1
|
Kondylakis H, Catalan R, Alabart SM, Barelle C, Bizopoulos P, Bobowicz M, Bona J, Fotiadis DI, Garcia T, Gomez I, Jimenez-Pastor A, Karatzanis G, Lekadir K, Kogut-Czarkowska M, Lalas A, Marias K, Marti-Bonmati L, Munuera J, Nikiforaki K, Pelissier M, Prior F, Rutherford M, Saint-Aubert L, Sakellariou Z, Seymour K, Trouillard T, Votis K, Tsiknakis M. Documenting the de-identification process of clinical and imaging data for AI for health imaging projects. Insights Imaging 2024; 15:130. [PMID: 38816658 PMCID: PMC11139818 DOI: 10.1186/s13244-024-01711-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2024] [Accepted: 04/26/2024] [Indexed: 06/01/2024] Open
Abstract
Artificial intelligence (AI) is revolutionizing the field of medical imaging, holding the potential to shift medicine from a reactive "sick-care" approach to a proactive focus on healthcare and prevention. The successful development of AI in this domain relies on access to large, comprehensive, and standardized real-world datasets that accurately represent diverse populations and diseases. However, images and data are sensitive, and as such, before using them in any way the data needs to be modified to protect the privacy of the patients. This paper explores the approaches in the domain of five EU projects working on the creation of ethically compliant and GDPR-regulated European medical imaging platforms, focused on cancer-related data. It presents the individual approaches to the de-identification of imaging data, and describes the problems and the solutions adopted in each case. Further, lessons learned are provided, enabling future projects to optimally handle the problem of data de-identification. CRITICAL RELEVANCE STATEMENT: This paper presents key approaches from five flagship EU projects for the de-identification of imaging and clinical data offering valuable insights and guidelines in the domain. KEY POINTS: ΑΙ models for health imaging require access to large amounts of data. Access to large imaging datasets requires an appropriate de-identification process. This paper provides de-identification guidelines from the AI for health imaging (AI4HI) projects.
Collapse
Affiliation(s)
| | - Rocio Catalan
- La Fe University and Polytechnic Hospital, La Fe Health Research Institute, Valencia, Spain
| | | | | | - Paschalis Bizopoulos
- Centre for Research & Technology Hellas, Information Technologies Institute (CERTH-ITI), Central Directorate, Thermi, Thessaloniki, Greece
| | | | - Jonathan Bona
- University of Arkansas for Medical Sciences, Little Rock, AR, USA
| | - Dimitrios I Fotiadis
- Unit of Medical Technology and Intelligent Information Systems, Department of Materials Science and Engineering, University of Ioannina, Ioannina, Greece
| | - Teresa Garcia
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Ignacio Gomez
- La Fe University and Polytechnic Hospital, La Fe Health Research Institute, Valencia, Spain
| | | | | | - Karim Lekadir
- Artificial Intelligence in Medicine Labm Universitat de Barcelona, Barcelona, Spain
| | | | - Antonios Lalas
- Centre for Research & Technology Hellas, Information Technologies Institute (CERTH-ITI), Central Directorate, Thermi, Thessaloniki, Greece
| | | | - Luis Marti-Bonmati
- Hospital Universitario y Politécnico La Fe, Grupo de Investigación Biomédica en Imagen IIS La Fe, Valencia, España
| | - Jose Munuera
- Quantitative Imaging Biomarkers in Medicine, Quibim, Valencia, Spain
| | | | | | - Fred Prior
- University of Arkansas for Medical Sciences, Little Rock, AR, USA
| | | | | | - Zisis Sakellariou
- Centre for Research & Technology Hellas, Information Technologies Institute (CERTH-ITI), Central Directorate, Thermi, Thessaloniki, Greece
| | | | | | - Konstantinos Votis
- Centre for Research & Technology Hellas, Information Technologies Institute (CERTH-ITI), Central Directorate, Thermi, Thessaloniki, Greece
| | | |
Collapse
|
2
|
Cybersecurity in PACS and Medical Imaging: an Overview. J Digit Imaging 2020; 33:1527-1542. [PMID: 33123867 PMCID: PMC7728878 DOI: 10.1007/s10278-020-00393-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2020] [Revised: 06/22/2020] [Accepted: 09/30/2020] [Indexed: 10/26/2022] Open
Abstract
This article provides an overview on the literature published on the topic of cybersecurity for PACS (Picture Archiving and Communications Systems) and medical imaging. From a practical perspective, PACS specific security measures must be implemented together with the measures applicable to the IT infrastructure as a whole, in order to prevent incidents such as PACS systems exposed to access from the Internet. Therefore, the article first offers an overview of the physical, technical and organizational mitigation measures that are proposed in literature on cybersecurity in healthcare information technology in general, followed by an overview on publications discussing specific cybersecurity topics that apply to PACS and medical imaging and present the "building blocks" for a secure PACS environment available in the literature. These include image de-identification, transport security, the selective encryption of the DICOM (Digital Imaging and Communications in Medicine) header, encrypted DICOM files, digital signatures and watermarking techniques. The article concludes with a discussion of gaps in the body of published literature and a summary.
Collapse
|
3
|
Steinkamp JM, Pomeranz T, Adleberg J, Kahn CE, Cook TS. Evaluation of Automated Public De-Identification Tools on a Corpus of Radiology Reports. Radiol Artif Intell 2020; 2:e190137. [PMID: 33937843 DOI: 10.1148/ryai.2020190137] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2019] [Revised: 05/05/2020] [Accepted: 05/14/2020] [Indexed: 11/11/2022]
Abstract
Purpose To evaluate publicly available de-identification tools on a large corpus of narrative-text radiology reports. Materials and Methods In this retrospective study, 21 categories of protected health information (PHI) in 2503 radiology reports were annotated from a large multihospital academic health system, collected between January 1, 2012 and January 8, 2019. A subset consisting of 1023 reports served as a test set; the remainder were used as domain-specific training data. The types and frequencies of PHI present within the reports were tallied. Five public de-identification tools were evaluated: MITRE Identification Scrubber Toolkit, U.S. National Library of Medicine‒Scrubber, Massachusetts Institute of Technology de-identification software, Emory Health Information DE-identification (HIDE) software, and Neuro named-entity recognition (NeuroNER). The tools were compared using metrics including recall, precision, and F1 score (the harmonic mean of recall and precision) for each category of PHI. Results The annotators identified 3528 spans of PHI text within the 2503 reports. Cohen κ for interrater agreement was 0.938. Dates accounted for the majority of PHI found in the dataset of radiology reports (n = 2755 [78%]). The two best-performing tools both used machine learning methods-NeuroNER (precision, 94.5%; recall, 92.6%; microaveraged F1 score [F1], 93.6%) and Emory HIDE (precision, 96.6%; recall, 88.2%; F1, 92.2%)-but none exceeded 50% F1 on the important patient names category. Conclusion PHI appeared infrequently within the corpus of reports studied, which created difficulties for training machine learning systems. Out-of-the-box de-identification tools achieved limited performance on the corpus of radiology reports, suggesting the need for further advancements in public datasets and trained models.Supplemental material is available for this article.See also the commentary by Tenenholtz and Wood in this issue.© RSNA, 2020.
Collapse
Affiliation(s)
- Jackson M Steinkamp
- Department of Radiology, Hospital of the University of Pennsylvania, 3400 Spruce St, Philadelphia, PA 19104 (J.M.S., T.P., J.A., C.E.K., T.S.C.); and Boston University School of Medicine, Boston, Mass (J.M.S.)
| | - Taylor Pomeranz
- Department of Radiology, Hospital of the University of Pennsylvania, 3400 Spruce St, Philadelphia, PA 19104 (J.M.S., T.P., J.A., C.E.K., T.S.C.); and Boston University School of Medicine, Boston, Mass (J.M.S.)
| | - Jason Adleberg
- Department of Radiology, Hospital of the University of Pennsylvania, 3400 Spruce St, Philadelphia, PA 19104 (J.M.S., T.P., J.A., C.E.K., T.S.C.); and Boston University School of Medicine, Boston, Mass (J.M.S.)
| | - Charles E Kahn
- Department of Radiology, Hospital of the University of Pennsylvania, 3400 Spruce St, Philadelphia, PA 19104 (J.M.S., T.P., J.A., C.E.K., T.S.C.); and Boston University School of Medicine, Boston, Mass (J.M.S.)
| | - Tessa S Cook
- Department of Radiology, Hospital of the University of Pennsylvania, 3400 Spruce St, Philadelphia, PA 19104 (J.M.S., T.P., J.A., C.E.K., T.S.C.); and Boston University School of Medicine, Boston, Mass (J.M.S.)
| |
Collapse
|
4
|
Identification and classification of DICOM files with burned-in text content. Int J Med Inform 2019; 126:128-137. [PMID: 31029254 DOI: 10.1016/j.ijmedinf.2019.02.011] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2018] [Revised: 02/12/2019] [Accepted: 02/19/2019] [Indexed: 11/23/2022]
Abstract
BACKGROUND Protected health information burned in pixel data is not indicated for various reasons in DICOM. It complicates the secondary use of such data. In recent years, there have been several attempts to anonymize or de-identify DICOM files. Existing approaches have different constraints. No completely reliable solution exists. Especially for large datasets, it is necessary to quickly analyse and identify files potentially violating privacy. METHODS Classification is based on adaptive-iterative algorithm designed to identify one of three classes. There are several image transformations, optical character recognition, and filters; then a local decision is made. A confirmed local decision is the final one. The classifier was trained on a dataset composed of 15,334 images of various modalities. RESULTS The false positive rates are in all cases below 4.00%, and 1.81% in the mission-critical problem of detecting protected health information. The classifier's weighted average recall was 94.85%, the weighted average inverse recall was 97.42% and Cohen's Kappa coefficient was 0.920. CONCLUSION The proposed novel approach for classification of burned-in text is highly configurable and able to analyse images from different modalities with a noisy background. The solution was validated and is intended to identify DICOM files that need to have restricted access or be thoroughly de-identified due to privacy issues. Unlike with existing tools, the recognised text, including its coordinates, can be further used for de-identification.
Collapse
|
5
|
Monteiro E, Costa C, Oliveira JL. A De-Identification Pipeline for Ultrasound Medical Images in DICOM Format. J Med Syst 2017; 41:89. [PMID: 28405948 DOI: 10.1007/s10916-017-0736-1] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2016] [Accepted: 04/03/2017] [Indexed: 11/24/2022]
Abstract
Clinical data sharing between healthcare institutions, and between practitioners is often hindered by privacy protection requirements. This problem is critical in collaborative scenarios where data sharing is fundamental for establishing a workflow among parties. The anonymization of patient information burned in DICOM images requires elaborate processes somewhat more complex than simple de-identification of textual information. Usually, before sharing, there is a need for manual removal of specific areas containing sensitive information in the images. In this paper, we present a pipeline for ultrasound medical image de-identification, provided as a free anonymization REST service for medical image applications, and a Software-as-a-Service to streamline automatic de-identification of medical images, which is freely available for end-users. The proposed approach applies image processing functions and machine-learning models to bring about an automatic system to anonymize medical images. To perform character recognition, we evaluated several machine-learning models, being Convolutional Neural Networks (CNN) selected as the best approach. For accessing the system quality, 500 processed images were manually inspected showing an anonymization rate of 89.2%. The tool can be accessed at https://bioinformatics.ua.pt/dicom/anonymizer and it is available with the most recent version of Google Chrome, Mozilla Firefox and Safari. A Docker image containing the proposed service is also publicly available for the community.
Collapse
Affiliation(s)
| | - Carlos Costa
- University of Aveiro, DETI/IEETA, Aveiro, Portugal
| | | |
Collapse
|