1
|
Jonnagaddala J, Chen A, Batongbacal S, Nekkantti C. The OpenDeID corpus for patient de-identification. Sci Rep 2021; 11:19973. [PMID: 34620985 PMCID: PMC8497517 DOI: 10.1038/s41598-021-99554-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Accepted: 09/28/2021] [Indexed: 11/18/2022] Open
Abstract
For research purposes, protected health information is often redacted from unstructured electronic health records to preserve patient privacy and confidentiality. The OpenDeID corpus is designed to assist development of automatic methods to redact sensitive information from unstructured electronic health records. We retrieved 4548 unstructured surgical pathology reports from four urban Australian hospitals. The corpus was developed by two annotators under three different experimental settings. The quality of the annotations was evaluated for each setting. Specifically, we employed serial annotations, parallel annotations, and pre-annotations. Our results suggest that the pre-annotations approach is not reliable in terms of quality when compared to the serial annotations but can drastically reduce annotation time. The OpenDeID corpus comprises 2,100 pathology reports from 1,833 cancer patients with an average of 737.49 tokens and 7.35 protected health information entities annotated per report. The overall inter annotator agreement and deviation scores are 0.9464 and 0.9726, respectively. Realistic surrogates are also generated to make the corpus suitable for distribution to other researchers.
Collapse
Affiliation(s)
| | - Aipeng Chen
- School of Computer Science and Engineering, UNSW Sydney, Sydney, Australia
| | - Sean Batongbacal
- School of Computer Science and Engineering, UNSW Sydney, Sydney, Australia
| | | |
Collapse
|
2
|
Tarling TE, Lasser F, Carter C, Matzke LA, Dhugga G, Arora N, Dee S, LeBlanc J, Babinsky S, O'Donoghue S, Cheah S, Watson P, Vercauteren SM. Business Planning for a Campus-Wide Biobank. Biopreserv Biobank 2017; 15:37-45. [DOI: 10.1089/bio.2016.0077] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022] Open
Affiliation(s)
- Tamsin E. Tarling
- Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, Canada
| | | | - Candace Carter
- Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, Canada
- Office of Biobank Education and Research (OBER), University of British Columbia, Vancouver, Canada
| | - Lise A.M. Matzke
- Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, Canada
- Office of Biobank Education and Research (OBER), University of British Columbia, Vancouver, Canada
| | - Gurm Dhugga
- BC Children's Hospital Research Institute, Vancouver, Canada
| | - Nidhi Arora
- Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, Canada
| | - Simon Dee
- Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, Canada
- Office of Biobank Education and Research (OBER), University of British Columbia, Vancouver, Canada
| | | | | | - Sheila O'Donoghue
- Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, Canada
- Office of Biobank Education and Research (OBER), University of British Columbia, Vancouver, Canada
| | - Stefanie Cheah
- Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, Canada
- Office of Biobank Education and Research (OBER), University of British Columbia, Vancouver, Canada
| | - Peter Watson
- Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, Canada
- Office of Biobank Education and Research (OBER), University of British Columbia, Vancouver, Canada
- BC Cancer Agency, Victoria, Canada
| | - Suzanne M. Vercauteren
- Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, Canada
- BC Children's Hospital Research Institute, Vancouver, Canada
- Department of Pathology and Laboratory Medicine, BC Children's Hospital, Vancouver, Canada
| |
Collapse
|