Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For:	Erlich Y, Narayanan A. Routes for breaching and protecting genetic privacy. Nat Rev Genet 2014;15:409-21. [PMID: 24805122 PMCID: PMC4151119 DOI: 10.1038/nrg3723] [Citation(s) in RCA: 177] [Impact Index Per Article: 17.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]

Number

Cited by Other Article(s)

Brauneck A, Schmalhorst L, Weiss S, Baumbach L, Völker U, Ellinghaus D, Baumbach J, Buchholtz G. Legal aspects of privacy-enhancing technologies in genome-wide association studies and their impact on performance and feasibility. Genome Biol 2024;25:154. [PMID: 38872191 PMCID: PMC11170858 DOI: 10.1186/s13059-024-03296-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Accepted: 06/03/2024] [Indexed: 06/15/2024] Open

Thomas M, Mackes N, Preuss-Dodhy A, Wieland T, Bundschus M. Assessing Privacy Vulnerabilities in Genetic Data Sets: Scoping Review. JMIR BIOINFORMATICS AND BIOTECHNOLOGY 2024;5:e54332. [PMID: 38935957 PMCID: PMC11165293 DOI: 10.2196/54332] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 03/26/2024] [Accepted: 03/29/2024] [Indexed: 06/29/2024]

Cavinato T, Rubinacci S, Malaspinas AS, Delaneau O. A resampling-based approach to share reference panels. NATURE COMPUTATIONAL SCIENCE 2024;4:360-366. [PMID: 38745108 PMCID: PMC11136649 DOI: 10.1038/s43588-024-00630-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Accepted: 04/16/2024] [Indexed: 05/16/2024]

Malakar Y, Lacey J, Twine NA, McCrea R, Bauer DC. Balancing the safeguarding of privacy and data sharing: perceptions of genomic professionals on patient genomic data ownership in Australia. Eur J Hum Genet 2024;32:506-512. [PMID: 36631540 PMCID: PMC11061115 DOI: 10.1038/s41431-022-01273-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Revised: 11/09/2022] [Accepted: 12/15/2022] [Indexed: 01/13/2023] Open

Shuffling haplotypes to share reference panels for imputation. NATURE COMPUTATIONAL SCIENCE 2024;4:320-321. [PMID: 38778210 DOI: 10.1038/s43588-024-00640-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2024]

Bartels K, Afonso S, Brown L, Carriles C, Kim R, Lazier J, Mercimek-Andrews S, Nelson TN, Stedman I, Thain E, Vanneste R, Chad L. Next generation of free? Points to consider when navigating sponsored genetic testing. J Med Genet 2024;61:299-304. [PMID: 37932018 DOI: 10.1136/jmg-2023-109571] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Accepted: 09/28/2023] [Indexed: 11/08/2023]

Zhou J, Chen S, Wu Y, Li H, Zhang B, Zhou L, Hu Y, Xiang Z, Li Z, Chen N, Han W, Xu C, Wang D, Gao X. PPML-Omics: A privacy-preserving federated machine learning method protects patients' privacy in omic data. SCIENCE ADVANCES 2024;10:eadh8601. [PMID: 38295178 PMCID: PMC10830108 DOI: 10.1126/sciadv.adh8601] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/18/2023] [Accepted: 12/29/2023] [Indexed: 02/02/2024]

Affiliation(s)

Juexiao Zhou Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia Computational Bioscience Research Center, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
Siyuan Chen Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia Computational Bioscience Research Center, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
Yulian Wu Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia Computational Bioscience Research Center, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
Haoyang Li Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia Computational Bioscience Research Center, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
Bin Zhang Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia Computational Bioscience Research Center, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
Longxi Zhou Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia Computational Bioscience Research Center, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
Yan Hu Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
Zihang Xiang Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
Zhongxiao Li Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia Computational Bioscience Research Center, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
Ningning Chen Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia Computational Bioscience Research Center, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
Wenkai Han Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia Computational Bioscience Research Center, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
Chencheng Xu Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia Computational Bioscience Research Center, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
Di Wang Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia Computational Bioscience Research Center, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
Xin Gao Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia Computational Bioscience Research Center, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia

Collapse

Schubach M, Maass T, Nazaretyan L, Röner S, Kircher M. CADD v1.7: using protein language models, regulatory CNNs and other nucleotide-level scores to improve genome-wide variant predictions. Nucleic Acids Res 2024;52:D1143-D1154. [PMID: 38183205 PMCID: PMC10767851 DOI: 10.1093/nar/gkad989] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 10/14/2023] [Accepted: 10/17/2023] [Indexed: 01/07/2024] Open

Emani PS, Geradi MN, Gürsoy G, Grasty MR, Miranker A, Gerstein MB. Assessing and mitigating privacy risks of sparse, noisy genotypes by local alignment to haplotype databases. Genome Res 2023;33:gr.278322.123. [PMID: 38097386 PMCID: PMC10760520 DOI: 10.1101/gr.278322.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Accepted: 11/18/2023] [Indexed: 01/04/2024]

Ayday E, Vaidya J, Jiang X, Telenti A. Ensuring Trust in Genomics Research. ... IEEE INTERNATIONAL CONFERENCE ON TRUST, PRIVACY AND SECURITY IN INTELLIGENT SYSTEMS AND APPLICATIONS : (TPS-ISA ...). IEEE INTERNATIONAL CONFERENCE ON TRUST, PRIVACY AND SECURITY IN INTELLIGENT SYSTEMS AND APPLICATIONS 2023;2023:1-12. [PMID: 38562180 PMCID: PMC10981793 DOI: 10.1109/tps-isa58951.2023.00011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]

Tamuhla T, Lulamba ET, Mutemaringa T, Tiffin N. Multiple modes of data sharing can facilitate secondary use of sensitive health data for research. BMJ Glob Health 2023;8:e013092. [PMID: 37802544 PMCID: PMC10565310 DOI: 10.1136/bmjgh-2023-013092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Accepted: 09/12/2023] [Indexed: 10/10/2023] Open

Abstract

Evidence-based healthcare relies on health data from diverse sources to inform decision-making across different domains, including disease prevention, aetiology, diagnostics, therapeutics and prognosis. Increasing volumes of highly granular data provide opportunities to leverage the evidence base, with growing recognition that health data are highly sensitive and onward research use may create privacy issues for individuals providing data. Concerns are heightened for data without explicit informed consent for secondary research use. Additionally, researchers-especially from under-resourced environments and the global South-may wish to participate in onward analysis of resources they collected or retain oversight of onward use to ensure ethical constraints are respected. Different data-sharing approaches may be adopted according to data sensitivity and secondary use restrictions, moving beyond the traditional Open Access model of unidirectional data transfer from generator to secondary user. We describe collaborative data sharing, facilitating research by combining datasets and undertaking meta-analysis involving collaborating partners; federated data analysis, where partners undertake synchronous, harmonised analyses on their independent datasets and then combine their results in a coauthored report, and trusted research environments where data are analysed in a controlled environment and only aggregate results are exported. We review how deidentification and anonymisation methods, including data perturbation, can reduce risks specifically associated with health data secondary use. In addition, we present an innovative modularised approach for building data sharing agreements incorporating a more nuanced approach to data sharing to protect privacy, and provide a framework for building the agreements for each of these data-sharing scenarios.

Collapse

Sadhuka S, Fridman D, Berger B, Cho H. Assessing transcriptomic reidentification risks using discriminative sequence models. Genome Res 2023;33:1101-1112. [PMID: 37541758 PMCID: PMC10538488 DOI: 10.1101/gr.277699.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2023] [Accepted: 04/19/2023] [Indexed: 08/06/2023]

Sweeney SM, Hamadeh HK, Abrams N, Adam SJ, Brenner S, Connors DE, Davis GJ, Fiore L, Gawel SH, Grossman RL, Hanlon SE, Hsu K, Kelloff GJ, Kirsch IR, Louv B, McGraw D, Meng F, Milgram D, Miller RS, Morgan E, Mukundan L, O'Brien T, Robbins P, Rubin EH, Rubinstein WS, Salmi L, Schaller T, Shi G, Sigman CC, Srivastava S. Challenges to Using Big Data in Cancer. Cancer Res 2023;83:1175-1182. [PMID: 36625843 PMCID: PMC10102837 DOI: 10.1158/0008-5472.can-22-1274] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2022] [Revised: 07/29/2022] [Accepted: 12/05/2022] [Indexed: 01/11/2023]

Affiliation(s)

Shawn M. Sweeney American Association for Cancer Research, Philadelphia, Pennsylvania
Hisham K. Hamadeh Genmab, Princeton, New Jersey
Natalie Abrams Division of Cancer Prevention, Early Detection Research Network, National Cancer Institute, Rockville, Maryland
Stacey J. Adam Foundation for the National Institutes of Health, Bethesda, Maryland
Sara Brenner Office of In Vitro Diagnostics, Center for Devices and Radiological Health, U.S. Food and Drug Administration, Silver Spring, Maryland
Dana E. Connors Foundation for the National Institutes of Health, Bethesda, Maryland
Gerard J. Davis Abbott Diagnostics Division, Abbott Laboratories, Lake Forest, Illinois
Louis Fiore Boston University School of Medicine, Boston and New England Department of Veterans Affairs, Bedford, Massachusetts
Susan H. Gawel Abbott Diagnostics Division, Abbott Laboratories, Lake Forest, Illinois
Robert L. Grossman Center for Translational Data Science, The University of Chicago, Chicago, Illinois
Sean E. Hanlon Center for Strategic Scientific Initiatives, National Cancer Institute, Bethesda, Maryland
Karl Hsu Sanofi, Bridgewater, New Jersey
Gary J. Kelloff Division of Cancer Treatment and Diagnosis, National Cancer Institute, Bethesda, Maryland
Ilan R. Kirsch Adaptive Biotechnologies, Seattle, Washington
Bill Louv Project Data Sphere, Morrisville, North Carolina
Deven McGraw Ciitizen Platform at Invitae, San Francisco, California
Frank Meng Boston University and Veterans Administration Boston Healthcare System, Boston, Massachusetts
Daniel Milgram CCS Associates, San Jose, California
Robert S. Miller CancerLinQ, American Society of Clinical Oncology, Alexandria, Virginia
Emily Morgan Foundation for the National Institutes of Health, Bethesda, Maryland
Lata Mukundan CCS Associates, San Jose, California
Thomas O'Brien Pfizer, Brooklyn, New York
Paul Robbins Pfizer, Brooklyn, New York
Eric H. Rubin Merck, New York, New York
Wendy S. Rubinstein Office of In Vitro Diagnostics, Center for Devices and Radiological Health, U.S. Food and Drug Administration, Silver Spring, Maryland
Liz Salmi Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, Massachusetts
Teilo Schaller Project Data Sphere, Morrisville, North Carolina
George Shi Abbott Diagnostics Division, Abbott Laboratories, Lake Forest, Illinois
Caroline C. Sigman Boston University and Veterans Administration Boston Healthcare System, Boston, Massachusetts
Sudhir Srivastava Cancer Biomarkers Research Group, Division of Cancer Prevention, National Cancer Institute, Rockville, Maryland

Collapse

Akyüz K, Goisauf M, Chassang G, Kozera Ł, Mežinska S, Tzortzatou-Nanopoulou O, Mayrhofer MT. Post-identifiability in changing sociotechnological genomic data environments. BIOSOCIETIES 2023:1-28. [PMID: 37359141 PMCID: PMC10042674 DOI: 10.1057/s41292-023-00299-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/13/2023] [Indexed: 03/30/2023]

Guo Y, Liu F, Zhou T, Cai Z, Xiao N. Seeing is believing: Towards interactive visual exploration of data privacy in federated learning. Inf Process Manag 2023. [DOI: 10.1016/j.ipm.2022.103162] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]

TogoVar: A comprehensive Japanese genetic variation database. Hum Genome Var 2022;9:44. [PMID: 36509753 PMCID: PMC9744889 DOI: 10.1038/s41439-022-00222-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2022] [Revised: 11/03/2022] [Accepted: 11/07/2022] [Indexed: 12/14/2022] Open

Fierro-Monti I, Wright JC, Choudhary JS, Vizcaíno JA. Identifying individuals using proteomics: are we there yet? Front Mol Biosci 2022;9:1062031. [PMID: 36523653 PMCID: PMC9744771 DOI: 10.3389/fmolb.2022.1062031] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2022] [Accepted: 11/16/2022] [Indexed: 08/31/2023] Open

Woerner AE, Mandape S, Kapema KB, Duque TM, Smuts A, King JL, Crysup B, Wang X, Huang M, Ge J, Budowle B. Optimized variant calling for estimating kinship. Forensic Sci Int Genet 2022;61:102785. [DOI: 10.1016/j.fsigen.2022.102785] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2022] [Revised: 08/07/2022] [Accepted: 09/29/2022] [Indexed: 11/16/2022]

Kim M, Wang S, Jiang X, Harmanci A. SVAT: Secure outsourcing of variant annotation and genotype aggregation. BMC Bioinformatics 2022;23:409. [PMID: 36182914 PMCID: PMC9526274 DOI: 10.1186/s12859-022-04959-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2021] [Accepted: 09/20/2022] [Indexed: 11/10/2022] Open

TrustGWAS: A full-process workflow for encrypted GWAS using multi-key homomorphic encryption and pseudorandom number perturbation. Cell Syst 2022;13:752-767.e6. [PMID: 36041458 DOI: 10.1016/j.cels.2022.08.001] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Revised: 04/21/2022] [Accepted: 08/04/2022] [Indexed: 01/26/2023]

Gürsoy G. Genome Privacy and Trust. Annu Rev Biomed Data Sci 2022;5:163-181. [PMID: 35508070 DOI: 10.1146/annurev-biodatasci-122120-021311] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Chen Z, Qian Y, Wang Y, Fang Y. Deep Convolutional Generative Adversarial Network-Based EMG Data Enhancement for Hand Motion Classification. Front Bioeng Biotechnol 2022;10:909653. [PMID: 36061423 PMCID: PMC9431769 DOI: 10.3389/fbioe.2022.909653] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Accepted: 06/22/2022] [Indexed: 11/13/2022] Open

Abstract

The acquisition of bio-signal from the human body requires a strict experimental setup and ethical approvements, which leads to limited data for the training of classifiers in the era of big data. It will change the situation if synthetic data can be generated based on real data. This article proposes such a kind of multiple channel electromyography (EMG) data enhancement method using a deep convolutional generative adversarial network (DCGAN). The generation procedure is as follows: First, the multiple channels of EMG signals within sliding windows are converted to grayscale images through matrix transformation, normalization, and histogram equalization. Second, the grayscale images of each class are used to train DCGAN so that synthetic grayscale images of each class can be generated with the input of random noises. To evaluate whether the synthetic data own the similarity and diversity with the real data, the classification accuracy index is adopted in this article. A public EMG dataset (that is, ISR Myo-I) for hand motion recognition is used to prove the usability of the proposed method. The experimental results show that adding synthetic data to the training data has little effect on the classification performance, indicating the similarity between real data and synthetic data. Moreover, it is also noted that the average accuracy (five classes) is slightly increased by 1%–2% for support vector machine (SVM) and random forest (RF), respectively, with additional synthetic data for training. Although the improvement is not statistically significant, it implies that the generated data by DCGAN own its new characteristics, and it is possible to enrich the diversity of the training dataset. In addition, cross-validation analysis shows that the synthetic samples have large inter-class distance, reflected by higher cross-validation accuracy of pure synthetic sample classification. Furthermore, this article also demonstrates that histogram equalization can significantly improve the performance of EMG-based hand motion recognition.

Collapse

Jiang Y, Mosquera L, Jiang B, Kong L, El Emam K. Measuring re-identification risk using a synthetic estimator to enable data sharing. PLoS One 2022;17:e0269097. [PMID: 35714132 PMCID: PMC9205507 DOI: 10.1371/journal.pone.0269097] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2021] [Accepted: 05/13/2022] [Indexed: 11/18/2022] Open

Zhang C, Bonomi L. Mitigating Membership Inference in Deep Learning Applications with High Dimensional Genomic Data. IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS. IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS 2022;2022:10.1109/ichi54592.2022.00101. [PMID: 36120416 PMCID: PMC9473339 DOI: 10.1109/ichi54592.2022.00101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]

Nakagawa Y, Ohata S, Shimizu K. Efficient privacy-preserving variable-length substring match for genome sequence. Algorithms Mol Biol 2022;17:9. [PMID: 35473587 PMCID: PMC9040336 DOI: 10.1186/s13015-022-00211-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Accepted: 03/01/2022] [Indexed: 11/28/2022] Open

Abstract

The development of a privacy-preserving technology is important for accelerating genome data sharing. This study proposes an algorithm that securely searches a variable-length substring match between a query and a database sequence. Our concept hinges on a technique that efficiently applies FM-index for a secret-sharing scheme. More precisely, we developed an algorithm that can achieve a secure table lookup in such a way that \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$V[V[\ldots V[p_0] \ldots ]]$$\end{document}V[V[…V[p0]…]] is computed for a given depth of recursion where \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_0$$\end{document}p0 is an initial position, and V is a vector. We used the secure table lookup for vectors created based on FM-index. The notable feature of the secure table lookup is that time, communication, and round complexities are not dependent on the table length N, after the query input. Therefore, a substring match by reference to the FM-index-based table can also be conducted independently against the database length, and the entire search time is dramatically improved compared to previous approaches. We conducted an experiment using a human genome sequence with the length of 10 million as the database and a query with the length of 100 and found that the query response time of our protocol was at least three orders of magnitude faster than a non-indexed database search protocol under the realistic computation/network environment.

Collapse

Bonomi L, Wu Z, Fan L. Sharing personal ECG time-series data privately. J Am Med Inform Assoc 2022;29:1152-1160. [PMID: 35380666 PMCID: PMC9196703 DOI: 10.1093/jamia/ocac047] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2021] [Revised: 03/16/2022] [Accepted: 03/31/2022] [Indexed: 11/13/2022] Open

Abstract Abstract Objective Emerging technologies (eg, wearable devices) have made it possible to collect data directly from individuals (eg, time-series), providing new insights on the health and well-being of individual patients. Broadening the access to these data would facilitate the integration with existing data sources (eg, clinical and genomic data) and advance medical research. Compared to traditional health data, these data are collected directly from individuals, are highly unique and provide fine-grained information, posing new privacy challenges. In this work, we study the applicability of a novel privacy model to enable individual-level time-series data sharing while maintaining the usability for data analytics. Methods and materials We propose a privacy-protecting method for sharing individual-level electrocardiography (ECG) time-series data, which leverages dimensional reduction technique and random sampling to achieve provable privacy protection. We show that our solution provides strong privacy protection against an informed adversarial model while enabling useful aggregate-level analysis. Results We conduct our evaluations on 2 real-world ECG datasets. Our empirical results show that the privacy risk is significantly reduced after sanitization while the data usability is retained for a variety of clinical tasks (eg, predictive modeling and clustering). Discussion Our study investigates the privacy risk in sharing individual-level ECG time-series data. We demonstrate that individual-level data can be highly unique, requiring new privacy solutions to protect data contributors. Conclusion The results suggest our proposed privacy-protection method provides strong privacy protections while preserving the usefulness of the data. Collapse

Functional genomics data: privacy risk assessment and technological mitigation. Nat Rev Genet 2022;23:245-258. [PMID: 34759381 DOI: 10.1038/s41576-021-00428-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/18/2021] [Indexed: 12/15/2022]

Santaló J, Berdasco M. Ethical implications of epigenetics in the era of personalized medicine. Clin Epigenetics 2022;14:44. [PMID: 35337378 PMCID: PMC8953972 DOI: 10.1186/s13148-022-01263-1] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2022] [Accepted: 03/17/2022] [Indexed: 11/10/2022] Open

Abstract

Given the increasing research activity on epigenetics to monitor human diseases and its connection with lifestyle and environmental expositions, the field of epigenetics has attracted a great deal of interest also at the ethical and societal level. In this review, we will identify and discuss current ethical, legal and social issues of epigenetics research in the context of personalized medicine. The review covers ethical aspects such as how epigenetic information should impact patient autonomy and the ability to generate an intentional and voluntary decision, the measures of data protection related to privacy and confidentiality derived from epigenome studies (e.g., risk of discrimination, patient re-identification and unexpected findings) or the debate in the distribution of responsibilities for health (i.e., personal versus public responsibilities). We pay special attention to the risk of social discrimination and stigmatization as a consequence of inferring information related to lifestyle and environmental exposures potentially contained in epigenetic data. Furthermore, as exposures to the environment and individual habits do not affect all populations equally, the violation of the principle of distributive justice in the access to the benefits of clinical epigenetics is discussed. In this regard, epigenetics represents a great opportunity for the integration of public policy measures aimed to create healthier living environments. Whether these public policies will coexist or, in contrast, compete with strategies reinforcing the personalized medicine interventions needs to be considered. The review ends with a reflection on the main challenges in epigenetic research, some of them in a technical dimension (e.g., assessing causality or establishing reference epigenomes) but also in the ethical and social sphere (e.g., risk to add an epigenetic determinism on top of the current genetic one). In sum, integration into life science investigation of social experiences such as exposure to risk, nutritional habits, prejudice and stigma, is imperative to understand epigenetic variation in disease. This pragmatic approach is required to locate clinical epigenetics out of the experimental laboratories and facilitate its implementation into society.

Collapse

Wan Z, Hazel JW, Clayton EW, Vorobeychik Y, Kantarcioglu M, Malin BA. Sociotechnical safeguards for genomic data privacy. Nat Rev Genet 2022;23:429-445. [PMID: 35246669 PMCID: PMC8896074 DOI: 10.1038/s41576-022-00455-y] [Citation(s) in RCA: 28] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/24/2022] [Indexed: 12/21/2022]

Akgün M, Pfeifer N, Kohlbacher O. Efficient privacy-preserving whole-genome variant queries. Bioinformatics 2022;38:2202-2210. [PMID: 35150254 PMCID: PMC9004657 DOI: 10.1093/bioinformatics/btac070] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2021] [Revised: 01/13/2022] [Accepted: 02/03/2022] [Indexed: 02/03/2023] Open

Abstract

MOTIVATION

Diagnosis and treatment decisions on genomic data have become widespread as the cost of genome sequencing decreases gradually. In this context, disease-gene association studies are of great importance. However, genomic data are very sensitive when compared to other data types and contains information about individuals and their relatives. Many studies have shown that this information can be obtained from the query-response pairs on genomic databases. In this work, we propose a method that uses secure multi-party computation to query genomic databases in a privacy-protected manner. The proposed solution privately outsources genomic data from arbitrarily many sources to the two non-colluding proxies and allows genomic databases to be safely stored in semi-honest cloud environments. It provides data privacy, query privacy and output privacy by using XOR-based sharing and unlike previous solutions, it allows queries to run efficiently on hundreds of thousands of genomic data.

RESULTS

We measure the performance of our solution with parameters similar to real-world applications. It is possible to query a genomic database with 3 000 000 variants with five genomic query predicates under 400 ms. Querying 1 048 576 genomes, each containing 1 000 000 variants, for the presence of five different query variants can be achieved approximately in 6 min with a small amount of dedicated hardware and connectivity. These execution times are in the right range to enable real-world applications in medical research and healthcare. Unlike previous studies, it is possible to query multiple databases with response times fast enough for practical application. To the best of our knowledge, this is the first solution that provides this performance for querying large-scale genomic data.

AVAILABILITY AND IMPLEMENTATION

https://gitlab.com/DIFUTURE/privacy-preserving-variant-queries.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

Collapse

Torkzadehmahani R, Nasirigerdeh R, Blumenthal DB, Kacprowski T, List M, Matschinske J, Spaeth J, Wenke NK, Baumbach J. Privacy-Preserving Artificial Intelligence Techniques in Biomedicine. Methods Inf Med 2022;61:e12-e27. [PMID: 35062032 PMCID: PMC9246509 DOI: 10.1055/s-0041-1740630] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]

Jafarbeiki S, Sakzad A, Kasra Kermanshahi S, Gaire R, Steinfeld R, Lai S, Abraham G, Thapa C. PrivGenDB: Efficient and privacy-preserving query executions over encrypted SNP-Phenotype database. INFORMATICS IN MEDICINE UNLOCKED 2022. [DOI: 10.1016/j.imu.2022.100988] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open

Akyüz K, Chassang G, Goisauf M, Kozera Ł, Mezinska S, Tzortzatou O, Mayrhofer MT. Biobanking and risk assessment: a comprehensive typology of risks for an adaptive risk governance. LIFE SCIENCES, SOCIETY AND POLICY 2021;17:10. [PMID: 34903285 PMCID: PMC8666836 DOI: 10.1186/s40504-021-00117-7] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/30/2021] [Accepted: 12/01/2021] [Indexed: 05/03/2023]

Wan Z, Vorobeychik Y, Xia W, Liu Y, Wooders M, Guo J, Yin Z, Clayton EW, Kantarcioglu M, Malin BA. Using game theory to thwart multistage privacy intrusions when sharing data. SCIENCE ADVANCES 2021;7:eabe9986. [PMID: 34890225 PMCID: PMC8664254 DOI: 10.1126/sciadv.abe9986] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/14/2020] [Accepted: 10/25/2021] [Indexed: 06/13/2023]

Affiliation(s)

Zhiyu Wan Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, TN 37212, USA Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, USA
Yevgeniy Vorobeychik Department of Computer Science and Engineering, Washington University in St. Louis, St. Louis, MO 63130, USA
Weiyi Xia Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, USA
Yongtai Liu Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, TN 37212, USA
Myrna Wooders Department of Economics, Vanderbilt University, Nashville, TN 37235, USA
Jia Guo Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, TN 37212, USA
Zhijun Yin Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, TN 37212, USA Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, USA
Ellen Wright Clayton Center for Biomedical Ethics and Society, Vanderbilt University Medical Center, Nashville, TN 37203, USA School of Law, Vanderbilt University, Nashville, TN 37203, USA Department of Pediatrics, Vanderbilt University Medical Center, Nashville, TN 37232, USA
Murat Kantarcioglu Department of Computer Science, University of Texas at Dallas, Richardson, TX 75080, USA Institute for Quantitative Social Science, Harvard University, Cambridge, MA 02138, USA Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, Berkeley, CA 94720, USA
Bradley A. Malin Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, TN 37212, USA Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, USA Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN 37203, USA

Collapse

Dupras C, Bunnik EM. Toward a Framework for Assessing Privacy Risks in Multi-Omic Research and Databases. THE AMERICAN JOURNAL OF BIOETHICS : AJOB 2021;21:46-64. [PMID: 33433298 DOI: 10.1080/15265161.2020.1863516] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]

Ziegenhain C, Sandberg R. BAMboozle removes genetic variation from human sequence data for open data sharing. Nat Commun 2021;12:6216. [PMID: 34711808 PMCID: PMC8553849 DOI: 10.1038/s41467-021-26152-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2021] [Accepted: 09/20/2021] [Indexed: 11/18/2022] Open

Hekel R, Budis J, Kucharik M, Radvanszky J, Pös Z, Szemes T. Privacy-preserving storage of sequenced genomic data. BMC Genomics 2021;22:712. [PMID: 34600465 PMCID: PMC8487550 DOI: 10.1186/s12864-021-07996-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2021] [Accepted: 09/10/2021] [Indexed: 11/23/2022] Open

Keane TM, O'Donovan C, Vizcaíno JA. The growing need for controlled data access models in clinical proteomics and metabolomics. Nat Commun 2021;12:5787. [PMID: 34599180 PMCID: PMC8486822 DOI: 10.1038/s41467-021-26110-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Accepted: 09/17/2021] [Indexed: 01/25/2023] Open

Daniels H, Jones KH, Heys S, Ford DV. Exploring the Use of Genomic and Routinely Collected Data: Narrative Literature Review and Interview Study. J Med Internet Res 2021;23:e15739. [PMID: 34559060 PMCID: PMC8501405 DOI: 10.2196/15739] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2019] [Revised: 10/01/2020] [Accepted: 07/15/2021] [Indexed: 11/13/2022] Open

Abstract

Background

Advancing the use of genomic data with routinely collected health data holds great promise for health care and research. Increasing the use of these data is a high priority to understand and address the causes of disease.

Objective

This study aims to provide an outline of the use of genomic data alongside routinely collected data in health research to date. As this field prepares to move forward, it is important to take stock of the current state of play in order to highlight new avenues for development, identify challenges, and ensure that adequate data governance models are in place for safe and socially acceptable progress.

Methods

We conducted a literature review to draw information from past studies that have used genomic and routinely collected data and conducted interviews with individuals who use these data for health research. We collected data on the following: the rationale of using genomic data in conjunction with routinely collected data, types of genomic and routinely collected data used, data sources, project approvals, governance and access models, and challenges encountered.

Results

The main purpose of using genomic and routinely collected data was to conduct genome-wide and phenome-wide association studies. Routine data sources included electronic health records, disease and death registries, health insurance systems, and deprivation indices. The types of genomic data included polygenic risk scores, single nucleotide polymorphisms, and measures of genetic activity, and biobanks generally provided these data. Although the literature search showed that biobanks released data to researchers, the case studies revealed a growing tendency for use within a data safe haven. Challenges of working with these data revolved around data collection, data storage, technical, and data privacy issues.

Conclusions

Using genomic and routinely collected data holds great promise for progressing health research. Several challenges are involved, particularly in terms of privacy. Overcoming these barriers will ensure that the use of these data to progress health research can be exploited to its full potential.

Collapse

Liu YL, Stadler ZK. The Future of Parallel Tumor and Germline Genetic Testing: Is There a Role for All Patients With Cancer? J Natl Compr Canc Netw 2021;19:871-878. [PMID: 34340209 PMCID: PMC11123333 DOI: 10.6004/jnccn.2021.7044] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2020] [Accepted: 04/09/2021] [Indexed: 11/17/2022]

Oestreich M, Chen D, Schultze JL, Fritz M, Becker M. Privacy considerations for sharing genomics data. EXCLI JOURNAL 2021;20:1243-1260. [PMID: 34345236 PMCID: PMC8326502 DOI: 10.17179/excli2021-4002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/19/2021] [Accepted: 07/07/2021] [Indexed: 01/23/2023]

Bu D, Wang X, Tang H. Haplotype-based membership inference from summary genomic data. Bioinformatics 2021;37:i161-i168. [PMID: 34252973 PMCID: PMC8275351 DOI: 10.1093/bioinformatics/btab305] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open

Lu D, Zhang Y, Zhang L, Wang H, Weng W, Li L, Cai H. Methods of privacy-preserving genomic sequencing data alignments. Brief Bioinform 2021;22:6279828. [PMID: 34021302 DOI: 10.1093/bib/bbab151] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2020] [Revised: 03/10/2021] [Accepted: 03/30/2021] [Indexed: 11/14/2022] Open

Arshad S, Arshad J, Khan MM, Parkinson S. Analysis of security and privacy challenges for DNA-genomics applications and databases. J Biomed Inform 2021;119:103815. [PMID: 34022422 DOI: 10.1016/j.jbi.2021.103815] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2021] [Revised: 05/07/2021] [Accepted: 05/08/2021] [Indexed: 02/06/2023]

Gürsoy G, Emani P, Brannon CM, Jolanki OA, Harmanci A, Strattan JS, Cherry JM, Miranker AD, Gerstein M. Data Sanitization to Reduce Private Information Leakage from Functional Genomics. Cell 2021;183:905-917.e16. [PMID: 33186529 DOI: 10.1016/j.cell.2020.09.036] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2019] [Revised: 07/23/2020] [Accepted: 09/11/2020] [Indexed: 12/30/2022]

Ayoz K, Ayday E, Cicek AE. Genome Reconstruction Attacks Against Genomic Data-Sharing Beacons. PROCEEDINGS ON PRIVACY ENHANCING TECHNOLOGIES. PRIVACY ENHANCING TECHNOLOGIES SYMPOSIUM 2021;2021:28-48. [PMID: 34746296 PMCID: PMC8570374 DOI: 10.2478/popets-2021-0036] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]

Cho JC. Human microbiome privacy risks associated with summary statistics. PLoS One 2021;16:e0249528. [PMID: 33798253 PMCID: PMC8018636 DOI: 10.1371/journal.pone.0249528] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2020] [Accepted: 03/21/2021] [Indexed: 11/25/2022] Open

Yang H, Chen L, Cheng Z, Yang M, Wang J, Lin C, Wang Y, Huang L, Chen Y, Peng S, Ke Z, Li W. Deep learning-based six-type classifier for lung cancer and mimics from histopathological whole slide images: a retrospective study. BMC Med 2021;19:80. [PMID: 33775248 PMCID: PMC8006383 DOI: 10.1186/s12916-021-01953-2] [Citation(s) in RCA: 37] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/26/2020] [Accepted: 02/26/2021] [Indexed: 12/19/2022] Open

Abstract

BACKGROUND

Targeted therapy and immunotherapy put forward higher demands for accurate lung cancer classification, as well as benign versus malignant disease discrimination. Digital whole slide images (WSIs) witnessed the transition from traditional histopathology to computational approaches, arousing a hype of deep learning methods for histopathological analysis. We aimed at exploring the potential of deep learning models in the identification of lung cancer subtypes and cancer mimics from WSIs.

METHODS

We initially obtained 741 WSIs from the First Affiliated Hospital of Sun Yat-sen University (SYSUFH) for the deep learning model development, optimization, and verification. Additional 318 WSIs from SYSUFH, 212 from Shenzhen People's Hospital, and 422 from The Cancer Genome Atlas were further collected for multi-centre verification. EfficientNet-B5- and ResNet-50-based deep learning methods were developed and compared using the metrics of recall, precision, F1-score, and areas under the curve (AUCs). A threshold-based tumour-first aggregation approach was proposed and implemented for the label inferencing of WSIs with complex tissue components. Four pathologists of different levels from SYSUFH reviewed all the testing slides blindly, and the diagnosing results were used for quantitative comparisons with the best performing deep learning model.

RESULTS

We developed the first deep learning-based six-type classifier for histopathological WSI classification of lung adenocarcinoma, lung squamous cell carcinoma, small cell lung carcinoma, pulmonary tuberculosis, organizing pneumonia, and normal lung. The EfficientNet-B5-based model outperformed ResNet-50 and was selected as the backbone in the classifier. Tested on 1067 slides from four cohorts of different medical centres, AUCs of 0.970, 0.918, 0.963, and 0.978 were achieved, respectively. The classifier achieved high consistence to the ground truth and attending pathologists with high intraclass correlation coefficients over 0.873.

CONCLUSIONS

Multi-cohort testing demonstrated our six-type classifier achieved consistent and comparable performance to experienced pathologists and gained advantages over other existing computational methods. The visualization of prediction heatmap improved the model interpretability intuitively. The classifier with the threshold-based tumour-first label inferencing method exhibited excellent accuracy and feasibility in classifying lung cancers and confused nonneoplastic tissues, indicating that deep learning can resolve complex multi-class tissue classification that conforms to real-world histopathological scenarios.

Collapse

Lemieux VL, Hofman D, Hamouda H, Batista D, Kaur R, Pan W, Costanzo I, Regier D, Pollard S, Weymann D, Fraser R. Having Our “Omic” Cake and Eating It Too?: Evaluating User Response to Using Blockchain Technology for Private and Secure Health Data Management and Sharing. FRONTIERS IN BLOCKCHAIN 2021. [DOI: 10.3389/fbloc.2020.558705] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]

PLEIO: a method to map and interpret pleiotropic loci with GWAS summary statistics. Am J Hum Genet 2021;108:36-48. [PMID: 33352115 DOI: 10.1016/j.ajhg.2020.11.017] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2020] [Accepted: 11/23/2020] [Indexed: 12/31/2022] Open