1
|
Wang X, Zhao Q, Wang J. FedKD-CPI: Combining the federated knowledge distillation technique to accomplish synergistic compound-protein interaction prediction. Methods 2025; 234:275-283. [PMID: 39824374 DOI: 10.1016/j.ymeth.2024.12.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2024] [Revised: 12/20/2024] [Accepted: 12/31/2024] [Indexed: 01/20/2025] Open
Abstract
Compound-protein interaction (CPI) prediction is critical in the early stages of drug discovery, narrowing the search space for CPIs and reducing the cost and time required for traditional high-throughput screening. However, CPI-related data are usually distributed across different institutions and their sharing is restricted because of data privacy and intellectual property rights. Constructing a scheme that enhances multi-institutional collaboration to improve prediction accuracy while protecting data privacy is essential. To this end, we propose FedKD-CPI, the first framework based on federated knowledge distillation, to effectively facilitate multi-party CPI collaborative prediction and ensure data privacy and security. FedKD-CPI uses knowledge distillation technology to extract the updated knowledge of all client models and train the model on the server to achieve knowledge aggregation, which can effectively utilize the knowledge contained in public and private data. We evaluate FedKD-CPI on three benchmark datasets and compare it with four baselines. The results show that FedKD-CPI is very close to centralized learning and significantly better than localized learning. Furthermore, FedKD-CPI outperforms federated learning-based baselines on independent and identically distributed data and non-independent and identically distributed data. Overall, FedKD-CPI improves the CPI prediction while ensuring data security and promoting institutions' collaboration to accelerate drug discovery.
Collapse
Affiliation(s)
- Xuetao Wang
- School of Computer Science and Engineering, Central South University, Changsha 410083, China; Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha 410083, China
| | - Qichang Zhao
- School of Computer Science and Engineering, Central South University, Changsha 410083, China; Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha 410083, China.
| | - Jianxin Wang
- School of Computer Science and Engineering, Central South University, Changsha 410083, China; Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha 410083, China
| |
Collapse
|
2
|
Ballhausen H, Corradini S, Belka C, Bogdanov D, Boldrini L, Bono F, Goelz C, Landry G, Panza G, Parodi K, Talviste R, Tran HE, Gambacorta MA, Marschner S. Privacy-friendly evaluation of patient data with secure multiparty computation in a European pilot study. NPJ Digit Med 2024; 7:280. [PMID: 39397162 PMCID: PMC11471812 DOI: 10.1038/s41746-024-01293-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2024] [Accepted: 10/06/2024] [Indexed: 10/15/2024] Open
Abstract
In multicentric studies, data sharing between institutions might negatively impact patient privacy or data security. An alternative is federated analysis by secure multiparty computation. This pilot study demonstrates an architecture and implementation addressing both technical challenges and legal difficulties in the particularly demanding setting of clinical research on cancer patients within the strict European regulation on patient privacy and data protection: 24 patients from LMU University Hospital in Munich, Germany, and 24 patients from Policlinico Universitario Fondazione Agostino Gemelli, Rome, Italy, were treated for adrenal gland metastasis with typically 40 Gy in 3 or 5 fractions of online-adaptive radiotherapy guided by real-time MR. High local control (21% complete remission, 27% partial remission, 40% stable disease) and low toxicity (73% reporting no toxicity) were observed. Median overall survival was 19 months. Federated analysis was found to improve clinical science through privacy-friendly evaluation of patient data in the European health data space.
Collapse
Affiliation(s)
- Hendrik Ballhausen
- Ludwig-Maximilians-Universität München, Munich, Germany.
- Department of Radiation Oncology, LMU University Hospital, LMU Munich, Munich, Germany.
| | - Stefanie Corradini
- Department of Radiation Oncology, LMU University Hospital, LMU Munich, Munich, Germany
| | - Claus Belka
- Department of Radiation Oncology, LMU University Hospital, LMU Munich, Munich, Germany
- German Cancer Consortium (DKTK), Partner Site Munich, Munich, Germany
- Bavarian Cancer Research Center (BZKF), Munich, Germany
| | - Dan Bogdanov
- Information Security Research Institute, Cybernetica AS, Tartu, Estonia
| | - Luca Boldrini
- Fondazione Policlinico Universitario "Agostino Gemelli" IRCCS, Rome, Italy
| | - Francesco Bono
- Fondazione Policlinico Universitario "Agostino Gemelli" IRCCS, Rome, Italy
| | - Christian Goelz
- Department of Medicine I, LMU University Hospital, LMU Munich, Munich, Germany
| | - Guillaume Landry
- Department of Radiation Oncology, LMU University Hospital, LMU Munich, Munich, Germany
| | - Giulia Panza
- Fondazione Policlinico Universitario "Agostino Gemelli" IRCCS, Rome, Italy
| | - Katia Parodi
- Ludwig-Maximilians-Universität München, Munich, Germany
| | - Riivo Talviste
- Information Security Research Institute, Cybernetica AS, Tartu, Estonia
| | - Huong Elena Tran
- Fondazione Policlinico Universitario "Agostino Gemelli" IRCCS, Rome, Italy
| | | | - Sebastian Marschner
- Department of Radiation Oncology, LMU University Hospital, LMU Munich, Munich, Germany
- German Cancer Consortium (DKTK), Partner Site Munich, Munich, Germany
| |
Collapse
|
3
|
Cho H, Froelicher D, Dokmai N, Nandi A, Sadhuka S, Hong MM, Berger B. Privacy-Enhancing Technologies in Biomedical Data Science. Annu Rev Biomed Data Sci 2024; 7:317-343. [PMID: 39178425 PMCID: PMC11346580 DOI: 10.1146/annurev-biodatasci-120423-120107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/25/2024]
Abstract
The rapidly growing scale and variety of biomedical data repositories raise important privacy concerns. Conventional frameworks for collecting and sharing human subject data offer limited privacy protection, often necessitating the creation of data silos. Privacy-enhancing technologies (PETs) promise to safeguard these data and broaden their usage by providing means to share and analyze sensitive data while protecting privacy. Here, we review prominent PETs and illustrate their role in advancing biomedicine. We describe key use cases of PETs and their latest technical advances and highlight recent applications of PETs in a range of biomedical domains. We conclude by discussing outstanding challenges and social considerations that need to be addressed to facilitate a broader adoption of PETs in biomedical data science.
Collapse
Affiliation(s)
- Hyunghoon Cho
- Department of Biomedical Informatics and Data Science, Yale School of Medicine, New Haven, Connecticut, USA;
| | - David Froelicher
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA;
| | - Natnatee Dokmai
- Department of Biomedical Informatics and Data Science, Yale School of Medicine, New Haven, Connecticut, USA;
| | - Anupama Nandi
- Department of Biomedical Informatics and Data Science, Yale School of Medicine, New Haven, Connecticut, USA;
| | - Shuvom Sadhuka
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA;
| | - Matthew M Hong
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA;
| | - Bonnie Berger
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA;
- Department of Mathematics, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
| |
Collapse
|
4
|
Hong S, Choi YA, Joo DS, Gürsoy G. Privacy-preserving model evaluation for logistic and linear regression using homomorphically encrypted genotype data. J Biomed Inform 2024; 156:104678. [PMID: 38936565 PMCID: PMC11272436 DOI: 10.1016/j.jbi.2024.104678] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2024] [Revised: 05/29/2024] [Accepted: 06/19/2024] [Indexed: 06/29/2024]
Abstract
OBJECTIVE Linear and logistic regression are widely used statistical techniques in population genetics for analyzing genetic data and uncovering patterns and associations in large genetic datasets, such as identifying genetic variations linked to specific diseases or traits. However, obtaining statistically significant results from these studies requires large amounts of sensitive genotype and phenotype information from thousands of patients, which raises privacy concerns. Although cryptographic techniques such as homomorphic encryption offers a potential solution to the privacy concerns as it allows computations on encrypted data, previous methods leveraging homomorphic encryption have not addressed the confidentiality of shared models, which can leak information about the training data. METHODS In this work, we present a secure model evaluation method for linear and logistic regression using homomorphic encryption for six prediction tasks, where input genotypes, output phenotypes, and model parameters are all encrypted. RESULTS Our method ensures no private information leakage during inference and achieves high accuracy (≥93% for all outcomes) with each inference taking less than ten seconds for ∼200 genomes. CONCLUSION Our study demonstrates that it is possible to perform linear and logistic regression model evaluation while protecting patient confidentiality with theoretical security guarantees. Our implementation and test data are available at https://github.com/G2Lab/privateML/.
Collapse
Affiliation(s)
- Seungwan Hong
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA; New York Genome Center, New York, NY 10013, USA
| | - Yoolim A Choi
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA; New York Genome Center, New York, NY 10013, USA
| | - Daniel S Joo
- New York Genome Center, New York, NY 10013, USA; Department of Computer Science, Columbia University, New York, NY 10032, USA
| | - Gamze Gürsoy
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA; New York Genome Center, New York, NY 10013, USA; Department of Computer Science, Columbia University, New York, NY 10032, USA.
| |
Collapse
|
5
|
Liu Y, Liu R, Ge J, Wang Y. Advancements in brain-machine interfaces for application in the metaverse. Front Neurosci 2024; 18:1383319. [PMID: 38919909 PMCID: PMC11198002 DOI: 10.3389/fnins.2024.1383319] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2024] [Accepted: 05/14/2024] [Indexed: 06/27/2024] Open
Abstract
In recent years, with the shift of focus in metaverse research toward content exchange and social interaction, breaking through the current bottleneck of audio-visual media interaction has become an urgent issue. The use of brain-machine interfaces for sensory simulation is one of the proposed solutions. Currently, brain-machine interfaces have demonstrated irreplaceable potential as physiological signal acquisition tools in various fields within the metaverse. This study explores three application scenarios: generative art in the metaverse, serious gaming for healthcare in metaverse medicine, and brain-machine interface applications for facial expression synthesis in the virtual society of the metaverse. It investigates existing commercial products and patents (such as MindWave Mobile, GVS, and Galea), draws analogies with the development processes of network security and neurosecurity, bioethics and neuroethics, and discusses the challenges and potential issues that may arise when brain-machine interfaces mature and are widely applied. Furthermore, it looks ahead to the diverse possibilities of deep and varied applications of brain-machine interfaces in the metaverse in the future.
Collapse
Affiliation(s)
- Yang Liu
- Department of Ophthalmology, First Hospital of China Medical University, Shengyang, China
| | - Ruibin Liu
- Department of Clinical Integration of Traditional Chinese and Western medicine, Liaoning University of Traditional Chinese Medicine, Shenyang, China
- Department of General Surgery, Cancer Hospital of China Medical University, Liaoning Cancer Hospital & Institute, Shenyang, China
| | - Jinnian Ge
- Department of General Surgery, First Hospital of China Medical University, Shengyang, China
| | - Yue Wang
- Department of General Surgery, Cancer Hospital of China Medical University, Liaoning Cancer Hospital & Institute, Shenyang, China
| |
Collapse
|
6
|
Wu Z, Zhang T, Ma X, Guo S, Zhou Q, Zahoor A, Deng G. Recent advances in anti-inflammatory active components and action mechanisms of natural medicines. Inflammopharmacology 2023; 31:2901-2937. [PMID: 37947913 DOI: 10.1007/s10787-023-01369-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2023] [Accepted: 09/16/2023] [Indexed: 11/12/2023]
Abstract
Inflammation is a series of reactions caused by the body's resistance to external biological stimuli. Inflammation affects the occurrence and development of many diseases. Anti-inflammatory drugs have been used widely to treat inflammatory diseases, but long-term use can cause toxic side-effects and affect human functions. As immunomodulators with long-term conditioning effects and no drug residues, natural products are being investigated increasingly for the treatment of inflammatory diseases. In this review, we focus on the inflammatory process and cellular mechanisms in the development of diseases such as inflammatory bowel disease, atherosclerosis, and coronavirus disease-2019. Also, we focus on three signaling pathways (Nuclear factor-kappa B, p38 mitogen-activated protein kinase, Janus kinase/signal transducer and activator of transcription-3) to explain the anti-inflammatory effect of natural products. In addition, we also classified common natural products based on secondary metabolites and explained the association between current bidirectional prediction progress of natural product targets and inflammatory diseases.
Collapse
Affiliation(s)
- Zhimin Wu
- Department of Clinical Veterinary Medicine, College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, China
| | - Tao Zhang
- College of Animal Science and Technology, Anhui Agricultural University, Hefei, China
| | - Xiaofei Ma
- College of Veterinary Medicine, Gansu Agriculture University, Lanzhou, China
| | - Shuai Guo
- Department of Clinical Veterinary Medicine, College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, China
| | - Qingqing Zhou
- Department of Clinical Veterinary Medicine, College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, China
| | - Arshad Zahoor
- College of Veterinary Sciences, The University of Agriculture Peshawar, Peshawar, Pakistan
| | - Ganzhen Deng
- Department of Clinical Veterinary Medicine, College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, China.
| |
Collapse
|
7
|
Smajić A, Grandits M, Ecker GF. Privacy-preserving techniques for decentralized and secure machine learning in drug discovery. Drug Discov Today 2023; 28:103820. [PMID: 37935330 DOI: 10.1016/j.drudis.2023.103820] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Revised: 10/17/2023] [Accepted: 11/01/2023] [Indexed: 11/09/2023]
Abstract
Data availability, data security, and privacy concerns often hamper optimal performance efficiency of machine learning (ML) techniques. Therefore, novel techniques for the utilization of private/sensitive data in the field of drug discovery have been proposed for ML model-building tasks. Some examples of the different techniques are secure multiparty computation, distributed deep learning, homomorphic encryption, blockchain-based peer-to-peer networking, differential privacy, and federated learning, as well as combinations of such techniques. In this paper, we present an overview of these techniques for decentralized ML to illustrate its benefits and drawbacks in the field of drug discovery.
Collapse
Affiliation(s)
- Aljoša Smajić
- Department of Pharmaceutical Sciences, University of Vienna, Vienna, Austria
| | - Melanie Grandits
- Department of Pharmaceutical Sciences, University of Vienna, Vienna, Austria
| | - Gerhard F Ecker
- Department of Pharmaceutical Sciences, University of Vienna, Vienna, Austria
| |
Collapse
|
8
|
Pan L, Xiao X, Liu S, Peng S. An Integration Framework of Secure Multiparty Computation and Deep Neural Network for Improving Drug-Drug Interaction Predictions. J Comput Biol 2023; 30:1034-1045. [PMID: 37707993 DOI: 10.1089/cmb.2023.0076] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/16/2023] Open
Abstract
Drug-drug interaction (DDI) is a key concern in drug development and pharmacovigilance. It is important to improve DDI predictions by integrating multisource data from various pharmaceutical companies. Unfortunately, the data privacy and financial interest issues seriously influence the interinstitutional collaborations for DDI predictions. We propose multiparty computation DDI (MPCDDI), a secure MPC-based deep learning framework for DDI predictions. MPCDDI leverages the secret sharing technologies to incorporate the drug-related feature data from multiple institutions and develops a deep learning model for DDI predictions. In MPCDDI, all data transmission and deep learning operations are integrated into secure MPC frameworks to enable high-quality collaboration among pharmaceutical institutions without divulging private drug-related information. The results suggest that MPCDDI is superior to other eight baselines and achieves the similar performance to that of the corresponding plaintext collaborations. More interestingly, MPCDDI significantly outperforms methods that use private data from the single institution. In summary, MPCDDI is an effective framework for promoting collaborative and privacy-preserving drug discovery.
Collapse
Affiliation(s)
- Liang Pan
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Xia Xiao
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | | | - Shaoliang Peng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
- The State Key Laboratory of Chemo/Biosensing and Chemometrics, Hunan University, Changsha, China
| |
Collapse
|
9
|
Geva R, Gusev A, Polyakov Y, Liram L, Rosolio O, Alexandru A, Genise N, Blatt M, Duchin Z, Waissengrin B, Mirelman D, Bukstein F, Blumenthal DT, Wolf I, Pelles-Avraham S, Schaffer T, Lavi LA, Micciancio D, Vaikuntanathan V, Badawi AA, Goldwasser S. Collaborative privacy-preserving analysis of oncological data using multiparty homomorphic encryption. Proc Natl Acad Sci U S A 2023; 120:e2304415120. [PMID: 37549296 PMCID: PMC10437415 DOI: 10.1073/pnas.2304415120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2023] [Accepted: 06/09/2023] [Indexed: 08/09/2023] Open
Abstract
Real-world healthcare data sharing is instrumental in constructing broader-based and larger clinical datasets that may improve clinical decision-making research and outcomes. Stakeholders are frequently reluctant to share their data without guaranteed patient privacy, proper protection of their datasets, and control over the usage of their data. Fully homomorphic encryption (FHE) is a cryptographic capability that can address these issues by enabling computation on encrypted data without intermediate decryptions, so the analytics results are obtained without revealing the raw data. This work presents a toolset for collaborative privacy-preserving analysis of oncological data using multiparty FHE. Our toolset supports survival analysis, logistic regression training, and several common descriptive statistics. We demonstrate using oncological datasets that the toolset achieves high accuracy and practical performance, which scales well to larger datasets. As part of this work, we propose a cryptographic protocol for interactive bootstrapping in multiparty FHE, which is of independent interest. The toolset we develop is general-purpose and can be applied to other collaborative medical and healthcare application domains.
Collapse
Affiliation(s)
- Ravit Geva
- Tel Aviv Sorasky Medical Center, Tel Aviv64239, Israel
| | - Alexander Gusev
- Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA02215
| | | | - Lior Liram
- Duality Technologies, Inc., Hoboken, NJ07103
| | | | | | | | | | | | | | - Dan Mirelman
- Tel Aviv Sorasky Medical Center, Tel Aviv64239, Israel
| | | | | | - Ido Wolf
- Tel Aviv Sorasky Medical Center, Tel Aviv64239, Israel
| | | | - Tali Schaffer
- Tel Aviv Sorasky Medical Center, Tel Aviv64239, Israel
| | - Lee A. Lavi
- Tel Aviv Sorasky Medical Center, Tel Aviv64239, Israel
| | - Daniele Micciancio
- Duality Technologies, Inc., Hoboken, NJ07103
- University of California, San Diego, CA92093
| | - Vinod Vaikuntanathan
- Duality Technologies, Inc., Hoboken, NJ07103
- Massachusetts Institute of Technology, Cambridge, MA02139
| | | | - Shafi Goldwasser
- Duality Technologies, Inc., Hoboken, NJ07103
- Simons Institute for the Theory of Computing, University of California, Berkeley, CA94720
| |
Collapse
|
10
|
Bi X, Shen X. Distribution-Invariant Differential Privacy. JOURNAL OF ECONOMETRICS 2023; 235:444-453. [PMID: 37701878 PMCID: PMC10495082 DOI: 10.1016/j.jeconom.2022.05.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/14/2023]
Abstract
Differential privacy is becoming one gold standard for protecting the privacy of publicly shared data. It has been widely used in social science, data science, public health, information technology, and the U.S. decennial census. Nevertheless, to guarantee differential privacy, existing methods may unavoidably alter the conclusion of original data analysis, as privatization often changes the sample distribution. This phenomenon is known as the trade-off between privacy protection and statistical accuracy. In this work, we mitigate this trade-off by developing a distribution-invariant privatization (DIP) method to reconcile both high statistical accuracy and strict differential privacy. As a result, any downstream statistical or machine learning task yields essentially the same conclusion as if one used the original data. Numerically, under the same strictness of privacy protection, DIP achieves superior statistical accuracy in a wide range of simulation studies and real-world benchmarks.
Collapse
Affiliation(s)
- Xuan Bi
- Information and Decision Sciences, Carlson School of Management, University of Minnesota, Minneapolis, MN
| | - Xiaotong Shen
- School of Statistics, University of Minnesota, Minneapolis, MN
| |
Collapse
|
11
|
Wang H, Fu T, Du Y, Gao W, Huang K, Liu Z, Chandak P, Liu S, Van Katwyk P, Deac A, Anandkumar A, Bergen K, Gomes CP, Ho S, Kohli P, Lasenby J, Leskovec J, Liu TY, Manrai A, Marks D, Ramsundar B, Song L, Sun J, Tang J, Veličković P, Welling M, Zhang L, Coley CW, Bengio Y, Zitnik M. Scientific discovery in the age of artificial intelligence. Nature 2023; 620:47-60. [PMID: 37532811 DOI: 10.1038/s41586-023-06221-2] [Citation(s) in RCA: 190] [Impact Index Per Article: 95.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2022] [Accepted: 05/16/2023] [Indexed: 08/04/2023]
Abstract
Artificial intelligence (AI) is being increasingly integrated into scientific discovery to augment and accelerate research, helping scientists to generate hypotheses, design experiments, collect and interpret large datasets, and gain insights that might not have been possible using traditional scientific methods alone. Here we examine breakthroughs over the past decade that include self-supervised learning, which allows models to be trained on vast amounts of unlabelled data, and geometric deep learning, which leverages knowledge about the structure of scientific data to enhance model accuracy and efficiency. Generative AI methods can create designs, such as small-molecule drugs and proteins, by analysing diverse data modalities, including images and sequences. We discuss how these methods can help scientists throughout the scientific process and the central issues that remain despite such advances. Both developers and users of AI toolsneed a better understanding of when such approaches need improvement, and challenges posed by poor data quality and stewardship remain. These issues cut across scientific disciplines and require developing foundational algorithmic approaches that can contribute to scientific understanding or acquire it autonomously, making them critical areas of focus for AI innovation.
Collapse
Affiliation(s)
- Hanchen Wang
- Department of Engineering, University of Cambridge, Cambridge, UK
- Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, CA, USA
- Department of Research and Early Development, Genentech Inc, South San Francisco, CA, USA
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Tianfan Fu
- Department of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA, USA
| | - Yuanqi Du
- Department of Computer Science, Cornell University, Ithaca, NY, USA
| | - Wenhao Gao
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Kexin Huang
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Ziming Liu
- Department of Physics, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Payal Chandak
- Harvard-MIT Program in Health Sciences and Technology, Cambridge, MA, USA
| | - Shengchao Liu
- Mila - Quebec AI Institute, Montreal, Quebec, Canada
- Université de Montréal, Montreal, Quebec, Canada
| | - Peter Van Katwyk
- Department of Earth, Environmental and Planetary Sciences, Brown University, Providence, RI, USA
- Data Science Institute, Brown University, Providence, RI, USA
| | - Andreea Deac
- Mila - Quebec AI Institute, Montreal, Quebec, Canada
- Université de Montréal, Montreal, Quebec, Canada
| | - Anima Anandkumar
- Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, CA, USA
- NVIDIA, Santa Clara, CA, USA
| | - Karianne Bergen
- Department of Earth, Environmental and Planetary Sciences, Brown University, Providence, RI, USA
- Data Science Institute, Brown University, Providence, RI, USA
| | - Carla P Gomes
- Department of Computer Science, Cornell University, Ithaca, NY, USA
| | - Shirley Ho
- Center for Computational Astrophysics, Flatiron Institute, New York, NY, USA
- Department of Astrophysical Sciences, Princeton University, Princeton, NJ, USA
- Department of Physics, Carnegie Mellon University, Pittsburgh, PA, USA
- Department of Physics and Center for Data Science, New York University, New York, NY, USA
| | | | - Joan Lasenby
- Department of Engineering, University of Cambridge, Cambridge, UK
| | - Jure Leskovec
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | | | - Arjun Manrai
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Debora Marks
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | - Le Song
- BioMap, Beijing, China
- Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, United Arab Emirates
| | - Jimeng Sun
- University of Illinois at Urbana-Champaign, Champaign, IL, USA
| | - Jian Tang
- Mila - Quebec AI Institute, Montreal, Quebec, Canada
- HEC Montréal, Montreal, Quebec, Canada
- CIFAR AI Chair, Toronto, Ontario, Canada
| | - Petar Veličković
- Google DeepMind, London, UK
- Department of Computer Science and Technology, University of Cambridge, Cambridge, UK
| | - Max Welling
- University of Amsterdam, Amsterdam, Netherlands
- Microsoft Research Amsterdam, Amsterdam, Netherlands
| | - Linfeng Zhang
- DP Technology, Beijing, China
- AI for Science Institute, Beijing, China
| | - Connor W Coley
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Yoshua Bengio
- Mila - Quebec AI Institute, Montreal, Quebec, Canada
- Université de Montréal, Montreal, Quebec, Canada
| | - Marinka Zitnik
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Harvard Data Science Initiative, Cambridge, MA, USA.
- Kempner Institute for the Study of Natural and Artificial Intelligence, Harvard University, Cambridge, MA, USA.
| |
Collapse
|
12
|
Singh R, Sledzieski S, Bryson B, Cowen L, Berger B. Contrastive learning in protein language space predicts interactions between drugs and protein targets. Proc Natl Acad Sci U S A 2023; 120:e2220778120. [PMID: 37289807 PMCID: PMC10268324 DOI: 10.1073/pnas.2220778120] [Citation(s) in RCA: 47] [Impact Index Per Article: 23.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Accepted: 04/10/2023] [Indexed: 06/10/2023] Open
Abstract
Sequence-based prediction of drug-target interactions has the potential to accelerate drug discovery by complementing experimental screens. Such computational prediction needs to be generalizable and scalable while remaining sensitive to subtle variations in the inputs. However, current computational techniques fail to simultaneously meet these goals, often sacrificing performance of one to achieve the others. We develop a deep learning model, ConPLex, successfully leveraging the advances in pretrained protein language models ("PLex") and employing a protein-anchored contrastive coembedding ("Con") to outperform state-of-the-art approaches. ConPLex achieves high accuracy, broad adaptivity to unseen data, and specificity against decoy compounds. It makes predictions of binding based on the distance between learned representations, enabling predictions at the scale of massive compound libraries and the human proteome. Experimental testing of 19 kinase-drug interaction predictions validated 12 interactions, including four with subnanomolar affinity, plus a strongly binding EPHB1 inhibitor (KD = 1.3 nM). Furthermore, ConPLex embeddings are interpretable, which enables us to visualize the drug-target embedding space and use embeddings to characterize the function of human cell-surface proteins. We anticipate that ConPLex will facilitate efficient drug discovery by making highly sensitive in silico drug screening feasible at the genome scale. ConPLex is available open source at https://ConPLex.csail.mit.edu.
Collapse
Affiliation(s)
- Rohit Singh
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA02139
| | - Samuel Sledzieski
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA02139
| | - Bryan Bryson
- Ragon Institute of MGH, MIT and Harvard, Cambridge, MA02139
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA02139
| | - Lenore Cowen
- Department of Computer Science, Tufts University, Medford, MA02155
| | - Bonnie Berger
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA02139
- Department of Mathematics, Massachusetts Institute of Technology, Cambridge, MA02139
| |
Collapse
|
13
|
Sequre: a high-performance framework for secure multiparty computation enables biomedical data sharing. Genome Biol 2023; 24:5. [PMID: 36631897 PMCID: PMC9832703 DOI: 10.1186/s13059-022-02841-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Accepted: 12/21/2022] [Indexed: 01/12/2023] Open
Abstract
Secure multiparty computation (MPC) is a cryptographic tool that allows computation on top of sensitive biomedical data without revealing private information to the involved entities. Here, we introduce Sequre, an easy-to-use, high-performance framework for developing performant MPC applications. Sequre offers a set of automatic compile-time optimizations that significantly improve the performance of MPC applications and incorporates the syntax of Python programming language to facilitate rapid application development. We demonstrate its usability and performance on various bioinformatics tasks showing up to 3-4 times increased speed over the existing pipelines with 7-fold reductions in codebase sizes.
Collapse
|
14
|
Wirth FN, Kussel T, Müller A, Hamacher K, Prasser F. EasySMPC: a simple but powerful no-code tool for practical secure multiparty computation. BMC Bioinformatics 2022; 23:531. [PMID: 36494612 PMCID: PMC9733077 DOI: 10.1186/s12859-022-05044-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Accepted: 11/08/2022] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Modern biomedical research is data-driven and relies heavily on the re-use and sharing of data. Biomedical data, however, is subject to strict data protection requirements. Due to the complexity of the data required and the scale of data use, obtaining informed consent is often infeasible. Other methods, such as anonymization or federation, in turn have their own limitations. Secure multi-party computation (SMPC) is a cryptographic technology for distributed calculations, which brings formally provable security and privacy guarantees and can be used to implement a wide-range of analytical approaches. As a relatively new technology, SMPC is still rarely used in real-world biomedical data sharing activities due to several barriers, including its technical complexity and lack of usability. RESULTS To overcome these barriers, we have developed the tool EasySMPC, which is implemented in Java as a cross-platform, stand-alone desktop application provided as open-source software. The tool makes use of the SMPC method Arithmetic Secret Sharing, which allows to securely sum up pre-defined sets of variables among different parties in two rounds of communication (input sharing and output reconstruction) and integrates this method into a graphical user interface. No additional software services need to be set up or configured, as EasySMPC uses the most widespread digital communication channel available: e-mails. No cryptographic keys need to be exchanged between the parties and e-mails are exchanged automatically by the software. To demonstrate the practicability of our solution, we evaluated its performance in a wide range of data sharing scenarios. The results of our evaluation show that our approach is scalable (summing up 10,000 variables between 20 parties takes less than 300 s) and that the number of participants is the essential factor. CONCLUSIONS We have developed an easy-to-use "no-code solution" for performing secure joint calculations on biomedical data using SMPC protocols, which is suitable for use by scientists without IT expertise and which has no special infrastructure requirements. We believe that innovative approaches to data sharing with SMPC are needed to foster the translation of complex protocols into practice.
Collapse
Affiliation(s)
- Felix Nikolaus Wirth
- grid.484013.a0000 0004 6879 971XBerlin Institute of Health at Charité – Universitätsmedizin Berlin, Medical Informatics Group, Charitéplatz 1, 10117 Berlin, Germany
| | - Tobias Kussel
- grid.6546.10000 0001 0940 1669Computational Biology and Simulation, TU Darmstadt, Darmstadt, Germany
| | - Armin Müller
- grid.484013.a0000 0004 6879 971XBerlin Institute of Health at Charité – Universitätsmedizin Berlin, Medical Informatics Group, Charitéplatz 1, 10117 Berlin, Germany
| | - Kay Hamacher
- grid.6546.10000 0001 0940 1669Computational Biology and Simulation, TU Darmstadt, Darmstadt, Germany
| | - Fabian Prasser
- grid.484013.a0000 0004 6879 971XBerlin Institute of Health at Charité – Universitätsmedizin Berlin, Medical Informatics Group, Charitéplatz 1, 10117 Berlin, Germany
| |
Collapse
|
15
|
Kuo TT, Jiang X, Tang H, Wang X, Harmanci A, Kim M, Post K, Bu D, Bath T, Kim J, Liu W, Chen H, Ohno-Machado L. The evolving privacy and security concerns for genomic data analysis and sharing as observed from the iDASH competition. J Am Med Inform Assoc 2022; 29:2182-2190. [PMID: 36164820 PMCID: PMC9667175 DOI: 10.1093/jamia/ocac165] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2022] [Revised: 08/25/2022] [Accepted: 09/13/2022] [Indexed: 01/11/2023] Open
Abstract
Concerns regarding inappropriate leakage of sensitive personal information as well as unauthorized data use are increasing with the growth of genomic data repositories. Therefore, privacy and security of genomic data have become increasingly important and need to be studied. With many proposed protection techniques, their applicability in support of biomedical research should be well understood. For this purpose, we have organized a community effort in the past 8 years through the integrating data for analysis, anonymization and sharing consortium to address this practical challenge. In this article, we summarize our experience from these competitions, report lessons learned from the events in 2020/2021 as examples, and discuss potential future research directions in this emerging field.
Collapse
Affiliation(s)
- Tsung-Ting Kuo
- Corresponding Author: Tsung-Ting Kuo, PhD, UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, CA 92093, USA;
| | | | | | | | - Arif Harmanci
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Miran Kim
- Department of Mathematics, Hanyang University, Seoul, Republic of Korea,Department of Computer Science, Hanyang University, Seoul, Republic of Korea
| | - Kai Post
- UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, California, USA
| | - Diyue Bu
- Luddy School of Informatics, Computing, and Engineering, Indiana University Bloomington, Bloomington, Indiana, USA
| | - Tyler Bath
- UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, California, USA
| | - Jihoon Kim
- UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, California, USA
| | - Weijie Liu
- Luddy School of Informatics, Computing, and Engineering, Indiana University Bloomington, Bloomington, Indiana, USA
| | - Hongbo Chen
- Luddy School of Informatics, Computing, and Engineering, Indiana University Bloomington, Bloomington, Indiana, USA
| | - Lucila Ohno-Machado
- UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, California, USA,Division of Health Services Research & Development, Veteran Affairs San Diego Healthcare System, San Diego, California, USA
| |
Collapse
|
16
|
TrustGWAS: A full-process workflow for encrypted GWAS using multi-key homomorphic encryption and pseudorandom number perturbation. Cell Syst 2022; 13:752-767.e6. [PMID: 36041458 DOI: 10.1016/j.cels.2022.08.001] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Revised: 04/21/2022] [Accepted: 08/04/2022] [Indexed: 01/26/2023]
Abstract
The statistical power of genome-wide association studies (GWASs) is affected by the effective sample size. However, the privacy and security concerns associated with individual-level genotype data pose great challenges for cross-institutional cooperation. The full-process cryptographic solutions are in demand but have not been covered, especially the essential principal-component analysis (PCA). Here, we present TrustGWAS, a complete solution for secure, large-scale GWAS, recapitulating gold standard results against PLINK without compromising privacy and supporting basic PLINK steps including quality control, linkage disequilibrium pruning, PCA, chi-square test, Cochran-Armitage trend test, covariate-supported logistic regression and linear regression, and their sequential combinations. TrustGWAS leverages pseudorandom number perturbations for PCA and multiparty scheme of multi-key homomorphic encryption for all other modules. TrustGWAS can evaluate 100,000 individuals with 1 million variants and complete QC-LD-PCA-regression workflow within 50 h. We further successfully discover gene loci associated with fasting blood glucose, consistent with the findings of the ChinaMAP project.
Collapse
|
17
|
Soellner M, Koenigstorfer J. Motive perception pathways to the release of personal information to healthcare organizations. BMC Med Inform Decis Mak 2022; 22:240. [PMID: 36100876 PMCID: PMC9468521 DOI: 10.1186/s12911-022-01986-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2022] [Accepted: 09/07/2022] [Indexed: 11/23/2022] Open
Abstract
BACKGROUND The goal of the study is to assess the downstream effects of who requests personal information from individuals for artificial intelligence-(AI) based healthcare research purposes-be it a pharmaceutical company (as an example of a for-profit organization) or a university hospital (as an example of a not-for-profit organization)-as well as their boundary conditions on individuals' likelihood to release personal information about their health. For the latter, the study considers two dimensions: the tendency to self-disclose (which is aimed to be high so that AI applications can reach their full potential) and the tendency to falsify (which is aimed to be low so that AI applications are based on both valid and reliable data). METHODS Across three experimental studies with Amazon Mechanical Turk workers from the U.S. (n = 204, n = 330, and n = 328, respectively), Covid-19 was used as the healthcare research context. RESULTS University hospitals (vs. pharmaceutical companies) score higher on altruism and lower on egoism. Individuals were more willing to disclose data if they perceived that the requesting organization acts based on altruistic motives (i.e., the motives function as gate openers). Individuals were more likely to protect their data by intending to provide false information when they perceived egoistic motives to be the main driver for the organization requesting their data (i.e., the motives function as a privacy protection tool). Two moderators, namely message appeal (Study 2) and message endorser credibility (Study 3) influence the two indirect pathways of the release of personal information. CONCLUSION The findings add to Communication Privacy Management Theory as well as Attribution Theory by suggesting motive-based pathways to the release of correct personal health data. Compared to not-for-profit organizations, for-profit organizations are particularly recommended to match their message appeal with the organizations' purposes (to provide personal benefit) and to use high-credibility endorsers in order to reduce inherent disadvantages in motive perceptions.
Collapse
Affiliation(s)
- Michaela Soellner
- Chair of Sport and Health Management, Technical University of Munich, Campus D - Uptown Munich, Georg-Brauchle-Ring 60/62, 80992, Munich, Germany
| | - Joerg Koenigstorfer
- Chair of Sport and Health Management, Technical University of Munich, Campus D - Uptown Munich, Georg-Brauchle-Ring 60/62, 80992, Munich, Germany.
| |
Collapse
|
18
|
Abstract
Genomics data are important for advancing biomedical research, improving clinical care, and informing other disciplines such as forensics and genealogy. However, privacy concerns arise when genomic data are shared. In particular, the identifying nature of genetic information, its direct relationship to health status, and the potential financial harm and stigmatization posed to individuals and their blood relatives call for a survey of the privacy issues related to sharing genetic and related data and potential solutions to overcome these issues. In this work, we provide an overview of the importance of genomic privacy, the information gleaned from genomics data, the sources of potential private information leakages in genomics, and ways to preserve privacy while utilizing the genetic information in research. We discuss the relationship between trust in the scientific community and protecting privacy, illuminating a future roadmap for data sharing and study participation.
Collapse
Affiliation(s)
- Gamze Gürsoy
- Department of Biomedical Informatics, Columbia University, New York, NY, USA; .,New York Genome Center, New York, NY, USA
| |
Collapse
|
19
|
Liu X, Zheng Y, Yuan X, Yi X. Deep learning-based medical diagnostic services: A secure, lightweight, and accurate realization1. JOURNAL OF COMPUTER SECURITY 2022. [DOI: 10.3233/jcs-210165] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
In this paper, we propose CryptMed, a system framework that enables medical service providers to offer secure, lightweight, and accurate medical diagnostic service to their customers via an execution of neural network inference in the ciphertext domain. CryptMed ensures the privacy of both parties with cryptographic guarantees. Our technical contributions include: 1) presenting a secret sharing based inference protocol that can well cope with the commonly-used linear and non-linear NN layers; 2) devising optimized secure comparison function that can efficiently support comparison-based activation functions in NN architectures; 3) constructing a suite of secure smooth functions built on precise approximation approaches for accurate medical diagnoses. We evaluate CryptMed on 6 neural network architectures across a wide range of non-linear activation functions over two benchmark and four real-world medical datasets. We comprehensively compare our system with prior art in terms of end-to-end service workload and prediction accuracy. Our empirical results demonstrate that CryptMed achieves up to respectively 413 ×, 19 ×, and 43 × bandwidth savings for MNIST, CIFAR-10, and medical applications compared with prior art. For the smooth activation based inference, the best choice of our proposed approximations preserve the precision of original functions, with less than 1.2% accuracy loss and could enhance the precision due to the newly introduced activation function family.
Collapse
Affiliation(s)
- Xiaoning Liu
- School of Computing Technologies, RMIT University, Melbourne, VIC 3001, Australia
| | - Yifeng Zheng
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China
| | - Xingliang Yuan
- Faculty of Information Technology, Monash University, Clayton, VIC 3800, Australia
| | - Xun Yi
- School of Computing Technologies, RMIT University, Melbourne, VIC 3001, Australia
| |
Collapse
|
20
|
Privacy-preserving federated neural network learning for disease-associated cell classification. PATTERNS 2022; 3:100487. [PMID: 35607628 PMCID: PMC9122966 DOI: 10.1016/j.patter.2022.100487] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/18/2021] [Revised: 02/14/2022] [Accepted: 03/14/2022] [Indexed: 11/21/2022]
Abstract
Training accurate and robust machine learning models requires a large amount of data that is usually scattered across data silos. Sharing or centralizing the data of different healthcare institutions is, however, unfeasible or prohibitively difficult due to privacy regulations. In this work, we address this problem by using a privacy-preserving federated learning-based approach, PriCell, for complex models such as convolutional neural networks. PriCell relies on multiparty homomorphic encryption and enables the collaborative training of encrypted neural networks with multiple healthcare institutions. We preserve the confidentiality of each institutions’ input data, of any intermediate values, and of the trained model parameters. We efficiently replicate the training of a published state-of-the-art convolutional neural network architecture in a decentralized and privacy-preserving manner. Our solution achieves an accuracy comparable with the one obtained with the centralized non-secure solution. PriCell guarantees patient privacy and ensures data utility for efficient multi-center studies involving complex healthcare data. We enable collaborative and privacy-preserving model training between institutions Training under encryption does not degrade the utility of the data We apply our solution to the single-cell analysis in a federated setting Our method is generalizable to other machine learning tasks in the healthcare domain
High-quality medical machine learning models will benefit greatly from collaboration between health care institutions. Yet, it is usually difficult to transfer data between these institutions due to strict privacy regulations. In this study, we propose a solution, PriCell, that relies on multiparty homomorphic encryption to enable privacy-preserving collaborative machine learning while protecting via encryption the institutions' input data, the model, and any value exchanged between the institutions. We show the maturity of our solution by training a published state-of-the-art convolutional neural network in a decentralized and privacy-preserving manner. We compare the accuracy achieved by PriCell with the centralized and non-secure solutions and show that PriCell guarantees privacy without reducing the utility of the data. The benefits of PriCell constitute an important landmark for real-world applications of collaborative training while preserving privacy.
Collapse
|
21
|
Smajlović H, Shajii A, Berger B, Cho H, Numanagić I. Sequre: a high-performance framework for rapid development of secure bioinformatics pipelines. IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, WORKSHOPS AND PHD FORUM : [PROCEEDINGS]. IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, WORKSHOPS AND PHD FORUM 2022; 2022:164-165. [PMID: 35958356 PMCID: PMC9364365] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Affiliation(s)
| | | | | | - Hyunghoon Cho
- Broad Institute of MIT and Harvard, Massachusetts, USA
| | | |
Collapse
|
22
|
Functional genomics data: privacy risk assessment and technological mitigation. Nat Rev Genet 2022; 23:245-258. [PMID: 34759381 DOI: 10.1038/s41576-021-00428-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/18/2021] [Indexed: 12/15/2022]
Abstract
The generation of functional genomics data by next-generation sequencing has increased greatly in the past decade. Broad sharing of these data is essential for research advancement but poses notable privacy challenges, some of which are analogous to those that occur when sharing genetic variant data. However, there are also unique privacy challenges that arise from cryptic information leakage during the processing and summarization of functional genomics data from raw reads to derived quantities, such as gene expression values. Here, we review these challenges and present potential solutions for mitigating privacy risks while allowing broad data dissemination and analysis.
Collapse
|
23
|
Privacy-preserving genotype imputation with fully homomorphic encryption. Cell Syst 2022; 13:173-182.e3. [PMID: 34758288 PMCID: PMC8857019 DOI: 10.1016/j.cels.2021.10.003] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2020] [Revised: 06/28/2021] [Accepted: 10/15/2021] [Indexed: 12/17/2022]
Abstract
Genotype imputation is the inference of unknown genotypes using known population structure observed in large genomic datasets; it can further our understanding of phenotype-genotype relationships and is useful for QTL mapping and GWASs. However, the compute-intensive nature of genotype imputation can overwhelm local servers for computation and storage. Hence, many researchers are moving toward using cloud services, raising privacy concerns. We address these concerns by developing an efficient, privacy-preserving algorithm called p-Impute. Our method uses homomorphic encryption, allowing calculations on ciphertext, thereby avoiding the decryption of private genotypes in the cloud. It is similar to k-nearest neighbor approaches, inferring missing genotypes in a genomic block based on the SNP genotypes of genetically related individuals in the same block. Our results demonstrate accuracy in agreement with the state-of-the-art plaintext solutions. Moreover, p-Impute is scalable to real-world applications as its memory and time requirements increase linearly with the increasing number of samples. p-Impute is freely available for download here: https://doi.org/10.5281/zenodo.5542001.
Collapse
|
24
|
Martínez-García M, Hernández-Lemus E. Data Integration Challenges for Machine Learning in Precision Medicine. Front Med (Lausanne) 2022; 8:784455. [PMID: 35145977 PMCID: PMC8821900 DOI: 10.3389/fmed.2021.784455] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2021] [Accepted: 12/28/2021] [Indexed: 12/19/2022] Open
Abstract
A main goal of Precision Medicine is that of incorporating and integrating the vast corpora on different databases about the molecular and environmental origins of disease, into analytic frameworks, allowing the development of individualized, context-dependent diagnostics, and therapeutic approaches. In this regard, artificial intelligence and machine learning approaches can be used to build analytical models of complex disease aimed at prediction of personalized health conditions and outcomes. Such models must handle the wide heterogeneity of individuals in both their genetic predisposition and their social and environmental determinants. Computational approaches to medicine need to be able to efficiently manage, visualize and integrate, large datasets combining structure, and unstructured formats. This needs to be done while constrained by different levels of confidentiality, ideally doing so within a unified analytical architecture. Efficient data integration and management is key to the successful application of computational intelligence approaches to medicine. A number of challenges arise in the design of successful designs to medical data analytics under currently demanding conditions of performance in personalized medicine, while also subject to time, computational power, and bioethical constraints. Here, we will review some of these constraints and discuss possible avenues to overcome current challenges.
Collapse
Affiliation(s)
- Mireya Martínez-García
- Clinical Research Division, National Institute of Cardiology ‘Ignacio Chávez’, Mexico City, Mexico
| | - Enrique Hernández-Lemus
- Computational Genomics Division, National Institute of Genomic Medicine (INMEGEN), Mexico City, Mexico
- Center for Complexity Sciences, Universidad Nacional Autnoma de Mexico, Mexico City, Mexico
| |
Collapse
|
25
|
Greener JG, Kandathil SM, Moffat L, Jones DT. A guide to machine learning for biologists. Nat Rev Mol Cell Biol 2022; 23:40-55. [PMID: 34518686 DOI: 10.1038/s41580-021-00407-0] [Citation(s) in RCA: 682] [Impact Index Per Article: 227.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/23/2021] [Indexed: 02/08/2023]
Abstract
The expanding scale and inherent complexity of biological data have encouraged a growing use of machine learning in biology to build informative and predictive models of the underlying biological processes. All machine learning techniques fit models to data; however, the specific methods are quite varied and can at first glance seem bewildering. In this Review, we aim to provide readers with a gentle introduction to a few key machine learning techniques, including the most recently developed and widely used techniques involving deep neural networks. We describe how different techniques may be suited to specific types of biological data, and also discuss some best practices and points to consider when one is embarking on experiments involving machine learning. Some emerging directions in machine learning methodology are also discussed.
Collapse
Affiliation(s)
- Joe G Greener
- Department of Computer Science, University College London, London, UK
| | - Shaun M Kandathil
- Department of Computer Science, University College London, London, UK
| | - Lewis Moffat
- Department of Computer Science, University College London, London, UK
| | - David T Jones
- Department of Computer Science, University College London, London, UK.
| |
Collapse
|
26
|
MacKinnon SS, Madani Tonekaboni SA, Windemuth A. Proteome-Scale Drug-Target Interaction Predictions: Approaches and Applications. Curr Protoc 2021; 1:e302. [PMID: 34794211 DOI: 10.1002/cpz1.302] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Drug-Target interaction predictions are an important cornerstone of computer-aided drug discovery. While predictive methods around individual targets have a long history, the application of proteome-scale models is relatively recent. In this overview, we will provide the context required to understand advances in this emerging field within computational drug discovery, evaluate emerging technologies for suitability to given tasks, and provide guidelines for the design and implementation of new drug-target interaction prediction models. We will discuss the validation approaches used, and propose a set of key criteria that should be applied to evaluate their validity. We note that we find widespread deficiencies in the existing literature, making it difficult to judge the practical effectiveness of some of the techniques proposed from their publications alone. We hope that this review may help remedy this situation and increase awareness of several sources of bias that may enter into commonly used cross-validation methods. © 2021 Cyclica Inc. Current Protocols published by Wiley Periodicals LLC.
Collapse
|
27
|
Truly privacy-preserving federated analytics for precision medicine with multiparty homomorphic encryption. Nat Commun 2021; 12:5910. [PMID: 34635645 PMCID: PMC8505638 DOI: 10.1038/s41467-021-25972-y] [Citation(s) in RCA: 51] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2021] [Accepted: 09/01/2021] [Indexed: 01/10/2023] Open
Abstract
Using real-world evidence in biomedical research, an indispensable complement to clinical trials, requires access to large quantities of patient data that are typically held separately by multiple healthcare institutions. We propose FAMHE, a novel federated analytics system that, based on multiparty homomorphic encryption (MHE), enables privacy-preserving analyses of distributed datasets by yielding highly accurate results without revealing any intermediate data. We demonstrate the applicability of FAMHE to essential biomedical analysis tasks, including Kaplan-Meier survival analysis in oncology and genome-wide association studies in medical genetics. Using our system, we accurately and efficiently reproduce two published centralized studies in a federated setting, enabling biomedical insights that are not possible from individual institutions alone. Our work represents a necessary key step towards overcoming the privacy hurdle in enabling multi-centric scientific collaborations. Existing approaches to sharing of distributed medical data either provide only limited protection of patients’ privacy or sacrifice the accuracy of results. Here, the authors propose a federated analytics system, based on multiparty homomorphic encryption (MHE), to overcome these issues.
Collapse
|
28
|
Chen S, Xue D, Chuai G, Yang Q, Liu Q. FL-QSAR: a federated learning-based QSAR prototype for collaborative drug discovery. Bioinformatics 2021; 36:5492-5498. [PMID: 33289524 DOI: 10.1093/bioinformatics/btaa1006] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2020] [Revised: 10/25/2020] [Accepted: 11/19/2020] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Quantitative structure-activity relationship (QSAR) analysis is commonly used in drug discovery. Collaborations among pharmaceutical institutions can lead to a better performance in QSAR prediction, however, intellectual property and related financial interests remain substantially hindering inter-institutional collaborations in QSAR modeling for drug discovery. RESULTS For the first time, we verified the feasibility of applying the horizontal federated learning (HFL), which is a recently developed collaborative and privacy-preserving learning framework to perform QSAR analysis. A prototype platform of federated-learning-based QSAR modeling for collaborative drug discovery, i.e. FL-QSAR, is presented accordingly. We first compared the HFL framework with a classic privacy-preserving computation framework, i.e. secure multiparty computation to indicate its difference from various perspective. Then we compared FL-QSAR with the public collaboration in terms of QSAR modeling. Our extensive experiments demonstrated that (i) collaboration by FL-QSAR outperforms a single client using only its private data, and (ii) collaboration by FL-QSAR achieves almost the same performance as that of collaboration via cleartext learning algorithms using all shared information. Taking together, our results indicate that FL-QSAR under the HFL framework provides an efficient solution to break the barriers between pharmaceutical institutions in QSAR modeling, therefore promote the development of collaborative and privacy-preserving drug discovery with extendable ability to other privacy-related biomedical areas. AVAILABILITY AND IMPLEMENTATION The source codes of FL-QSAR are available on the GitHub: https://github.com/bm2-lab/FL-QSAR. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Shaoqi Chen
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
| | - Dongyu Xue
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
| | - Guohui Chuai
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
| | - Qiang Yang
- Department of AI, WeBank, Shenzhen 518055, China.,Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong, China
| | - Qi Liu
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
| |
Collapse
|
29
|
Coley CW, Eyke NS, Jensen KF. Autonome Entdeckung in den chemischen Wissenschaften, Teil II: Ausblick. Angew Chem Int Ed Engl 2020. [DOI: 10.1002/ange.201909989] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Affiliation(s)
- Connor W. Coley
- Department of Chemical Engineering Massachusetts Institute of Technology Cambridge MA 02139 USA
| | - Natalie S. Eyke
- Department of Chemical Engineering Massachusetts Institute of Technology Cambridge MA 02139 USA
| | - Klavs F. Jensen
- Department of Chemical Engineering Massachusetts Institute of Technology Cambridge MA 02139 USA
| |
Collapse
|
30
|
Yu YW, Weber GM. Balancing Accuracy and Privacy in Federated Queries of Clinical Data Repositories: Algorithm Development and Validation. J Med Internet Res 2020; 22:e18735. [PMID: 33141090 PMCID: PMC7671849 DOI: 10.2196/18735] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2020] [Revised: 08/28/2020] [Accepted: 09/07/2020] [Indexed: 12/01/2022] Open
Abstract
Background Over the past decade, the emergence of several large federated clinical data networks has enabled researchers to access data on millions of patients at dozens of health care organizations. Typically, queries are broadcast to each of the sites in the network, which then return aggregate counts of the number of matching patients. However, because patients can receive care from multiple sites in the network, simply adding the numbers frequently double counts patients. Various methods such as the use of trusted third parties or secure multiparty computation have been proposed to link patient records across sites. However, they either have large trade-offs in accuracy and privacy or are not scalable to large networks. Objective This study aims to enable accurate estimates of the number of patients matching a federated query while providing strong guarantees on the amount of protected medical information revealed. Methods We introduce a novel probabilistic approach to running federated network queries. It combines an algorithm called HyperLogLog with obfuscation in the form of hashing, masking, and homomorphic encryption. It is tunable, in that it allows networks to balance accuracy versus privacy, and it is computationally efficient even for large networks. We built a user-friendly free open-source benchmarking platform to simulate federated queries in large hospital networks. Using this platform, we compare the accuracy, k-anonymity privacy risk (with k=10), and computational runtime of our algorithm with several existing techniques. Results In simulated queries matching 1 to 100 million patients in a 100-hospital network, our method was significantly more accurate than adding aggregate counts while maintaining k-anonymity. On average, it required a total of 12 kilobytes of data to be sent to the network hub and added only 5 milliseconds to the overall federated query runtime. This was orders of magnitude better than other approaches, which guaranteed the exact answer. Conclusions Using our method, it is possible to run highly accurate federated queries of clinical data repositories that both protect patient privacy and scale to large networks.
Collapse
Affiliation(s)
- Yun William Yu
- Computer & Mathematical Sciences, University of Toronto, Toronto, ON, Canada
| | - Griffin M Weber
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, United States
| |
Collapse
|
31
|
Hie B, Bryson BD, Berger B. Leveraging Uncertainty in Machine Learning Accelerates Biological Discovery and Design. Cell Syst 2020; 11:461-477.e9. [PMID: 33065027 DOI: 10.1016/j.cels.2020.09.007] [Citation(s) in RCA: 75] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2020] [Revised: 06/01/2020] [Accepted: 09/23/2020] [Indexed: 12/13/2022]
Abstract
Machine learning that generates biological hypotheses has transformative potential, but most learning algorithms are susceptible to pathological failure when exploring regimes beyond the training data distribution. A solution to address this issue is to quantify prediction uncertainty so that algorithms can gracefully handle novel phenomena that confound standard methods. Here, we demonstrate the broad utility of robust uncertainty prediction in biological discovery. By leveraging Gaussian process-based uncertainty prediction on modern pre-trained features, we train a model on just 72 compounds to make predictions over a 10,833-compound library, identifying and experimentally validating compounds with nanomolar affinity for diverse kinases and whole-cell growth inhibition of Mycobacterium tuberculosis. Uncertainty facilitates a tight iterative loop between computation and experimentation and generalizes across biological domains as diverse as protein engineering and single-cell transcriptomics. More broadly, our work demonstrates that uncertainty should play a key role in the increasing adoption of machine learning algorithms into the experimental lifecycle.
Collapse
Affiliation(s)
- Brian Hie
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Bryan D Bryson
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Ragon Institute of Massachusetts General Hospital, MIT, and Harvard, Cambridge, MA 02139, USA.
| | - Bonnie Berger
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Department of Mathematics, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.
| |
Collapse
|
32
|
Telenti A, Jiang X. Treating medical data as a durable asset. Nat Genet 2020; 52:1005-1010. [PMID: 32929286 DOI: 10.1038/s41588-020-0698-y] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2020] [Accepted: 08/21/2020] [Indexed: 11/09/2022]
Abstract
Access to medical data is central for conducting research on genomics. However, to tap these metadata (observable traits and phenotypes, diagnoses and medication, and labels), researchers must grapple with the complex and sensitive nature of the information. In this Perspective, we argue that, at this exciting time for genomics and artificial intelligence, several critical aspects of data generation, infrastructure and management are pillars of a modern data ecosystem. Many risks to privacy and many obstacles to medical research can be eliminated or mitigated by new secure data analytics. Finally, we discuss the potential consequences of medical data exiting the institutions and being managed by individuals. These shifts in data ownership have the potential for profound disruption and opportunity across many fields.
Collapse
Affiliation(s)
- Amalio Telenti
- Department of Integrative Structural and Computational Biology, Scripps Research Institute, La Jolla, CA, USA.
| | - Xiaoqian Jiang
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA.
| |
Collapse
|
33
|
Coley CW, Eyke NS, Jensen KF. Autonomous Discovery in the Chemical Sciences Part II: Outlook. Angew Chem Int Ed Engl 2020; 59:23414-23436. [PMID: 31553509 DOI: 10.1002/anie.201909989] [Citation(s) in RCA: 105] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2019] [Indexed: 01/19/2023]
Abstract
This two-part Review examines how automation has contributed to different aspects of discovery in the chemical sciences. In this second part, we reflect on a selection of exemplary studies. It is increasingly important to articulate what the role of automation and computation has been in the scientific process and how that has or has not accelerated discovery. One can argue that even the best automated systems have yet to "discover" despite being incredibly useful as laboratory assistants. We must carefully consider how they have been and can be applied to future problems of chemical discovery in order to effectively design and interact with future autonomous platforms. The majority of this Review defines a large set of open research directions, including improving our ability to work with complex data, build empirical models, automate both physical and computational experiments for validation, select experiments, and evaluate whether we are making progress towards the ultimate goal of autonomous discovery. Addressing these practical and methodological challenges will greatly advance the extent to which autonomous systems can make meaningful discoveries.
Collapse
Affiliation(s)
- Connor W Coley
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - Natalie S Eyke
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - Klavs F Jensen
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| |
Collapse
|
34
|
Cho H, Simmons S, Kim R, Berger B. Privacy-Preserving Biomedical Database Queries with Optimal Privacy-Utility Trade-Offs. Cell Syst 2020; 10:408-416.e9. [DOI: 10.1016/j.cels.2020.03.006] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2020] [Revised: 02/26/2020] [Accepted: 03/25/2020] [Indexed: 11/29/2022]
|
35
|
Zhang JD, Sach-Peltason L, Kramer C, Wang K, Ebeling M. Multiscale modelling of drug mechanism and safety. Drug Discov Today 2020; 25:519-534. [DOI: 10.1016/j.drudis.2019.12.009] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2019] [Revised: 12/06/2019] [Accepted: 12/23/2019] [Indexed: 12/19/2022]
|
36
|
Ma R, Li Y, Li C, Wan F, Hu H, Xu W, Zeng J. Secure multiparty computation for privacy-preserving drug discovery. Bioinformatics 2020; 36:2872-2880. [DOI: 10.1093/bioinformatics/btaa038] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2019] [Revised: 01/08/2020] [Accepted: 01/15/2020] [Indexed: 01/24/2023] Open
Abstract
Abstract
Motivation
Quantitative structure–activity relationship (QSAR) and drug–target interaction (DTI) prediction are both commonly used in drug discovery. Collaboration among pharmaceutical institutions can lead to better performance in both QSAR and DTI prediction. However, the drug-related data privacy and intellectual property issues have become a noticeable hindrance for inter-institutional collaboration in drug discovery.
Results
We have developed two novel algorithms under secure multiparty computation (MPC), including QSARMPC and DTIMPC, which enable pharmaceutical institutions to achieve high-quality collaboration to advance drug discovery without divulging private drug-related information. QSARMPC, a neural network model under MPC, displays good scalability and performance and is feasible for privacy-preserving collaboration on large-scale QSAR prediction. DTIMPC integrates drug-related heterogeneous network data and accurately predicts novel DTIs, while keeping the drug information confidential. Under several experimental settings that reflect the situations in real drug discovery scenarios, we have demonstrated that DTIMPC possesses significant performance improvement over the baseline methods, generates novel DTI predictions with supporting evidence from the literature and shows the feasible scalability to handle growing DTI data. All these results indicate that QSARMPC and DTIMPC can provide practically useful tools for advancing privacy-preserving drug discovery.
Availability and implementation
The source codes of QSARMPC and DTIMPC are available on the GitHub: https://github.com/rongma6/QSARMPC_DTIMPC.git.
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Rong Ma
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, China
| | - Yi Li
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, China
| | - Chenxing Li
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, China
| | - Fangping Wan
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, China
| | - Hailin Hu
- School of Medicine, Tsinghua University, Beijing 100084, China
| | - Wei Xu
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, China
| | - Jianyang Zeng
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, China
- MOE Key Laboratory of Bioinformatics, Tsinghua University, Beijing 100084, China
| |
Collapse
|
37
|
Dzobo K, Adotey S, Thomford NE, Dzobo W. Integrating Artificial and Human Intelligence: A Partnership for Responsible Innovation in Biomedical Engineering and Medicine. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2019; 24:247-263. [PMID: 31313972 DOI: 10.1089/omi.2019.0038] [Citation(s) in RCA: 34] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Historically, the term "artificial intelligence" dates to 1956 when it was first used in a conference at Dartmouth College in the US. Since then, the development of artificial intelligence has in part been shaped by the field of neuroscience. By understanding the human brain, scientists have attempted to build new intelligent machines capable of performing complex tasks akin to humans. Indeed, future research into artificial intelligence will continue to benefit from the study of the human brain. While the development of artificial intelligence algorithms has been fast paced, the actual use of most artificial intelligence (AI) algorithms in biomedical engineering and clinical practice is still markedly below its conceivably broader potentials. This is partly because for any algorithm to be incorporated into existing workflows it has to stand the test of scientific validation, clinical and personal utility, application context, and is equitable as well. In this context, there is much to be gained by combining AI and human intelligence (HI). Harnessing Big Data, computing power and storage capacities, and addressing societal issues emergent from algorithm applications, demand deploying HI in tandem with AI. Very few countries, even economically developed states, lack adequate and critical governance frames to best understand and steer the AI innovation trajectories in health care. Drug discovery and translational pharmaceutical research stand to gain from AI technology provided they are also informed by HI. In this expert review, we analyze the ways in which AI applications are likely to traverse the continuum of life from birth to death, and encompassing not only humans but also all animal, plant, and other living organisms that are increasingly touched by AI. Examples of AI applications include digital health, diagnosis of diseases in newborns, remote monitoring of health by smart devices, real-time Big Data analytics for prompt diagnosis of heart attacks, and facial analysis software with consequences on civil liberties. While we underscore the need for integration of AI and HI, we note that AI technology does not have to replace medical specialists or scientists and rather, is in need of such expert HI. Altogether, AI and HI offer synergy for responsible innovation and veritable prospects for improving health care from prevention to diagnosis to therapeutics while unintended consequences of automation emergent from AI and algorithms should be borne in mind on scientific cultures, work force, and society at large.
Collapse
Affiliation(s)
- Kevin Dzobo
- International Centre for Genetic Engineering and Biotechnology (ICGEB), Cape Town Component, Wernher and Beit Building (South), UCT Medical Campus, Anzio Road, Observatory 7925, Cape Town, South Africa.,Division of Medical Biochemistry and Institute of Infectious Disease and Molecular Medicine, Department of Integrative Biomedical Sciences, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
| | - Sampson Adotey
- International Development Innovation Network, D-Lab, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
| | - Nicholas E Thomford
- Pharmacogenetics Research Group, Division of Human Genetics, Department of Pathology and Institute of Infectious Diseases and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Observatory 7925, Cape Town, South Africa
| | - Witness Dzobo
- Pathology and Immunology Department, University Hospital Southampton, Mail Point B, Tremona Road, Southampton, UK.,University of Portsmouth, Faculty of Science, St Michael's Building, White Swan Road, Portsmouth, UK
| |
Collapse
|
38
|
Abstract
As the scale of genomic and health-related data explodes and our understanding of these data matures, the privacy of the individuals behind the data is increasingly at stake. Traditional approaches to protect privacy have fundamental limitations. Here we discuss emerging privacy-enhancing technologies that can enable broader data sharing and collaboration in genomics research.
Collapse
Affiliation(s)
- Bonnie Berger
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA, 02139, USA. .,Department of Mathematics, MIT, Cambridge, MA, 02139, USA. .,Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA.
| | - Hyunghoon Cho
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA, 02139, USA. .,Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA.
| |
Collapse
|
39
|
Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med 2019; 25:44-56. [PMID: 30617339 DOI: 10.1038/s41591-018-0300-7] [Citation(s) in RCA: 2468] [Impact Index Per Article: 411.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2018] [Accepted: 11/12/2018] [Indexed: 11/08/2022]
Abstract
The use of artificial intelligence, and the deep-learning subtype in particular, has been enabled by the use of labeled big data, along with markedly enhanced computing power and cloud storage, across all sectors. In medicine, this is beginning to have an impact at three levels: for clinicians, predominantly via rapid, accurate image interpretation; for health systems, by improving workflow and the potential for reducing medical errors; and for patients, by enabling them to process their own data to promote health. The current limitations, including bias, privacy and security, and lack of transparency, along with the future directions of these applications will be discussed in this article. Over time, marked improvements in accuracy, productivity, and workflow will likely be actualized, but whether that will be used to improve the patient-doctor relationship or facilitate its erosion remains to be seen.
Collapse
Affiliation(s)
- Eric J Topol
- Department of Molecular Medicine, Scripps Research, La Jolla, CA, USA.
| |
Collapse
|