Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Jiang Y, Hamer J, Wang C, Jiang X, Kim M, Song Y, Xia Y, Mohammed N, Sadat MN, Wang S. SecureLR: Secure Logistic Regression Model via a Hybrid Cryptographic Protocol. IEEE/ACM Trans Comput Biol Bioinform 2019;16:113-123. [PMID: 29994005 DOI: 10.1109/tcbb.2018.2833463] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]

For:	Jiang Y, Hamer J, Wang C, Jiang X, Kim M, Song Y, Xia Y, Mohammed N, Sadat MN, Wang S. SecureLR: Secure Logistic Regression Model via a Hybrid Cryptographic Protocol. IEEE/ACM Trans Comput Biol Bioinform 2019;16:113-123. [PMID: 29994005 DOI: 10.1109/tcbb.2018.2833463] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]

Number

Cited by Other Article(s)

Ai N, Yang Z, Yuan H, Ouyang D, Miao R, Ji Y, Liang Y. A distributed sparse logistic regression with $$L_{1/2}$$ regularization for microarray biomarker discovery in cancer classification. Soft comput 2022. [DOI: 10.1007/s00500-022-07551-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]

Waqas Khan P, Byun YC. Multi-Fault Detection and Classification of Wind Turbines Using Stacking Classifier. SENSORS (BASEL, SWITZERLAND) 2022;22:6955. [PMID: 36146299 PMCID: PMC9505315 DOI: 10.3390/s22186955] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/21/2022] [Revised: 09/09/2022] [Accepted: 09/13/2022] [Indexed: 06/16/2023]

Qayyum A, Ijaz A, Usama M, Iqbal W, Qadir J, Elkhatib Y, Al-Fuqaha A. Securing Machine Learning in the Cloud: A Systematic Review of Cloud Machine Learning Security. Front Big Data 2021;3:587139. [PMID: 33693420 PMCID: PMC7931962 DOI: 10.3389/fdata.2020.587139] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2020] [Accepted: 10/08/2020] [Indexed: 11/13/2022] Open

Scalable Privacy-Preserving Distributed Learning. PROCEEDINGS ON PRIVACY ENHANCING TECHNOLOGIES 2021. [DOI: 10.2478/popets-2021-0030] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]

Lu Y, Zhou T, Tian Y, Zhu S, Li J. Web-Based Privacy-Preserving Multicenter Medical Data Analysis Tools Via Threshold Homomorphic Encryption: Design and Development Study. J Med Internet Res 2020;22:e22555. [PMID: 33289676 PMCID: PMC7755539 DOI: 10.2196/22555] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2020] [Revised: 10/02/2020] [Accepted: 11/06/2020] [Indexed: 11/22/2022] Open

Abstract

Background

Data sharing in multicenter medical research can improve the generalizability of research, accelerate progress, enhance collaborations among institutions, and lead to new discoveries from data pooled from multiple sources. Despite these benefits, many medical institutions are unwilling to share their data, as sharing may cause sensitive information to be leaked to researchers, other institutions, and unauthorized users. Great progress has been made in the development of secure machine learning frameworks based on homomorphic encryption in recent years; however, nearly all such frameworks use a single secret key and lack a description of how to securely evaluate the trained model, which makes them impractical for multicenter medical applications.

Objective

The aim of this study is to provide a privacy-preserving machine learning protocol for multiple data providers and researchers (eg, logistic regression). This protocol allows researchers to train models and then evaluate them on medical data from multiple sources while providing privacy protection for both the sensitive data and the learned model.

Methods

We adapted a novel threshold homomorphic encryption scheme to guarantee privacy requirements. We devised new relinearization key generation techniques for greater scalability and multiplicative depth and new model training strategies for simultaneously training multiple models through x-fold cross-validation.

Results

Using a client-server architecture, we evaluated the performance of our protocol. The experimental results demonstrated that, with 10-fold cross-validation, our privacy-preserving logistic regression model training and evaluation over 10 attributes in a data set of 49,152 samples took approximately 7 minutes and 20 minutes, respectively.

Conclusions

We present the first privacy-preserving multiparty logistic regression model training and evaluation protocol based on threshold homomorphic encryption. Our protocol is practical for real-world use and may promote multicenter medical research to some extent.

Collapse

Anxie T, Bing L. Application of deep learning and artificial intelligence in the psychological mechanism of language activity. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2020. [DOI: 10.3233/jifs-179806] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]

Gong M, Wang S, Wang L, Liu C, Wang J, Guo Q, Zheng H, Xie K, Wang C, Hui Z. Evaluation of Privacy Risks of Patients' Data in China: Case Study. JMIR Med Inform 2020;8:e13046. [PMID: 32022691 PMCID: PMC7055805 DOI: 10.2196/13046] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2018] [Revised: 06/07/2019] [Accepted: 09/26/2019] [Indexed: 11/13/2022] Open

Abstract

BACKGROUND

Patient privacy is a ubiquitous problem around the world. Many existing studies have demonstrated the potential privacy risks associated with sharing of biomedical data. Owing to the increasing need for data sharing and analysis, health care data privacy is drawing more attention. However, to better protect biomedical data privacy, it is essential to assess the privacy risk in the first place.

OBJECTIVE

In China, there is no clear regulation for health systems to deidentify data. It is also not known whether a mechanism such as the Health Insurance Portability and Accountability Act (HIPAA) safe harbor policy will achieve sufficient protection. This study aimed to conduct a pilot study using patient data from Chinese hospitals to understand and quantify the privacy risks of Chinese patients.

METHODS

We used g-distinct analysis to evaluate the reidentification risks with regard to the HIPAA safe harbor approach when applied to Chinese patients' data. More specifically, we estimated the risks based on the HIPAA safe harbor and limited dataset policies by assuming an attacker has background knowledge of the patient from the public domain.

RESULTS

The experiments were conducted on 0.83 million patients (with data field of date of birth, gender, and surrogate ZIP codes generated based on home address) across 33 provincial-level administrative divisions in China. Under the Limited Dataset policy, 19.58% (163,262/833,235) of the population could be uniquely identifiable under the g-distinct metric (ie, 1-distinct). In contrast, the Safe Harbor policy is able to significantly reduce privacy risk, where only 0.072% (601/833,235) of individuals are uniquely identifiable, and the majority of the population is 3000 indistinguishable (ie the population is expected to share common attributes with 3000 or less people).

CONCLUSIONS

Through the experiments based on real-world patient data, this work illustrates that the results of g-distinct analysis about Chinese patient privacy risk are similar to those from a previous US study, in which data from different organizations/regions might be vulnerable to different reidentification risks under different policies. This work provides reference to Chinese health care entities for estimating patients' privacy risk during data sharing, which laid the foundation of privacy risk study about Chinese patients' data in the future.

Collapse

A multicenter random forest model for effective prognosis prediction in collaborative clinical research network. Artif Intell Med 2020;103:101814. [PMID: 32143809 DOI: 10.1016/j.artmed.2020.101814] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2019] [Revised: 02/04/2020] [Accepted: 02/04/2020] [Indexed: 12/17/2022]

Abstract

BACKGROUND

The accuracy of a prognostic prediction model has become an essential aspect of the quality and reliability of the health-related decisions made by clinicians in modern medicine. Unfortunately, individual institutions often lack sufficient samples, which might not provide sufficient statistical power for models. One mitigation is to expand data collection from a single institution to multiple centers to collectively increase the sample size. However, sharing sensitive biomedical data for research involves complicated issues. Machine learning models such as random forests (RF), though they are commonly used and achieve good performances for prognostic prediction, usually suffer worse performance under multicenter privacy-preserving data mining scenarios compared to a centrally trained version.

METHODS AND MATERIALS

In this study, a multicenter random forest prognosis prediction model is proposed that enables federated clinical data mining from horizontally partitioned datasets. By using a novel data enhancement approach based on a differentially private generative adversarial network customized to clinical prognosis data, the proposed model is able to provide a multicenter RF model with performances on par with-or even better than-centrally trained RF but without the need to aggregate the raw data. Moreover, our model also incorporates an importance ranking step designed for feature selection without sharing patient-level information.

RESULT

The proposed model was evaluated on colorectal cancer datasets from the US and China. Two groups of datasets with different levels of heterogeneity within the collaborative research network were selected. First, we compare the performance of the distributed random forest model under different privacy parameters with different percentages of enhancement datasets and validate the effectiveness and plausibility of our approach. Then, we compare the discrimination and calibration ability of the proposed multicenter random forest with a centrally trained random forest model and other tree-based classifiers as well as some commonly used machine learning methods. The results show that the proposed model can provide better prediction performance in terms of discrimination and calibration ability than the centrally trained RF model or the other candidate models while following the privacy-preserving rules in both groups. Additionally, good discrimination and calibration ability are shown on the simplified model based on the feature importance ranking in the proposed approach.

CONCLUSION

The proposed random forest model exhibits ideal prediction capability using multicenter clinical data and overcomes the performance limitation arising from privacy guarantees. It can also provide feature importance ranking across institutions without pooling the data at a central site. This study offers a practical solution for building a prognosis prediction model in the collaborative clinical research network and solves practical issues in real-world applications of medical artificial intelligence.

Collapse

Li T, Sun J, Zhang X, Wang L, Zhu P, Wang N. Competition prediction and fitness behavior based on GA-SVM algorithm and PCA model. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2019. [DOI: 10.3233/jifs-179202] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]

A Two-Stage Big Data Analytics Framework with Real World Applications Using Spark Machine Learning and Long Short-Term Memory Network. Symmetry (Basel) 2018. [DOI: 10.3390/sym10100485] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open

Abstract Every day we experience unprecedented data growth from numerous sources, which contribute to big data in terms of volume, velocity, and variability. These datasets again impose great challenges to analytics framework and computational resources, making the overall analysis difficult for extracting meaningful information in a timely manner. Thus, to harness these kinds of challenges, developing an efficient big data analytics framework is an important research topic. Consequently, to address these challenges by exploiting non-linear relationships from very large and high-dimensional datasets, machine learning (ML) and deep learning (DL) algorithms are being used in analytics frameworks. Apache Spark has been in use as the fastest big data processing arsenal, which helps to solve iterative ML tasks, using distributed ML library called Spark MLlib. Considering real-world research problems, DL architectures such as Long Short-Term Memory (LSTM) is an effective approach to overcoming practical issues such as reduced accuracy, long-term sequence dependency, and vanishing and exploding gradient in conventional deep architectures. In this paper, we propose an efficient analytics framework, which is technically a progressive machine learning technique merged with Spark-based linear models, Multilayer Perceptron (MLP) and LSTM, using a two-stage cascade structure in order to enhance the predictive accuracy. Our proposed architecture enables us to organize big data analytics in a scalable and efficient way. To show the effectiveness of our framework, we applied the cascading structure to two different real-life datasets to solve a multiclass and a binary classification problem, respectively. Experimental results show that our analytical framework outperforms state-of-the-art approaches with a high-level of classification accuracy. Collapse