1
|
Zhang QX, Liu T, Guo X, Zhen J, Yang MY, Khederzadeh S, Zhou F, Han X, Zheng Q, Jia P, Ding X, He M, Zou X, Liao JK, Zhang H, He J, Zhu X, Lu D, Chen H, Zeng C, Liu F, Zheng HF, Liu S, Xu HM, Chen GB. Searching across-cohort relatives in 54,092 GWAS samples via encrypted genotype regression. PLoS Genet 2024; 20:e1011037. [PMID: 38206971 PMCID: PMC10783776 DOI: 10.1371/journal.pgen.1011037] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2023] [Accepted: 12/13/2023] [Indexed: 01/13/2024] Open
Abstract
Explicitly sharing individual level data in genomics studies has many merits comparing to sharing summary statistics, including more strict QCs, common statistical analyses, relative identification and improved statistical power in GWAS, but it is hampered by privacy or ethical constraints. In this study, we developed encG-reg, a regression approach that can detect relatives of various degrees based on encrypted genomic data, which is immune of ethical constraints. The encryption properties of encG-reg are based on the random matrix theory by masking the original genotypic matrix without sacrificing precision of individual-level genotype data. We established a connection between the dimension of a random matrix, which masked genotype matrices, and the required precision of a study for encrypted genotype data. encG-reg has false positive and false negative rates equivalent to sharing original individual level data, and is computationally efficient when searching relatives. We split the UK Biobank into their respective centers, and then encrypted the genotype data. We observed that the relatives estimated using encG-reg was equivalently accurate with the estimation by KING, which is a widely used software but requires original genotype data. In a more complex application, we launched a finely devised multi-center collaboration across 5 research institutes in China, covering 9 cohorts of 54,092 GWAS samples. encG-reg again identified true relatives existing across the cohorts with even different ethnic backgrounds and genotypic qualities. Our study clearly demonstrates that encrypted genomic data can be used for data sharing without loss of information or data sharing barrier.
Collapse
Affiliation(s)
- Qi-Xin Zhang
- Institute of Bioinformatics, Zhejiang University, Hangzhou, Zhejiang, China
- Center for Reproductive Medicine, Department of Genetic and Genomic Medicine, and Clinical Research Institute, Zhejiang Provincial People’s Hospital, People’s Hospital of Hangzhou Medical College, Hangzhou, Zhejiang, China
| | - Tianzi Liu
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, Shanghai, China
- CAS Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
| | - Xinxin Guo
- School of Public Health (Shenzhen), Sun Yat-sen University, Shenzhen, Guangdong, China
| | - Jianxin Zhen
- Central Laboratory, Shenzhen Baoan Women’s and Children’s Hospital, Shenzhen, Guangdong, China
| | - Meng-yuan Yang
- Diseases & Population (DaP) Geninfo Lab, School of Life Sciences, Westlake University, Hangzhou, Zhejiang, China
| | - Saber Khederzadeh
- Diseases & Population (DaP) Geninfo Lab, School of Life Sciences, Westlake University, Hangzhou, Zhejiang, China
| | - Fang Zhou
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Fudan University, Shanghai, China
| | - Xiaotong Han
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou, Guangdong, China
| | - Qiwen Zheng
- CAS Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
| | - Peilin Jia
- CAS Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
| | - Xiaohu Ding
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou, Guangdong, China
| | - Mingguang He
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou, Guangdong, China
- Centre for Eye Research Australia, Royal Victorian Eye and Ear Hospital, Melbourne, Victoria, Australia
- Ophthalmology, Department of Surgery, University of Melbourne, Melbourne, Victoria, Australia
| | - Xin Zou
- State Key Laboratory of CAD & GC, Zhejiang University, Hangzhou, Zhejiang, China
| | - Jia-Kai Liao
- School of Mathematics and Statistics and Research Institute of Mathematical Sciences (RIMS), Jiangsu Provincial Key Laboratory of Educational Big Data Science and Engineering, Jiangsu Normal University, Xuzhou, Jiangsu, China
- Ningbo Institute of Life and Health Industry, University of Chinese Academy of Sciences, Ningbo, Zhejiang, China
| | - Hongxin Zhang
- State Key Laboratory of CAD & GC, Zhejiang University, Hangzhou, Zhejiang, China
| | - Ji He
- Department of Neurology, Peking University Third Hospital, Beijing, China
| | - Xiaofeng Zhu
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, Ohio, United States of America
| | - Daru Lu
- State Key Laboratory of Genetic Engineering and MOE Engineering Research Center of Gene Technology, School of Life Sciences and Zhongshan Hospital, Fudan University, Shanghai, China
- NHC Key Laboratory of Birth Defects and Reproductive Health, Chongqing Population and Family Planning Science and Technology Research Institute, Chongqing, China
| | - Hongyan Chen
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Fudan University, Shanghai, China
| | - Changqing Zeng
- CAS Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
- Henan Academy of Sciences, Zhengzhou, Henan, China
| | - Fan Liu
- CAS Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
- Department of Forensic Sciences, College of Criminal Justice, Naif Arab University of Security Sciences, Riyadh, Kingdom of Saudi Arabia
| | - Hou-Feng Zheng
- Diseases & Population (DaP) Geninfo Lab, School of Life Sciences, Westlake University, Hangzhou, Zhejiang, China
| | - Siyang Liu
- School of Public Health (Shenzhen), Sun Yat-sen University, Shenzhen, Guangdong, China
| | - Hai-Ming Xu
- Institute of Bioinformatics, Zhejiang University, Hangzhou, Zhejiang, China
| | - Guo-Bo Chen
- Center for Reproductive Medicine, Department of Genetic and Genomic Medicine, and Clinical Research Institute, Zhejiang Provincial People’s Hospital, People’s Hospital of Hangzhou Medical College, Hangzhou, Zhejiang, China
- Key Laboratory of Endocrine Gland Diseases of Zhejiang Province, Hangzhou, Zhejiang, China
| |
Collapse
|
3
|
Kim EE, Lee S, Lee CH, Oh H, Song K, Han B. FOLD: a method to optimize power in meta-analysis of genetic association studies with overlapping subjects. Bioinformatics 2017; 33:3947-3954. [PMID: 29036405 PMCID: PMC5860085 DOI: 10.1093/bioinformatics/btx463] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2016] [Accepted: 07/19/2017] [Indexed: 11/26/2022] Open
Abstract
Motivation In genetic association studies, meta-analyses are widely used to increase the statistical power by aggregating information from multiple studies. In meta-analyses, participating studies often share the same individuals due to the shared use of publicly available control data or accidental recruiting of the same subjects. As such overlapping can inflate false positive rate, overlapping subjects are traditionally split in the studies prior to meta-analysis, which requires access to genotype data and is not always possible. Fortunately, recently developed meta-analysis methods can systematically account for overlapping subjects at the summary statistics level. Results We identify and report a phenomenon that these methods for overlapping subjects can yield low power. For instance, in our simulation involving a meta-analysis of five studies that share 20% of individuals, whereas the traditional splitting method achieved 80% power, none of the new methods exceeded 32% power. We found that this low power resulted from the unaccounted differences between shared and unshared individuals in terms of their contributions towards the final statistic. Here, we propose an optimal summary-statistic-based method termed as FOLD that increases the power of meta-analysis involving studies with overlapping subjects. Availability and implementation Our method is available at http://software.buhmhan.com/FOLD. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Emma E Kim
- Asan Institute for Life Sciences, Asan Medical Center, Seoul 138-736, Korea.,Department of Chemistry, Seoul National University, Seoul 151-747, Korea
| | - Seunghoon Lee
- Department of Chemistry, Seoul National University, Seoul 151-747, Korea
| | | | - Hyunjung Oh
- Department of Biochemistry and Molecular Biology, University of Ulsan College of Medicine, Seoul 138-736, Korea
| | - Kyuyoung Song
- Department of Biochemistry and Molecular Biology, University of Ulsan College of Medicine, Seoul 138-736, Korea
| | - Buhm Han
- Asan Institute for Life Sciences, Asan Medical Center, Seoul 138-736, Korea.,Department of Convergence Medicine
| |
Collapse
|
5
|
Schönbach C, Horton P, Yiu SM, Tan TW, Ranganathan S. GIW and InCoB are advancing bioinformatics in the Asia-Pacific. BMC Bioinformatics 2015; 16:I1. [PMID: 28102114 PMCID: PMC6389036 DOI: 10.1186/1471-2105-16-s18-i1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
GIW/InCoB2015 the joint 26th International Conference on Genome Informatics (GIW) and 14th International Conference on Bioinformatics (InCoB) held in Tokyo, September 9-11, 2015 was attended by over 200 delegates. Fifty-one out of 89 oral presentations were based on research articles accepted for publication in four BMC journal supplements and three other journals. Sixteen articles in this supplement and six articles in the BMC Systems Biology GIW/InCoB2015 Supplement are covered by this introduction. The topics range from genome informatics, protein structure informatics, image analysis to biological networks and biomarker discovery.
Collapse
Affiliation(s)
- Christian Schönbach
- Department of Biology, School of Science and Technology, Nazarbayev University, Astana, 010000 Republic of Kazakhstan
- Center for AIDS Research and International Research Center for Medical Sciences, Kumamoto University, Kumamoto, 860-0811 Japan
| | - Paul Horton
- Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology, Tokyo, 135-0064 Japan
- Department of Computational Biology, Graduate School of Frontier Sciences, University of Tokyo, Japan
| | - Siu-Ming Yiu
- Department of Computer Science, Faculty of Engineering, The University of Hong Kong, Hong Kong, HKSAR
| | - Tin Wee Tan
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, 117599
| | - Shoba Ranganathan
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, 117599
- Department of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, NSW 2109 Australia
| |
Collapse
|
6
|
Naveed M, Ayday E, Clayton EW, Fellay J, Gunter CA, Hubaux JP, Malin BA, Wang X. Privacy in the Genomic Era. ACM COMPUTING SURVEYS 2015; 48:6. [PMID: 26640318 PMCID: PMC4666540 DOI: 10.1145/2767007] [Citation(s) in RCA: 78] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/01/2014] [Accepted: 04/01/2015] [Indexed: 05/19/2023]
Abstract
Genome sequencing technology has advanced at a rapid pace and it is now possible to generate highly-detailed genotypes inexpensively. The collection and analysis of such data has the potential to support various applications, including personalized medical services. While the benefits of the genomics revolution are trumpeted by the biomedical community, the increased availability of such data has major implications for personal privacy; notably because the genome has certain essential features, which include (but are not limited to) (i) an association with traits and certain diseases, (ii) identification capability (e.g., forensics), and (iii) revelation of family relationships. Moreover, direct-to-consumer DNA testing increases the likelihood that genome data will be made available in less regulated environments, such as the Internet and for-profit companies. The problem of genome data privacy thus resides at the crossroads of computer science, medicine, and public policy. While the computer scientists have addressed data privacy for various data types, there has been less attention dedicated to genomic data. Thus, the goal of this paper is to provide a systematization of knowledge for the computer science community. In doing so, we address some of the (sometimes erroneous) beliefs of this field and we report on a survey we conducted about genome data privacy with biomedical specialists. Then, after characterizing the genome privacy problem, we review the state-of-the-art regarding privacy attacks on genomic data and strategies for mitigating such attacks, as well as contextualizing these attacks from the perspective of medicine and public policy. This paper concludes with an enumeration of the challenges for genome data privacy and presents a framework to systematize the analysis of threats and the design of countermeasures as the field moves forward.
Collapse
|