1
|
Rutter LA, Cope H, MacKay MJ, Herranz R, Das S, Ponomarev SA, Costes SV, Paul AM, Barker R, Taylor DM, Bezdan D, Szewczyk NJ, Muratani M, Mason CE, Giacomello S. Astronaut omics and the impact of space on the human body at scale. Nat Commun 2024; 15:4952. [PMID: 38862505 PMCID: PMC11166943 DOI: 10.1038/s41467-024-47237-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2023] [Accepted: 03/22/2024] [Indexed: 06/13/2024] Open
Abstract
Future multi-year crewed planetary missions will motivate advances in aerospace nutrition and telehealth. On Earth, the Human Cell Atlas project aims to spatially map all cell types in the human body. Here, we propose that a parallel Human Cell Space Atlas could serve as an openly available, global resource for space life science research. As humanity becomes increasingly spacefaring, high-resolution omics on orbit could permit an advent of precision spaceflight healthcare. Alongside the scientific potential, we consider the complex ethical, cultural, and legal challenges intrinsic to the human space omics discipline, and how philosophical frameworks may benefit from international perspectives.
Collapse
Affiliation(s)
- Lindsay A Rutter
- Transborder Medical Research Center, University of Tsukuba, 305-8575, Tsukuba, Japan
- Department of Genome Biology, Institute of Medicine, University of Tsukuba, 305-8575, Tsukuba, Japan
- School of Chemistry, University of Glasgow, Glasgow, G12 8QQ, UK
| | - Henry Cope
- School of Medicine, University of Nottingham, Derby, DE22 3DT, UK
| | - Matthew J MacKay
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, 10065, USA
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY, 10021, USA
- The WorldQuant Initiative for Quantitative Prediction, Weill Cornell Medicine, New York, NY, 10065, USA
| | - Raúl Herranz
- Centro de Investigaciones Biológicas "Margarita Salas" (CSIC), Ramiro de Maeztu 9, Madrid, 28040, Spain
| | - Saswati Das
- Department of Biochemistry, Atal Bihari Vajpayee Institute of Medical Sciences & Dr. Ram Manohar Lohia Hospital, New Delhi, 110001, India
| | - Sergey A Ponomarev
- Department of Immunology and Microbiology, Institute for the Biomedical Problems, Russian Academy of Sciences, 123007, Moscow, Russia
| | - Sylvain V Costes
- Space Biosciences Division, NASA Ames Research Center, Moffett Field, CA, 94035, USA
| | - Amber M Paul
- Embry-Riddle Aeronautical University, Department of Human Factors and Behavioral Neurobiology, Daytona Beach, FL, 32114, USA
| | - Richard Barker
- Department of Botany, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Deanne M Taylor
- Department of Biomedical and Health Informatics, The Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
- Department of Pediatrics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Daniela Bezdan
- Institute of Medical Genetics and Applied Genomics, University of Tübingen, Tübingen, 72076, Germany
- NGS Competence Center Tübingen (NCCT), University of Tübingen, Tübingen, 72076, Germany
- yuri GmbH, Meckenbeuren, 88074, Germany
| | - Nathaniel J Szewczyk
- School of Medicine, University of Nottingham, Derby, DE22 3DT, UK
- Ohio Musculoskeletal and Neurological Institute (OMNI), Heritage College of Osteopathic Medicine, Ohio University, Athens, OH, 45701, USA
| | - Masafumi Muratani
- Transborder Medical Research Center, University of Tsukuba, 305-8575, Tsukuba, Japan
- Department of Genome Biology, Institute of Medicine, University of Tsukuba, 305-8575, Tsukuba, Japan
| | - Christopher E Mason
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, 10065, USA.
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY, 10021, USA.
- The WorldQuant Initiative for Quantitative Prediction, Weill Cornell Medicine, New York, NY, 10065, USA.
- The Feil Family Brain and Mind Research Institute, Weill Cornell Medicine, New York, NY, 10065, USA.
| | | |
Collapse
|
2
|
Thomas M, Mackes N, Preuss-Dodhy A, Wieland T, Bundschus M. Assessing Privacy Vulnerabilities in Genetic Data Sets: Scoping Review. JMIR BIOINFORMATICS AND BIOTECHNOLOGY 2024; 5:e54332. [PMID: 38935957 PMCID: PMC11165293 DOI: 10.2196/54332] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 03/26/2024] [Accepted: 03/29/2024] [Indexed: 06/29/2024]
Abstract
BACKGROUND Genetic data are widely considered inherently identifiable. However, genetic data sets come in many shapes and sizes, and the feasibility of privacy attacks depends on their specific content. Assessing the reidentification risk of genetic data is complex, yet there is a lack of guidelines or recommendations that support data processors in performing such an evaluation. OBJECTIVE This study aims to gain a comprehensive understanding of the privacy vulnerabilities of genetic data and create a summary that can guide data processors in assessing the privacy risk of genetic data sets. METHODS We conducted a 2-step search, in which we first identified 21 reviews published between 2017 and 2023 on the topic of genomic privacy and then analyzed all references cited in the reviews (n=1645) to identify 42 unique original research studies that demonstrate a privacy attack on genetic data. We then evaluated the type and components of genetic data exploited for these attacks as well as the effort and resources needed for their implementation and their probability of success. RESULTS From our literature review, we derived 9 nonmutually exclusive features of genetic data that are both inherent to any genetic data set and informative about privacy risk: biological modality, experimental assay, data format or level of processing, germline versus somatic variation content, content of single nucleotide polymorphisms, short tandem repeats, aggregated sample measures, structural variants, and rare single nucleotide variants. CONCLUSIONS On the basis of our literature review, the evaluation of these 9 features covers the great majority of privacy-critical aspects of genetic data and thus provides a foundation and guidance for assessing genetic data risk.
Collapse
|
3
|
Naef A, Coduti E, Windisch PY. The Anonymous Data Warehouse: A Hands-On Framework for Anonymizing Data From Digital Health Applications. Cureus 2024; 16:e57519. [PMID: 38707006 PMCID: PMC11067565 DOI: 10.7759/cureus.57519] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/02/2024] [Indexed: 05/07/2024] Open
Abstract
The digital health space is growing rapidly, and so is the interest in sharing anonymized health data. However, data anonymization techniques have yet to see much coverage in the medical literature. The purpose of this article is, therefore, to provide a practical framework for anonymization with a focus on the unique properties of data from digital health applications. Literature trends, as well as common anonymization techniques, were synthesized into a framework that considers the opportunities and challenges of digital health data. A rationale for each design decision is provided, and the advantages and disadvantages are discussed. We propose a framework based on storing data separately, anonymizing the data where the identified data is located, only exporting selected data, minimizing static attributes, ensuring k-anonymity of users and their static attributes, and preventing defined metrics from acting as quasi-identifiers by using aggregation, rounding, and capping. Data anonymization requires a pragmatic approach that preserves the utility of the data while minimizing reidentification risk. The proposed framework should be modified according to the characteristics of the respective data set.
Collapse
Affiliation(s)
- André Naef
- Innovation Team, dacadoo AG, Zürich, CHE
| | | | | |
Collapse
|
4
|
Dong X, Lu Y, Guo L, Li C, Ni Q, Wu B, Wang H, Yang L, Wu S, Sun Q, Zheng H, Zhou W, Wang S. PICOTEES: a privacy-preserving online service of phenotype exploration for genetic-diagnostic variants from Chinese children cohorts. J Genet Genomics 2024; 51:243-251. [PMID: 37714454 DOI: 10.1016/j.jgg.2023.09.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2023] [Revised: 08/31/2023] [Accepted: 09/03/2023] [Indexed: 09/17/2023]
Abstract
The growth in biomedical data resources has raised potential privacy concerns and risks of genetic information leakage. For instance, exome sequencing aids clinical decisions by comparing data through web services, but it requires significant trust between users and providers. To alleviate privacy concerns, the most commonly used strategy is to anonymize sensitive data. Unfortunately, studies have shown that anonymization is insufficient to protect against reidentification attacks. Recently, privacy-preserving technologies have been applied to preserve application utility while protecting the privacy of biomedical data. We present the PICOTEES framework, a privacy-preserving online service of phenotype exploration for genetic-diagnostic variants (https://birthdefectlab.cn:3000/). PICOTEES enables privacy-preserving queries of the phenotype spectrum for a single variant by utilizing trusted execution environment technology, which can protect the privacy of the user's query information, backend models, and data, as well as the final results. We demonstrate the utility and performance of PICOTEES by exploring a bioinformatics dataset. The dataset is from a cohort containing 20,909 genetic testing patients with 3,152,508 variants from the Children's Hospital of Fudan University in China, dominated by the Chinese Han population (>99.9%). Our query results yield a large number of unreported diagnostic variants and previously reported pathogenicity.
Collapse
Affiliation(s)
- Xinran Dong
- Center for Molecular Medicine, Children's Hospital of Fudan University, Shanghai 201102, China; Key Laboratory of Birth Defects, Children's Hospital of Fudan University, Shanghai 201102, China
| | - Yulan Lu
- Center for Molecular Medicine, Children's Hospital of Fudan University, Shanghai 201102, China; Key Laboratory of Birth Defects, Children's Hospital of Fudan University, Shanghai 201102, China
| | - Lanting Guo
- Department of Bioinformatics, Hangzhou Nuowei Information Technology Co., Ltd, Hangzhou, Zhejiang 310000, China
| | - Chuan Li
- Center for Molecular Medicine, Children's Hospital of Fudan University, Shanghai 201102, China
| | - Qi Ni
- Center for Molecular Medicine, Children's Hospital of Fudan University, Shanghai 201102, China; Key Laboratory of Birth Defects, Children's Hospital of Fudan University, Shanghai 201102, China
| | - Bingbing Wu
- Center for Molecular Medicine, Children's Hospital of Fudan University, Shanghai 201102, China; Key Laboratory of Birth Defects, Children's Hospital of Fudan University, Shanghai 201102, China
| | - Huijun Wang
- Center for Molecular Medicine, Children's Hospital of Fudan University, Shanghai 201102, China; Key Laboratory of Birth Defects, Children's Hospital of Fudan University, Shanghai 201102, China
| | - Lin Yang
- Center for Molecular Medicine, Children's Hospital of Fudan University, Shanghai 201102, China; Key Laboratory of Birth Defects, Children's Hospital of Fudan University, Shanghai 201102, China
| | - Songyang Wu
- The Third Research Institute of the Ministry of Public Security, Shanghai 200031, China
| | - Qi Sun
- Department of Bioinformatics, Hangzhou Nuowei Information Technology Co., Ltd, Hangzhou, Zhejiang 310000, China
| | - Hao Zheng
- Department of Bioinformatics, Hangzhou Nuowei Information Technology Co., Ltd, Hangzhou, Zhejiang 310000, China
| | - Wenhao Zhou
- Center for Molecular Medicine, Children's Hospital of Fudan University, Shanghai 201102, China; Xiamen Campus of Children's Hospital of Fudan University, Xiamen, Fujian 361006, China.
| | - Shuang Wang
- Department of Bioinformatics, Hangzhou Nuowei Information Technology Co., Ltd, Hangzhou, Zhejiang 310000, China; Institutes for Systems Genetics, West China Hospital, Chengdu, Sichuan 610041, China; Shanghai Putuo People's Hospital, Tongji University, Shanghai 200060, China.
| |
Collapse
|
5
|
Oliva A, Kaphle A, Reguant R, Sng LMF, Twine NA, Malakar Y, Wickramarachchi A, Keller M, Ranbaduge T, Chan EKF, Breen J, Buckberry S, Guennewig B, Haas M, Brown A, Cowley MJ, Thorne N, Jain Y, Bauer DC. Future-proofing genomic data and consent management: a comprehensive review of technology innovations. Gigascience 2024; 13:giae021. [PMID: 38837943 PMCID: PMC11152178 DOI: 10.1093/gigascience/giae021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Revised: 01/15/2024] [Accepted: 04/09/2024] [Indexed: 06/07/2024] Open
Abstract
Genomic information is increasingly used to inform medical treatments and manage future disease risks. However, any personal and societal gains must be carefully balanced against the risk to individuals contributing their genomic data. Expanding our understanding of actionable genomic insights requires researchers to access large global datasets to capture the complexity of genomic contribution to diseases. Similarly, clinicians need efficient access to a patient's genome as well as population-representative historical records for evidence-based decisions. Both researchers and clinicians hence rely on participants to consent to the use of their genomic data, which in turn requires trust in the professional and ethical handling of this information. Here, we review existing and emerging solutions for secure and effective genomic information management, including storage, encryption, consent, and authorization that are needed to build participant trust. We discuss recent innovations in cloud computing, quantum-computing-proof encryption, and self-sovereign identity. These innovations can augment key developments from within the genomics community, notably GA4GH Passports and the Crypt4GH file container standard. We also explore how decentralized storage as well as the digital consenting process can offer culturally acceptable processes to encourage data contributions from ethnic minorities. We conclude that the individual and their right for self-determination needs to be put at the center of any genomics framework, because only on an individual level can the received benefits be accurately balanced against the risk of exposing private information.
Collapse
Affiliation(s)
- Adrien Oliva
- Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation, Level 3/160 Hawkesbury Rd, Westmead NSW 2145, Australia
| | - Anubhav Kaphle
- Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation, Level 3/160 Hawkesbury Rd, Westmead NSW 2145, Australia
| | - Roc Reguant
- Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation, Level 3/160 Hawkesbury Rd, Westmead NSW 2145, Australia
| | - Letitia M F Sng
- Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation, Level 3/160 Hawkesbury Rd, Westmead NSW 2145, Australia
| | - Natalie A Twine
- Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation, Level 3/160 Hawkesbury Rd, Westmead NSW 2145, Australia
| | - Yuwan Malakar
- Responsible Innovation Future Science Platform, Commonwealth Scientific and Industrial Research Organisation, Brisbane, 41 Boggo Rd, Dutton Park QLD 4102, Australia
| | - Anuradha Wickramarachchi
- Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation, Level 3/160 Hawkesbury Rd, Westmead NSW 2145, Australia
| | - Marcel Keller
- Data61, Commonwealth Scientific and Industrial Research Organisation, Level 5/13 Garden St, Eveleigh NSW 2015, Australia
| | - Thilina Ranbaduge
- Data61, Commonwealth Scientific and Industrial Research Organisation, Building 101, Clunies Ross St, Black Mountain, Canberra, ACT 2601, Australia
| | - Eva K F Chan
- NSW Health Pathology, Sydney, 1 Reserve Road, St Leonards NSW 2065, Australia
| | - James Breen
- Telethon Kids Institute, Perth, WA 6009, Australia
- National Centre for Indigenous Genomics, The John Curtin School of Medical Research, Australian National University, Canberra, ACT 2601, Australia
| | - Sam Buckberry
- Telethon Kids Institute, Perth, WA 6009, Australia
- National Centre for Indigenous Genomics, The John Curtin School of Medical Research, Australian National University, Canberra, ACT 2601, Australia
| | - Boris Guennewig
- Sydney Medical School, Brain and Mind Centre, The University of Sydney, Sydney, 94 Mallett St, Camperdown NSW 2050, Australia
| | - Matilda Haas
- Australian Genomics, Parkville, VIC 3052, Australia
- Murdoch Children’s Research Institute, Parkville, Victoria 3052, Australia
| | - Alex Brown
- Telethon Kids Institute, Perth, WA 6009, Australia
- National Centre for Indigenous Genomics, The John Curtin School of Medical Research, Australian National University, Canberra, ACT 2601, Australia
| | - Mark J Cowley
- Children’s Cancer Institute, Lowy Cancer Research Centre, Level 4, Lowy Cancer Research Centre Corner Botany & High Streets UNSW Kensington Campus UNSW Sydney, Kensington NSW 2052, Australia
- School of Clinical Medicine, UNSW Medicine & Health, Wallace Wurth Building (C27), Cnr High St & Botany St, UNSW Sydney, Kensington NSW 2052, Australia
| | - Natalie Thorne
- University of Melbourne, Melbourne, Parkville VIC 3052, Australia
- Melbourne Genomics Health Alliance, Melbourne 1G, Walter and Eliza Hall Institute/1G Royal Parade, Parkville VIC 3052, Australia
- Walter and Eliza Hall Institute, Melbourne, 1G, Walter and Eliza Hall Institute/1G Royal Parade, Parkville VIC 3052, Australia
| | - Yatish Jain
- Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation, Level 3/160 Hawkesbury Rd, Westmead NSW 2145, Australia
- Applied BioSciences, Faculty of Science and Engineering, Macquarie University, Applied BioSciences 205B Culloden Rd Macquarie University, NSW 2109, Australia
| | - Denis C Bauer
- Applied BioSciences, Faculty of Science and Engineering, Macquarie University, Applied BioSciences 205B Culloden Rd Macquarie University, NSW 2109, Australia
- Department of Biomedical Sciences, MQ Health General Practice - Macquarie University, Suite 305, Level 3/2 Technology Pl, Macquarie Park NSW 2109, Australia
- Australian e-Health Research Centre, Commonwealth Scientific and Industrial Research Organisation, Gate 13, Kintore Avenue University of Adelaide, Adelaide SA 5000, Australia
| |
Collapse
|
6
|
Hopman R. The face as folded object: Race and the problems with 'progress' in forensic DNA phenotyping. SOCIAL STUDIES OF SCIENCE 2023; 53:869-890. [PMID: 34338081 PMCID: PMC10696901 DOI: 10.1177/03063127211035562] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Forensic DNA phenotyping (FDP) encompasses a set of technologies aimed at predicting phenotypic characteristics from genotypes. Advocates of FDP present it as the future of forensics, with an ultimate goal of producing complete, individualised facial composites based on DNA. With a focus on individuals and promised advances in technology comes the assumption that modern methods are steadily moving away from racial science. Yet in the quantification of physical differences, FDP builds upon some nineteenth- and twentieth-century scientific practices that measured and categorised human variation in terms of race. In this article I complicate the linear temporal approach to scientific progress by building on the notion of the folded object. Drawing on ethnographic fieldwork conducted in various genetic laboratories, I show how nineteenth- and early twentieth-century anthropological measuring and data-collection practices and statistical averaging techniques are folded into the ordering of measurements of skin color data taken with a spectrophotometer, the analysis of facial shape based on computational landmarks and the collection of iris photographs. Attending to the historicity of FDP facial renderings, I bring into focus how race comes about as a consequence of temporal folds.
Collapse
Affiliation(s)
- Roos Hopman
- University of Amsterdam, Amsterdam, The Netherlands
| |
Collapse
|
7
|
Resnik D. Openness in Scientific Research: A Historical and Philosophical Perspective. JOURNAL OF OPEN ACCESS TO LAW 2023; 11:132. [PMID: 37994350 PMCID: PMC10665006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 11/24/2023]
Abstract
Openness is widely regarded as a pillar of scientific ethics because it promotes reproducibility and progress in science and benefits society. However, the sharing of scientific information can sometimes adversely impact the interests of human research participants, human communities or populations, scientists, and private research sponsors; and may threaten national security. Because openness may conflict with other important social values, solutions to ethical and policy dilemmas should include meaningful input from those who are impacted by the sharing and use of scientific information, including research participants, communities, and the public. Data sharing and use policies should be reviewed and revised periodically to account for ongoing changes in science, technology, and society.
Collapse
Affiliation(s)
- David Resnik
- National Institute of Environmental Health Sciences (NIEHS)
| |
Collapse
|
8
|
Kayser M, Branicki W, Parson W, Phillips C. Recent advances in Forensic DNA Phenotyping of appearance, ancestry and age. Forensic Sci Int Genet 2023; 65:102870. [PMID: 37084623 DOI: 10.1016/j.fsigen.2023.102870] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Accepted: 04/04/2023] [Indexed: 04/09/2023]
Abstract
Forensic DNA Phenotyping (FDP) comprises the prediction of a person's externally visible characteristics regarding appearance, biogeographic ancestry and age from DNA of crime scene samples, to provide investigative leads to help find unknown perpetrators that cannot be identified with forensic STR-profiling. In recent years, FDP has advanced considerably in all of its three components, which we summarize in this review article. Appearance prediction from DNA has broadened beyond eye, hair and skin color to additionally comprise other traits such as eyebrow color, freckles, hair structure, hair loss in men, and tall stature. Biogeographic ancestry inference from DNA has progressed from continental ancestry to sub-continental ancestry detection and the resolving of co-ancestry patterns in genetically admixed individuals. Age estimation from DNA has widened beyond blood to more somatic tissues such as saliva and bones as well as new markers and tools for semen. Technological progress has allowed forensically suitable DNA technology with largely increased multiplex capacity for the simultaneous analysis of hundreds of DNA predictors with targeted massively parallel sequencing (MPS). Forensically validated MPS-based FDP tools for predicting from crime scene DNA i) several appearance traits, ii) multi-regional ancestry, iii) several appearance traits together with multi-regional ancestry, and iv) age from different tissue types, are already available. Despite recent advances that will likely increase the impact of FDP in criminal casework in the near future, moving reliable appearance, ancestry and age prediction from crime scene DNA to the level of detail and accuracy police investigators may desire, requires further intensified scientific research together with technical developments and forensic validations as well as the necessary funding.
Collapse
Affiliation(s)
- Manfred Kayser
- Department of Genetic Identification, Erasmus MC, University Medical Center Rotterdam, Rotterdam, the Netherlands.
| | - Wojciech Branicki
- Institute of Zoology and Biomedical Research, Jagiellonian University, Kraków, Poland,; Institute of Forensic Research, Kraków, Poland
| | - Walther Parson
- Institute of Legal Medicine, Medical University of Innsbruck, Innsbruck, Austria; Forensic Science Program, The Pennsylvania State University, PA, USA
| | - Christopher Phillips
- Forensic Genetics Unit, Institute of Forensic Sciences, University of Santiago de Compostela, Spain
| |
Collapse
|
9
|
Jiang Y, Shang T, Liu J. Secure Counting Query Protocol for Genomic Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:1457-1468. [PMID: 35666798 DOI: 10.1109/tcbb.2022.3178446] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Statistical analysis on genomic data can explore the relationship between gene sequence and phenotype. Particularly, counting the genomic mutation samples and associating with related phenotypes for statistical analysis can annotate the variation sites and help to diagnose genovariation. Expansion of the size of variation sample data helps to increase the accuracy of statistical analysis. It is feasible to securely share data from genomic databases on cloud platforms. In this paper, we design a secure counting query protocol that can securely share genomic data on cloud platforms. Our protocol supports statistical analysis of the genomic data in VCF (Variant Call Format) files by counting query. There are three participants of data owner, cloud platform and query party. Firstly, the genomic data is preprocessed to reduce the data size. Secondly, Paillier homomorphic is used so that genomic data can be securely shared and calculated on cloud platform. Finally, the results which be decrypted is used to implement counting function of the protocol. Experimental results show that the protocol can implement the query counting function after homomorphic encryption. The query time is less than 1 s, which provide a feasible solution to share genomic data securely on cloud platform for statistical analysis.
Collapse
|
10
|
Liu K, Chen Q, Huang GH. An Efficient Feature Selection Algorithm for Gene Families Using NMF and ReliefF. Genes (Basel) 2023; 14:421. [PMID: 36833348 PMCID: PMC9957060 DOI: 10.3390/genes14020421] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Revised: 01/24/2023] [Accepted: 01/25/2023] [Indexed: 02/10/2023] Open
Abstract
Gene families, which are parts of a genome's information storage hierarchy, play a significant role in the development and diversity of multicellular organisms. Several studies have focused on the characteristics of gene families, such as function, homology, or phenotype. However, statistical and correlation analyses on the distribution of gene family members in the genome have yet to be conducted. Here, a novel framework incorporating gene family analysis and genome selection based on NMF-ReliefF is reported. Specifically, the proposed method starts by obtaining gene families from the TreeFam database and determining the number of gene families within the feature matrix. Then, NMF-ReliefF is used to select features from the gene feature matrix, which is a new feature selection algorithm that overcomes the inefficiencies of traditional methods. Finally, a support vector machine is utilized to classify the acquired features. The results show that the framework achieved an accuracy of 89.1% and an AUC of 0.919 on the insect genome test set. We also employed four microarray gene data sets to evaluate the performance of the NMF-ReliefF algorithm. The outcomes show that the proposed method may strike a delicate balance between robustness and discrimination. Additionally, the proposed method's categorization is superior to state-of-the-art feature selection approaches.
Collapse
Affiliation(s)
- Kai Liu
- College of Plant Protection, Hunan Agricultural University, Changsha 410128, China
- Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Nongda Road, Furong District, Changsha 410128, China
- College of Information and Intelligence, Hunan Agricultural University, Changsha 410128, China
| | - Qi Chen
- College of Plant Protection, Hunan Agricultural University, Changsha 410128, China
- Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Nongda Road, Furong District, Changsha 410128, China
| | - Guo-Hua Huang
- College of Plant Protection, Hunan Agricultural University, Changsha 410128, China
- Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Nongda Road, Furong District, Changsha 410128, China
| |
Collapse
|
11
|
Advancement in Human Face Prediction Using DNA. Genes (Basel) 2023; 14:genes14010136. [PMID: 36672878 PMCID: PMC9858985 DOI: 10.3390/genes14010136] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2022] [Revised: 12/15/2022] [Accepted: 12/21/2022] [Indexed: 01/05/2023] Open
Abstract
The rapid improvements in identifying the genetic factors contributing to facial morphology have enabled the early identification of craniofacial syndromes. Similarly, this technology can be vital in forensic cases involving human identification from biological traces or human remains, especially when reference samples are not available in the deoxyribose nucleic acid (DNA) database. This review summarizes the currently used methods for predicting human phenotypes such as age, ancestry, pigmentation, and facial features based on genetic variations. To identify the facial features affected by DNA, various two-dimensional (2D)- and three-dimensional (3D)-scanning techniques and analysis tools are reviewed. A comparison between the scanning technologies is also presented in this review. Face-landmarking techniques and face-phenotyping algorithms are discussed in chronological order. Then, the latest approaches in genetic to 3D face shape analysis are emphasized. A systematic review of the current markers that passed the threshold of a genome-wide association (GWAS) of single nucleotide polymorphism (SNP)-face traits from the GWAS Catalog is also provided using the preferred reporting items for systematic reviews and meta-analyses (PRISMA), approach. Finally, the current challenges in forensic DNA phenotyping are analyzed and discussed.
Collapse
|
12
|
Kaufmann M. DNA as in-formation. WIRES. FORENSIC SCIENCE 2023; 5:e1470. [PMID: 37070086 PMCID: PMC10103537 DOI: 10.1002/wfs2.1470] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/08/2021] [Revised: 07/07/2022] [Accepted: 07/21/2022] [Indexed: 11/09/2022]
Abstract
Traces are fundamental vectors of information. This is the first of seven forensic principles formulated by the 2022 Sydney declaration. To better understand the trace as information, this article proposes the notion of in-formation. DNA is matter in becoming. DNA changes as it travels across forensic sites and domains. New formations occur as humans, technologies and DNA interact. Understanding DNA as in-formation is of particular relevance vis-à-vis the increase of algorithmic technologies in the forensic sciences and the rendering of DNA into (big) data. The concept can help identifying, acknowledging and communicating those moments of techno-scientific interaction that require discretion and methodical decisions. It can assist in tracing what form DNA will take and what consequences this may have. This article is categorized under:Crime Scene Investigation > From Traces to Intelligence and EvidenceForensic Biology > Ethical and Social ImplicationsForensic Biology > Forensic DNA Technologies.
Collapse
Affiliation(s)
- Mareile Kaufmann
- Department of Criminology and Sociology of LawUniversity of OsloOsloNorway
| |
Collapse
|
13
|
Khalil AT, Shinwari ZK, Islam A. Fostering openness in open science: An ethical discussion of risks and benefits. FRONTIERS IN POLITICAL SCIENCE 2022; 4. [DOI: 10.3389/fpos.2022.930574] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]
Abstract
Transformation of science by embracing the concepts of open science presents a very attractive strategy to enhance the reliability of science. Open science policies embody the concepts of open data and open access that encompass sharing of resources, dissemination of ideas, and synergizing the collaborative forums of research. Despite the opportunities in openness, however, there are grave ethical concerns too, and they present a dual-use dilemma. Access to sensitive information is seen as a security risk, and it also possesses other concerns such as confidentiality, privacy, and affordability. There are arguments that open science can be harmful to marginalized groups. Through this study, we aim to discuss the opportunities of open science, as well as the ethical and security aspects, which require further deliberation before full-fledged acceptance in the science community.
Collapse
|
14
|
TrustGWAS: A full-process workflow for encrypted GWAS using multi-key homomorphic encryption and pseudorandom number perturbation. Cell Syst 2022; 13:752-767.e6. [PMID: 36041458 DOI: 10.1016/j.cels.2022.08.001] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Revised: 04/21/2022] [Accepted: 08/04/2022] [Indexed: 01/26/2023]
Abstract
The statistical power of genome-wide association studies (GWASs) is affected by the effective sample size. However, the privacy and security concerns associated with individual-level genotype data pose great challenges for cross-institutional cooperation. The full-process cryptographic solutions are in demand but have not been covered, especially the essential principal-component analysis (PCA). Here, we present TrustGWAS, a complete solution for secure, large-scale GWAS, recapitulating gold standard results against PLINK without compromising privacy and supporting basic PLINK steps including quality control, linkage disequilibrium pruning, PCA, chi-square test, Cochran-Armitage trend test, covariate-supported logistic regression and linear regression, and their sequential combinations. TrustGWAS leverages pseudorandom number perturbations for PCA and multiparty scheme of multi-key homomorphic encryption for all other modules. TrustGWAS can evaluate 100,000 individuals with 1 million variants and complete QC-LD-PCA-regression workflow within 50 h. We further successfully discover gene loci associated with fasting blood glucose, consistent with the findings of the ChinaMAP project.
Collapse
|
15
|
Naqvi S, Hoskens H, Wilke F, Weinberg SM, Shaffer JR, Walsh S, Shriver MD, Wysocka J, Claes P. Decoding the Human Face: Progress and Challenges in Understanding the Genetics of Craniofacial Morphology. Annu Rev Genomics Hum Genet 2022; 23:383-412. [PMID: 35483406 PMCID: PMC9482780 DOI: 10.1146/annurev-genom-120121-102607] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Variations in the form of the human face, which plays a role in our individual identities and societal interactions, have fascinated scientists and artists alike. Here, we review our current understanding of the genetics underlying variation in craniofacial morphology and disease-associated dysmorphology, synthesizing decades of progress on Mendelian syndromes in addition to more recent results from genome-wide association studies of human facial shape and disease risk. We also discuss the various approaches used to phenotype and quantify facial shape, which are of particular importance due to the complex, multipartite nature of the craniofacial form. We close by discussing how experimental studies have contributed and will further contribute to our understanding of human genetic variation and then proposing future directions and applications for the field.
Collapse
Affiliation(s)
- Sahin Naqvi
- Department of Chemical and Systems Biology, Stanford University School of Medicine, Stanford, California, USA; ,
- Department of Genetics, Stanford University School of Medicine, Stanford, California, USA
| | - Hanne Hoskens
- Center for Processing Speech and Images, Department of Electrical Engineering, KU Leuven, Leuven, Belgium; ,
- Medical Imaging Research Center, University Hospitals Leuven, Leuven, Belgium
| | - Franziska Wilke
- Department of Biology, Indiana University-Purdue University Indianapolis, Indianapolis, Indiana, USA; ,
| | - Seth M Weinberg
- Department of Human Genetics, University of Pittsburgh, Pittsburgh, Pennsylvania, USA; ,
- Center for Craniofacial and Dental Genetics, Department of Oral and Craniofacial Sciences, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
- Department of Anthropology, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| | - John R Shaffer
- Department of Human Genetics, University of Pittsburgh, Pittsburgh, Pennsylvania, USA; ,
- Center for Craniofacial and Dental Genetics, Department of Oral and Craniofacial Sciences, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| | - Susan Walsh
- Department of Biology, Indiana University-Purdue University Indianapolis, Indianapolis, Indiana, USA; ,
| | - Mark D Shriver
- Department of Anthropology, The Pennsylvania State University, University Park, Pennsylvania, USA;
| | - Joanna Wysocka
- Department of Chemical and Systems Biology, Stanford University School of Medicine, Stanford, California, USA; ,
- Department of Developmental Biology, Stanford University School of Medicine, Stanford, California, USA
- Howard Hughes Medical Institute, Stanford University School of Medicine, Stanford, California, USA
| | - Peter Claes
- Center for Processing Speech and Images, Department of Electrical Engineering, KU Leuven, Leuven, Belgium; ,
- Medical Imaging Research Center, University Hospitals Leuven, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
- Murdoch Children's Research Institute, Melbourne, Victoria, Australia
| |
Collapse
|
16
|
Joshi RS, Rigau M, García-Prieto CA, Castro de Moura M, Piñeyro D, Moran S, Davalos V, Carrión P, Ferrando-Bernal M, Olalde I, Lalueza-Fox C, Navarro A, Fernández-Tena C, Aspandi D, Sukno FM, Binefa X, Valencia A, Esteller M. Look-alike humans identified by facial recognition algorithms show genetic similarities. Cell Rep 2022; 40:111257. [PMID: 36001980 DOI: 10.1016/j.celrep.2022.111257] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Revised: 06/05/2022] [Accepted: 08/01/2022] [Indexed: 11/03/2022] Open
Abstract
The human face is one of the most visible features of our unique identity as individuals. Interestingly, monozygotic twins share almost identical facial traits and the same DNA sequence but could exhibit differences in other biometrical parameters. The expansion of the world wide web and the possibility to exchange pictures of humans across the planet has increased the number of people identified online as virtual twins or doubles that are not family related. Herein, we have characterized in detail a set of "look-alike" humans, defined by facial recognition algorithms, for their multiomics landscape. We report that these individuals share similar genotypes and differ in their DNA methylation and microbiome landscape. These results not only provide insights about the genetics that determine our face but also might have implications for the establishment of other human anthropometric properties and even personality characteristics.
Collapse
Affiliation(s)
- Ricky S Joshi
- Josep Carreras Leukaemia Research Institute (IJC), Badalona, 08916 Barcelona, Spain
| | - Maria Rigau
- Barcelona Supercomputing Center (BSC), 08034 Barcelona, Spain
| | - Carlos A García-Prieto
- Josep Carreras Leukaemia Research Institute (IJC), Badalona, 08916 Barcelona, Spain; Barcelona Supercomputing Center (BSC), 08034 Barcelona, Spain
| | | | - David Piñeyro
- Josep Carreras Leukaemia Research Institute (IJC), Badalona, 08916 Barcelona, Spain; Centro de Investigacion Biomedica en Red Cancer (CIBERONC), 28029 Madrid, Spain
| | - Sebastian Moran
- Josep Carreras Leukaemia Research Institute (IJC), Badalona, 08916 Barcelona, Spain
| | - Veronica Davalos
- Josep Carreras Leukaemia Research Institute (IJC), Badalona, 08916 Barcelona, Spain
| | - Pablo Carrión
- Institute of Evolutionary Biology (CSIC-Universitat Pompeu Fabra), 08003 Barcelona, Spain
| | - Manuel Ferrando-Bernal
- Institute of Evolutionary Biology (CSIC-Universitat Pompeu Fabra), 08003 Barcelona, Spain
| | - Iñigo Olalde
- Institute of Evolutionary Biology (CSIC-Universitat Pompeu Fabra), 08003 Barcelona, Spain
| | - Carles Lalueza-Fox
- Institute of Evolutionary Biology (CSIC-Universitat Pompeu Fabra), 08003 Barcelona, Spain
| | - Arcadi Navarro
- Institute of Evolutionary Biology (CSIC-Universitat Pompeu Fabra), 08003 Barcelona, Spain; Centre for Genomic Regulation (CNAG-CRG), 08003 Barcelona, Catalonia, Spain; Institucio Catalana de Recerca i Estudis Avançats (ICREA), 08010 Barcelona, Spain
| | | | - Decky Aspandi
- Departament de Tecnologies de la Informació i les Comunicaciones (DTIC), Universitat Pompeu Fabra (UPF), 08018 Barcelona, Spain
| | - Federico M Sukno
- Departament de Tecnologies de la Informació i les Comunicaciones (DTIC), Universitat Pompeu Fabra (UPF), 08018 Barcelona, Spain
| | - Xavier Binefa
- Departament de Tecnologies de la Informació i les Comunicaciones (DTIC), Universitat Pompeu Fabra (UPF), 08018 Barcelona, Spain
| | - Alfonso Valencia
- Barcelona Supercomputing Center (BSC), 08034 Barcelona, Spain; Institucio Catalana de Recerca i Estudis Avançats (ICREA), 08010 Barcelona, Spain
| | - Manel Esteller
- Josep Carreras Leukaemia Research Institute (IJC), Badalona, 08916 Barcelona, Spain; Centro de Investigacion Biomedica en Red Cancer (CIBERONC), 28029 Madrid, Spain; Institucio Catalana de Recerca i Estudis Avançats (ICREA), 08010 Barcelona, Spain; Physiological Sciences Department, School of Medicine and Health Sciences, University of Barcelona (UB), L'Hospitalet, 08907 Barcelona, Spain.
| |
Collapse
|
17
|
Hohl DM, González R, Di Santo Meztler GP, Patiño-Rico J, Dejean C, Avena S, Gutiérrez MDLÁ, Catanesi CI. Applicability of the IrisPlex system for eye color prediction in an admixed population from Argentina. Ann Hum Genet 2022; 86:297-327. [PMID: 35946314 DOI: 10.1111/ahg.12480] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Revised: 07/07/2022] [Accepted: 07/11/2022] [Indexed: 11/29/2022]
Abstract
Eye color prediction based on an individual's genetic information is of interest in the field of forensic genetics. In recent years, researchers have studied different genes and markers associated with this externally visible characteristic and have developed methods for its prediction. The IrisPlex represents a validated tool for homogeneous populations, though its applicability in populations of mixed ancestry is limited, mainly regarding the prediction of intermediate eye colors. With the aim of validating the applicability of this system in an admixed population from Argentina (n = 302), we analyzed the six single nucleotide variants used in that multiplex for eye color and four additional SNPs, and evaluated its prediction ability. We also performed a genotype-phenotype association analysis. This system proved to be useful when dealing with the extreme ends of the eye color spectrum (blue and brown) but presented difficulties in determining the intermediate phenotypes (green), which were found in a large proportion of our population. We concluded that these genetic tools should be used with caution in admixed populations and that more studies are required in order to improve the prediction of intermediate phenotypes.
Collapse
Affiliation(s)
- Diana María Hohl
- Laboratorio de Diversidad Genética, Instituto Multidisciplinario de Biología Celular IMBICE (CONICET-UNLP-CIC), La Plata, Buenos Aires, Argentina
| | - Rebeca González
- Laboratorio de Diversidad Genética, Instituto Multidisciplinario de Biología Celular IMBICE (CONICET-UNLP-CIC), La Plata, Buenos Aires, Argentina
| | - Gabriela Paula Di Santo Meztler
- Centro de Investigación de Proteínas Vegetales (CIPROVE-Centro Asociado CICPBA-UNLP), Depto. de Cs. Biológicas, Facultad de Cs. Exactas, Universidad Nacional de La Plata (UNLP), La Plata, Buenos Aires, Argentina
| | - Jessica Patiño-Rico
- Centro de Ciencias Naturales, Ambientales y Antropológicas, Universidad Maimónides, Buenos Aires, Argentina
| | - Cristina Dejean
- Centro de Ciencias Naturales, Ambientales y Antropológicas, Universidad Maimónides, Buenos Aires, Argentina.,Universidad de Buenos Aires, Facultad de Filosofía y Letras, Instituto de Ciencias Antropológicas (ICA), Sección Antropología Biológica, Buenos Aires, Argentina
| | - Sergio Avena
- Centro de Ciencias Naturales, Ambientales y Antropológicas, Universidad Maimónides, Buenos Aires, Argentina.,Consejo Nacional de Investigaciones Científicas y Técnicas CONICET, Buenos Aires, Argentina
| | - María De Los Ángeles Gutiérrez
- Centro de Investigaciones del Medioambiente CIM, Facultad de Ciencias Exactas-CONICET, Universidad Nacional de La Plata, La Plata, Buenos Aires, Argentina
| | - Cecilia Inés Catanesi
- Laboratorio de Diversidad Genética, Instituto Multidisciplinario de Biología Celular IMBICE (CONICET-UNLP-CIC), La Plata, Buenos Aires, Argentina.,Consejo Nacional de Investigaciones Científicas y Técnicas CONICET, Buenos Aires, Argentina.,Facultad de Ciencias Naturales y Museo, Universidad Nacional de La Plata, La Plata, Buenos Aires, Argentina
| |
Collapse
|
18
|
Abstract
Genomics data are important for advancing biomedical research, improving clinical care, and informing other disciplines such as forensics and genealogy. However, privacy concerns arise when genomic data are shared. In particular, the identifying nature of genetic information, its direct relationship to health status, and the potential financial harm and stigmatization posed to individuals and their blood relatives call for a survey of the privacy issues related to sharing genetic and related data and potential solutions to overcome these issues. In this work, we provide an overview of the importance of genomic privacy, the information gleaned from genomics data, the sources of potential private information leakages in genomics, and ways to preserve privacy while utilizing the genetic information in research. We discuss the relationship between trust in the scientific community and protecting privacy, illuminating a future roadmap for data sharing and study participation.
Collapse
Affiliation(s)
- Gamze Gürsoy
- Department of Biomedical Informatics, Columbia University, New York, NY, USA; .,New York Genome Center, New York, NY, USA
| |
Collapse
|
19
|
Wan Z, Hazel JW, Clayton EW, Vorobeychik Y, Kantarcioglu M, Malin BA. Sociotechnical safeguards for genomic data privacy. Nat Rev Genet 2022; 23:429-445. [PMID: 35246669 PMCID: PMC8896074 DOI: 10.1038/s41576-022-00455-y] [Citation(s) in RCA: 28] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/24/2022] [Indexed: 12/21/2022]
Abstract
Recent developments in a variety of sectors, including health care, research and the direct-to-consumer industry, have led to a dramatic increase in the amount of genomic data that are collected, used and shared. This state of affairs raises new and challenging concerns for personal privacy, both legally and technically. This Review appraises existing and emerging threats to genomic data privacy and discusses how well current legal frameworks and technical safeguards mitigate these concerns. It concludes with a discussion of remaining and emerging challenges and illustrates possible solutions that can balance protecting privacy and realizing the benefits that result from the sharing of genetic information.
Collapse
Affiliation(s)
- Zhiyu Wan
- Center for Genetic Privacy and Identity in Community Settings, Vanderbilt University Medical Center, Nashville, TN, USA
- Department of Computer Science, Vanderbilt University, Nashville, TN, USA
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - James W Hazel
- Center for Genetic Privacy and Identity in Community Settings, Vanderbilt University Medical Center, Nashville, TN, USA
- Center for Biomedical Ethics and Society, Vanderbilt University, Nashville, TN, USA
| | - Ellen Wright Clayton
- Center for Genetic Privacy and Identity in Community Settings, Vanderbilt University Medical Center, Nashville, TN, USA
- Center for Biomedical Ethics and Society, Vanderbilt University, Nashville, TN, USA
- Vanderbilt University Law School, Nashville, TN, USA
| | - Yevgeniy Vorobeychik
- Department of Computer Science and Engineering, Washington University in St. Louis, St. Louis, MO, USA
| | - Murat Kantarcioglu
- Department of Computer Science, University of Texas at Dallas, Richardson, TX, USA
| | - Bradley A Malin
- Center for Genetic Privacy and Identity in Community Settings, Vanderbilt University Medical Center, Nashville, TN, USA.
- Department of Computer Science, Vanderbilt University, Nashville, TN, USA.
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA.
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, USA.
| |
Collapse
|
20
|
Cope H, Willis CR, MacKay MJ, Rutter LA, Toh LS, Williams PM, Herranz R, Borg J, Bezdan D, Giacomello S, Muratani M, Mason CE, Etheridge T, Szewczyk NJ. Routine omics collection is a golden opportunity for European human research in space and analog environments. PATTERNS 2022; 3:100550. [PMID: 36277820 PMCID: PMC9583032 DOI: 10.1016/j.patter.2022.100550] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
21
|
Chakraborty D, Sharma N, Kour S, Sodhi SS, Gupta MK, Lee SJ, Son YO. Applications of Omics Technology for Livestock Selection and Improvement. Front Genet 2022; 13:774113. [PMID: 35719396 PMCID: PMC9204716 DOI: 10.3389/fgene.2022.774113] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2021] [Accepted: 05/16/2022] [Indexed: 12/16/2022] Open
Abstract
Conventional animal selection and breeding methods were based on the phenotypic performance of the animals. These methods have limitations, particularly for sex-limited traits and traits expressed later in the life cycle (e.g., carcass traits). Consequently, the genetic gain has been slow with high generation intervals. With the advent of high-throughput omics techniques and the availability of multi-omics technologies and sophisticated analytic packages, several promising tools and methods have been developed to estimate the actual genetic potential of the animals. It has now become possible to collect and access large and complex datasets comprising different genomics, transcriptomics, proteomics, metabolomics, and phonemics data as well as animal-level data (such as longevity, behavior, adaptation, etc.,), which provides new opportunities to better understand the mechanisms regulating animals’ actual performance. The cost of omics technology and expertise of several fields like biology, bioinformatics, statistics, and computational biology make these technology impediments to its use in some cases. The population size and accurate phenotypic data recordings are other significant constraints for appropriate selection and breeding strategies. Nevertheless, omics technologies can estimate more accurate breeding values (BVs) and increase the genetic gain by assisting the section of genetically superior, disease-free animals at an early stage of life for enhancing animal productivity and profitability. This manuscript provides an overview of various omics technologies and their limitations for animal genetic selection and breeding decisions.
Collapse
Affiliation(s)
- Dibyendu Chakraborty
- Division of Animal Genetics and Breeding, Faculty of Veterinary Sciences and Animal Husbandry, Sher-e-Kashmir University of Agricultural Sciences and Technology of Jammu, Ranbir Singh Pura, India
| | - Neelesh Sharma
- Division of Veterinary Medicine, Faculty of Veterinary Sciences and Animal Husbandry, Sher-e-Kashmir University of Agricultural Sciences and Technology of Jammu, Ranbir Singh Pura, India
- *Correspondence: Neelesh Sharma, ; Young Ok Son,
| | - Savleen Kour
- Division of Veterinary Medicine, Faculty of Veterinary Sciences and Animal Husbandry, Sher-e-Kashmir University of Agricultural Sciences and Technology of Jammu, Ranbir Singh Pura, India
| | - Simrinder Singh Sodhi
- Department of Animal Biotechnology, College of Animal Biotechnology, Guru Angad Dev Veterinary and Animal Sciences University, Ludhiana, India
| | - Mukesh Kumar Gupta
- Department of Biotechnology and Medical Engineering, National Institute of Technology, Rourkela, India
| | - Sung Jin Lee
- Department of Animal Biotechnology, College of Animal Life Sciences, Kangwon National University, Chuncheon-si, South Korea
| | - Young Ok Son
- Department of Animal Biotechnology, Faculty of Biotechnology, College of Applied Life Sciences and Interdisciplinary Graduate Program in Advanced Convergence Technology and Science, Jeju National University, Jeju, South Korea
- *Correspondence: Neelesh Sharma, ; Young Ok Son,
| |
Collapse
|
22
|
Dabas P, Jain S, Khajuria H, Nayak BP. Forensic DNA phenotyping: Inferring phenotypic traits from crime scene DNA. J Forensic Leg Med 2022; 88:102351. [DOI: 10.1016/j.jflm.2022.102351] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2021] [Revised: 03/01/2022] [Accepted: 04/04/2022] [Indexed: 10/18/2022]
|
23
|
Qian W, Zhang M, Wan K, Xie Y, Du S, Li J, Mu X, Qiu J, Xue X, Zhuang X, Wu Y, Liu F, Wang S. Genetic evidence for facial variation being a composite phenotype of cranial variation and facial soft tissue thickness. J Genet Genomics 2022; 49:934-942. [DOI: 10.1016/j.jgg.2022.02.020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2021] [Revised: 02/17/2022] [Accepted: 02/20/2022] [Indexed: 10/18/2022]
|
24
|
Alsaffar MM, Hasan M, McStay GP, Sedky M. Digital DNA lifecycle security and privacy: an overview. Brief Bioinform 2022; 23:6518049. [PMID: 35106557 DOI: 10.1093/bib/bbab607] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2021] [Revised: 12/29/2021] [Accepted: 12/30/2021] [Indexed: 11/14/2022] Open
Abstract
DNA sequencing technologies have advanced significantly in the last few years leading to advancements in biomedical research which has improved personalised medicine and the discovery of new treatments for diseases. Sequencing technology advancement has also reduced the cost of DNA sequencing, which has led to the rise of direct-to-consumer (DTC) sequencing, e.g. 23andme.com, ancestry.co.uk, etc. In the meantime, concerns have emerged over privacy and security in collecting, handling, analysing and sharing DNA and genomic data. DNA data are unique and can be used to identify individuals. Moreover, those data provide information on people's current disease status and disposition, e.g. mental health or susceptibility for developing cancer. DNA privacy violation does not only affect the owner but also affects their close consanguinity due to its hereditary nature. This article introduces and defines the term 'digital DNA life cycle' and presents an overview of privacy and security threats and their mitigation techniques for predigital DNA and throughout the digital DNA life cycle. It covers DNA sequencing hardware, software and DNA sequence pipeline in addition to common privacy attacks and their countermeasures when DNA digital data are stored, queried or shared. Likewise, the article examines DTC genomic sequencing privacy and security.
Collapse
Affiliation(s)
- Muhalb M Alsaffar
- Department of Computing, AI and Robotics, School of Digital, Technologies and Arts, Staffordshire University, College Road, ST4 2DE, Staffordshire, United Kingdom
| | | | - Gavin P McStay
- Department of Biological Sciences, School of Health, Science and Wellbeing, Staffordshire University, College Road, Stoke-on-Trent, Staffordshire, ST4 2DE, United Kingdom
| | - Mohamed Sedky
- Department of Computing, AI and Robotics, School of Digital, Technologies and Arts, Staffordshire University, College Road, ST4 2DE, Staffordshire, United Kingdom
| |
Collapse
|
25
|
Pośpiech E, Teisseyre P, Mielniczuk J, Branicki W. Predicting Physical Appearance from DNA Data-Towards Genomic Solutions. Genes (Basel) 2022; 13:genes13010121. [PMID: 35052461 PMCID: PMC8774670 DOI: 10.3390/genes13010121] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2021] [Revised: 01/03/2022] [Accepted: 01/04/2022] [Indexed: 02/04/2023] Open
Abstract
The idea of forensic DNA intelligence is to extract from genomic data any information that can help guide the investigation. The clues to the externally visible phenotype are of particular practical importance. The high heritability of the physical phenotype suggests that genetic data can be easily predicted, but this has only become possible with less polygenic traits. The forensic community has developed DNA-based predictive tools by employing a limited number of the most important markers analysed with targeted massive parallel sequencing. The complexity of the genetics of many other appearance phenotypes requires big data coupled with sophisticated machine learning methods to develop accurate genomic predictors. A significant challenge in developing universal genomic predictive methods will be the collection of sufficiently large data sets. These should be created using whole-genome sequencing technology to enable the identification of rare DNA variants implicated in phenotype determination. It is worth noting that the correctness of the forensic sketch generated from the DNA data depends on the inclusion of an age factor. This, however, can be predicted by analysing epigenetic data. An important limitation preventing whole-genome approaches from being commonly used in forensics is the slow progress in the development and implementation of high-throughput, low DNA input sequencing technologies. The example of palaeoanthropology suggests that such methods may possibly be developed in forensics.
Collapse
Affiliation(s)
- Ewelina Pośpiech
- Malopolska Centre of Biotechnology, Jagiellonian University, 30-387 Kraków, Poland;
| | - Paweł Teisseyre
- Institute of Computer Science, Polish Academy of Sciences, 01-248 Warsaw, Poland; (P.T.); (J.M.)
- Faculty of Mathematics and Information Science, Warsaw University of Technology, 00-662 Warsaw, Poland
| | - Jan Mielniczuk
- Institute of Computer Science, Polish Academy of Sciences, 01-248 Warsaw, Poland; (P.T.); (J.M.)
- Faculty of Mathematics and Information Science, Warsaw University of Technology, 00-662 Warsaw, Poland
| | - Wojciech Branicki
- Malopolska Centre of Biotechnology, Jagiellonian University, 30-387 Kraków, Poland;
- Central Forensic Laboratory of the Police, 00-583 Warsaw, Poland
- Correspondence: ; Tel.: +48-126-645-024
| |
Collapse
|
26
|
DiEuliis D, Giordano J. Balancing Act: Precision Medicine and National Security. Mil Med 2021; 187:32-35. [PMID: 34967406 DOI: 10.1093/milmed/usab017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2020] [Revised: 01/12/2021] [Accepted: 01/14/2021] [Indexed: 11/13/2022] Open
Abstract
Developments in genetics, pharmacology, biomarker identification, imaging, and interventional biotechnology are enabling medicine to become increasingly more precise in "personalized" approaches to assessing and treating individual patients. Here we describe current scientific and technological developments in precision medicine and elucidate the dual-use risks of employing these tools and capabilities to exert disruptive influence upon human health, economics, social structure, military capabilities, and global dimensions of power. We advocate continued enterprise toward more completely addressing nuances in the ethical systems and approaches that can-and should-be implemented (and communicated) to more effectively inform policy to guide and govern the biosecurity and use of current and emerging bioscience and technology on the rapidly shifting global stage.
Collapse
Affiliation(s)
- Diane DiEuliis
- Center for the Study of Weapons of Mass Destruction, National Defense University, Washington, DC 20319, USA
| | - James Giordano
- Departments of Neurology and Biochemistry Pellegrino Center for Clinical Bioethics and Cyber-SMART Center, Georgetown University, Washington, DC 20057, USA.,Program in Biosecurity, Technology and Ethics US Naval War College, Newport, RI 02841, USA
| |
Collapse
|
27
|
Wan Z, Vorobeychik Y, Xia W, Liu Y, Wooders M, Guo J, Yin Z, Clayton EW, Kantarcioglu M, Malin BA. Using game theory to thwart multistage privacy intrusions when sharing data. SCIENCE ADVANCES 2021; 7:eabe9986. [PMID: 34890225 PMCID: PMC8664254 DOI: 10.1126/sciadv.abe9986] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/14/2020] [Accepted: 10/25/2021] [Indexed: 06/13/2023]
Abstract
Person-specific biomedical data are now widely collected, but its sharing raises privacy concerns, specifically about the re-identification of seemingly anonymous records. Formal re-identification risk assessment frameworks can inform decisions about whether and how to share data; current techniques, however, focus on scenarios where the data recipients use only one resource for re-identification purposes. This is a concern because recent attacks show that adversaries can access multiple resources, combining them in a stage-wise manner, to enhance the chance of an attack’s success. In this work, we represent a re-identification game using a two-player Stackelberg game of perfect information, which can be applied to assess risk, and suggest an optimal data sharing strategy based on a privacy-utility tradeoff. We report on experiments with large-scale genomic datasets to show that, using game theoretic models accounting for adversarial capabilities to launch multistage attacks, most data can be effectively shared with low re-identification risk.
Collapse
Affiliation(s)
- Zhiyu Wan
- Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, TN 37212, USA
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, USA
| | - Yevgeniy Vorobeychik
- Department of Computer Science and Engineering, Washington University in St. Louis, St. Louis, MO 63130, USA
| | - Weiyi Xia
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, USA
| | - Yongtai Liu
- Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, TN 37212, USA
| | - Myrna Wooders
- Department of Economics, Vanderbilt University, Nashville, TN 37235, USA
| | - Jia Guo
- Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, TN 37212, USA
| | - Zhijun Yin
- Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, TN 37212, USA
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, USA
| | - Ellen Wright Clayton
- Center for Biomedical Ethics and Society, Vanderbilt University Medical Center, Nashville, TN 37203, USA
- School of Law, Vanderbilt University, Nashville, TN 37203, USA
- Department of Pediatrics, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Murat Kantarcioglu
- Department of Computer Science, University of Texas at Dallas, Richardson, TX 75080, USA
- Institute for Quantitative Social Science, Harvard University, Cambridge, MA 02138, USA
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, Berkeley, CA 94720, USA
| | - Bradley A. Malin
- Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, TN 37212, USA
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, USA
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN 37203, USA
| |
Collapse
|
28
|
Dupras C, Bunnik EM. Toward a Framework for Assessing Privacy Risks in Multi-Omic Research and Databases. THE AMERICAN JOURNAL OF BIOETHICS : AJOB 2021; 21:46-64. [PMID: 33433298 DOI: 10.1080/15265161.2020.1863516] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
While the accumulation and increased circulation of genomic data have captured much attention over the past decade, privacy risks raised by the diversification and integration of omics have been largely overlooked. In this paper, we propose the outline of a framework for assessing privacy risks in multi-omic research and databases. Following a comparison of privacy risks associated with genomic and epigenomic data, we dissect ten privacy risk-impacting omic data properties that affect either the risk of re-identification of research participants, or the sensitivity of the information potentially conveyed by biological data. We then propose a three-step approach for the assessment of privacy risks in the multi-omic era. Thus, we lay grounds for a data property-based, 'pan-omic' approach that moves away from genetic exceptionalism. We conclude by inviting our peers to refine these theoretical foundations, put them to the test in their respective fields, and translate our approach into practical guidance.
Collapse
|
29
|
Venkatesaramani R, Malin BA, Vorobeychik Y. Re-identification of individuals in genomic datasets using public face images. SCIENCE ADVANCES 2021; 7:eabg3296. [PMID: 34788101 PMCID: PMC8597988 DOI: 10.1126/sciadv.abg3296] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/14/2023]
Abstract
Recent studies suggest that genomic data can be matched to images of human faces, raising the concern that genomic data can be re-identified with relative ease. However, such investigations assume access to well-curated images, which are rarely available in practice and challenging to derive from photos not generated in a controlled laboratory setting. In this study, we reconsider re-identification risk and find that, for most individuals, the actual risk posed by linkage attacks to typical face images is substantially smaller than claimed in prior investigations. Moreover, we show that only a small amount of well-calibrated noise, imperceptible to humans, can be added to images to markedly reduce such risk. The results of this investigation create an opportunity to create image filters that enable individuals to have better control over re-identification risk based on linkage.
Collapse
Affiliation(s)
- Rajagopal Venkatesaramani
- Department of Computer Science and Engineering, Washington University in St. Louis, 1 Brookings Dr., St. Louis, MO 63108, USA
- Corresponding author.
| | - Bradley A. Malin
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Suite 1475, 2525 West End Avenue, Nashville, TN 37203, USA
- Department of Biostatistics, Vanderbilt University Medical Center, Suite 1475, 2525 West End Avenue, Nashville, TN 37203, USA
- Department of Electrical Engineering and Computer Science, Vanderbilt University, 2201 West End Ave, Nashville, TN 37235, USA
| | - Yevgeniy Vorobeychik
- Department of Computer Science and Engineering, Washington University in St. Louis, 1 Brookings Dr., St. Louis, MO 63108, USA
| |
Collapse
|
30
|
Warmerdam R, Lanting P, Deelen P, Franke L. Idéfix: identifying accidental sample mix-ups in biobanks using polygenic scores. Bioinformatics 2021; 38:1059-1066. [PMID: 34792549 PMCID: PMC8796367 DOI: 10.1093/bioinformatics/btab783] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2021] [Revised: 10/07/2021] [Accepted: 11/15/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Identifying sample mix-ups in biobanks is essential to allow the repurposing of genetic data for clinical pharmacogenetics. Pharmacogenetic advice based on the genetic information of another individual is potentially harmful. Existing methods for identifying mix-ups are limited to datasets in which additional omics data (e.g. gene expression) is available. Cohorts lacking such data can only use sex, which can reveal only half of the mix-ups. Here, we describe Idéfix, a method for the identification of accidental sample mix-ups in biobanks using polygenic scores. RESULTS In the Lifelines population-based biobank, we calculated polygenic scores (PGSs) for 25 traits for 32 786 participants. We then applied Idéfix to compare the actual phenotypes to PGSs, and to use the relative discordance that is expected for mix-ups, compared to correct samples. In a simulation, using induced mix-ups, Idéfix reaches an AUC of 0.90 using 25 polygenic scores and sex. This is a substantial improvement over using only sex, which has an AUC of 0.75. Subsequent simulations present Idéfix's potential in varying datasets with more powerful PGSs. This suggests its performance will likely improve when more highly powered GWASs for commonly measured traits will become available. Idéfix can be used to identify a set of high-quality participants for whom it is very unlikely that they reflect sample mix-ups, and for these participants we can use genetic data for clinical purposes, such as pharmacogenetic profiles. For instance, in Lifelines, we can select 34.4% of participants, reducing the sample mix-up rate from 0.15% to 0.01%. AVAILABILITYAND IMPLEMENTATION Idéfix is freely available at https://github.com/molgenis/systemsgenetics/wiki/Idefix. The individual-level data that support the findings were obtained from the Lifelines biobank under project application number ov16_0365. Data is made available upon reasonable request submitted to the LifeLines Research office (research@lifelines.nl, https://www.lifelines.nl/researcher/how-to-apply/apply-here). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Robert Warmerdam
- Department of Genetics, University Medical Center Groningen, University of Groningen, 9700AB Groningen, The Netherlands
| | - Pauline Lanting
- Department of Genetics, University Medical Center Groningen, University of Groningen, 9700AB Groningen, The Netherlands
| | | | - Patrick Deelen
- Department of Genetics, University Medical Center Groningen, University of Groningen, 9700AB Groningen, The Netherlands,Department of Genetics, University Medical Center Utrecht, 3508GA Utrecht, The Netherlands
| | | |
Collapse
|
31
|
Bu D, Wang X, Tang H. Haplotype-based membership inference from summary genomic data. Bioinformatics 2021; 37:i161-i168. [PMID: 34252973 PMCID: PMC8275351 DOI: 10.1093/bioinformatics/btab305] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Motivation The availability of human genomic data, together with the enhanced capacity to process them, is leading to transformative technological advances in biomedical science and engineering. However, the public dissemination of such data has been difficult due to privacy concerns. Specifically, it has been shown that the presence of a human subject in a case group can be inferred from the shared summary statistics of the group, e.g. the allele frequencies, or even the presence/absence of genetic variants (e.g. shared by the Beacon project) in the group. These methods rely on the availability of the target’s genome, i.e. the DNA profile of a target human subject, and thus are often referred to as the membership inference method. Results In this article, we demonstrate the haplotypes, i.e. the sequence of single nucleotide variations (SNVs) showing strong genetic linkages in human genome databases, may be inferred from the summary of genomic data without using a target’s genome. Furthermore, novel haplotypes that did not appear in the database may be reconstructed solely from the allele frequencies from genomic datasets. These reconstructed haplotypes can be used for a haplotype-based membership inference algorithm to identify target subjects in a case group with greater power than existing methods based on SNVs. Availability and implementation The implementation of the membership inference algorithms is available at https://github.com/diybu/Haplotype-based-membership-inferences.
Collapse
Affiliation(s)
- Diyue Bu
- Department of Informatics, Luddy School of Informatics, Computing, and Engineering, Indiana University, Bloomington, IN 47408, USA
| | - Xiaofeng Wang
- Department of Informatics, Luddy School of Informatics, Computing, and Engineering, Indiana University, Bloomington, IN 47408, USA
| | - Haixu Tang
- Department of Informatics, Luddy School of Informatics, Computing, and Engineering, Indiana University, Bloomington, IN 47408, USA
| |
Collapse
|
32
|
Kukla-Bartoszek M, Teisseyre P, Pośpiech E, Karłowska-Pik J, Zieliński P, Woźniak A, Boroń M, Dąbrowski M, Zubańska M, Jarosz A, Płoski R, Grzybowski T, Spólnicka M, Mielniczuk J, Branicki W. Searching for improvements in predicting human eye colour from DNA. Int J Legal Med 2021; 135:2175-2187. [PMID: 34259936 PMCID: PMC8523394 DOI: 10.1007/s00414-021-02645-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Accepted: 06/17/2021] [Indexed: 01/29/2023]
Abstract
Increasing understanding of human genome variability allows for better use of the predictive potential of DNA. An obvious direct application is the prediction of the physical phenotypes. Significant success has been achieved, especially in predicting pigmentation characteristics, but the inference of some phenotypes is still challenging. In search of further improvements in predicting human eye colour, we conducted whole-exome (enriched in regulome) sequencing of 150 Polish samples to discover new markers. For this, we adopted quantitative characterization of eye colour phenotypes using high-resolution photographic images of the iris in combination with DIAT software analysis. An independent set of 849 samples was used for subsequent predictive modelling. Newly identified candidates and 114 additional literature-based selected SNPs, previously associated with pigmentation, and advanced machine learning algorithms were used. Whole-exome sequencing analysis found 27 previously unreported candidate SNP markers for eye colour. The highest overall prediction accuracies were achieved with LASSO-regularized and BIC-based selected regression models. A new candidate variant, rs2253104, located in the ARFIP2 gene and identified with the HyperLasso method, revealed predictive potential and was included in the best-performing regression models. Advanced machine learning approaches showed a significant increase in sensitivity of intermediate eye colour prediction (up to 39%) compared to 0% obtained for the original IrisPlex model. We identified a new potential predictor of eye colour and evaluated several widely used advanced machine learning algorithms in predictive analysis of this trait. Our results provide useful hints for developing future predictive models for eye colour in forensic and anthropological studies.
Collapse
Affiliation(s)
- Magdalena Kukla-Bartoszek
- Faculty of Biochemistry, Biophysics and Biotechnology, Jagiellonian University, Kraków, Poland. .,Malopolska Centre of Biotechnology of the Jagiellonian University, Kraków, Poland.
| | - Paweł Teisseyre
- Institute of Computer Science, Polish Academy of Sciences, Warsaw, Poland.,Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland
| | - Ewelina Pośpiech
- Malopolska Centre of Biotechnology of the Jagiellonian University, Kraków, Poland
| | - Joanna Karłowska-Pik
- Faculty of Mathematics and Computer Science, Nicolaus Copernicus University in Toruń, Toruń, Poland
| | - Piotr Zieliński
- Institute of Environmental Sciences, Faculty of Biology, Jagiellonian University, Kraków, Poland
| | - Anna Woźniak
- Central Forensic Laboratory of the Police, Warsaw, Poland
| | - Michał Boroń
- Central Forensic Laboratory of the Police, Warsaw, Poland
| | - Michał Dąbrowski
- Laboratory of Bioinformatics, Neurobiology Centre, Nencki Institute of Experimental Biology of Polish Academy of Sciences, Warsaw, Poland
| | - Magdalena Zubańska
- Faculty of Law and Administration, Department of Criminology and Forensic Sciences, University of Warmia and Mazury in Olsztyn, Olsztyn, Poland.,Unit of Forensic Sciences, Faculty of Internal Security, Police Academy, Szczytno, Poland
| | - Agata Jarosz
- Malopolska Centre of Biotechnology of the Jagiellonian University, Kraków, Poland
| | - Rafał Płoski
- Department of Medical Genetics, Warsaw Medical University, Warsaw, Poland
| | - Tomasz Grzybowski
- Division of Molecular and Forensic Genetics, Department of Forensic Medicine, Nicolaus Copernicus University in Toruń, Collegium Medicum in Bydgoszcz, Bydgoszcz, Poland
| | | | - Jan Mielniczuk
- Institute of Computer Science, Polish Academy of Sciences, Warsaw, Poland.,Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland
| | - Wojciech Branicki
- Malopolska Centre of Biotechnology of the Jagiellonian University, Kraków, Poland. .,Central Forensic Laboratory of the Police, Warsaw, Poland.
| |
Collapse
|
33
|
You C, Zhou Z, Wen J, Li Y, Pang CH, Du H, Wang Z, Zhou XH, King DA, Liu CT, Huang J. Polygenic Scores and Parental Predictors: An Adult Height Study Based on the United Kingdom Biobank and the Framingham Heart Study. Front Genet 2021; 12:669441. [PMID: 34093660 PMCID: PMC8176283 DOI: 10.3389/fgene.2021.669441] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Accepted: 03/26/2021] [Indexed: 11/19/2022] Open
Abstract
Human height is a polygenic trait, influenced by a large number of genomic loci. In the pre-genomic era, height prediction was based largely on parental height. More recent predictions of human height have made great strides by integrating genotypic data from large biobanks with improved statistical techniques. Nevertheless, recent studies have not leveraged parental height, an added feature that we hypothesized would offer complementary predictive value. In this study, we assessed the predictive power of polygenic risk scores (PRS) combined with the traditional parental height predictors. Our study analyzed genotypic data and parental height from 1,071 trios from the United Kingdom Biobank and 444 trios from the Framingham Heart Study. We explored a series of statistical models to fully evaluate the performance of several PRS constructed together with parental information and proposed a model we call PRS++ that includes gender, parental height, and PRSs of parents and proband. Our estimate of height with an R2 of ∼0.82 is, to our knowledge, the most accurate estimate yet achieved for predicting human adult height. Without parental information, the R2 from the best PRS-driven model is ∼0.73. In summary, using adult height prediction as an example, we demonstrated that traditional predictors still play important roles and merit integration into the current trends of intensive PRS approaches.
Collapse
Affiliation(s)
- Chong You
- Department of Biostatistics, School of Public Health, Peking University, Beijing, China
| | - Zhenwei Zhou
- Department of Biostatistics, Boston University, Boston, MA, United States
| | - Jia Wen
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| | - Yun Li
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
| | - Cheng Heng Pang
- Faculty of Science and Engineering, University of Nottingham Ningbo, Ningbo, China
| | - Haoyang Du
- Department of Computer Science, School of Art and Science, Wake Forest University, Wake Forest, NC, United States
| | - Ziwen Wang
- Department of Bioengineering, School of Engineering, Rice University, Houston, TX, United States
| | - Xiao-Hua Zhou
- Department of Biostatistics, School of Public Health, Peking University, Beijing, China
| | - Daniel A King
- Department of Medicine, Stanford University, Palo Alto, CA, United States
| | - Ching-Ti Liu
- Department of Biostatistics, Boston University, Boston, MA, United States
| | - Jie Huang
- Department of Global Health, School of Public Health, Peking University, Beijing, China.,Institute for Global Health and Development, Peking University, Peking, China
| |
Collapse
|
34
|
Ayoz K, Ayday E, Cicek AE. Genome Reconstruction Attacks Against Genomic Data-Sharing Beacons. PROCEEDINGS ON PRIVACY ENHANCING TECHNOLOGIES. PRIVACY ENHANCING TECHNOLOGIES SYMPOSIUM 2021; 2021:28-48. [PMID: 34746296 PMCID: PMC8570374 DOI: 10.2478/popets-2021-0036] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Sharing genome data in a privacy-preserving way stands as a major bottleneck in front of the scientific progress promised by the big data era in genomics. A community-driven protocol named genomic data-sharing beacon protocol has been widely adopted for sharing genomic data. The system aims to provide a secure, easy to implement, and standardized interface for data sharing by only allowing yes/no queries on the presence of specific alleles in the dataset. However, beacon protocol was recently shown to be vulnerable against membership inference attacks. In this paper, we show that privacy threats against genomic data sharing beacons are not limited to membership inference. We identify and analyze a novel vulnerability of genomic data-sharing beacons: genome reconstruction. We show that it is possible to successfully reconstruct a substantial part of the genome of a victim when the attacker knows the victim has been added to the beacon in a recent update. In particular, we show how an attacker can use the inherent correlations in the genome and clustering techniques to run such an attack in an efficient and accurate way. We also show that even if multiple individuals are added to the beacon during the same update, it is possible to identify the victim's genome with high confidence using traits that are easily accessible by the attacker (e.g., eye color or hair type). Moreover, we show how a reconstructed genome using a beacon that is not associated with a sensitive phenotype can be used for membership inference attacks to beacons with sensitive phenotypes (e.g., HIV+). The outcome of this work will guide beacon operators on when and how to update the content of the beacon and help them (along with the beacon participants) make informed decisions.
Collapse
|
35
|
Jacoba CMP, Celi LA, Silva PS. Biomarkers for Progression in Diabetic Retinopathy: Expanding Personalized Medicine through Integration of AI with Electronic Health Records. Semin Ophthalmol 2021; 36:250-257. [PMID: 33734908 DOI: 10.1080/08820538.2021.1893351] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
The goal of personalized diabetes eye care is to accurately predict in real-time the risk of diabetic retinopathy (DR) progression and visual loss. The use of electronic health records (EHR) provides a platform for artificial intelligence (AI) algorithms that predict DR progression to be incorporated into clinical decision-making. By implementing an algorithm on data points from each patient, their risk for retinopathy progression and visual loss can be modeled, allowing them to receive timely treatment. Data can guide algorithms to create models for disease and treatment that may pave the way for more personalized care. Currently, there exist numerous challenges that need to be addressed before reliably building and deploying AI algorithms, including issues with data quality, privacy, intellectual property, and informed consent.
Collapse
Affiliation(s)
- Cris Martin P Jacoba
- Joslin Diabetes Centre, Beetham Eye Institute, Boston, MA, USA.,Department of Ophthalmology, Harvard Medical School, Boston, MA, USA
| | - Leo Anthony Celi
- Division of Pulmonary, Critical Care and Pain Medicine, Beth Israel Deaconess Medical Center, Boston, MA, USA.,Laboratory for Computational Physiology, Harvard-MIT Health Sciences and Technology Division, Cambridge, MA, USA.,Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Paolo S Silva
- Joslin Diabetes Centre, Beetham Eye Institute, Boston, MA, USA.,Department of Ophthalmology, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
36
|
Scheibner J, Raisaro JL, Troncoso-Pastoriza JR, Ienca M, Fellay J, Vayena E, Hubaux JP. Revolutionizing Medical Data Sharing Using Advanced Privacy-Enhancing Technologies: Technical, Legal, and Ethical Synthesis. J Med Internet Res 2021; 23:e25120. [PMID: 33629963 PMCID: PMC7952236 DOI: 10.2196/25120] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2020] [Revised: 01/06/2021] [Accepted: 01/16/2021] [Indexed: 12/03/2022] Open
Abstract
Multisite medical data sharing is critical in modern clinical practice and medical research. The challenge is to conduct data sharing that preserves individual privacy and data utility. The shortcomings of traditional privacy-enhancing technologies mean that institutions rely upon bespoke data sharing contracts. The lengthy process and administration induced by these contracts increases the inefficiency of data sharing and may disincentivize important clinical treatment and medical research. This paper provides a synthesis between 2 novel advanced privacy-enhancing technologies-homomorphic encryption and secure multiparty computation (defined together as multiparty homomorphic encryption). These privacy-enhancing technologies provide a mathematical guarantee of privacy, with multiparty homomorphic encryption providing a performance advantage over separately using homomorphic encryption or secure multiparty computation. We argue multiparty homomorphic encryption fulfills legal requirements for medical data sharing under the European Union's General Data Protection Regulation which has set a global benchmark for data protection. Specifically, the data processed and shared using multiparty homomorphic encryption can be considered anonymized data. We explain how multiparty homomorphic encryption can reduce the reliance upon customized contractual measures between institutions. The proposed approach can accelerate the pace of medical research while offering additional incentives for health care and research institutes to employ common data interoperability standards.
Collapse
Affiliation(s)
- James Scheibner
- Health Ethics and Policy Laboratory, Department of Health Sciences and Technology, Eidgenössische Technische Hochschule Zürich, Zürich, Switzerland
- College of Business, Government and Law, Flinders University, Adelaide, Australia
| | - Jean Louis Raisaro
- Precision Medicine Unit, Lausanne University Hospital, Lausanne, Switzerland
- Data Science Group, Lausanne University Hospital, Lausanne, Switzerland
| | - Juan Ramón Troncoso-Pastoriza
- Laboratory for Data Security, School of Computer and Communication Sciences, École polytechnique fédérale de Lausanne, Lausanne, Switzerland
| | - Marcello Ienca
- Health Ethics and Policy Laboratory, Department of Health Sciences and Technology, Eidgenössische Technische Hochschule Zürich, Zürich, Switzerland
| | - Jacques Fellay
- Precision Medicine Unit, Lausanne University Hospital, Lausanne, Switzerland
- School of Life Sciences, École polytechnique fédérale de Lausanne, Lausanne, Switzerland
- Host-Pathogen Genomics Laboratory, Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Effy Vayena
- Health Ethics and Policy Laboratory, Department of Health Sciences and Technology, Eidgenössische Technische Hochschule Zürich, Zürich, Switzerland
| | - Jean-Pierre Hubaux
- Laboratory for Data Security, School of Computer and Communication Sciences, École polytechnique fédérale de Lausanne, Lausanne, Switzerland
| |
Collapse
|
37
|
Chavarria-Soley G, Francis-Cartin F, Jimenez-Gonzalez F, Ávila-Aguirre A, Castro-Gomez MJ, Robarts L, Middleton A, Raventós H. Attitudes of Costa Rican individuals towards donation of personal genetic data for research. Per Med 2021; 18:141-152. [PMID: 33576268 PMCID: PMC8010325 DOI: 10.2217/pme-2020-0113] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Aim: We explore attitudes from the public in Costa Rica regarding willingness to donate DNA data for research. Materials & methods: A total of 224 Costa Rican individuals answered the anonymous online survey 'Your DNA, Your Say'. It covers attitudes toward DNA and medical data donation, trust in research professionals and concerns about consequences of reidentification. Results & conclusion: Most individuals (89%) are willing to donate their information for research purposes. When confronted with different potential uses of their data, participants are significantly less likely to donate data to for-profit researchers (34% willingness to donate). The most frequently cited concerns regarding donation of genetic data relate to possible discrimination by health/life insurance companies and employers. For the participants in the survey, the most trusted professionals are their own medical doctor and nonprofit researchers from their country. This is the first study regarding attitudes toward genetic data donation in Costa Rica.
Collapse
Affiliation(s)
- Gabriela Chavarria-Soley
- Escuela de Biología/Universidad de Costa Rica/San José, Costa Rica.,Centro de Investigación en Biología Celular y Molecular/Universidad de Costa Rica/San José, Costa Rica
| | - Fernanda Francis-Cartin
- Escuela de Biología/Universidad de Costa Rica/San José, Costa Rica.,Centro de Investigación en Biología Celular y Molecular/Universidad de Costa Rica/San José, Costa Rica
| | - Fabiola Jimenez-Gonzalez
- Centro de Investigación en Biología Celular y Molecular/Universidad de Costa Rica/San José, Costa Rica
| | - Alejandro Ávila-Aguirre
- Centro de Investigación en Biología Celular y Molecular/Universidad de Costa Rica/San José, Costa Rica
| | - Maria Jose Castro-Gomez
- Centro de Investigación en Biología Celular y Molecular/Universidad de Costa Rica/San José, Costa Rica
| | - Lauren Robarts
- Society & Ethics Research Group, Connecting Science, Wellcome Genome Campus, Cambridge, UK
| | - Anna Middleton
- Society & Ethics Research Group, Connecting Science, Wellcome Genome Campus, Cambridge, UK.,Faculty of Education, University of Cambridge
| | - Henriette Raventós
- Escuela de Biología/Universidad de Costa Rica/San José, Costa Rica.,Centro de Investigación en Biología Celular y Molecular/Universidad de Costa Rica/San José, Costa Rica
| |
Collapse
|
38
|
Schumacher GJ, Sawaya S, Nelson D, Hansen AJ. Genetic Information Insecurity as State of the Art. Front Bioeng Biotechnol 2020; 8:591980. [PMID: 33381496 PMCID: PMC7768984 DOI: 10.3389/fbioe.2020.591980] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2020] [Accepted: 11/16/2020] [Indexed: 11/16/2022] Open
Abstract
Genetic information is being generated at an increasingly rapid pace, offering advances in science and medicine that are paralleled only by the threats and risk present within the responsible systems. Human genetic information is identifiable and contains sensitive information, but genetic information security is only recently gaining attention. Genetic data is generated in an evolving and distributed cyber-physical system, with multiple subsystems that handle information and multiple partners that rely and influence the whole ecosystem. This paper characterizes a general genetic information system from the point of biological material collection through long-term data sharing, storage and application in the security context. While all biotechnology stakeholders and ecosystems are valuable assets to the bioeconomy, genetic information systems are particularly vulnerable with great potential for harm and misuse. The security of post-analysis phases of data dissemination and storage have been focused on by others, but the security of wet and dry laboratories is also challenging due to distributed devices and systems that are not designed nor implemented with security in mind. Consequently, industry standards and best operational practices threaten the security of genetic information systems. Extensive development of laboratory security will be required to realize the potential of this emerging field while protecting the bioeconomy and all of its stakeholders.
Collapse
Affiliation(s)
- Garrett J. Schumacher
- GeneInfoSec Inc., Boulder, CO, United States
- Technology, Cybersecurity and Policy Program, College of Engineering and Applied Science, University of Colorado Boulder, Boulder, CO, United States
- Department of Computer Science, College of Engineering and Applied Science, University of Colorado Boulder, Boulder, CO, United States
| | | | | | - Aaron J. Hansen
- Technology, Cybersecurity and Policy Program, College of Engineering and Applied Science, University of Colorado Boulder, Boulder, CO, United States
- Department of Computer Science, College of Engineering and Applied Science, University of Colorado Boulder, Boulder, CO, United States
| |
Collapse
|
39
|
Karimi S, Jiang X, Dolin RH, Kim M, Boxwala A. A secure system for genomics clinical decision support. J Biomed Inform 2020; 112:103602. [PMID: 33080397 PMCID: PMC8577277 DOI: 10.1016/j.jbi.2020.103602] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2020] [Revised: 09/07/2020] [Accepted: 10/12/2020] [Indexed: 11/26/2022]
Abstract
We developed a prototype genomic archiving and communications system to securely store genome data and provide clinical decision support (CDS). This system operates on a client-server model. The client encrypts the data, and the server stores data and performs the computations necessary for CDS. Computations are directly performed on encrypted data, and the client decrypts results. The server cannot decrypt inputs or outputs, which provides strong guarantees of security. We have validated our system with three genomics-based CDS applications. The results demonstrate that it is possible to resolve a long-standing dilemma in genomic data privacy and accessibility, by using a principled cryptographical framework and a mathematical representation of genome data and CDS questions.
Collapse
Affiliation(s)
| | - Xiaoqian Jiang
- UT Health School of Biomedical Informatics, Houston, TX, United States
| | | | - Miran Kim
- UT Health School of Biomedical Informatics, Houston, TX, United States
| | - Aziz Boxwala
- Elimu Informatics Inc., Richmond, CA, United States
| |
Collapse
|
40
|
Abstract
Many questions can be explored thanks to whole-genome data. The aim of this study was to overcome their main limits, software availability and database accuracy, and estimate the feasibility of red blood cell (RBC) antigen typing from whole-genome sequencing (WGS) data. We analyzed whole-genome data from 79 individuals for HLA-DRB1 and 9 RBC antigens. Whole-genome sequencing data was analyzed with software allowing phasing of variable positions to define alleles or haplotypes and validated for HLA typing from next-generation sequencing data. A dedicated database was set up with 1648 variable positions analyzed in KEL (KEL), ACKR1 (FY), SLC14A1 (JK), ACHE (YT), ART4 (DO), AQP1 (CO), CD44 (IN), SLC4A1 (DI) and ICAM4 (LW). Whole-genome sequencing typing was compared to that previously obtained by amplicon-based monoallelic sequencing and by SNaPshot analysis. Whole-genome sequencing data were also explored for other alleles. Our results showed 93% of concordance for blood group polymorphisms and 91% for HLA-DRB1. Incorrect typing and unresolved results confirm that WGS should be considered reliable with read depths strictly above 15x. Our results supported that RBC antigen typing from WGS is feasible but requires improvements in read depth for SNV polymorphisms typing accuracy. We also showed the potential for WGS in screening donors with rare blood antigens, such as weak JK alleles. The development of WGS analysis in immunogenetics laboratories would offer personalized care in the management of RBC disorders.
Collapse
|
41
|
Balanovska E, Lukianova E, Kagazezheva J, Maurer A, Leybova N, Agdzhoyan A, Gorin I, Petrushenko V, Zhabagin M, Pylev V, Kostryukova E, Balanovsky O. Optimizing the genetic prediction of the eye and hair color for North Eurasian populations. BMC Genomics 2020; 21:527. [PMID: 32912208 PMCID: PMC7488246 DOI: 10.1186/s12864-020-06923-1] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2020] [Accepted: 07/17/2020] [Indexed: 01/05/2023] Open
Abstract
Background Predicting the eye and hair color from genotype became an established and widely used tool in forensic genetics, as well as in studies of ancient human populations. However, the accuracy of this tool has been verified on the West and Central Europeans only, while populations from border regions between Europe and Asia (like Caucasus and Ural) also carry the light pigmentation phenotypes. Results We phenotyped 286 samples collected across North Eurasia, genotyped them by the standard HIrisPlex-S markers and found that predictive power in Caucasus/Ural/West Siberian populations is reasonable but lower than that in West Europeans. As these populations have genetic ancestries different from that of West Europeans, we hypothesized they may carry a somewhat different allele spectrum. Thus, for all samples we performed the exome sequencing additionally enriched with the 53 genes and intergenic regions known to be associated with the eye/hair color. Our association analysis replicated the importance of the key previously known SNPs but also identified five new markers whose eye color prediction power for the studied populations is compatible with the two major previously well-known SNPs. Four out of these five SNPs lie within the HERС2 gene and the fifth in the intergenic region. These SNPs are found at high frequencies in most studied populations. The released dataset of exomes from Russian populations can be further used for population genetic and medical genetic studies. Conclusions This study demonstrated that precision of the established systems for eye/hair color prediction from a genotype is slightly lower for the populations from the border regions between Europe and Asia that for the West Europeans. However, this precision can be improved if some newly revealed predictive SNPs are added into the panel. We discuss that the replication of these pigmentation-associated SNPs on the independent North Eurasian sample is needed in the future studies.
Collapse
Affiliation(s)
- Elena Balanovska
- Research Centre for Medical Genetics, Moscow, Russia.,Biobank of North Eurasia, Moscow, Russia
| | | | - Janet Kagazezheva
- Research Centre for Medical Genetics, Moscow, Russia.,Vavilov Institute of General Genetics, Moscow, Russia.,Krasnodar State Medical University, Krasnodar, Russia
| | - Andrey Maurer
- Research Institute and Museum of Anthropology, Lomonosov Moscow State University, Moscow, Russia
| | - Natalia Leybova
- Institute of Ethnology and Anthropology of Russian Academy of Sciences, Moscow, Russia
| | - Anastasiya Agdzhoyan
- Research Centre for Medical Genetics, Moscow, Russia.,Vavilov Institute of General Genetics, Moscow, Russia
| | - Igor Gorin
- Vavilov Institute of General Genetics, Moscow, Russia.,Moscow Institute of Physics and Technology, Moscow, Russia
| | - Valeria Petrushenko
- Vavilov Institute of General Genetics, Moscow, Russia.,Moscow Institute of Physics and Technology, Moscow, Russia
| | - Maxat Zhabagin
- National Center for Biotechnology, Nursultan, Kazakhstan
| | | | - Elena Kostryukova
- Federal Research and Clinical Centre of Physical-Chemical Medicine, Moscow, Russia
| | - Oleg Balanovsky
- Research Centre for Medical Genetics, Moscow, Russia. .,Biobank of North Eurasia, Moscow, Russia. .,Vavilov Institute of General Genetics, Moscow, Russia.
| |
Collapse
|
42
|
Abstract
The prediction of a person's aspect from analysis of an anonymous DNA sample has made significant progress in the last decade. Pigmentation (eyes, hair and, more recently, skin colour) can now be determined with good accuracy; face shape is still not amenable to prediction (except, in general lines, from ancestry). Age can apparently also be determined from methylation profiles. Police forces are, understandably, very interested in this technology, with a tendency to over-estimate its accuracy. Legislation varies greatly, with some nations opting for complete prohibition (Germany) and others allowing wide application of the approach (United Kingdom).
Collapse
Affiliation(s)
- Bertrand Jordan
- UMR 7268 ADÉS, Aix-Marseille, Université /EFS/CNRS ; CoReBio PACA, case 901, Parc scientifique de Luminy, 13288 Marseille Cedex 09, France
| |
Collapse
|
43
|
Carpov S, Gama N, Georgieva M, Troncoso-Pastoriza JR. Privacy-preserving semi-parallel logistic regression training with fully homomorphic encryption. BMC Med Genomics 2020; 13:88. [PMID: 32693814 PMCID: PMC7372765 DOI: 10.1186/s12920-020-0723-0] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Background Privacy-preserving computations on genomic data, and more generally on medical data, is a critical path technology for innovative, life-saving research to positively and equally impact the global population. It enables medical research algorithms to be securely deployed in the cloud because operations on encrypted genomic databases are conducted without revealing any individual genomes. Methods for secure computation have shown significant performance improvements over the last several years. However, it is still challenging to apply them on large biomedical datasets. Methods The HE Track of iDash 2018 competition focused on solving an important problem in practical machine learning scenarios, where a data analyst that has trained a regression model (both linear and logistic) with a certain set of features, attempts to find all features in an encrypted database that will improve the quality of the model. Our solution is based on the hybrid framework Chimera that allows for switching between different families of fully homomorphic schemes, namely TFHE and HEAAN. Results Our solution is one of the finalist of Track 2 of iDash 2018 competition. Among the submitted solutions, ours is the only bootstrapped approach that can be applied for different sets of parameters without re-encrypting the genomic database, making it practical for real-world applications. Conclusions This is the first step towards the more general feature selection problem across large encrypted databases.
Collapse
Affiliation(s)
- Sergiu Carpov
- CEA, LIST, Point Courier 172, Gif-sur-Yvette cedex, 91191, France.,Inpher, Innovation Park A, Lausanne, CH-1015, Switzerland
| | - Nicolas Gama
- Inpher, Innovation Park A, Lausanne, CH-1015, Switzerland
| | - Mariya Georgieva
- Inpher, Innovation Park A, Lausanne, CH-1015, Switzerland. .,EPFL, Route Cantonal, Lausanne, CH-1015, Switzerland.
| | | |
Collapse
|
44
|
Robust genome-wide ancestry inference for heterogeneous datasets: illustrated using the 1,000 genome project with 3D facial images. Sci Rep 2020; 10:11850. [PMID: 32678112 PMCID: PMC7367291 DOI: 10.1038/s41598-020-68259-w] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2020] [Accepted: 06/19/2020] [Indexed: 11/17/2022] Open
Abstract
Estimates of individual-level genomic ancestry are routinely used in human genetics, and related fields. The analysis of population structure and genomic ancestry can yield insights in terms of modern and ancient populations, allowing us to address questions regarding admixture, and the numbers and identities of the parental source populations. Unrecognized population structure is also an important confounder to correct for in genome-wide association studies. However, it remains challenging to work with heterogeneous datasets from multiple studies collected by different laboratories with diverse genotyping and imputation protocols. This work presents a new approach and an accompanying open-source toolbox that facilitates a robust integrative analysis for population structure and genomic ancestry estimates for heterogeneous datasets. We show robustness against individual outliers and different protocols for the projection of new samples into a reference ancestry space, and the ability to reveal and adjust for population structure in a simulated case–control admixed population. Given that visually evident and easily recognizable patterns of human facial characteristics co-vary with genomic ancestry, and based on the integration of three different sources of genome data, we generate average 3D faces to illustrate genomic ancestry variations within the 1,000 Genome project and for eight ancient-DNA profiles, respectively.
Collapse
|
45
|
Bonomi L, Huang Y, Ohno-Machado L. Privacy challenges and research opportunities for genomic data sharing. Nat Genet 2020; 52:646-654. [PMID: 32601475 PMCID: PMC7761157 DOI: 10.1038/s41588-020-0651-0] [Citation(s) in RCA: 78] [Impact Index Per Article: 19.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2019] [Accepted: 05/22/2020] [Indexed: 12/17/2022]
Abstract
The sharing of genomic data holds great promise in advancing precision medicine and providing personalized treatments and other types of interventions. However, these opportunities come with privacy concerns, and data misuse could potentially lead to privacy infringement for individuals and their blood relatives. With the rapid growth and increased availability of genomic datasets, understanding the current genome privacy landscape and identifying the challenges in developing effective privacy-protecting solutions are imperative. In this work, we provide an overview of major privacy threats identified by the research community and examine the privacy challenges in the context of emerging direct-to-consumer genetic-testing applications. We additionally present general privacy-protection techniques for genomic data sharing and their potential applications in direct-to-consumer genomic testing and forensic analyses. Finally, we discuss limitations in current privacy-protection methods, highlight possible mitigation strategies and suggest future research opportunities for advancing genomic data sharing.
Collapse
Affiliation(s)
- Luca Bonomi
- UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, USA.
| | - Yingxiang Huang
- UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, USA
| | - Lucila Ohno-Machado
- UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, USA
- Division of Health Services Research & Development, VA San Diego Healthcare System, San Diego, La Jolla, CA, USA
| |
Collapse
|
46
|
Abstract
BACKGROUND The shape of pig scapula is complex and is important for sow robustness and health. To better understand the relationship between 3D shape of the scapula and functional traits, it is necessary to build a model that explains most of the morphological variation between animals. This requires point correspondence, i.e. a map that explains which points represent the same piece of tissue among individuals. The objective of this study was to further develop an automated computational pipeline for the segmentation of computed tomography (CT) scans to incorporate 3D modelling of the scapula, and to develop a genetic prediction model for 3D morphology. RESULTS The surface voxels of the scapula were identified on 2143 CT-scanned pigs, and point correspondence was established by predicting the coordinates of 1234 semi-landmarks on each animal, using the coherent point drift algorithm. A subsequent principal component analysis showed that the first 10 principal components covered more than 80% of the total variation in 3D shape of the scapula. Using principal component scores as phenotypes in a genetic model, estimates of heritability ranged from 0.4 to 0.8 (with standard errors from 0.07 to 0.08). To validate the entire computational pipeline, a statistical model was trained to predict scapula shape based on marker genotype data. The mean prediction reliability averaged over the whole scapula was equal to 0.18 (standard deviation = 0.05) with a higher reliability in convex than in concave regions. CONCLUSIONS Estimates of heritability of the principal components were high and indicated that the computational pipeline that processes CT data to principal component phenotypes was associated with little error. Furthermore, we showed that it is possible to predict the 3D shape of scapula based on marker genotype data. Taken together, these results show that the proposed computational pipeline closes the gap between a point cloud representing the shape of an animal and its underlying genetic components.
Collapse
Affiliation(s)
- Øyvind Nordbø
- Norsvin SA, Storhamargata 44, 2317, Hamar, Norway.
- Geno SA, Storhamargata 44, 2317, Hamar, Norway.
| |
Collapse
|
47
|
Reconstructing Denisovan Anatomy Using DNA Methylation Maps. Cell 2020; 179:180-192.e10. [PMID: 31539495 DOI: 10.1016/j.cell.2019.08.035] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2019] [Revised: 05/24/2019] [Accepted: 08/20/2019] [Indexed: 12/26/2022]
Abstract
Denisovans are an extinct group of humans whose morphology remains unknown. Here, we present a method for reconstructing skeletal morphology using DNA methylation patterns. Our method is based on linking unidirectional methylation changes to loss-of-function phenotypes. We tested performance by reconstructing Neanderthal and chimpanzee skeletal morphologies and obtained >85% precision in identifying divergent traits. We then applied this method to the Denisovan and offer a putative morphological profile. We suggest that Denisovans likely shared with Neanderthals traits such as an elongated face and a wide pelvis. We also identify Denisovan-derived changes, such as an increased dental arch and lateral cranial expansion. Our predictions match the only morphologically informative Denisovan bone to date, as well as the Xuchang skull, which was suggested by some to be a Denisovan. We conclude that DNA methylation can be used to reconstruct anatomical features, including some that do not survive in the fossil record.
Collapse
|
48
|
Jones K, Daniels H, Heys S, Lacey A, Ford DV. Toward a Risk-Utility Data Governance Framework for Research Using Genomic and Phenotypic Data in Safe Havens: Multifaceted Review. J Med Internet Res 2020; 22:e16346. [PMID: 32412420 PMCID: PMC7260661 DOI: 10.2196/16346] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2019] [Revised: 01/13/2020] [Accepted: 01/30/2020] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND Research using genomic data opens up new insights into health and disease. Being able to use the data in association with health and administrative record data held in safe havens can multiply the benefits. However, there is much discussion about the use of genomic data with perceptions of particular challenges in doing so safely and effectively. OBJECTIVE This study aimed to work toward a risk-utility data governance framework for research using genomic and phenotypic data in an anonymized form for research in safe havens. METHODS We carried out a multifaceted review drawing upon data governance arrangements in published research, case studies of organizations working with genomic and phenotypic data, public views and expectations, and example studies using genomic and phenotypic data in combination. The findings were contextualized against a backdrop of legislative and regulatory requirements and used to create recommendations. RESULTS We proposed recommendations toward a risk-utility model with a flexible suite of controls to safeguard privacy and retain data utility for research. These were presented as overarching principles aligned to the core elements in the data sharing framework produced by the Global Alliance for Genomics and Health and as practical control measures distilled from published literature and case studies of operational safe havens to be applied as required at a project-specific level. CONCLUSIONS The recommendations presented can be used to contribute toward a proportionate data governance framework to promote the safe, socially acceptable use of genomic and phenotypic data in safe havens. They do not purport to eradicate risk but propose case-by-case assessment with transparency and accountability. If the risks are adequately understood and mitigated, there should be no reason that linked genomic and phenotypic data should not be used in an anonymized form for research in safe havens.
Collapse
Affiliation(s)
- Kerina Jones
- Population Data Science, Swansea University Medical School, Swansea University, Swansea, United Kingdom
| | - Helen Daniels
- Population Data Science, Swansea University Medical School, Swansea University, Swansea, United Kingdom
| | - Sharon Heys
- Population Data Science, Swansea University Medical School, Swansea University, Swansea, United Kingdom
| | - Arron Lacey
- Population Data Science, Swansea University Medical School, Swansea University, Swansea, United Kingdom
| | - David V Ford
- Population Data Science, Swansea University Medical School, Swansea University, Swansea, United Kingdom
| |
Collapse
|
49
|
Bunnik EM, Timmers M, Bolt IL. Ethical Issues in Research and Development of Epigenome-wide Technologies. Epigenet Insights 2020; 13:2516865720913253. [PMID: 32313869 PMCID: PMC7154555 DOI: 10.1177/2516865720913253] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2019] [Accepted: 02/14/2020] [Indexed: 12/27/2022] Open
Abstract
To date, few scholarly discussions on ethical implications of epigenetics and epigenomics technologies have focused on the current phase of research and development, in which researchers are confronted with real and practical ethical dilemmas. In this article, a responsible research and innovation approach, using interviews and an expert meeting, is applied to a case of epigenomic test development for cervical cancer screening. This article provides an overview of ethical issues presently facing epigenomics researchers and test developers, and discusses 3 sets of issues in depth: (1) informed consent; (2) communication with donors and/or research participants, and (3) privacy and publication of data and research results. Although these issues are familiar to research ethics, some aspects are new and most require reinterpretation in the context of epigenomics technologies. With this article, we aim to start a discussion of the practical ethical issues rising in research and development of epigenomic technologies and to offer guidance for researchers working in the field of epigenetic and epigenomic technology.
Collapse
Affiliation(s)
- Eline M Bunnik
- Department of Medical Ethics, Philosophy and History of Medicine, Erasmus MC, Rotterdam, The Netherlands
| | - Marjolein Timmers
- Department of Medical Ethics, Philosophy and History of Medicine, Erasmus MC, Rotterdam, The Netherlands
| | - Ineke Lle Bolt
- Department of Medical Ethics, Philosophy and History of Medicine, Erasmus MC, Rotterdam, The Netherlands
| |
Collapse
|
50
|
Mabee PM, Balhoff JP, Dahdul WM, Lapp H, Mungall CJ, Vision TJ. A Logical Model of Homology for Comparative Biology. Syst Biol 2020; 69:345-362. [PMID: 31596473 PMCID: PMC7672696 DOI: 10.1093/sysbio/syz067] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2019] [Revised: 09/20/2019] [Accepted: 09/26/2019] [Indexed: 01/09/2023] Open
Abstract
There is a growing body of research on the evolution of anatomy in a wide variety of organisms. Discoveries in this field could be greatly accelerated by computational methods and resources that enable these findings to be compared across different studies and different organisms and linked with the genes responsible for anatomical modifications. Homology is a key concept in comparative anatomy; two important types are historical homology (the similarity of organisms due to common ancestry) and serial homology (the similarity of repeated structures within an organism). We explored how to most effectively represent historical and serial homology across anatomical structures to facilitate computational reasoning. We assembled a collection of homology assertions from the literature with a set of taxon phenotypes for the skeletal elements of vertebrate fins and limbs from the Phenoscape Knowledgebase. Using seven competency questions, we evaluated the reasoning ramifications of two logical models: the Reciprocal Existential Axioms (REA) homology model and the Ancestral Value Axioms (AVA) homology model. The AVA model returned all user-expected results in addition to the search term and any of its subclasses. The AVA model also returns any superclass of the query term in which a homology relationship has been asserted. The REA model returned the user-expected results for five out of seven queries. We identify some challenges of implementing complete homology queries due to limitations of OWL reasoning. This work lays the foundation for homology reasoning to be incorporated into other ontology-based tools, such as those that enable synthetic supermatrix construction and candidate gene discovery. [Homology; ontology; anatomy; morphology; evolution; knowledgebase; phenoscape.].
Collapse
Affiliation(s)
- Paula M Mabee
- Department of Biology, University of South Dakota, 414 East Clark Street, Vermillion, SD 57069, USA
| | - James P Balhoff
- Renaissance Computing Institute, University of North Carolina, 100 Europa Drive, Suite 540, Chapel Hill, NC 27517, USA
| | - Wasila M Dahdul
- Department of Biology, University of South Dakota, 414 East Clark Street, Vermillion, SD 57069, USA
| | - Hilmar Lapp
- Center for Genomic and Computational Biology, Duke University, 101 Science Drive, Durham, NC 27708, USA
| | - Christopher J Mungall
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Todd J Vision
- Department of Biology and School of Information and Library Sciences, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599-3280, USA
| |
Collapse
|