1
|
Eidi M, Abdolalizadeh S, Moeini S, Garshasbi M, Zahiri J. 123VCF: an intuitive and efficient tool for filtering VCF files. BMC Bioinformatics 2024; 25:68. [PMID: 38350858 PMCID: PMC10865685 DOI: 10.1186/s12859-024-05661-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2023] [Accepted: 01/17/2024] [Indexed: 02/15/2024] Open
Abstract
BACKGROUND The advent of Next-Generation Sequencing (NGS) has catalyzed a paradigm shift in medical genetics, enabling the identification of disease-associated variants. However, the vast quantum of data produced by NGS necessitates a robust and dependable mechanism for filtering irrelevant variants. Annotation-based variant filtering, a pivotal step in this process, demands a profound understanding of the case-specific conditions and the relevant annotation instruments. To tackle this complex task, we sought to design an accessible, efficient and more importantly easy to understand variant filtering tool. RESULTS Our efforts culminated in the creation of 123VCF, a tool capable of processing both compressed and uncompressed Variant Calling Format (VCF) files. Built on a Java framework, the tool employs a disk-streaming real-time filtering algorithm, allowing it to manage sizable variant files on conventional desktop computers. 123VCF filters input variants in accordance with a predefined filter sequence applied to the input variants. Users are provided the flexibility to define various filtering parameters, such as quality, coverage depth, and variant frequency within the populations. Additionally, 123VCF accommodates user-defined filters tailored to specific case requirements, affording users enhanced control over the filtering process. We evaluated the performance of 123VCF by analyzing different types of variant files and comparing its runtimes to the most similar algorithms like BCFtools filter and GATK VariantFiltration. The results indicated that 123VCF performs relatively well. The tool's intuitive interface and potential for reproducibility make it a valuable asset for both researchers and clinicians. CONCLUSION The 123VCF filtering tool provides an effective, dependable approach for filtering variants in both research and clinical settings. As an open-source tool available at https://project123vcf.sourceforge.io , it is accessible to the global scientific and clinical community, paving the way for the discovery of disease-causing variants and facilitating the advancement of personalized medicine.
Collapse
Affiliation(s)
- Milad Eidi
- Department of Medical Genetics, Faculty of Medical Sciences, Tarbiat Modares University, Tehran, Iran
| | - Samaneh Abdolalizadeh
- Department of Genetics and Molecular Medicine, School of Medicine, Zanjan University of Medical Sciences (ZUMS), Zanjan, Iran
| | - Soheila Moeini
- Département de Biochimie et Médecine Moléculaire, Université de Montréal, Montreal, QC, Canada
- Research Centre, Montreal Heart Institute, Montreal, QC, Canada
| | - Masoud Garshasbi
- Department of Medical Genetics, Faculty of Medical Sciences, Tarbiat Modares University, Tehran, Iran.
| | - Javad Zahiri
- Department of Neuroscience, University of California San Diego, San Diego, CA, USA.
| |
Collapse
|
2
|
Corpas M, Megy K, Metastasio A, Lehmann E. Implementation of individualised polygenic risk score analysis: a test case of a family of four. BMC Med Genomics 2022; 15:207. [PMID: 36192731 PMCID: PMC9531350 DOI: 10.1186/s12920-022-01331-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Accepted: 08/05/2022] [Indexed: 11/16/2022] Open
Abstract
Background Polygenic risk scores (PRS) have been widely applied in research studies, showing how population groups can be stratified into risk categories for many common conditions. As healthcare systems consider applying PRS to keep their populations healthy, little work has been carried out demonstrating their implementation at an individual level. Case presentation We performed a systematic curation of PRS sources from established data repositories, selecting 15 phenotypes, comprising an excess of 37 million SNPs related to cancer, cardiovascular, metabolic and autoimmune diseases. We tested selected phenotypes using whole genome sequencing data for a family of four related individuals. Individual risk scores were given percentile values based upon reference distributions among 1000 Genomes Iberians, Europeans, or all samples. Over 96 billion allele effects were calculated in order to obtain the PRS for each of the individuals analysed here. Conclusions Our results highlight the need for further standardisation in the way PRS are developed and shared, the importance of individual risk assessment rather than the assumption of inherited averages, and the challenges currently posed when translating PRS into risk metrics.
Collapse
Affiliation(s)
- Manuel Corpas
- Cambridge Precision Medicine Limited, ideaSpace, University of Cambridge Biomedical Innovation Hub, Cambridge, UK. .,Institute of Continuing Education, University of Cambridge, Cambridge, UK. .,Facultad de Ciencias de La Salud, Universidad Internacional de La Rioja, Madrid, Spain.
| | - Karyn Megy
- Cambridge Precision Medicine Limited, ideaSpace, University of Cambridge Biomedical Innovation Hub, Cambridge, UK.,Department of Haematology, University of Cambridge & NHS Blood and Transplant, Cambridge, UK
| | - Antonio Metastasio
- Cambridge Precision Medicine Limited, ideaSpace, University of Cambridge Biomedical Innovation Hub, Cambridge, UK.,Camden and Islington NHS Foundation Trust, London, UK
| | - Edmund Lehmann
- Cambridge Precision Medicine Limited, ideaSpace, University of Cambridge Biomedical Innovation Hub, Cambridge, UK
| |
Collapse
|
3
|
Corpas M, Megy K, Mistry V, Metastasio A, Lehmann E. Whole Genome Interpretation for a Family of Five. Front Genet 2021; 12:535123. [PMID: 33763108 PMCID: PMC7982663 DOI: 10.3389/fgene.2021.535123] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2020] [Accepted: 02/15/2021] [Indexed: 12/19/2022] Open
Abstract
Although best practices have emerged on how to analyse and interpret personal genomes, the utility of whole genome screening remains underdeveloped. A large amount of information can be gathered from various types of analyses via whole genome sequencing including pathogenicity screening, genetic risk scoring, fitness, nutrition, and pharmacogenomic analysis. We recognize different levels of confidence when assessing the validity of genetic markers and apply rigorous standards for evaluation of phenotype associations. We illustrate the application of this approach on a family of five. By applying analyses of whole genomes from different methodological perspectives, we are able to build a more comprehensive picture to assist decision making in preventative healthcare and well-being management. Our interpretation and reporting outputs provide input for a clinician to develop a healthcare plan for the individual, based on genetic and other healthcare data.
Collapse
Affiliation(s)
- Manuel Corpas
- Cambridge Precision Medicine Limited, ideaSpace, University of Cambridge Biomedical Innovation Hub, Cambridge, United Kingdom.,Institute of Continuing Education Madingley Hall Madingley, University of Cambridge, Cambridge, United Kingdom.,Facultad de Ciencias de la Salud, Universidad Internacional de La Rioja, Madrid, Spain
| | - Karyn Megy
- Cambridge Precision Medicine Limited, ideaSpace, University of Cambridge Biomedical Innovation Hub, Cambridge, United Kingdom.,Department of Haematology, University of Cambridge & National Health Service (NHS) Blood and Transplant, Cambridge, United Kingdom
| | | | - Antonio Metastasio
- Cambridge Precision Medicine Limited, ideaSpace, University of Cambridge Biomedical Innovation Hub, Cambridge, United Kingdom.,Camden and Islington NHS Foundation Trust, London, United Kingdom
| | - Edmund Lehmann
- Cambridge Precision Medicine Limited, ideaSpace, University of Cambridge Biomedical Innovation Hub, Cambridge, United Kingdom
| |
Collapse
|
4
|
Nabieva E, Sharma SM, Kapushev Y, Garushyants SK, Fedotova AV, Moskalenko VN, Serebrenikova TE, Glazyrina E, Kanivets IV, Pyankov DV, Neretina TV, Logacheva MD, Bazykin GA, Yarotsky D. Accurate fetal variant calling in the presence of maternal cell contamination. Eur J Hum Genet 2020; 28:1615-1623. [PMID: 32728107 PMCID: PMC7576216 DOI: 10.1038/s41431-020-0697-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2020] [Revised: 07/07/2020] [Accepted: 07/14/2020] [Indexed: 11/09/2022] Open
Abstract
High-throughput sequencing of fetal DNA is a promising and increasingly common method for the discovery of all (or all coding) genetic variants in the fetus, either as part of prenatal screening or diagnosis, or for genetic diagnosis of spontaneous abortions. In many cases, the fetal DNA (from chorionic villi, amniotic fluid, or abortive tissue) can be contaminated with maternal cells, resulting in the mixture of fetal and maternal DNA. This maternal cell contamination (MCC) undermines the assumption, made by traditional variant callers, that each allele in a heterozygous site is covered, on average, by 50% of the reads, and therefore can lead to erroneous genotype calls. We present a panel of methods for reducing the genotyping error in the presence of MCC. All methods start with the output of GATK HaplotypeCaller on the sequencing data for the (contaminated) fetal sample and both of its parents, and additionally rely on information about the MCC fraction (which itself is readily estimated from the high-throughput sequencing data). The first of these methods uses a Bayesian probabilistic model to correct the fetal genotype calls produced by MCC-unaware HaplotypeCaller. The other two methods "learn" the genotype-correction model from examples. We use simulated contaminated fetal data to train and test the models. Using the test sets, we show that all three methods lead to substantially improved accuracy when compared with the original MCC-unaware HaplotypeCaller calls. We then apply the best-performing method to three chorionic villus samples from spontaneously terminated pregnancies.
Collapse
Affiliation(s)
- Elena Nabieva
- Skolkovo Institute of Science and Technology, Skolkovo, Russia.
- Institute for Information Transmission Problems (Kharkevich Institute), Russian Academy of Sciences, Moscow, Russia.
| | | | - Yermek Kapushev
- Skolkovo Institute of Science and Technology, Skolkovo, Russia
- Institute for Information Transmission Problems (Kharkevich Institute), Russian Academy of Sciences, Moscow, Russia
| | - Sofya K Garushyants
- Skolkovo Institute of Science and Technology, Skolkovo, Russia
- Institute for Information Transmission Problems (Kharkevich Institute), Russian Academy of Sciences, Moscow, Russia
| | - Anna V Fedotova
- Skolkovo Institute of Science and Technology, Skolkovo, Russia
- Lomonosov Moscow State University, Moscow, Russia
| | | | | | | | | | | | - Tatyana V Neretina
- Institute for Information Transmission Problems (Kharkevich Institute), Russian Academy of Sciences, Moscow, Russia
- Lomonosov Moscow State University, Moscow, Russia
| | - Maria D Logacheva
- Skolkovo Institute of Science and Technology, Skolkovo, Russia
- Institute for Information Transmission Problems (Kharkevich Institute), Russian Academy of Sciences, Moscow, Russia
- Lomonosov Moscow State University, Moscow, Russia
| | - Georgii A Bazykin
- Skolkovo Institute of Science and Technology, Skolkovo, Russia
- Institute for Information Transmission Problems (Kharkevich Institute), Russian Academy of Sciences, Moscow, Russia
| | - Dmitry Yarotsky
- Skolkovo Institute of Science and Technology, Skolkovo, Russia
- Institute for Information Transmission Problems (Kharkevich Institute), Russian Academy of Sciences, Moscow, Russia
| |
Collapse
|
5
|
Greshake Tzovaras B, Angrist M, Arvai K, Dulaney M, Estrada-Galiñanes V, Gunderson B, Head T, Lewis D, Nov O, Shaer O, Tzovara A, Bobe J, Price Ball M. Open Humans: A platform for participant-centered research and personal data exploration. Gigascience 2019; 8:giz076. [PMID: 31241153 PMCID: PMC6593360 DOI: 10.1093/gigascience/giz076] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2018] [Revised: 05/02/2019] [Accepted: 06/03/2019] [Indexed: 01/15/2023] Open
Abstract
BACKGROUND Many aspects of our lives are now digitized and connected to the internet. As a result, individuals are now creating and collecting more personal data than ever before. This offers an unprecedented chance for human-participant research ranging from the social sciences to precision medicine. With this potential wealth of data comes practical problems (e.g., how to merge data streams from various sources), as well as ethical problems (e.g., how best to balance risks and benefits when enabling personal data sharing by individuals). RESULTS To begin to address these problems in real time, we present Open Humans, a community-based platform that enables personal data collections across data streams, giving individuals more personal data access and control of sharing authorizations, and enabling academic research as well as patient-led projects. We showcase data streams that Open Humans combines (e.g., personal genetic data, wearable activity monitors, GPS location records, and continuous glucose monitor data), along with use cases of how the data facilitate various projects. CONCLUSIONS Open Humans highlights how a community-centric ecosystem can be used to aggregate personal data from various sources, as well as how these data can be used by academic and citizen scientists through practical, iterative approaches to sharing that strive to balance considerations with participant autonomy, inclusion, and privacy.
Collapse
Affiliation(s)
- Bastian Greshake Tzovaras
- Open Humans Foundation, 500 Westover Dr #10553, Sanford, NC, 27330, USA
- Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley, CA 94720, USA
| | - Misha Angrist
- Social Science Research Institute, Duke University, 140 Science Drive, Durham, NC 27708, USA
| | | | - Mairi Dulaney
- Open Humans Foundation, 500 Westover Dr #10553, Sanford, NC, 27330, USA
| | - Vero Estrada-Galiñanes
- QoL Lab, Department of ComputerScience, University of Copenhagen, Sigurdsgade 41, DK-2200 Copenhagen, Denmark
- IDE, University of Stavanger, Kjell Arholmsgate 41, 4036 Stavanger, Norway
| | | | - Tim Head
- Wild Tree Tech, Froehlichstrasse 42 5200 Brugg Switzerland
| | | | - Oded Nov
- Tandon School of Engineering, New York University, 6 MetroTech Center, Brooklyn, NY 11201, USA
| | - Orit Shaer
- Wellesley College, 106 Central Street – Wellesley, MA 02481, USA
| | - Athina Tzovara
- Helen Wills Neuroscience Institute, University of California, Berkeley 174 Li Ka Shing Center, Berkeley, CA 94720, USA
- Institute of Computer Science, University of Bern, Neubrückstrasse 10, 3012 Bern, Switzerland
| | - Jason Bobe
- Institute for Next Generation Healthcare, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Place New York, NY 10029-5674, USA
| | - Mad Price Ball
- Open Humans Foundation, 500 Westover Dr #10553, Sanford, NC, 27330, USA
| |
Collapse
|
6
|
Parveen A, Khurana S, Kumar A. Overview of Genomic Tools for Circular Visualization in the Next-generation Genomic Sequencing Era. Curr Genomics 2019; 20:90-99. [PMID: 31555060 PMCID: PMC6728899 DOI: 10.2174/1389202920666190314092044] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2019] [Revised: 03/07/2019] [Accepted: 03/07/2019] [Indexed: 12/13/2022] Open
Abstract
After human genome sequencing and rapid changes in genome sequencing methods, we have entered into the era of rapidly accumulating genome-sequencing data. This has derived the development of several types of methods for representing results of genome sequencing data. Circular genome visual-ization tools are also critical in this area as they provide rapid interpretation and simple visualization of overall data. In the last 15 years, we have seen rapid changes in circular visualization tools after the de-velopment of the circos tool with 1-2 tools published per year. Herein we have summarized and revisited all these tools until the third quarter of 2018.
Collapse
Affiliation(s)
- Alisha Parveen
- 1Medical Research Center, Medical Faculty of Mannheim, University of Heidelberg, Mannheim, Germany; 2Pharmacology Department, Central Drug Research Institute - Lucknow, Uttar Pradesh, India; 3Department of Genetics & Molecular Biology in Botany, Institute of Botany, Christian-Albrechts-University at Kiel, Kiel, Germany
| | - Sukant Khurana
- 1Medical Research Center, Medical Faculty of Mannheim, University of Heidelberg, Mannheim, Germany; 2Pharmacology Department, Central Drug Research Institute - Lucknow, Uttar Pradesh, India; 3Department of Genetics & Molecular Biology in Botany, Institute of Botany, Christian-Albrechts-University at Kiel, Kiel, Germany
| | - Abhishek Kumar
- 1Medical Research Center, Medical Faculty of Mannheim, University of Heidelberg, Mannheim, Germany; 2Pharmacology Department, Central Drug Research Institute - Lucknow, Uttar Pradesh, India; 3Department of Genetics & Molecular Biology in Botany, Institute of Botany, Christian-Albrechts-University at Kiel, Kiel, Germany
| |
Collapse
|
7
|
Jesser KJ, Valdivia-Granda W, Jones JL, Noble RT. Clustering of Vibrio parahaemolyticus Isolates Using MLST and Whole-Genome Phylogenetics and Protein Motif Fingerprinting. Front Public Health 2019; 7:66. [PMID: 31139608 PMCID: PMC6519141 DOI: 10.3389/fpubh.2019.00066] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2018] [Accepted: 03/06/2019] [Indexed: 01/22/2023] Open
Abstract
Vibrio parahaemolyticus is a ubiquitous and abundant member of native microbial assemblages in coastal waters and shellfish. Though V. parahaemolyticus is predominantly environmental, some strains have infected human hosts and caused outbreaks of seafood-related gastroenteritis. In order to understand differences among clinical and environmental V. parahaemolyticus strains, we used high quality DNA sequencing data to compare the genomes of V. parahaemolyticus isolates (n = 43) from a variety of geographic locations and clinical and environmental sample matrices. We used phylogenetic trees inferred from multilocus sequence typing (MLST) and whole-genome (WG) alignments, as well as a novel classification and genome clustering approach that relies on protein motif fingerprints (MFs), to assess relationships between V. parahaemolyticus strains and identify novel molecular targets associated with virulence. Differences in strain clustering at more than one position were observed between the MLST and WG phylogenetic trees. The WG phylogeny had higher support values and strain resolution since isolates of the same sequence type could be differentiated. The MF analysis revealed groups of protein motifs that were associated with the pathogenic MLST type ST36 and a large group of clinical strains isolated from human stool. A subset of the stool and ST36-associated protein motifs were selected for further analysis and the motif sequences were found in genes with a variety of functions, including transposases, secretion system components and effectors, and hypothetical proteins. DNA sequences associated with these protein motifs are candidate targets for future molecular assays in order to improve surveys of pathogenic V. parahaemolyticus in the environment and seafood.
Collapse
Affiliation(s)
- Kelsey J Jesser
- Institute of Marine Sciences, University of North Carolina at Chapel Hill, Morehead City, NC, United States
| | | | - Jessica L Jones
- Gulf Coast Seafood Laboratory, Division of Seafood Science and Technology, U.S. Food and Drug Administration, Dauphin Island, AL, United States
| | - Rachel T Noble
- Institute of Marine Sciences, University of North Carolina at Chapel Hill, Morehead City, NC, United States
| |
Collapse
|
8
|
Murray MF, Evans JP, Angrist M, Uhlmann WR, Lochner Doyle D, Fullerton SM, Ganiats TG, Hagenkord J, Imhof S, Rim SH, Ortmann L, Aziz N, Dotson WD, Matloff E, Young K, Kaphingst K, Bradbury A, Scott J, Wang C, Zauber A, Levine M, Korf B, Leonard DG, Wicklund C, Isham G, Khoury MJ. A Proposed Approach for Implementing Genomics-Based Screening Programs for Healthy Adults. NAM Perspect 2018. [DOI: 10.31478/201812a] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Joan Scott
- Health Resources and Services Administration
| | | | | | | | | | | | | | | | | |
Collapse
|
9
|
Créquit P, Mansouri G, Benchoufi M, Vivot A, Ravaud P. Mapping of Crowdsourcing in Health: Systematic Review. J Med Internet Res 2018; 20:e187. [PMID: 29764795 PMCID: PMC5974463 DOI: 10.2196/jmir.9330] [Citation(s) in RCA: 70] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2017] [Revised: 02/10/2018] [Accepted: 03/14/2018] [Indexed: 11/22/2022] Open
Abstract
Background Crowdsourcing involves obtaining ideas, needed services, or content by soliciting Web-based contributions from a crowd. The 4 types of crowdsourced tasks (problem solving, data processing, surveillance or monitoring, and surveying) can be applied in the 3 categories of health (promotion, research, and care). Objective This study aimed to map the different applications of crowdsourcing in health to assess the fields of health that are using crowdsourcing and the crowdsourced tasks used. We also describe the logistics of crowdsourcing and the characteristics of crowd workers. Methods MEDLINE, EMBASE, and ClinicalTrials.gov were searched for available reports from inception to March 30, 2016, with no restriction on language or publication status. Results We identified 202 relevant studies that used crowdsourcing, including 9 randomized controlled trials, of which only one had posted results at ClinicalTrials.gov. Crowdsourcing was used in health promotion (91/202, 45.0%), research (73/202, 36.1%), and care (38/202, 18.8%). The 4 most frequent areas of application were public health (67/202, 33.2%), psychiatry (32/202, 15.8%), surgery (22/202, 10.9%), and oncology (14/202, 6.9%). Half of the reports (99/202, 49.0%) referred to data processing, 34.6% (70/202) referred to surveying, 10.4% (21/202) referred to surveillance or monitoring, and 5.9% (12/202) referred to problem-solving. Labor market platforms (eg, Amazon Mechanical Turk) were used in most studies (190/202, 94%). The crowd workers’ characteristics were poorly reported, and crowdsourcing logistics were missing from two-thirds of the reports. When reported, the median size of the crowd was 424 (first and third quartiles: 167-802); crowd workers’ median age was 34 years (32-36). Crowd workers were mainly recruited nationally, particularly in the United States. For many studies (58.9%, 119/202), previous experience in crowdsourcing was required, and passing a qualification test or training was seldom needed (11.9% of studies; 24/202). For half of the studies, monetary incentives were mentioned, with mainly less than US $1 to perform the task. The time needed to perform the task was mostly less than 10 min (58.9% of studies; 119/202). Data quality validation was used in 54/202 studies (26.7%), mainly by attention check questions or by replicating the task with several crowd workers. Conclusions The use of crowdsourcing, which allows access to a large pool of participants as well as saving time in data collection, lowering costs, and speeding up innovations, is increasing in health promotion, research, and care. However, the description of crowdsourcing logistics and crowd workers’ characteristics is frequently missing in study reports and needs to be precisely reported to better interpret the study findings and replicate them.
Collapse
Affiliation(s)
- Perrine Créquit
- INSERM UMR1153, Methods Team, Epidemiology and Statistics Sorbonne Paris Cité Research Center, Paris Descartes University, Paris, France.,Centre d'Epidémiologie Clinique, Hôpital Hôtel Dieu, Assistance Publique des Hôpitaux de Paris, Paris, France.,Cochrane France, Paris, France
| | - Ghizlène Mansouri
- INSERM UMR1153, Methods Team, Epidemiology and Statistics Sorbonne Paris Cité Research Center, Paris Descartes University, Paris, France
| | - Mehdi Benchoufi
- Centre d'Epidémiologie Clinique, Hôpital Hôtel Dieu, Assistance Publique des Hôpitaux de Paris, Paris, France
| | - Alexandre Vivot
- INSERM UMR1153, Methods Team, Epidemiology and Statistics Sorbonne Paris Cité Research Center, Paris Descartes University, Paris, France.,Centre d'Epidémiologie Clinique, Hôpital Hôtel Dieu, Assistance Publique des Hôpitaux de Paris, Paris, France
| | - Philippe Ravaud
- INSERM UMR1153, Methods Team, Epidemiology and Statistics Sorbonne Paris Cité Research Center, Paris Descartes University, Paris, France.,Centre d'Epidémiologie Clinique, Hôpital Hôtel Dieu, Assistance Publique des Hôpitaux de Paris, Paris, France.,Cochrane France, Paris, France.,Department of Epidemiology, Columbia University, Mailman School of Public Health, New York, NY, United States
| |
Collapse
|
10
|
Haeusermann T, Greshake B, Blasimme A, Irdam D, Richards M, Vayena E. Open sharing of genomic data: Who does it and why? PLoS One 2017; 12:e0177158. [PMID: 28486511 PMCID: PMC5423632 DOI: 10.1371/journal.pone.0177158] [Citation(s) in RCA: 37] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2016] [Accepted: 04/24/2017] [Indexed: 01/12/2023] Open
Abstract
We explored the characteristics and motivations of people who, having obtained their genetic or genomic data from Direct-To-Consumer genetic testing (DTC-GT) companies, voluntarily decide to share them on the publicly accessible web platform openSNP. The study is the first attempt to describe open data sharing activities undertaken by individuals without institutional oversight. In the paper we provide a detailed overview of the distribution of the demographic characteristics and motivations of people engaged in genetic or genomic open data sharing. The geographical distribution of the respondents showed the USA as dominant. There was no significant gender divide, the age distribution was broad, educational background varied and respondents with and without children were equally represented. Health, even though prominent, was not the respondents' primary or only motivation to be tested. As to their motivations to openly share their data, 86.05% indicated wanting to learn about themselves as relevant, followed by contributing to the advancement of medical research (80.30%), improving the predictability of genetic testing (76.02%) and considering it fun to explore genotype and phenotype data (75.51%). Whereas most respondents were well aware of the privacy risks of their involvement in open genetic data sharing and considered the possibility of direct, personal repercussions troubling, they estimated the risk of this happening to be negligible. Our findings highlight the diversity of DTC-GT consumers who decide to openly share their data. Instead of focusing exclusively on health-related aspects of genetic testing and data sharing, our study emphasizes the importance of taking into account benefits and risks that stretch beyond the health spectrum. Our results thus lend further support to the call for a broader and multi-faceted conceptualization of genomic utility.
Collapse
Affiliation(s)
- Tobias Haeusermann
- Health Ethics and Policy Lab, Epidemiology, Biostatistics & Prevention Institute (EBPI), University of Zurich, Zurich, Switzerland
- Department of Sociology, University of Cambridge, Cambridge, United Kingdom
| | - Bastian Greshake
- Department for Applied Bioinformatics, Institute for Cell Biology and Neuroscience, Goethe University, Frankfurt am Main, Germany
| | - Alessandro Blasimme
- Health Ethics and Policy Lab, Epidemiology, Biostatistics & Prevention Institute (EBPI), University of Zurich, Zurich, Switzerland
| | - Darja Irdam
- Department of Sociology, University of Cambridge, Cambridge, United Kingdom
| | - Martin Richards
- Centre for Family Research, Department of Psychology. University of Cambridge, Cambridge, United Kingdom
| | - Effy Vayena
- Health Ethics and Policy Lab, Epidemiology, Biostatistics & Prevention Institute (EBPI), University of Zurich, Zurich, Switzerland
| |
Collapse
|
11
|
Wright CF, Middleton A, Barrett JC, Firth HV, FitzPatrick DR, Hurles ME, Parker M. Returning genome sequences to research participants: Policy and practice. Wellcome Open Res 2017; 2:15. [PMID: 28317033 PMCID: PMC5351846 DOI: 10.12688/wellcomeopenres.10942.1] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/22/2017] [Indexed: 12/18/2022] Open
Abstract
Despite advances in genomic science stimulating an explosion of literature around returning health-related findings, the possibility of returning entire genome sequences to individual research participants has not been widely considered. Through direct involvement in large-scale translational genomics studies, we have identified a number of logistical challenges that would need to be overcome prior to returning individual genome sequence data, including verifying that the data belong to the requestor and providing appropriate informatics support. In addition, we identify a number of ethico-legal issues that require careful consideration, including returning data to family members, mitigating against unintended consequences, and ensuring appropriate governance. Finally, recognising that there is an opportunity cost to addressing these issues, we make some specific pragmatic suggestions for studies that are considering whether to share individual genomic datasets with individual study participants. If data are shared, research should be undertaken into the personal, familial and societal impact of receiving individual genome sequence data.
Collapse
Affiliation(s)
- Caroline F. Wright
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Anna Middleton
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Jeffrey C. Barrett
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Helen V. Firth
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
- East Anglian Medical Genetics Service, Cambridge University Hospitals NHS Foundation Trust, Cambridge Biomedical Campus, Cambridge, CB2 0QQ, UK
| | - David R. FitzPatrick
- MRC Human Genetics Unit, MRC IGMM, University of Edinburgh, Western General Hospital, Edinburgh, EH4 2XU, UK
| | - Matthew E. Hurles
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Michael Parker
- The Ethox Centre, Nuffield Department of Population Health, University of Oxford, Oxford, OX3 7LF, UK
| |
Collapse
|