1
|
Chan Y, Tung M, Garruss AS, Zaranek SW, Chan YK, Lunshof JE, Zaranek AW, Ball MP, Chou MF, Lim ET, Church GM. An unbiased index to quantify participant's phenotypic contribution to an open-access cohort. Sci Rep 2017; 7:46148. [PMID: 28387241 PMCID: PMC5384003 DOI: 10.1038/srep46148] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2016] [Accepted: 03/10/2017] [Indexed: 01/03/2023] Open
Abstract
The Personal Genome Project (PGP) is an effort to enroll many participants to create an open-access repository of genome, health and trait data for research. However, PGP participants are not enrolled for studying any specific traits and participants choose the phenotypes to disclose. To measure the extent and willingness and to encourage and guide participants to contribute phenotypes, we developed an algorithm to score and rank the phenotypes and participants of the PGP. The scoring algorithm calculates the participation index (P-index) for every participant, where 0 indicates no reported phenotypes and 100 indicate complete phenotype reporting. We calculated the P-index for all 5,015 participants in the PGP and they ranged from 0 to 96.7. We found that participants mainly have either high scores (P-index > 90, 29.5%) or low scores (P-index < 10, 57.8%). While, there are significantly more males than female participants (1,793 versus 1,271), females tend to have on average higher P-indexes (P = 0.015). We also reported the P-indexes of participants based on demographics and states like Missouri and Massachusetts have better P-indexes than states like Utah and Minnesota. The P-index can therefore be used as an unbiased way to measure and rank participant's phenotypic contribution towards the PGP.
Collapse
Affiliation(s)
- Yingleong Chan
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
- Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA 02115, USA
- Program in Medical and Population Genetics, Broad Institute, Cambridge, MA 02142, USA
| | - Michael Tung
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
- Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA 02115, USA
| | - Alexander S. Garruss
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
- Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA 02115, USA
- Program in Bioinformatics and Integrative Genomics, Division of Medical Sciences, Graduate School of Arts and Sciences, Harvard University, Cambridge, MA 02138, USA
| | | | - Ying Kai Chan
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
- Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA 02115, USA
| | - Jeantine E. Lunshof
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
- Program in Medical and Population Genetics, Broad Institute, Cambridge, MA 02142, USA
- Department of Genetics, University Medical Centre Groningen, University of Groningen, 9700 RB Groningen, The Netherlands
| | | | | | - Michael F. Chou
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
- Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA 02115, USA
| | - Elaine T. Lim
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
- Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA 02115, USA
- Division of Genetics and Genomics, Boston Children’s Hospital, Boston, MA 02115, USA
| | - George M. Church
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
- Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA 02115, USA
| |
Collapse
|
2
|
Lubin IM, Aziz N, Babb LJ, Ballinger D, Bisht H, Church DM, Cordes S, Eilbeck K, Hyland F, Kalman L, Landrum M, Lockhart ER, Maglott D, Marth G, Pfeifer JD, Rehm HL, Roy S, Tezak Z, Truty R, Ullman-Cullere M, Voelkerding KV, Worthey EA, Zaranek AW, Zook JM. Principles and Recommendations for Standardizing the Use of the Next-Generation Sequencing Variant File in Clinical Settings. J Mol Diagn 2017; 19:417-426. [PMID: 28315672 DOI: 10.1016/j.jmoldx.2016.12.001] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2016] [Revised: 12/05/2016] [Accepted: 12/23/2016] [Indexed: 11/30/2022] Open
Abstract
A national workgroup convened by the Centers for Disease Control and Prevention identified principles and made recommendations for standardizing the description of sequence data contained within the variant file generated during the course of clinical next-generation sequence analysis for diagnosing human heritable conditions. The specifications for variant files were initially developed to be flexible with regard to content representation to support a variety of research applications. This flexibility permits variation with regard to how sequence findings are described and this depends, in part, on the conventions used. For clinical laboratory testing, this poses a problem because these differences can compromise the capability to compare sequence findings among laboratories to confirm results and to query databases to identify clinically relevant variants. To provide for a more consistent representation of sequence findings described within variant files, the workgroup made several recommendations that considered alignment to a common reference sequence, variant caller settings, use of genomic coordinates, and gene and variant naming conventions. These recommendations were considered with regard to the existing variant file specifications presently used in the clinical setting. Adoption of these recommendations is anticipated to reduce the potential for ambiguity in describing sequence findings and facilitate the sharing of genomic data among clinical laboratories and other entities.
Collapse
Affiliation(s)
- Ira M Lubin
- Division of Laboratory Systems, Centers for Disease Control and Prevention, Atlanta, Georgia.
| | - Nazneen Aziz
- College of American Pathologists, Chicago, Illinois; Kaiser Permanente Research Bank, Oakland, California
| | - Lawrence J Babb
- Partners Healthcare Personalized Medicine, Cambridge, Massachusetts; GeneInsight, a Sunquest Company, Boston, Massachusetts
| | | | - Himani Bisht
- Center for Devices and Radiological Health, US Food and Drug Administration, Silver Spring, Maryland
| | - Deanna M Church
- Personalis, Menlo Park, California; National Center for Biotechnology Information, NIH, Bethesda, Maryland; 10× Genomics, Pleasanton, California
| | | | - Karen Eilbeck
- Department of Human Genetics, University of Utah School of Medicine, Salt Lake City, Utah
| | | | - Lisa Kalman
- Division of Laboratory Systems, Centers for Disease Control and Prevention, Atlanta, Georgia
| | - Melissa Landrum
- National Center for Biotechnology Information, NIH, Bethesda, Maryland
| | - Edward R Lockhart
- Division of Laboratory Systems, Centers for Disease Control and Prevention, Atlanta, Georgia
| | - Donna Maglott
- National Center for Biotechnology Information, NIH, Bethesda, Maryland
| | - Gabor Marth
- Department of Biomedical Informatics, University of Utah School of Medicine, Salt Lake City, Utah; Boston College, Chestnut Hill, Massachusetts
| | - John D Pfeifer
- Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, Missouri
| | - Heidi L Rehm
- Department of Pathology, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts
| | - Somak Roy
- Division of Molecular and Genomic Pathology, University of Pittsburgh Medical Center, Pittsburgh, Pennsylvania
| | - Zivana Tezak
- Center for Devices and Radiological Health, US Food and Drug Administration, Silver Spring, Maryland
| | - Rebecca Truty
- Complete Genomics, Mountain View, California; Invitae Corporation, San Francisco, California
| | | | - Karl V Voelkerding
- Department of Pathology, University of Utah and the Institute for Clinical and Experimental Pathology, Associated Regional and University Pathologists Laboratories, Salt Lake City, Utah
| | - Elizabeth A Worthey
- Department of Pediatrics, Medical College of Wisconsin, Milwaukee, Wisconsin
| | - Alexander W Zaranek
- Personal Genome Project, Harvard Medical School, Boston, Massachusetts; Curoverse, Inc., Somerville, Massachusetts
| | - Justin M Zook
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, Maryland
| |
Collapse
|
3
|
Zook JM, Catoe D, McDaniel J, Vang L, Spies N, Sidow A, Weng Z, Liu Y, Mason CE, Alexander N, Henaff E, McIntyre AB, Chandramohan D, Chen F, Jaeger E, Moshrefi A, Pham K, Stedman W, Liang T, Saghbini M, Dzakula Z, Hastie A, Cao H, Deikus G, Schadt E, Sebra R, Bashir A, Truty RM, Chang CC, Gulbahce N, Zhao K, Ghosh S, Hyland F, Fu Y, Chaisson M, Xiao C, Trow J, Sherry ST, Zaranek AW, Ball M, Bobe J, Estep P, Church GM, Marks P, Kyriazopoulou-Panagiotopoulou S, Zheng GX, Schnall-Levin M, Ordonez HS, Mudivarti PA, Giorda K, Sheng Y, Rypdal KB, Salit M. Extensive sequencing of seven human genomes to characterize benchmark reference materials. Sci Data 2016; 3:160025. [PMID: 27271295 PMCID: PMC4896128 DOI: 10.1038/sdata.2016.25] [Citation(s) in RCA: 385] [Impact Index Per Article: 48.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2015] [Accepted: 03/15/2016] [Indexed: 02/01/2023] Open
Abstract
The Genome in a Bottle Consortium, hosted by the National Institute of Standards and Technology (NIST) is creating reference materials and data for human genome sequencing, as well as methods for genome comparison and benchmarking. Here, we describe a large, diverse set of sequencing data for seven human genomes; five are current or candidate NIST Reference Materials. The pilot genome, NA12878, has been released as NIST RM 8398. We also describe data from two Personal Genome Project trios, one of Ashkenazim Jewish ancestry and one of Chinese ancestry. The data come from 12 technologies: BioNano Genomics, Complete Genomics paired-end and LFR, Ion Proton exome, Oxford Nanopore, Pacific Biosciences, SOLiD, 10X Genomics GemCode WGS, and Illumina exome and WGS paired-end, mate-pair, and synthetic long reads. Cell lines, DNA, and data from these individuals are publicly available. Therefore, we expect these data to be useful for revealing novel information about the human genome and improving sequencing technologies, SNP, indel, and structural variant calling, and de novo assembly.
Collapse
Affiliation(s)
- Justin M. Zook
- National Institute of Standards and Technology, Gaithersburg, Maryland 20899, USA
| | - David Catoe
- National Institute of Standards and Technology, Gaithersburg, Maryland 20899, USA
| | - Jennifer McDaniel
- National Institute of Standards and Technology, Gaithersburg, Maryland 20899, USA
| | - Lindsay Vang
- National Institute of Standards and Technology, Gaithersburg, Maryland 20899, USA
| | - Noah Spies
- National Institute of Standards and Technology, Gaithersburg, Maryland 20899, USA
- Stanford University, Stanford, California 94305, USA
| | - Arend Sidow
- Stanford University, Stanford, California 94305, USA
| | - Ziming Weng
- Stanford University, Stanford, California 94305, USA
| | - Yuling Liu
- Stanford University, Stanford, California 94305, USA
| | - Christopher E. Mason
- Department of Physiology and Biophysics, the Feil Family Brain and Mind Research Institute, and HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Medical College, Cornell University, New York, New York 10065, USA
| | - Noah Alexander
- Department of Physiology and Biophysics, the Feil Family Brain and Mind Research Institute, and HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Medical College, Cornell University, New York, New York 10065, USA
| | - Elizabeth Henaff
- Department of Physiology and Biophysics, the Feil Family Brain and Mind Research Institute, and HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Medical College, Cornell University, New York, New York 10065, USA
| | - Alexa B.R. McIntyre
- Department of Physiology and Biophysics, the Feil Family Brain and Mind Research Institute, and HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Medical College, Cornell University, New York, New York 10065, USA
| | - Dhruva Chandramohan
- Department of Physiology and Biophysics, the Feil Family Brain and Mind Research Institute, and HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Medical College, Cornell University, New York, New York 10065, USA
| | - Feng Chen
- Illumina Mission Bay, San Francisco, California 94158, USA
| | - Erich Jaeger
- Illumina Mission Bay, San Francisco, California 94158, USA
| | - Ali Moshrefi
- Illumina Mission Bay, San Francisco, California 94158, USA
| | - Khoa Pham
- BioNano Genomics, San Diego, California 92121, USA
| | | | | | | | | | - Alex Hastie
- BioNano Genomics, San Diego, California 92121, USA
| | - Han Cao
- BioNano Genomics, San Diego, California 92121, USA
| | - Gintaras Deikus
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York 10029, USA
| | - Eric Schadt
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York 10029, USA
| | - Robert Sebra
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York 10029, USA
| | - Ali Bashir
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York 10029, USA
| | | | | | - Natali Gulbahce
- Complete Genomics Inc., Mountain View, California 94043, USA
| | - Keyan Zhao
- Thermo Fisher Scientific, South San Francisco, California 94080, USA
| | - Srinka Ghosh
- Thermo Fisher Scientific, South San Francisco, California 94080, USA
| | - Fiona Hyland
- Thermo Fisher Scientific, South San Francisco, California 94080, USA
| | - Yutao Fu
- Thermo Fisher Scientific, South San Francisco, California 94080, USA
| | - Mark Chaisson
- Genome Sciences, University of Washington, Seattle, Washington 98105, USA
| | - Chunlin Xiao
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 45 Center Drive, Bethesda, Maryland 20892, USA
| | - Jonathan Trow
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 45 Center Drive, Bethesda, Maryland 20892, USA
| | - Stephen T. Sherry
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 45 Center Drive, Bethesda, Maryland 20892, USA
| | | | | | - Jason Bobe
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York 10029, USA
- PersonalGenomes.org, Boston, Massachusetts 02115, USA
| | - Preston Estep
- PersonalGenomes.org, Boston, Massachusetts 02115, USA
- Harvard Medical School, Boston, Massachusetts 02115, USA
| | - George M. Church
- PersonalGenomes.org, Boston, Massachusetts 02115, USA
- Harvard Medical School, Boston, Massachusetts 02115, USA
| | | | | | | | | | | | | | | | - Ying Sheng
- Department of Medical Genetics, Oslo University Hospital, Kirkeveien 166, Bygg 25, Oslo 0450, Norway
| | | | - Marc Salit
- National Institute of Standards and Technology, Gaithersburg, Maryland 20899, USA
- Stanford University, Stanford, California 94305, USA
| |
Collapse
|
4
|
|
5
|
Dewey FE, Chen R, Cordero SP, Ormond KE, Caleshu C, Karczewski KJ, Whirl-Carrillo M, Wheeler MT, Dudley JT, Byrnes JK, Cornejo OE, Knowles JW, Woon M, Sangkuhl K, Gong L, Thorn CF, Hebert JM, Capriotti E, David SP, Pavlovic A, West A, Thakuria JV, Ball MP, Zaranek AW, Rehm HL, Church GM, West JS, Bustamante CD, Snyder M, Altman RB, Klein TE, Butte AJ, Ashley EA. Phased whole-genome genetic risk in a family quartet using a major allele reference sequence. PLoS Genet 2011; 7:e1002280. [PMID: 21935354 PMCID: PMC3174201 DOI: 10.1371/journal.pgen.1002280] [Citation(s) in RCA: 128] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2011] [Accepted: 07/26/2011] [Indexed: 11/19/2022] Open
Abstract
Whole-genome sequencing harbors unprecedented potential for characterization of individual and family genetic variation. Here, we develop a novel synthetic human reference sequence that is ethnically concordant and use it for the analysis of genomes from a nuclear family with history of familial thrombophilia. We demonstrate that the use of the major allele reference sequence results in improved genotype accuracy for disease-associated variant loci. We infer recombination sites to the lowest median resolution demonstrated to date (<1,000 base pairs). We use family inheritance state analysis to control sequencing error and inform family-wide haplotype phasing, allowing quantification of genome-wide compound heterozygosity. We develop a sequence-based methodology for Human Leukocyte Antigen typing that contributes to disease risk prediction. Finally, we advance methods for analysis of disease and pharmacogenomic risk across the coding and non-coding genome that incorporate phased variant data. We show these methods are capable of identifying multigenic risk for inherited thrombophilia and informing the appropriate pharmacological therapy. These ethnicity-specific, family-based approaches to interpretation of genetic variation are emblematic of the next generation of genetic risk assessment using whole-genome sequencing. An individual's genetic profile plays an important role in determining risk for disease and response to medical therapy. The development of technologies that facilitate rapid whole-genome sequencing will provide unprecedented power in the estimation of disease risk. Here we develop methods to characterize genetic determinants of disease risk and response to medical therapy in a nuclear family of four, leveraging population genetic profiles from recent large scale sequencing projects. We identify the way in which genetic information flows through the family to identify sequencing errors and inheritance patterns of genes contributing to disease risk. In doing so we identify genetic risk factors associated with an inherited predisposition to blood clot formation and response to blood thinning medications. We find that this aligns precisely with the most significant disease to occur to date in the family, namely pulmonary embolism, a blood clot in the lung. These ethnicity-specific, family-based approaches to interpretation of individual genetic profiles are emblematic of the next generation of genetic risk assessment using whole-genome sequencing.
Collapse
Affiliation(s)
- Frederick E. Dewey
- Center for Inherited Cardiovascular Disease, Division of Cardiovascular Medicine, Stanford University, Stanford, California, United States of America
| | - Rong Chen
- Division of Systems Medicine, Department of Pediatrics, Stanford University School of Medicine, Stanford, California, United States of America
| | - Sergio P. Cordero
- Biomedical Informatics Graduate Training Program, Stanford University School of Medicine, Stanford, California, United States of America
| | - Kelly E. Ormond
- Department of Genetics, Stanford University School of Medicine, Stanford, California, United States of America
- Center for Biomedical Ethics, Stanford University, Stanford, California, United States of America
| | - Colleen Caleshu
- Center for Inherited Cardiovascular Disease, Division of Cardiovascular Medicine, Stanford University, Stanford, California, United States of America
| | - Konrad J. Karczewski
- Biomedical Informatics Graduate Training Program, Stanford University School of Medicine, Stanford, California, United States of America
- Department of Genetics, Stanford University School of Medicine, Stanford, California, United States of America
| | - Michelle Whirl-Carrillo
- Department of Genetics, Stanford University School of Medicine, Stanford, California, United States of America
| | - Matthew T. Wheeler
- Center for Inherited Cardiovascular Disease, Division of Cardiovascular Medicine, Stanford University, Stanford, California, United States of America
| | - Joel T. Dudley
- Division of Systems Medicine, Department of Pediatrics, Stanford University School of Medicine, Stanford, California, United States of America
- Biomedical Informatics Graduate Training Program, Stanford University School of Medicine, Stanford, California, United States of America
| | - Jake K. Byrnes
- Department of Genetics, Stanford University School of Medicine, Stanford, California, United States of America
| | - Omar E. Cornejo
- Department of Genetics, Stanford University School of Medicine, Stanford, California, United States of America
| | - Joshua W. Knowles
- Center for Inherited Cardiovascular Disease, Division of Cardiovascular Medicine, Stanford University, Stanford, California, United States of America
| | - Mark Woon
- Department of Genetics, Stanford University School of Medicine, Stanford, California, United States of America
| | - Katrin Sangkuhl
- Department of Genetics, Stanford University School of Medicine, Stanford, California, United States of America
| | - Li Gong
- Department of Genetics, Stanford University School of Medicine, Stanford, California, United States of America
| | - Caroline F. Thorn
- Department of Genetics, Stanford University School of Medicine, Stanford, California, United States of America
| | - Joan M. Hebert
- Department of Genetics, Stanford University School of Medicine, Stanford, California, United States of America
| | - Emidio Capriotti
- Department of Genetics, Stanford University School of Medicine, Stanford, California, United States of America
| | - Sean P. David
- Department of Genetics, Stanford University School of Medicine, Stanford, California, United States of America
| | - Aleksandra Pavlovic
- Center for Inherited Cardiovascular Disease, Division of Cardiovascular Medicine, Stanford University, Stanford, California, United States of America
| | - Anne West
- Wellesley College, Wellesley, Massachusetts, United States of America
| | - Joseph V. Thakuria
- Division of Genetics, Massachusetts General Hospital, Boston, Massachusetts, United States of America
| | - Madeleine P. Ball
- Department of Genetics, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Alexander W. Zaranek
- Department of Genetics, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Heidi L. Rehm
- Department of Pathology, Harvard Medical School, Boston, Massachusetts, United States of America
| | - George M. Church
- Department of Genetics, Harvard Medical School, Boston, Massachusetts, United States of America
| | - John S. West
- Personalis, Palo Alto, California, United States of America
| | - Carlos D. Bustamante
- Department of Genetics, Stanford University School of Medicine, Stanford, California, United States of America
| | - Michael Snyder
- Department of Genetics, Stanford University School of Medicine, Stanford, California, United States of America
| | - Russ B. Altman
- Department of Genetics, Stanford University School of Medicine, Stanford, California, United States of America
- Department of Bioengineering, Stanford University, Stanford, California, United States of America
| | - Teri E. Klein
- Department of Genetics, Stanford University School of Medicine, Stanford, California, United States of America
| | - Atul J. Butte
- Division of Systems Medicine, Department of Pediatrics, Stanford University School of Medicine, Stanford, California, United States of America
| | - Euan A. Ashley
- Center for Inherited Cardiovascular Disease, Division of Cardiovascular Medicine, Stanford University, Stanford, California, United States of America
- * E-mail:
| |
Collapse
|