Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Telenti A, Pierce LC, Biggs WH, di Iulio J, Wong EH, Fabani MM, Kirkness EF, Moustafa A, Shah N, Xie C, Brewerton SC, Bulsara N, Garner C, Metzker G, Sandoval E, Perkins BA, Och FJ, Turpaz Y, Venter JC. Deep sequencing of 10,000 human genomes. Proc Natl Acad Sci U S A 2016;113:11901-6. [PMID: 27702888 DOI: 10.1073/pnas.1613365113] [Citation(s) in RCA: 260] [Impact Index Per Article: 32.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open

For:	Telenti A, Pierce LC, Biggs WH, di Iulio J, Wong EH, Fabani MM, Kirkness EF, Moustafa A, Shah N, Xie C, Brewerton SC, Bulsara N, Garner C, Metzker G, Sandoval E, Perkins BA, Och FJ, Turpaz Y, Venter JC. Deep sequencing of 10,000 human genomes. Proc Natl Acad Sci U S A 2016;113:11901-6. [PMID: 27702888 DOI: 10.1073/pnas.1613365113] [Citation(s) in RCA: 260] [Impact Index Per Article: 32.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open

Number

Cited by Other Article(s)

201

Stenson PD, Mort M, Ball EV, Evans K, Hayden M, Heywood S, Hussain M, Phillips AD, Cooper DN. The Human Gene Mutation Database: towards a comprehensive repository of inherited mutation data for medical research, genetic diagnosis and next-generation sequencing studies. Hum Genet 2017. [PMID: 28349240 DOI: 10.1007/s00439‐017‐1779‐6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]

202

Stenson PD, Mort M, Ball EV, Evans K, Hayden M, Heywood S, Hussain M, Phillips AD, Cooper DN. The Human Gene Mutation Database: towards a comprehensive repository of inherited mutation data for medical research, genetic diagnosis and next-generation sequencing studies. Hum Genet 2017;136:665-677. [PMID: 28349240 PMCID: PMC5429360 DOI: 10.1007/s00439-017-1779-6] [Citation(s) in RCA: 905] [Impact Index Per Article: 129.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2017] [Accepted: 03/14/2017] [Indexed: 02/06/2023]

203

Moustafa A, Xie C, Kirkness E, Biggs W, Wong E, Turpaz Y, Bloom K, Delwart E, Nelson KE, Venter JC, Telenti A. The blood DNA virome in 8,000 humans. PLoS Pathog 2017;13:e1006292. [PMID: 28328962 PMCID: PMC5378407 DOI: 10.1371/journal.ppat.1006292] [Citation(s) in RCA: 199] [Impact Index Per Article: 28.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2016] [Revised: 04/03/2017] [Accepted: 03/14/2017] [Indexed: 02/06/2023] Open

Abstract

The characterization of the blood virome is important for the safety of blood-derived transfusion products, and for the identification of emerging pathogens. We explored non-human sequence data from whole-genome sequencing of blood from 8,240 individuals, none of whom were ascertained for any infectious disease. Viral sequences were extracted from the pool of sequence reads that did not map to the human reference genome. Analyses sifted through close to 1 Petabyte of sequence data and performed 0.5 trillion similarity searches. With a lower bound for identification of 2 viral genomes/100,000 cells, we mapped sequences to 94 different viruses, including sequences from 19 human DNA viruses, proviruses and RNA viruses (herpesviruses, anelloviruses, papillomaviruses, three polyomaviruses, adenovirus, HIV, HTLV, hepatitis B, hepatitis C, parvovirus B19, and influenza virus) in 42% of the study participants. Of possible relevance to transfusion medicine, we identified Merkel cell polyomavirus in 49 individuals, papillomavirus in blood of 13 individuals, parvovirus B19 in 6 individuals, and the presence of herpesvirus 8 in 3 individuals. The presence of DNA sequences from two RNA viruses was unexpected: Hepatitis C virus is revealing of an integration event, while the influenza virus sequence resulted from immunization with a DNA vaccine. Age, sex and ancestry contributed significantly to the prevalence of infection. The remaining 75 viruses mostly reflect extensive contamination of commercial reagents and from the environment. These technical problems represent a major challenge for the identification of novel human pathogens. Increasing availability of human whole-genome sequences will contribute substantial amounts of data on the composition of the normal and pathogenic human blood virome. Distinguishing contaminants from real human viruses is challenging.

Novel sequencing technologies offer insight into the virome in human samples. Here, we identify the viral DNA sequences in blood of over 8,000 individuals undergoing whole genome sequencing. This approach serves to identify 94 viruses; however, many are shown to reflect widespread DNA contamination of commercial reagents or of environmental origin. While this represents a significant limitation to reliably identify novel viruses infecting humans, we could confidently detect sequences and quantify abundance of 19 human viruses in 42% of individuals. Ancestry, sex, and age were important determinants of viral prevalence. This large study calls attention on the challenge of interpreting next generation sequencing data for the identification of novel viruses. However, it serves to categorize the abundance of human DNA viruses using an unbiased technique.

Collapse

204

Whole-genome sequencing identifies common-to-rare variants associated with human blood metabolites. Nat Genet 2017;49:568-578. [PMID: 28263315 DOI: 10.1038/ng.3809] [Citation(s) in RCA: 268] [Impact Index Per Article: 38.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2016] [Accepted: 02/10/2017] [Indexed: 02/07/2023]

205

Diversity in non-repetitive human sequences not found in the reference genome. Nat Genet 2017;49:588-593. [PMID: 28250455 DOI: 10.1038/ng.3801] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2016] [Accepted: 02/03/2017] [Indexed: 12/15/2022]

206

Allendorf FW. Genetics and the conservation of natural populations: allozymes to genomes. Mol Ecol 2017;26:420-430. [DOI: 10.1111/mec.13948] [Citation(s) in RCA: 180] [Impact Index Per Article: 25.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2016] [Accepted: 11/28/2016] [Indexed: 12/14/2022]

207

Freedman JE, Miano JM. Challenges and Opportunities in Linking Long Noncoding RNAs to Cardiovascular, Lung, and Blood Diseases. Arterioscler Thromb Vasc Biol 2016;37:21-25. [PMID: 27856459 DOI: 10.1161/atvbaha.116.308513] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2016] [Accepted: 11/04/2016] [Indexed: 01/16/2023]

208

Genetic variation: Diving deep into the genome. Nat Rev Genet 2016;17:716-717. [PMID: 27773921 DOI: 10.1038/nrg.2016.144] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]

209

It takes a genome to understand a village: Population scale precision medicine. Proc Natl Acad Sci U S A 2016;113:12344-12346. [PMID: 27791179 DOI: 10.1073/pnas.1615329113] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

210

Mao Q, Ciotlos S, Zhang RY, Ball MP, Chin R, Carnevali P, Barua N, Nguyen S, Agarwal MR, Clegg T, Connelly A, Vandewege W, Zaranek AW, Estep PW, Church GM, Drmanac R, Peters BA. The whole genome sequences and experimentally phased haplotypes of over 100 personal genomes. Gigascience 2016;5:42. [PMID: 27724973 PMCID: PMC5057367 DOI: 10.1186/s13742-016-0148-z] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2016] [Accepted: 09/19/2016] [Indexed: 02/01/2023] Open

Abstract

Background

Since the completion of the Human Genome Project in 2003, it is estimated that more than 200,000 individual whole human genomes have been sequenced. A stunning accomplishment in such a short period of time. However, most of these were sequenced without experimental haplotype data and are therefore missing an important aspect of genome biology. In addition, much of the genomic data is not available to the public and lacks phenotypic information.

Findings

As part of the Personal Genome Project, blood samples from 184 participants were collected and processed using Complete Genomics’ Long Fragment Read technology. Here, we present the experimental whole genome haplotyping and sequencing of these samples to an average read coverage depth of 100X. This is approximately three-fold higher than the read coverage applied to most whole human genome assemblies and ensures the highest quality results. Currently, 114 genomes from this dataset are freely available in the GigaDB repository and are associated with rich phenotypic data; the remaining 70 should be added in the near future as they are approved through the PGP data release process. For reproducibility analyses, 20 genomes were sequenced at least twice using independent LFR barcoded libraries. Seven genomes were also sequenced using Complete Genomics’ standard non-barcoded library process. In addition, we report 2.6 million high-quality, rare variants not previously identified in the Single Nucleotide Polymorphisms database or the 1000 Genomes Project Phase 3 data.

Conclusions

These genomes represent a unique source of haplotype and phenotype data for the scientific community and should help to expand our understanding of human genome evolution and function.

Electronic supplementary material

The online version of this article (doi:10.1186/s13742-016-0148-z) contains supplementary material, which is available to authorized users.

Collapse

Affiliation(s)

Qing Mao Complete Genomics, Inc., 2071 Stierlin Ct., Mountain View, CA, 94043, USA
Serban Ciotlos Complete Genomics, Inc., 2071 Stierlin Ct., Mountain View, CA, 94043, USA
Rebecca Yu Zhang Complete Genomics, Inc., 2071 Stierlin Ct., Mountain View, CA, 94043, USA
Madeleine P Ball Harvard Personal Genome Project, Harvard Medical School, NRB 238, 77 Avenue Louis Pasteur, Boston, MA, 02115, USA.,PersonalGenomes.org, 423 Brookline Avenue, #323, Boston, MA, 02215, USA
Robert Chin Complete Genomics, Inc., 2071 Stierlin Ct., Mountain View, CA, 94043, USA
Paolo Carnevali Complete Genomics, Inc., 2071 Stierlin Ct., Mountain View, CA, 94043, USA
Nina Barua Complete Genomics, Inc., 2071 Stierlin Ct., Mountain View, CA, 94043, USA
Staci Nguyen Complete Genomics, Inc., 2071 Stierlin Ct., Mountain View, CA, 94043, USA
Misha R Agarwal Complete Genomics, Inc., 2071 Stierlin Ct., Mountain View, CA, 94043, USA
Tom Clegg Harvard Personal Genome Project, Harvard Medical School, NRB 238, 77 Avenue Louis Pasteur, Boston, MA, 02115, USA.,Curoverse Inc., 212 Elm St, 3rd Floor, Somerville, MA, 02144, USA
Abram Connelly Harvard Personal Genome Project, Harvard Medical School, NRB 238, 77 Avenue Louis Pasteur, Boston, MA, 02115, USA.,Curoverse Inc., 212 Elm St, 3rd Floor, Somerville, MA, 02144, USA
Ward Vandewege Harvard Personal Genome Project, Harvard Medical School, NRB 238, 77 Avenue Louis Pasteur, Boston, MA, 02115, USA.,Curoverse Inc., 212 Elm St, 3rd Floor, Somerville, MA, 02144, USA
Alexander Wait Zaranek Harvard Personal Genome Project, Harvard Medical School, NRB 238, 77 Avenue Louis Pasteur, Boston, MA, 02115, USA.,Curoverse Inc., 212 Elm St, 3rd Floor, Somerville, MA, 02144, USA
Preston W Estep Harvard Personal Genome Project, Harvard Medical School, NRB 238, 77 Avenue Louis Pasteur, Boston, MA, 02115, USA
George M Church Harvard Personal Genome Project, Harvard Medical School, NRB 238, 77 Avenue Louis Pasteur, Boston, MA, 02115, USA
Radoje Drmanac Complete Genomics, Inc., 2071 Stierlin Ct., Mountain View, CA, 94043, USA.,BGI-Shenzhen, Shenzhen, 518083, China
Brock A Peters Complete Genomics, Inc., 2071 Stierlin Ct., Mountain View, CA, 94043, USA. .,BGI-Shenzhen, Shenzhen, 518083, China.

Collapse

211

Popitsch N, Schuh A, Taylor JC. ReliableGenome: annotation of genomic regions with high/low variant calling concordance. Bioinformatics 2016;33:155-160. [PMID: 27605105 PMCID: PMC5903559 DOI: 10.1093/bioinformatics/btw587] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2016] [Revised: 08/12/2016] [Accepted: 09/04/2016] [Indexed: 12/30/2022] Open

Abstract

Motivation

The increasing adoption of clinical whole-genome resequencing (WGS) demands for highly accurate and reproducible variant calling (VC) methods. The observed discordance between state-of-the-art VC pipelines, however, indicates that the current practice still suffers from non-negligible numbers of false positive and negative SNV and INDEL calls that were shown to be enriched among discordant calls but also in genomic regions with low sequence complexity.

Results

Here, we describe our method ReliableGenome (RG) for partitioning genomes into high and low concordance regions with respect to a set of surveyed VC pipelines. Our method combines call sets derived by multiple pipelines from arbitrary numbers of datasets and interpolates expected concordance for genomic regions without data. By applying RG to 219 deep human WGS datasets, we demonstrate that VC concordance depends predominantly on genomic context rather than the actual sequencing data which manifests in high recurrence of regions that can/cannot be reliably genotyped by a single method. This enables the application of pre-computed regions to other data created with comparable sequencing technology and software. RG outperforms comparable efforts in predicting VC concordance and false positive calls in low-concordance regions which underlines its usefulness for variant filtering, annotation and prioritization. RG allows focusing resource-intensive algorithms (e.g. consensus calling methods) on the smaller, discordant share of the genome (20–30%) which might result in increased overall accuracy at reasonable costs. Our method and analysis of discordant calls may further be useful for development, benchmarking and optimization of VC algorithms and for the relative comparison of call sets between different studies/pipelines.

Availability and Implementation

RG was implemented in Java, source code and binaries are freely available for non-commercial use at https://github.com/popitsch/wtchg-rg/.

Supplementary information

Supplementary data are available at Bioinformatics online.

Collapse