1
|
Maretty L, Jensen JM, Petersen B, Sibbesen JA, Liu S, Villesen P, Skov L, Belling K, Theil Have C, Izarzugaza JMG, Grosjean M, Bork-Jensen J, Grove J, Als TD, Huang S, Chang Y, Xu R, Ye W, Rao J, Guo X, Sun J, Cao H, Ye C, van Beusekom J, Espeseth T, Flindt E, Friborg RM, Halager AE, Le Hellard S, Hultman CM, Lescai F, Li S, Lund O, Løngren P, Mailund T, Matey-Hernandez ML, Mors O, Pedersen CNS, Sicheritz-Pontén T, Sullivan P, Syed A, Westergaard D, Yadav R, Li N, Xu X, Hansen T, Krogh A, Bolund L, Sørensen TIA, Pedersen O, Gupta R, Rasmussen S, Besenbacher S, Børglum AD, Wang J, Eiberg H, Kristiansen K, Brunak S, Schierup MH. Sequencing and de novo assembly of 150 genomes from Denmark as a population reference. Nature 2017; 548:87-91. [PMID: 28746312 DOI: 10.1038/nature23264] [Citation(s) in RCA: 87] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2016] [Accepted: 06/04/2017] [Indexed: 12/17/2022]
Abstract
Hundreds of thousands of human genomes are now being sequenced to characterize genetic variation and use this information to augment association mapping studies of complex disorders and other phenotypic traits. Genetic variation is identified mainly by mapping short reads to the reference genome or by performing local assembly. However, these approaches are biased against discovery of structural variants and variation in the more complex parts of the genome. Hence, large-scale de novo assembly is needed. Here we show that it is possible to construct excellent de novo assemblies from high-coverage sequencing with mate-pair libraries extending up to 20 kilobases. We report de novo assemblies of 150 individuals (50 trios) from the GenomeDenmark project. The quality of these assemblies is similar to those obtained using the more expensive long-read technology. We use the assemblies to identify a rich set of structural variants including many novel insertions and demonstrate how this variant catalogue enables further deciphering of known association mapping signals. We leverage the assemblies to provide 100 completely resolved major histocompatibility complex haplotypes and to resolve major parts of the Y chromosome. Our study provides a regional reference genome that we expect will improve the power of future association mapping studies and hence pave the way for precision medicine initiatives, which now are being launched in many countries including Denmark.
Collapse
Affiliation(s)
- Lasse Maretty
- Bioinformatics Centre, Department of Biology, University of Copenhagen, 2200 Copenhagen, Denmark
| | - Jacob Malte Jensen
- Bioinformatics Research Centre, Aarhus University, 8000 Aarhus, Denmark.,iSEQ, Centre for Integrative Sequencing, Aarhus University, 8000 Aarhus, Denmark
| | - Bent Petersen
- DTU Bioinformatics, Department of Bio and Health Informatics, Technical University of Denmark, Kemitorvet, 2800 Kongens Lyngby, Denmark
| | - Jonas Andreas Sibbesen
- Bioinformatics Centre, Department of Biology, University of Copenhagen, 2200 Copenhagen, Denmark
| | - Siyang Liu
- Bioinformatics Centre, Department of Biology, University of Copenhagen, 2200 Copenhagen, Denmark.,BGI-Europe, Ole Maaløes Vej 3, 2200 Copenhagen, Denmark
| | - Palle Villesen
- Bioinformatics Research Centre, Aarhus University, 8000 Aarhus, Denmark.,iSEQ, Centre for Integrative Sequencing, Aarhus University, 8000 Aarhus, Denmark.,Department of Clinical Medicine, Aarhus University, 8000 Aarhus, Denmark
| | - Laurits Skov
- Bioinformatics Research Centre, Aarhus University, 8000 Aarhus, Denmark.,iSEQ, Centre for Integrative Sequencing, Aarhus University, 8000 Aarhus, Denmark
| | - Kirstine Belling
- DTU Bioinformatics, Department of Bio and Health Informatics, Technical University of Denmark, Kemitorvet, 2800 Kongens Lyngby, Denmark
| | - Christian Theil Have
- Novo Nordisk Foundation Center for Basic Metabolic Research, Section of Metabolic Genetics, University of Copenhagen, 2100 Copenhagen, Denmark
| | - Jose M G Izarzugaza
- DTU Bioinformatics, Department of Bio and Health Informatics, Technical University of Denmark, Kemitorvet, 2800 Kongens Lyngby, Denmark
| | - Marie Grosjean
- DTU Bioinformatics, Department of Bio and Health Informatics, Technical University of Denmark, Kemitorvet, 2800 Kongens Lyngby, Denmark
| | - Jette Bork-Jensen
- Novo Nordisk Foundation Center for Basic Metabolic Research, Section of Metabolic Genetics, University of Copenhagen, 2100 Copenhagen, Denmark
| | - Jakob Grove
- iSEQ, Centre for Integrative Sequencing, Aarhus University, 8000 Aarhus, Denmark.,Department of Biomedicine, Aarhus University, 8000 Aarhus, Denmark.,The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, 8000 Aarhus, Denmark
| | - Thomas D Als
- iSEQ, Centre for Integrative Sequencing, Aarhus University, 8000 Aarhus, Denmark.,Department of Biomedicine, Aarhus University, 8000 Aarhus, Denmark.,The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, 8000 Aarhus, Denmark
| | - Shujia Huang
- BGI-Shenzhen, Shenzhen 518083, China.,School of Bioscience and Biotechnology, South China University of Technology, Guangzhou 510006, China
| | | | - Ruiqi Xu
- BGI-Europe, Ole Maaløes Vej 3, 2200 Copenhagen, Denmark
| | - Weijian Ye
- BGI-Europe, Ole Maaløes Vej 3, 2200 Copenhagen, Denmark
| | - Junhua Rao
- BGI-Europe, Ole Maaløes Vej 3, 2200 Copenhagen, Denmark
| | - Xiaosen Guo
- BGI-Shenzhen, Shenzhen 518083, China.,Laboratory of Genomics and Molecular Biomedicine, Department of Biology, University of Copenhagen, 2100 Copenhagen, Denmark
| | - Jihua Sun
- BGI-Europe, Ole Maaløes Vej 3, 2200 Copenhagen, Denmark.,Novo Nordisk Foundation Center for Basic Metabolic Research, Section of Metabolic Genetics, University of Copenhagen, 2100 Copenhagen, Denmark
| | | | - Chen Ye
- BGI-Shenzhen, Shenzhen 518083, China
| | - Johan van Beusekom
- DTU Bioinformatics, Department of Bio and Health Informatics, Technical University of Denmark, Kemitorvet, 2800 Kongens Lyngby, Denmark
| | - Thomas Espeseth
- Department of Psychology, University of Oslo, 0317 Oslo, Norway.,NORMENT, KG Jebsen Centre for Psychosis Research, Department of Clinical Science, University of Bergen, Bergen 5021, Norway
| | - Esben Flindt
- Laboratory of Genomics and Molecular Biomedicine, Department of Biology, University of Copenhagen, 2100 Copenhagen, Denmark
| | - Rune M Friborg
- Bioinformatics Research Centre, Aarhus University, 8000 Aarhus, Denmark.,iSEQ, Centre for Integrative Sequencing, Aarhus University, 8000 Aarhus, Denmark
| | - Anders E Halager
- Bioinformatics Research Centre, Aarhus University, 8000 Aarhus, Denmark.,iSEQ, Centre for Integrative Sequencing, Aarhus University, 8000 Aarhus, Denmark
| | - Stephanie Le Hellard
- NORMENT, KG Jebsen Centre for Psychosis Research, Department of Clinical Science, University of Bergen, Bergen 5021, Norway.,Dr E. Martens Research Group of Biological Psychiatry, Center for Medical Genetics and Molecular Medicine, Haukeland University Hospital, Bergen 5021, Norway
| | - Christina M Hultman
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm 17177, Sweden
| | - Francesco Lescai
- iSEQ, Centre for Integrative Sequencing, Aarhus University, 8000 Aarhus, Denmark.,Department of Biomedicine, Aarhus University, 8000 Aarhus, Denmark.,The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, 8000 Aarhus, Denmark
| | - Shengting Li
- iSEQ, Centre for Integrative Sequencing, Aarhus University, 8000 Aarhus, Denmark.,Department of Biomedicine, Aarhus University, 8000 Aarhus, Denmark.,The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, 8000 Aarhus, Denmark
| | - Ole Lund
- DTU Bioinformatics, Department of Bio and Health Informatics, Technical University of Denmark, Kemitorvet, 2800 Kongens Lyngby, Denmark
| | - Peter Løngren
- DTU Bioinformatics, Department of Bio and Health Informatics, Technical University of Denmark, Kemitorvet, 2800 Kongens Lyngby, Denmark
| | - Thomas Mailund
- Bioinformatics Research Centre, Aarhus University, 8000 Aarhus, Denmark.,iSEQ, Centre for Integrative Sequencing, Aarhus University, 8000 Aarhus, Denmark
| | - Maria Luisa Matey-Hernandez
- DTU Bioinformatics, Department of Bio and Health Informatics, Technical University of Denmark, Kemitorvet, 2800 Kongens Lyngby, Denmark
| | - Ole Mors
- iSEQ, Centre for Integrative Sequencing, Aarhus University, 8000 Aarhus, Denmark.,Department of Clinical Medicine, Aarhus University, 8000 Aarhus, Denmark.,The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, 8000 Aarhus, Denmark
| | - Christian N S Pedersen
- Bioinformatics Research Centre, Aarhus University, 8000 Aarhus, Denmark.,iSEQ, Centre for Integrative Sequencing, Aarhus University, 8000 Aarhus, Denmark
| | - Thomas Sicheritz-Pontén
- DTU Bioinformatics, Department of Bio and Health Informatics, Technical University of Denmark, Kemitorvet, 2800 Kongens Lyngby, Denmark
| | - Patrick Sullivan
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm 17177, Sweden.,Department of Genetics, University of North Carolina, Chapel Hill, North Carolina 27599-7264, USA
| | - Ali Syed
- DTU Bioinformatics, Department of Bio and Health Informatics, Technical University of Denmark, Kemitorvet, 2800 Kongens Lyngby, Denmark
| | - David Westergaard
- DTU Bioinformatics, Department of Bio and Health Informatics, Technical University of Denmark, Kemitorvet, 2800 Kongens Lyngby, Denmark
| | - Rachita Yadav
- DTU Bioinformatics, Department of Bio and Health Informatics, Technical University of Denmark, Kemitorvet, 2800 Kongens Lyngby, Denmark
| | - Ning Li
- BGI-Europe, Ole Maaløes Vej 3, 2200 Copenhagen, Denmark
| | - Xun Xu
- BGI-Shenzhen, Shenzhen 518083, China
| | - Torben Hansen
- Novo Nordisk Foundation Center for Basic Metabolic Research, Section of Metabolic Genetics, University of Copenhagen, 2100 Copenhagen, Denmark
| | - Anders Krogh
- Bioinformatics Centre, Department of Biology, University of Copenhagen, 2200 Copenhagen, Denmark
| | - Lars Bolund
- Department of Biomedicine, Aarhus University, 8000 Aarhus, Denmark.,BGI-Shenzhen, Shenzhen 518083, China
| | - Thorkild I A Sørensen
- Novo Nordisk Foundation Center for Basic Metabolic Research, Section of Metabolic Genetics, University of Copenhagen, 2100 Copenhagen, Denmark.,Department of Clinical Epidemiology, Bispebjerg and Frederiksberg Hospital, The Capital Region, Copenhagen, 2000 Frederiksberg, Denmark.,Department of Public Health, Faculty of Health and Medical Sciences, University of Copenhagen, 2200 Copenhagen, Denmark
| | - Oluf Pedersen
- Novo Nordisk Foundation Center for Basic Metabolic Research, Section of Metabolic Genetics, University of Copenhagen, 2100 Copenhagen, Denmark
| | - Ramneek Gupta
- DTU Bioinformatics, Department of Bio and Health Informatics, Technical University of Denmark, Kemitorvet, 2800 Kongens Lyngby, Denmark
| | - Simon Rasmussen
- DTU Bioinformatics, Department of Bio and Health Informatics, Technical University of Denmark, Kemitorvet, 2800 Kongens Lyngby, Denmark
| | - Søren Besenbacher
- Bioinformatics Research Centre, Aarhus University, 8000 Aarhus, Denmark.,Department of Clinical Medicine, Aarhus University, 8000 Aarhus, Denmark
| | - Anders D Børglum
- iSEQ, Centre for Integrative Sequencing, Aarhus University, 8000 Aarhus, Denmark.,Department of Biomedicine, Aarhus University, 8000 Aarhus, Denmark.,The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, 8000 Aarhus, Denmark
| | - Jun Wang
- iSEQ, Centre for Integrative Sequencing, Aarhus University, 8000 Aarhus, Denmark.,BGI-Shenzhen, Shenzhen 518083, China.,Laboratory of Genomics and Molecular Biomedicine, Department of Biology, University of Copenhagen, 2100 Copenhagen, Denmark
| | - Hans Eiberg
- Department of Cellular and Molecular Medicine, University of Copenhagen, 2200 Copenhagen, Denmark
| | - Karsten Kristiansen
- BGI-Shenzhen, Shenzhen 518083, China.,Laboratory of Genomics and Molecular Biomedicine, Department of Biology, University of Copenhagen, 2100 Copenhagen, Denmark
| | - Søren Brunak
- DTU Bioinformatics, Department of Bio and Health Informatics, Technical University of Denmark, Kemitorvet, 2800 Kongens Lyngby, Denmark.,Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, 2200 Copenhagen, Denmark
| | - Mikkel Heide Schierup
- Bioinformatics Research Centre, Aarhus University, 8000 Aarhus, Denmark.,iSEQ, Centre for Integrative Sequencing, Aarhus University, 8000 Aarhus, Denmark.,Department of Bioscience, Aarhus University, 8000 Aarhus, Denmark
| |
Collapse
|