1
|
Usoltsev D, Kolosov N, Rotar O, Loboda A, Boyarinova M, Moguchaya E, Kolesova E, Erina A, Tolkunova K, Rezapova V, Molotkov I, Melnik O, Freylikhman O, Paskar N, Alieva A, Baranova E, Bazhenova E, Beliaeva O, Vasilyeva E, Kibkalo S, Skitchenko R, Babenko A, Sergushichev A, Dushina A, Lopina E, Basyrova I, Libis R, Duplyakov D, Cherepanova N, Donner K, Laiho P, Kostareva A, Konradi A, Shlyakhto E, Palotie A, Daly MJ, Artomov M. Complex trait susceptibilities and population diversity in a sample of 4,145 Russians. Nat Commun 2024; 15:6212. [PMID: 39043636 PMCID: PMC11266540 DOI: 10.1038/s41467-024-50304-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Accepted: 07/02/2024] [Indexed: 07/25/2024] Open
Abstract
The population of Russia consists of more than 150 local ethnicities. The ethnic diversity and geographic origins, which extend from eastern Europe to Asia, make the population uniquely positioned to investigate the shared properties of inherited disease risks between European and Asian ancestries. We present the analysis of genetic and phenotypic data from a cohort of 4,145 individuals collected in three metro areas in western Russia. We show the presence of multiple admixed genetic ancestry clusters spanning from primarily European to Asian and high identity-by-descent sharing with the Finnish population. As a result, there was notable enrichment of Finnish-specific variants in Russia. We illustrate the utility of Russian-descent cohorts for discovery of novel population-specific genetic associations, as well as replication of previously identified associations that were thought to be population-specific in other cohorts. Finally, we provide access to a database of allele frequencies and GWAS results for 464 phenotypes.
Collapse
Affiliation(s)
- Dmitrii Usoltsev
- Almazov National Medical Research Centre, St Petersburg, Russia
- ITMO University, St Petersburg, Russia
- Broad Institute, Cambridge, MA, USA
- The Institute for Genomic Medicine, Nationwide Children's Hospital, Columbus, OH, USA
- Department of Pediatrics, The Ohio State University College of Medicine, Columbus, OH, USA
| | - Nikita Kolosov
- Almazov National Medical Research Centre, St Petersburg, Russia
- ITMO University, St Petersburg, Russia
- Broad Institute, Cambridge, MA, USA
- The Institute for Genomic Medicine, Nationwide Children's Hospital, Columbus, OH, USA
- Department of Pediatrics, The Ohio State University College of Medicine, Columbus, OH, USA
| | - Oxana Rotar
- Almazov National Medical Research Centre, St Petersburg, Russia
| | - Alexander Loboda
- Almazov National Medical Research Centre, St Petersburg, Russia
- ITMO University, St Petersburg, Russia
- Broad Institute, Cambridge, MA, USA
| | | | | | | | - Anastasia Erina
- Almazov National Medical Research Centre, St Petersburg, Russia
| | | | - Valeriia Rezapova
- Almazov National Medical Research Centre, St Petersburg, Russia
- ITMO University, St Petersburg, Russia
- Broad Institute, Cambridge, MA, USA
| | - Ivan Molotkov
- The Institute for Genomic Medicine, Nationwide Children's Hospital, Columbus, OH, USA
- Department of Pediatrics, The Ohio State University College of Medicine, Columbus, OH, USA
| | - Olesya Melnik
- Almazov National Medical Research Centre, St Petersburg, Russia
| | | | - Nadezhda Paskar
- Almazov National Medical Research Centre, St Petersburg, Russia
| | - Asiiat Alieva
- Almazov National Medical Research Centre, St Petersburg, Russia
| | - Elena Baranova
- Almazov National Medical Research Centre, St Petersburg, Russia
| | - Elena Bazhenova
- Almazov National Medical Research Centre, St Petersburg, Russia
| | - Olga Beliaeva
- Almazov National Medical Research Centre, St Petersburg, Russia
| | - Elena Vasilyeva
- Almazov National Medical Research Centre, St Petersburg, Russia
| | - Sofia Kibkalo
- Almazov National Medical Research Centre, St Petersburg, Russia
| | | | - Alina Babenko
- Almazov National Medical Research Centre, St Petersburg, Russia
| | | | | | | | | | - Roman Libis
- Orenburg State Medical University, Orenburg, Russia
| | - Dmitrii Duplyakov
- Samara State Medical University, Samara, Russia
- Samara Regional Cardiology Dispensary, Samara, Russia
| | | | - Kati Donner
- Institute for Molecular Medicine Finland (FIMM), Helsinki, Finland
| | - Paivi Laiho
- Finnish Institute for Health and Welfare (THL), Helsinki, Finland
| | - Anna Kostareva
- Almazov National Medical Research Centre, St Petersburg, Russia
- ITMO University, St Petersburg, Russia
| | - Alexandra Konradi
- Almazov National Medical Research Centre, St Petersburg, Russia
- ITMO University, St Petersburg, Russia
| | | | - Aarno Palotie
- Broad Institute, Cambridge, MA, USA
- Institute for Molecular Medicine Finland (FIMM), Helsinki, Finland
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Mark J Daly
- Broad Institute, Cambridge, MA, USA
- Institute for Molecular Medicine Finland (FIMM), Helsinki, Finland
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Mykyta Artomov
- Almazov National Medical Research Centre, St Petersburg, Russia.
- ITMO University, St Petersburg, Russia.
- Broad Institute, Cambridge, MA, USA.
- The Institute for Genomic Medicine, Nationwide Children's Hospital, Columbus, OH, USA.
- Department of Pediatrics, The Ohio State University College of Medicine, Columbus, OH, USA.
- Institute for Molecular Medicine Finland (FIMM), Helsinki, Finland.
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA.
| |
Collapse
|
2
|
Guguchkin E, Kasianov A, Belenikin M, Zobkova G, Kosova E, Makeev V, Karpulevich E. Enhancing SNV identification in whole-genome sequencing data through the incorporation of known genetic variants into the minimap2 index. BMC Bioinformatics 2024; 25:238. [PMID: 39003441 PMCID: PMC11246581 DOI: 10.1186/s12859-024-05862-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Accepted: 07/10/2024] [Indexed: 07/15/2024] Open
Abstract
MOTIVATION Alignment of reads to a reference genome sequence is one of the key steps in the analysis of human whole-genome sequencing data obtained through Next-generation sequencing (NGS) technologies. The quality of the subsequent steps of the analysis, such as the results of clinical interpretation of genetic variants or the results of a genome-wide association study, depends on the correct identification of the position of the read as a result of its alignment. The amount of human NGS whole-genome sequencing data is constantly growing. There are a number of human genome sequencing projects worldwide that have resulted in the creation of large-scale databases of genetic variants of sequenced human genomes. Such information about known genetic variants can be used to improve the quality of alignment at the read alignment stage when analysing sequencing data obtained for a new individual, for example, by creating a genomic graph. While existing methods for aligning reads to a linear reference genome have high alignment speed, methods for aligning reads to a genomic graph have greater accuracy in variable regions of the genome. The development of a read alignment method that takes into account known genetic variants in the linear reference sequence index allows combining the advantages of both sets of methods. RESULTS In this paper, we present the minimap2_index_modifier tool, which enables the construction of a modified index of a reference genome using known single nucleotide variants and insertions/deletions (indels) specific to a given human population. The use of the modified minimap2 index improves variant calling quality without modifying the bioinformatics pipeline and without significant additional computational overhead. Using the PrecisionFDA Truth Challenge V2 benchmark data (for HG002 short-read data aligned to the GRCh38 linear reference (GCA_000001405.15) with parameters k = 27 and w = 14) it was demonstrated that the number of false negative genetic variants decreased by more than 9500, and the number of false positives decreased by more than 7000 when modifying the index with genetic variants from the Human Pangenome Reference Consortium.
Collapse
Affiliation(s)
- Egor Guguchkin
- Ivannikov Institute for System Programming, Moscow, Russia.
| | - Artem Kasianov
- Institute for Information Transmission Problems, Moscow, Russia
| | | | | | | | - Vsevolod Makeev
- Vavilov Institute of General Genetics, Moscow, Russia
- Institute of Biochemistry and Genetics of Ufa Scientific Centre, Ufa, Russia
- Cancer Research UK National Biomarker Centre, University of Manchester, Manchester, Manchester, M20 4BX, UK
| | | |
Collapse
|