1
|
Barbitoff YA, Ushakov MO, Lazareva TE, Nasykhova YA, Glotov AS, Predeus AV. Bioinformatics of germline variant discovery for rare disease diagnostics: current approaches and remaining challenges. Brief Bioinform 2024; 25:bbad508. [PMID: 38271481 PMCID: PMC10810331 DOI: 10.1093/bib/bbad508] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Revised: 11/18/2023] [Accepted: 12/12/2023] [Indexed: 01/27/2024] Open
Abstract
Next-generation sequencing (NGS) has revolutionized the field of rare disease diagnostics. Whole exome and whole genome sequencing are now routinely used for diagnostic purposes; however, the overall diagnosis rate remains lower than expected. In this work, we review current approaches used for calling and interpretation of germline genetic variants in the human genome, and discuss the most important challenges that persist in the bioinformatic analysis of NGS data in medical genetics. We describe and attempt to quantitatively assess the remaining problems, such as the quality of the reference genome sequence, reproducible coverage biases, or variant calling accuracy in complex regions of the genome. We also discuss the prospects of switching to the complete human genome assembly or the human pan-genome and important caveats associated with such a switch. We touch on arguably the hardest problem of NGS data analysis for medical genomics, namely, the annotation of genetic variants and their subsequent interpretation. We highlight the most challenging aspects of annotation and prioritization of both coding and non-coding variants. Finally, we demonstrate the persistent prevalence of pathogenic variants in the coding genome, and outline research directions that may enhance the efficiency of NGS-based disease diagnostics.
Collapse
Affiliation(s)
- Yury A Barbitoff
- Dpt. of Genomic Medicine, D.O. Ott Research Institute of Obstetrics, Gynaecology, and Reproductology, Mendeleevskaya line 3, 199034, St. Petersburg, Russia
- Bioinformatics Institute, Kentemirovskaya st. 2A, 197342, St. Petersburg, Russia
| | - Mikhail O Ushakov
- Dpt. of Genomic Medicine, D.O. Ott Research Institute of Obstetrics, Gynaecology, and Reproductology, Mendeleevskaya line 3, 199034, St. Petersburg, Russia
| | - Tatyana E Lazareva
- Dpt. of Genomic Medicine, D.O. Ott Research Institute of Obstetrics, Gynaecology, and Reproductology, Mendeleevskaya line 3, 199034, St. Petersburg, Russia
| | - Yulia A Nasykhova
- Dpt. of Genomic Medicine, D.O. Ott Research Institute of Obstetrics, Gynaecology, and Reproductology, Mendeleevskaya line 3, 199034, St. Petersburg, Russia
| | - Andrey S Glotov
- Dpt. of Genomic Medicine, D.O. Ott Research Institute of Obstetrics, Gynaecology, and Reproductology, Mendeleevskaya line 3, 199034, St. Petersburg, Russia
| | - Alexander V Predeus
- Bioinformatics Institute, Kentemirovskaya st. 2A, 197342, St. Petersburg, Russia
| |
Collapse
|
2
|
Sopic M, Vilne B, Gerdts E, Trindade F, Uchida S, Khatib S, Wettinger SB, Devaux Y, Magni P. Multiomics tools for improved atherosclerotic cardiovascular disease management. Trends Mol Med 2023; 29:983-995. [PMID: 37806854 DOI: 10.1016/j.molmed.2023.09.004] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Revised: 09/20/2023] [Accepted: 09/21/2023] [Indexed: 10/10/2023]
Abstract
Multiomics studies offer accurate preventive and therapeutic strategies for atherosclerotic cardiovascular disease (ASCVD) beyond traditional risk factors. By using artificial intelligence (AI) and machine learning (ML) approaches, it is possible to integrate multiple 'omics and clinical data sets into tools that can be utilized for the development of personalized diagnostic and therapeutic approaches. However, currently multiple challenges in data quality, integration, and privacy still need to be addressed. In this opinion, we emphasize that joined efforts, exemplified by the AtheroNET COST Action, have a pivotal role in overcoming the challenges to advance multiomics approaches in ASCVD research, with the aim to foster more precise and effective patient care.
Collapse
Affiliation(s)
- Miron Sopic
- Cardiovascular Research Unit, Department of Precision Health, 1A-B rue Edison, Luxembourg Institute of Health, L-1445 Strassen, Luxembourg; Department of Medical Biochemistry, Faculty of Pharmacy, University of Belgrade, Belgrade, 11000, Serbia
| | - Baiba Vilne
- Bioinformatics Laboratory, Rīga Stradiņš University, Rīga, LV-1007, Latvia
| | - Eva Gerdts
- Center for Research on Cardiac Disease in Women, Department of Clinical Science, University of Bergen, Bergen, 5020, Norway
| | - Fábio Trindade
- Cardiovascular R&D Centre - UnIC@RISE, Department of Surgery and Physiology, Faculty of Medicine of the University of Porto, Porto, 4099-002, Portugal
| | - Shizuka Uchida
- Center for RNA Medicine, Department of Clinical Medicine, Aalborg University, Copenhagen, SV, DK-2450, Denmark
| | - Soliman Khatib
- Natural Compounds and Analytical Chemistry Laboratory, MIGAL-Galilee Research Institute, Kiryat Shemona, 11016, Israel; Department of Biotechnology, Tel-Hai College, Upper Galilee 12210, Israel
| | - Stephanie Bezzina Wettinger
- Department of Applied Biomedical Science, Faculty of Health Sciences, University of Malta, Msida, 2080, Malta
| | - Yvan Devaux
- Cardiovascular Research Unit, Department of Precision Health, 1A-B rue Edison, Luxembourg Institute of Health, L-1445 Strassen, Luxembourg.
| | - Paolo Magni
- Department of Pharmacological and Biomolecular Sciences 'Rodolfo Paoletti', Università degli Studi di Milano, Via G. Balzaretti 9, 20133 Milano, Italy; IRCCS MultiMedica, Via Milanese 300, 20099 Sesto S. Giovanni, Milan, Italy.
| |
Collapse
|
3
|
Glotov OS, Chernov AN, Glotov AS. Human Exome Sequencing and Prospects for Predictive Medicine: Analysis of International Data and Own Experience. J Pers Med 2023; 13:1236. [PMID: 37623486 PMCID: PMC10455459 DOI: 10.3390/jpm13081236] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Revised: 07/25/2023] [Accepted: 07/26/2023] [Indexed: 08/26/2023] Open
Abstract
Today, whole-exome sequencing (WES) is used to conduct the massive screening of structural and regulatory genes in order to identify the allele frequencies of disease-associated polymorphisms in various populations and thus detect pathogenic genetic changes (mutations or polymorphisms) conducive to malfunctional protein sequences. With its extensive capabilities, exome sequencing today allows both the diagnosis of monogenic diseases (MDs) and the examination of seemingly healthy populations to reveal a wide range of potential risks prior to disease manifestation (in the future, exome sequencing may outpace costly and less informative genome sequencing to become the first-line examination technique). This review establishes the human genetic passport as a new WES-based clinical concept for the identification of new candidate genes, gene variants, and molecular mechanisms in the diagnosis, prediction, and treatment of monogenic, oligogenic, and multifactorial diseases. Various diseases are addressed to demonstrate the extensive potential of WES and consider its advantages as well as disadvantages. Thus, WES can become a general test with a broad spectrum pf applications, including opportunistic screening.
Collapse
Affiliation(s)
- Oleg S. Glotov
- Department of Genomic Medicine, D. O. Ott Research Institute of Obstetrics, Gynecology and Reproductology, 199034 St. Petersburg, Russia;
- Department of Experimental Medical Virology, Molecular Genetics and Biobanking of Pediatric Research and Clinical Center for Infectious Diseases, 197022 St. Petersburg, Russia
| | - Alexander N. Chernov
- Department of Genomic Medicine, D. O. Ott Research Institute of Obstetrics, Gynecology and Reproductology, 199034 St. Petersburg, Russia;
- Department of General Pathology and Pathological Physiology, Institute of Experimental Medicine, 197376 St. Petersburg, Russia
| | - Andrey S. Glotov
- Department of Genomic Medicine, D. O. Ott Research Institute of Obstetrics, Gynecology and Reproductology, 199034 St. Petersburg, Russia;
| |
Collapse
|
4
|
Robinson M, Joshi A, Vidyarthi A, Maccoun M, Rangavajjhala S, Glusman G. Quality control of large genome datasets. HGG ADVANCES 2022; 3:100123. [PMID: 35789587 PMCID: PMC9250042 DOI: 10.1016/j.xhgg.2022.100123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2021] [Accepted: 06/02/2022] [Indexed: 11/26/2022] Open
Abstract
The 1000 Genomes Project (TGP) is a foundational resource that serves the biomedical community as a standard reference cohort for human genetic variation. There are now seven public versions of these genomes. The TGP Consortium produced the first by mapping its final data release against human reference sequence GRCh37, then "lifted over" these genomes to the improved reference sequence (GRCh38) when it was released, and remapped the original data to GRCh38 with two similar pipelines. As best-practice quality validation, the pipelines that generated these versions were benchmarked against the Genome In A Bottle Consortium's "platinum quality" genome (NA12878). The New York Genome Center recently released the results of independently resequencing the cohort at greater depth (30×), a phased version informed by the inclusion of related individuals, and independently remapped the original variant calls to GRCh38. We performed a cross-comparison evaluation of all seven versions using genome fingerprinting, which supports ultrafast genome comparison even across reference versions. We noted multiple issues, including discrepancies in cohort membership, disagreement on the overall level of variation, evidence of substandard pipeline performance on specific genomes and in specific regions of the genome, cryptic relationships between individuals, inconsistent phasing, and annotation distortions caused by the history of the reference genome itself. We therefore recommend global quality assessment by rapid genome comparisons, alongside benchmarking as part of best-practice quality assessment of large genome datasets. Our observations also help inform the decision of which version to use, to support analyses by individual researchers.
Collapse
Affiliation(s)
- Max Robinson
- Institute for Systems Biology, 401 Terry Avenue N, Seattle, WA 98109, USA
| | - Arpita Joshi
- Institute for Systems Biology, 401 Terry Avenue N, Seattle, WA 98109, USA
| | - Ansh Vidyarthi
- Institute for Systems Biology, 401 Terry Avenue N, Seattle, WA 98109, USA
| | - Mary Maccoun
- Institute for Systems Biology, 401 Terry Avenue N, Seattle, WA 98109, USA
| | | | - Gustavo Glusman
- Institute for Systems Biology, 401 Terry Avenue N, Seattle, WA 98109, USA
| |
Collapse
|
5
|
Kaminow B, Ballouz S, Gillis J, Dobin A. Pan-human consensus genome significantly improves the accuracy of RNA-seq analyses. Genome Res 2022; 32:738-749. [PMID: 35256454 PMCID: PMC8997357 DOI: 10.1101/gr.275613.121] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2021] [Accepted: 03/02/2022] [Indexed: 11/25/2022]
Abstract
The Human Reference Genome serves as the foundation for modern genomic analyses. However, in its present form, it does not adequately represent the vast genetic diversity of the human population. In this study, we explored the consensus genome as a potential successor of the current reference genome and assessed its effect on the accuracy of RNA-seq read alignment. In order to find the best haploid genome representation, we constructed consensus genomes at the pan-human, super-population, and population levels, utilizing variant information from the 1000 Genomes Project. Using personal haploid genomes as the ground truth, we compared mapping errors for real RNA-seq reads aligned to the consensus genomes versus the reference genome. For reads overlapping homozygous variants, we found that the mapping error decreased by a factor of ~2-3 when the reference was replaced with the pan-human consensus genome. We also found that using more population-specific consensuses resulted in little to no increase overusing the pan-human consensus, suggesting a limit in the utility of incorporating more specific genomic variation. Replacing reference with consensus genomes impacts functional analyses, such as differential expressions of isoforms, genes, and splice junctions.
Collapse
Affiliation(s)
- Benjamin Kaminow
- Cold Spring Harbor Laboratory; Weill Cornell Graduate School of Medical Sciences
| | - Sara Ballouz
- Garvan-Weizmann Centre for Cellular Genomics, Garvan Institute of Medical Research; School of Medical Sciences, University of New South Wales; Cold Spring Harbor Laboratory
| | | | | |
Collapse
|
6
|
Barbitoff YA, Abasov R, Tvorogova VE, Glotov AS, Predeus AV. Systematic benchmark of state-of-the-art variant calling pipelines identifies major factors affecting accuracy of coding sequence variant discovery. BMC Genomics 2022; 23:155. [PMID: 35193511 PMCID: PMC8862519 DOI: 10.1186/s12864-022-08365-3] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2021] [Accepted: 02/03/2022] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND Accurate variant detection in the coding regions of the human genome is a key requirement for molecular diagnostics of Mendelian disorders. Efficiency of variant discovery from next-generation sequencing (NGS) data depends on multiple factors, including reproducible coverage biases of NGS methods and the performance of read alignment and variant calling software. Although variant caller benchmarks are published constantly, no previous publications have leveraged the full extent of available gold standard whole-genome (WGS) and whole-exome (WES) sequencing datasets. RESULTS In this work, we systematically evaluated the performance of 4 popular short read aligners (Bowtie2, BWA, Isaac, and Novoalign) and 9 novel and well-established variant calling and filtering methods (Clair3, DeepVariant, Octopus, GATK, FreeBayes, and Strelka2) using a set of 14 "gold standard" WES and WGS datasets available from Genome In A Bottle (GIAB) consortium. Additionally, we have indirectly evaluated each pipeline's performance using a set of 6 non-GIAB samples of African and Russian ethnicity. In our benchmark, Bowtie2 performed significantly worse than other aligners, suggesting it should not be used for medical variant calling. When other aligners were considered, the accuracy of variant discovery mostly depended on the variant caller and not the read aligner. Among the tested variant callers, DeepVariant consistently showed the best performance and the highest robustness. Other actively developed tools, such as Clair3, Octopus, and Strelka2, also performed well, although their efficiency had greater dependence on the quality and type of the input data. We have also compared the consistency of variant calls in GIAB and non-GIAB samples. With few important caveats, best-performing tools have shown little evidence of overfitting. CONCLUSIONS The results show surprisingly large differences in the performance of cutting-edge tools even in high confidence regions of the coding genome. This highlights the importance of regular benchmarking of quickly evolving tools and pipelines. We also discuss the need for a more diverse set of gold standard genomes that would include samples of African, Hispanic, or mixed ancestry. Additionally, there is also a need for better variant caller assessment in the repetitive regions of the coding genome.
Collapse
Affiliation(s)
- Yury A Barbitoff
- Bioinformatics Institute, St. Petersburg, Russia.
- Department of Genomic Medicine, D.O. Ott Research Institute of Obstetrics, Gynaecology and Reproductology, St. Petersburg, Russia.
- Department of Genetics and Biotechnology, St. Petersburg State University, St. Petersburg, Russia.
| | - Ruslan Abasov
- Bioinformatics Institute, St. Petersburg, Russia
- Dmitry Rogachev National Research Center of Pediatric Hematology-Oncology and Immunology, Moscow, Russia
| | - Varvara E Tvorogova
- Bioinformatics Institute, St. Petersburg, Russia
- Department of Genetics and Biotechnology, St. Petersburg State University, St. Petersburg, Russia
| | - Andrey S Glotov
- Department of Genomic Medicine, D.O. Ott Research Institute of Obstetrics, Gynaecology and Reproductology, St. Petersburg, Russia
| | | |
Collapse
|
7
|
Neubert K, Zuchantke E, Leidenfrost RM, Wünschiers R, Grützke J, Malorny B, Brendebach H, Al Dahouk S, Homeier T, Hotzel H, Reinert K, Tomaso H, Busch A. Testing assembly strategies of Francisella tularensis genomes to infer an evolutionary conservation analysis of genomic structures. BMC Genomics 2021; 22:822. [PMID: 34773979 PMCID: PMC8590783 DOI: 10.1186/s12864-021-08115-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2021] [Accepted: 10/12/2021] [Indexed: 02/08/2023] Open
Abstract
Background We benchmarked sequencing technology and assembly strategies for short-read, long-read, and hybrid assemblers in respect to correctness, contiguity, and completeness of assemblies in genomes of Francisella tularensis. Benchmarking allowed in-depth analyses of genomic structures of the Francisella pathogenicity islands and insertion sequences. Five major high-throughput sequencing technologies were applied, including next-generation “short-read” and third-generation “long-read” sequencing methods. Results We focused on short-read assemblers, hybrid assemblers, and analysis of the genomic structure with particular emphasis on insertion sequences and the Francisella pathogenicity island. The A5-miseq pipeline performed best for MiSeq data, Mira for Ion Torrent data, and ABySS for HiSeq data from eight short-read assembly methods. Two approaches were applied to benchmark long-read and hybrid assembly strategies: long-read-first assembly followed by correction with short reads (Canu/Pilon, Flye/Pilon) and short-read-first assembly along with scaffolding based on long reads (Unicyler, SPAdes). Hybrid assembly can resolve large repetitive regions best with a “long-read first” approach. Conclusions Genomic structures of the Francisella pathogenicity islands frequently showed misassembly. Insertion sequences (IS) could be used to perform an evolutionary conservation analysis. A phylogenetic structure of insertion sequences and the evolution within the clades elucidated the clade structure of the highly conservative F. tularensis. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-021-08115-x.
Collapse
Affiliation(s)
- Kerstin Neubert
- Department of Mathematics and Computer Science, Algorithmic Bioinformatics, Freie Universität Berlin, Institute of Computer Science, Takustr. 9, 14195, Berlin, Germany.,German Federal Institute for Risk Assessment, Diedersdorfer Weg 1, 12277, Berlin, Germany
| | - Eric Zuchantke
- Friedrich-Loeffler-Institut, Institute of Bacterial Infections and Zoonoses, Naumburger Str. 96a, 07749, Jena, Germany
| | - Robert Maximilian Leidenfrost
- Department of Biotechnology and Chemistry, Mittweida University of Applied Sciences, Technikumplatz 17a, 09648, Mittweida, Germany
| | - Röbbe Wünschiers
- Department of Biotechnology and Chemistry, Mittweida University of Applied Sciences, Technikumplatz 17a, 09648, Mittweida, Germany
| | - Josephine Grützke
- German Federal Institute for Risk Assessment, Diedersdorfer Weg 1, 12277, Berlin, Germany
| | - Burkhard Malorny
- German Federal Institute for Risk Assessment, Diedersdorfer Weg 1, 12277, Berlin, Germany
| | - Holger Brendebach
- German Federal Institute for Risk Assessment, Diedersdorfer Weg 1, 12277, Berlin, Germany
| | - Sascha Al Dahouk
- German Federal Institute for Risk Assessment, Diedersdorfer Weg 1, 12277, Berlin, Germany
| | - Timo Homeier
- Friedrich-Loeffler-Institut, Institute of Epidemiology, Südufer, 10 17493, Greifswald, Insel Riems, Germany
| | - Helmut Hotzel
- Friedrich-Loeffler-Institut, Institute of Bacterial Infections and Zoonoses, Naumburger Str. 96a, 07749, Jena, Germany
| | - Knut Reinert
- Department of Mathematics and Computer Science, Algorithmic Bioinformatics, Freie Universität Berlin, Institute of Computer Science, Takustr. 9, 14195, Berlin, Germany
| | - Herbert Tomaso
- Friedrich-Loeffler-Institut, Institute of Bacterial Infections and Zoonoses, Naumburger Str. 96a, 07749, Jena, Germany
| | - Anne Busch
- Friedrich-Loeffler-Institut, Institute of Bacterial Infections and Zoonoses, Naumburger Str. 96a, 07749, Jena, Germany. .,Department of Anaesthesiology and Intensive Care Medicine, University Hospital Jena, Jena, Germany.
| |
Collapse
|
8
|
Shumate A, Zimin AV, Sherman RM, Puiu D, Wagner JM, Olson ND, Pertea M, Salit ML, Zook JM, Salzberg SL. Assembly and annotation of an Ashkenazi human reference genome. Genome Biol 2020; 21:129. [PMID: 32487205 PMCID: PMC7265644 DOI: 10.1186/s13059-020-02047-7] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2020] [Accepted: 05/15/2020] [Indexed: 01/23/2023] Open
Abstract
BACKGROUND Thousands of experiments and studies use the human reference genome as a resource each year. This single reference genome, GRCh38, is a mosaic created from a small number of individuals, representing a very small sample of the human population. There is a need for reference genomes from multiple human populations to avoid potential biases. RESULTS Here, we describe the assembly and annotation of the genome of an Ashkenazi individual and the creation of a new, population-specific human reference genome. This genome is more contiguous and more complete than GRCh38, the latest version of the human reference genome, and is annotated with highly similar gene content. The Ashkenazi reference genome, Ash1, contains 2,973,118,650 nucleotides as compared to 2,937,639,212 in GRCh38. Annotation identified 20,157 protein-coding genes, of which 19,563 are > 99% identical to their counterparts on GRCh38. Most of the remaining genes have small differences. Forty of the protein-coding genes in GRCh38 are missing from Ash1; however, all of these genes are members of multi-gene families for which Ash1 contains other copies. Eleven genes appear on different chromosomes from their homologs in GRCh38. Alignment of DNA sequences from an unrelated Ashkenazi individual to Ash1 identified ~ 1 million fewer homozygous SNPs than alignment of those same sequences to the more-distant GRCh38 genome, illustrating one of the benefits of population-specific reference genomes. CONCLUSIONS The Ash1 genome is presented as a reference for any genetic studies involving Ashkenazi Jewish individuals.
Collapse
Affiliation(s)
- Alaina Shumate
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Aleksey V Zimin
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Rachel M Sherman
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Daniela Puiu
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Justin M Wagner
- National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Nathan D Olson
- National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Mihaela Pertea
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Marc L Salit
- Joint Initiative for Metrology in Biology, Stanford University, Stanford, CA, USA
| | - Justin M Zook
- National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Steven L Salzberg
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA.
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA.
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.
- Department of Biostatistics, Johns Hopkins University, Baltimore, MD, USA.
| |
Collapse
|
9
|
Balashova MS, Tuluzanovskaya IG, Glotov OS, Glotov AS, Barbitoff YA, Fedyakov MA, Alaverdian DA, Ivashchenko TE, Romanova OV, Sarana AM, Scherbak SG, Baranov VS, Filimonov MI, Skalny AV, Zhuchenko NA, Ignatova TM, Asanov AY. The spectrum of pathogenic variants of the ATP7B gene in Wilson disease in the Russian Federation. J Trace Elem Med Biol 2020; 59:126420. [PMID: 31708252 DOI: 10.1016/j.jtemb.2019.126420] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/09/2019] [Revised: 10/21/2019] [Accepted: 10/22/2019] [Indexed: 02/06/2023]
Abstract
BACKGROUND Wilson's disease (WD) is a rare inherited disorder caused by mutations in the ATP7B gene resulting in copper accumulation in different organs. However, data on ATP7B mutation spectrum in Russia and worldwide are insufficient and contradictory. The objective of the present study was estimation of the frequency of ATP7B gene mutations in the Russian population of WD patients. MATERIALS AND METHODS 75 WDpatients were examined by next-generation sequencing (NGS). A targeted panel NimbleGen SeqCap EZ Choice: 151012_HG38_CysFib_EZ_HX3 (ROCHE)was designed for analysis of ATP7B gene and possible modifier genes. Retrospective assessment of a diagnostic WD score (Leipzig, 2001) was also performed. RESULTS 31 mutations in ATP7B gene were detected. Two most frequent mutations were c.3207C > A (51,85% of alleles) and c.3190 G > A (8,64% of alleles). Single rare mutations were detected in 29% of cases. In 96% cases mutations of both copies of the ATP7B were revealed. We also observed 3 novel potentially pathogenic variants which were not previously described (c.1870-8A > G, c.3655A > T (p.Ile1219Phe), c.3036dupC (p.Lys1013fs). For 25% of patients at the time of the manifestation the diagnosis WD could not be established using the earlier proposed diagnostic score. There was a remarkable delay in diagnosis for the majority of patients. Only 33% of patients WD was diagnosed in three months after the first symptoms, 29%patients - in 3-12 months, 30% - in 1-10 years, in 8% - more than 10 years. Generally, clinical appearance of WD may be rather variable at manifestation and genetic profiling at this step is the only way to confirm the presence of WD.
Collapse
Affiliation(s)
- Mariya S Balashova
- Sechenov First Moscow State Medical University, Moscow, Russia; Center of Genetics and Reproductive Medicine «Genetico», Moscow, Russia.
| | | | - Oleg S Glotov
- D.O.Ott Research Institute of Obstetrics, Gynecology and Reproductology, St. Petersburg, Russia; St.Petersburg State Health Care Establishment the City Hospital №40, St. Petersburg, Russia; Saint Petersburg State University, St. Petersburg, Russia
| | - Andrey S Glotov
- D.O.Ott Research Institute of Obstetrics, Gynecology and Reproductology, St. Petersburg, Russia; St.Petersburg State Health Care Establishment the City Hospital №40, St. Petersburg, Russia; Saint Petersburg State University, St. Petersburg, Russia
| | - Yury A Barbitoff
- Saint Petersburg State University, St. Petersburg, Russia; Bioinformatics Institute, St. Petersburg, Russia
| | - Mikhail A Fedyakov
- St.Petersburg State Health Care Establishment the City Hospital №40, St. Petersburg, Russia; Saint Petersburg State University, St. Petersburg, Russia
| | - Diana A Alaverdian
- St.Petersburg State Health Care Establishment the City Hospital №40, St. Petersburg, Russia
| | - Tatiana E Ivashchenko
- D.O.Ott Research Institute of Obstetrics, Gynecology and Reproductology, St. Petersburg, Russia
| | - Olga V Romanova
- D.O.Ott Research Institute of Obstetrics, Gynecology and Reproductology, St. Petersburg, Russia; St.Petersburg State Health Care Establishment the City Hospital №40, St. Petersburg, Russia
| | - Andrey M Sarana
- St.Petersburg State Health Care Establishment the City Hospital №40, St. Petersburg, Russia; Saint Petersburg State University, St. Petersburg, Russia
| | - Sergey G Scherbak
- St.Petersburg State Health Care Establishment the City Hospital №40, St. Petersburg, Russia; Saint Petersburg State University, St. Petersburg, Russia
| | - Vladislav S Baranov
- D.O.Ott Research Institute of Obstetrics, Gynecology and Reproductology, St. Petersburg, Russia; Saint Petersburg State University, St. Petersburg, Russia
| | | | - Anatoly V Skalny
- Sechenov First Moscow State Medical University, Moscow, Russia; Taipei Medical University, Taipei, Taiwan
| | | | - Tatiana M Ignatova
- Center of Endosurgery and Lithotripsy (CELT), Moscow, Russia; Burnasyan Federal Medical Biophysical Center of Federal Medical Biological Agency, Moscow, Russia
| | - Aliy Y Asanov
- Sechenov First Moscow State Medical University, Moscow, Russia
| |
Collapse
|
10
|
Barbitoff YA, Polev DE, Glotov AS, Serebryakova EA, Shcherbakova IV, Kiselev AM, Kostareva AA, Glotov OS, Predeus AV. Systematic dissection of biases in whole-exome and whole-genome sequencing reveals major determinants of coding sequence coverage. Sci Rep 2020; 10:2057. [PMID: 32029882 PMCID: PMC7005158 DOI: 10.1038/s41598-020-59026-y] [Citation(s) in RCA: 61] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2019] [Accepted: 01/22/2020] [Indexed: 12/30/2022] Open
Abstract
Advantages and diagnostic effectiveness of the two most widely used resequencing approaches, whole exome (WES) and whole genome (WGS) sequencing, are often debated. WES dominated large-scale resequencing projects because of lower cost and easier data storage and processing. Rapid development of 3rd generation sequencing methods and novel exome sequencing kits predicate the need for a robust statistical framework allowing informative and easy performance comparison of the emerging methods. In our study we developed a set of statistical tools to systematically assess coverage of coding regions provided by several modern WES platforms, as well as PCR-free WGS. We identified a substantial problem in most previously published comparisons which did not account for mappability limitations of short reads. Using regression analysis and simple machine learning, as well as several novel metrics of coverage evenness, we analyzed the contribution from the major determinants of CDS coverage. Contrary to a common view, most of the observed bias in modern WES stems from mappability limitations of short reads and exome probe design rather than sequence composition. We also identified the ~ 500 kb region of human exome that could not be effectively characterized using short read technology and should receive special attention during variant analysis. Using our novel metrics of sequencing coverage, we identified main determinants of WES and WGS performance. Overall, our study points out avenues for improvement of enrichment-based methods and development of novel approaches that would maximize variant discovery at optimal cost.
Collapse
Affiliation(s)
- Yury A Barbitoff
- Bioinformatics Institute, Saint Petersburg, Russia.,Department of Genomic Medicine, D. O. Ott Research Institute of Obstetrics, Gynecology, and Reproduction, Saint Petersburg, Russia.,Department of Genetics and Biotechnology, Saint Petersburg State University, Saint Petersburg, Russia.,Cerbalab LTD, Saint Petersburg, Russia
| | | | - Andrey S Glotov
- Department of Genomic Medicine, D. O. Ott Research Institute of Obstetrics, Gynecology, and Reproduction, Saint Petersburg, Russia.,Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg, Russia.,City Hospital №40, Saint Petersburg, Russia.,Institute of Living Systems, Immanuel Kant Baltic Federal University, Kaliningrad, Russia
| | - Elena A Serebryakova
- Department of Genomic Medicine, D. O. Ott Research Institute of Obstetrics, Gynecology, and Reproduction, Saint Petersburg, Russia
| | - Irina V Shcherbakova
- Molecular Biology Division, Biomedical Center, LMU Munich, 82152, Planegg-Martinsried, Germany
| | - Artem M Kiselev
- Almazov National Medical Research Centre, Saint Petersburg, Russia
| | - Anna A Kostareva
- Almazov National Medical Research Centre, Saint Petersburg, Russia
| | - Oleg S Glotov
- Department of Genomic Medicine, D. O. Ott Research Institute of Obstetrics, Gynecology, and Reproduction, Saint Petersburg, Russia.,City Hospital №40, Saint Petersburg, Russia
| | | |
Collapse
|
11
|
Barbitoff YA, Skitchenko RK, Poleshchuk OI, Shikov AE, Serebryakova EA, Nasykhova YA, Polev DE, Shuvalova AR, Shcherbakova IV, Fedyakov MA, Glotov OS, Glotov AS, Predeus AV. Whole-exome sequencing provides insights into monogenic disease prevalence in Northwest Russia. Mol Genet Genomic Med 2019; 7:e964. [PMID: 31482689 PMCID: PMC6825859 DOI: 10.1002/mgg3.964] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2019] [Accepted: 08/07/2019] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND Allele frequency data from large exome and genome aggregation projects such as the Genome Aggregation Database (gnomAD) are of ultimate importance to the interpretation of medical resequencing data. However, allele frequencies might significantly differ in poorly studied populations that are underrepresented in large-scale projects, such as the Russian population. METHODS In this work, we leveraged our access to a large dataset of 694 exome samples to analyze genetic variation in the Northwest Russia. We compared the spectrum of genetic variants to the dbSNP build 151, and made estimates of ClinVar-based autosomal recessive (AR) disease allele prevalence as compared to gnomAD r. 2.1. RESULTS An estimated 9.3% of discovered variants were not present in dbSNP. We report statistically significant overrepresentation of pathogenic variants for several Mendelian disorders, including phenylketonuria (PAH, rs5030858), Wilson's disease (ATP7B, rs76151636), factor VII deficiency (F7, rs36209567), kyphoscoliosis type of Ehlers-Danlos syndrome (FKBP14, rs542489955), and several other recessive pathologies. We also make primary estimates of monogenic disease incidence in the population, with retinal dystrophy, cystic fibrosis, and phenylketonuria being the most frequent AR pathologies. CONCLUSION Our observations demonstrate the utility of population-specific allele frequency data to the diagnosis of monogenic disorders using high-throughput technologies.
Collapse
Affiliation(s)
- Yury A. Barbitoff
- Bioinformatics InstituteSt. PetersburgRussia
- Department of Genetics and BiotechnologySt. Petersburg State UniversitySt. PetersburgRussia
| | | | | | - Anton E. Shikov
- Bioinformatics InstituteSt. PetersburgRussia
- City Hospital No. 40St. PetersburgRussia
| | - Elena A. Serebryakova
- Department of Genomic MedicineD.O. Ott Research Institute of Obstetrics, Gynaecology and ReproductionSt. PetersburgRussia
| | - Yulia A. Nasykhova
- Department of Genomic MedicineD.O. Ott Research Institute of Obstetrics, Gynaecology and ReproductionSt. PetersburgRussia
- Laboratory of Biobanking and Genomic Medicine of Institute of Translation BiomedicineSt. Petersburg State UniversitySt. PetersburgRussia
| | | | | | - Irina V. Shcherbakova
- Laboratory of Biobanking and Genomic Medicine of Institute of Translation BiomedicineSt. Petersburg State UniversitySt. PetersburgRussia
| | | | - Oleg S. Glotov
- City Hospital No. 40St. PetersburgRussia
- Department of Genomic MedicineD.O. Ott Research Institute of Obstetrics, Gynaecology and ReproductionSt. PetersburgRussia
| | - Andrey S. Glotov
- City Hospital No. 40St. PetersburgRussia
- Department of Genomic MedicineD.O. Ott Research Institute of Obstetrics, Gynaecology and ReproductionSt. PetersburgRussia
- Laboratory of Biobanking and Genomic Medicine of Institute of Translation BiomedicineSt. Petersburg State UniversitySt. PetersburgRussia
- Institute of Living SystemsImmanuel Kant Baltic Federal UniversityKaliningradRussia
| | | |
Collapse
|
12
|
Glotov OS, Serebryakova EA, Turkunova ME, Efimova OA, Glotov AS, Barbitoff YA, Nasykhova YA, Predeus AV, Polev DE, Fedyakov MA, Polyakova IV, Ivashchenko TE, Shved NY, Shabanova ES, Tiselko AV, Romanova OV, Sarana AM, Pendina AA, Scherbak SG, Musina EV, Petrovskaia-Kaminskaia AV, Lonishin LR, Ditkovskaya LV, Zhelenina LА, Tyrtova LV, Berseneva OS, Skitchenko RK, Suspitsin EN, Bashnina EB, Baranov VS. Whole‑exome sequencing in Russian children with non‑type 1 diabetes mellitus reveals a wide spectrum of genetic variants in MODY‑related and unrelated genes. Mol Med Rep 2019; 20:4905-4914. [PMID: 31638168 PMCID: PMC6854535 DOI: 10.3892/mmr.2019.10751] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2019] [Accepted: 08/28/2019] [Indexed: 12/13/2022] Open
Abstract
The present study reports on the frequency and the spectrum of genetic variants causative of monogenic diabetes in Russian children with non-type 1 diabetes mellitus. The present study included 60 unrelated Russian children with non-type 1 diabetes mellitus diagnosed before the age of 18 years. Genetic variants were screened using whole-exome sequencing (WES) in a panel of 35 genes causative of maturity onset diabetes of the young (MODY) and transient or permanent neonatal diabetes. Verification of the WES results was performed using PCR-direct sequencing. A total of 38 genetic variants were identified in 33 out of 60 patients (55%). The majority of patients (27/33, 81.8%) had variants in MODY-related genes: GCK (n=19), HNF1A (n=2), PAX4 (n=1), ABCC8 (n=1), KCNJ11 (n=1), GCK+HNF1A (n=1), GCK+BLK (n=1) and GCK+BLK+WFS1 (n=1). A total of 6 patients (6/33, 18.2%) had variants in MODY-unrelated genes: GATA6 (n=1), WFS1 (n=3), EIF2AK3 (n=1) and SLC19A2 (n=1). A total of 15 out of 38 variants were novel, including GCK, HNF1A, BLK, WFS1, EIF2AK3 and SLC19A2. To summarize, the present study demonstrates a high frequency and a wide spectrum of genetic variants causative of monogenic diabetes in Russian children with non-type 1 diabetes mellitus. The spectrum includes previously known and novel variants in MODY-related and unrelated genes, with multiple variants in a number of patients. The prevalence of GCK variants indicates that diagnostics of monogenic diabetes in Russian children may begin with testing for MODY2. However, the remaining variants are present at low frequencies in 9 different genes, altogether amounting to ~50% of the cases and highlighting the efficiency of using WES in non-GCK-MODY cases.
Collapse
Affiliation(s)
- Oleg S Glotov
- D.O. Ott Research Institute of Obstetrics, Gynecology and Reproductology, 199034 St. Petersburg, Russia
| | - Elena A Serebryakova
- D.O. Ott Research Institute of Obstetrics, Gynecology and Reproductology, 199034 St. Petersburg, Russia
| | - Mariia E Turkunova
- St. Petersburg State Pediatric Medical University, 194100 St. Petersburg, Russia
| | - Olga A Efimova
- D.O. Ott Research Institute of Obstetrics, Gynecology and Reproductology, 199034 St. Petersburg, Russia
| | - Andrey S Glotov
- D.O. Ott Research Institute of Obstetrics, Gynecology and Reproductology, 199034 St. Petersburg, Russia
| | | | - Yulia A Nasykhova
- D.O. Ott Research Institute of Obstetrics, Gynecology and Reproductology, 199034 St. Petersburg, Russia
| | | | - Dmitrii E Polev
- St. Petersburg State University, 199034 St. Petersburg, Russia
| | | | | | - Tatyana E Ivashchenko
- D.O. Ott Research Institute of Obstetrics, Gynecology and Reproductology, 199034 St. Petersburg, Russia
| | - Natalia Y Shved
- D.O. Ott Research Institute of Obstetrics, Gynecology and Reproductology, 199034 St. Petersburg, Russia
| | - Elena S Shabanova
- D.O. Ott Research Institute of Obstetrics, Gynecology and Reproductology, 199034 St. Petersburg, Russia
| | - Alena V Tiselko
- D.O. Ott Research Institute of Obstetrics, Gynecology and Reproductology, 199034 St. Petersburg, Russia
| | - Olga V Romanova
- City Hospital Number 40, Sestroretsk, 197706 St. Petersburg, Russia
| | - Andrey M Sarana
- St. Petersburg State University, 199034 St. Petersburg, Russia
| | - Anna A Pendina
- D.O. Ott Research Institute of Obstetrics, Gynecology and Reproductology, 199034 St. Petersburg, Russia
| | | | - Ekaterina V Musina
- D.O. Ott Research Institute of Obstetrics, Gynecology and Reproductology, 199034 St. Petersburg, Russia
| | | | | | - Liliya V Ditkovskaya
- St. Petersburg State Pediatric Medical University, 194100 St. Petersburg, Russia
| | - Liudmila А Zhelenina
- St. Petersburg State Pediatric Medical University, 194100 St. Petersburg, Russia
| | - Ludmila V Tyrtova
- St. Petersburg State Pediatric Medical University, 194100 St. Petersburg, Russia
| | - Olga S Berseneva
- St. Petersburg State Pediatric Medical University, 194100 St. Petersburg, Russia
| | | | - Evgenii N Suspitsin
- St. Petersburg State Pediatric Medical University, 194100 St. Petersburg, Russia
| | - Elena B Bashnina
- North‑Western State Medical University Named After I.I. Mechnikov, 191015 St. Petersburg, Russia
| | - Vladislav S Baranov
- D.O. Ott Research Institute of Obstetrics, Gynecology and Reproductology, 199034 St. Petersburg, Russia
| |
Collapse
|
13
|
Abstract
The use of the human reference genome has shaped methods and data across modern genomics. This has offered many benefits while creating a few constraints. In the following opinion, we outline the history, properties, and pitfalls of the current human reference genome. In a few illustrative analyses, we focus on its use for variant-calling, highlighting its nearness to a 'type specimen'. We suggest that switching to a consensus reference would offer important advantages over the continued use of the current reference with few disadvantages.
Collapse
Affiliation(s)
- Sara Ballouz
- Cold Spring Harbor Laboratory, The Stanley Institute for Cognitive Genomics, Cold Spring Harbor, NY, 11724, USA
| | - Alexander Dobin
- Cold Spring Harbor Laboratory, The Stanley Institute for Cognitive Genomics, Cold Spring Harbor, NY, 11724, USA
| | - Jesse A Gillis
- Cold Spring Harbor Laboratory, The Stanley Institute for Cognitive Genomics, Cold Spring Harbor, NY, 11724, USA.
| |
Collapse
|
14
|
Fedorenko OY, Golimbet VE, Ivanova SА, Levchenko А, Gainetdinov RR, Semke AV, Simutkin GG, Gareeva АE, Glotov АS, Gryaznova A, Iourov IY, Krupitsky EM, Lebedev IN, Mazo GE, Kaleda VG, Abramova LI, Oleichik IV, Nasykhova YA, Nasyrova RF, Nikolishin AE, Kasyanov ED, Rukavishnikov GV, Timerbulatov IF, Brodyansky VM, Vorsanova SG, Yurov YB, Zhilyaeva TV, Sergeeva AV, Blokhina EA, Zvartau EE, Blagonravova AS, Aftanas LI, Bokhan NА, Kekelidze ZI, Klimenko TV, Anokhina IP, Khusnutdinova EK, Klyushnik TP, Neznanov NG, Stepanov VA, Schulze TG, Kibitov АО. Opening up new horizons for psychiatric genetics in the Russian Federation: moving toward a national consortium. Mol Psychiatry 2019; 24:1099-1111. [PMID: 30664668 PMCID: PMC6756082 DOI: 10.1038/s41380-019-0354-z] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/11/2018] [Revised: 12/27/2018] [Accepted: 12/31/2018] [Indexed: 12/18/2022]
Abstract
We provide an overview of the recent achievements in psychiatric genetics research in the Russian Federation and present genotype-phenotype, population, epigenetic, cytogenetic, functional, ENIGMA, and pharmacogenetic studies, with an emphasis on genome-wide association studies. The genetic backgrounds of mental illnesses in the polyethnic and multicultural population of the Russian Federation are still understudied. Furthermore, genetic, genomic, and pharmacogenetic data from the Russian Federation are not adequately represented in the international scientific literature, are currently not available for meta-analyses and have never been compared with data from other populations. Most of these problems cannot be solved by individual centers working in isolation but warrant a truly collaborative effort that brings together all the major psychiatric genetic research centers in the Russian Federation in a national consortium. For this reason, we have established the Russian National Consortium for Psychiatric Genetics (RNCPG) with the aim to strengthen the power and rigor of psychiatric genetics research in the Russian Federation and enhance the international compatibility of this research.The consortium is set up as an open organization that will facilitate collaborations on complex biomedical research projects in human mental health in the Russian Federation and abroad. These projects will include genotyping, sequencing, transcriptome and epigenome analysis, metabolomics, and a wide array of other state-of-the-art analyses. Here, we discuss the challenges we face and the approaches we will take to unlock the huge potential that the Russian Federation holds for the worldwide psychiatric genetics community.
Collapse
Affiliation(s)
- Olga Yu Fedorenko
- Mental Health Research Institute, Tomsk National Research Medical Center of Russian Academy of Sciences, Tomsk, Russian Federation.
- National Research Tomsk Polytechnic University, Tomsk, Russian Federation.
| | | | - Svetlana А Ivanova
- Mental Health Research Institute, Tomsk National Research Medical Center of Russian Academy of Sciences, Tomsk, Russian Federation
- National Research Tomsk Polytechnic University, Tomsk, Russian Federation
| | - Аnastasia Levchenko
- Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg, Russian Federation
| | - Raul R Gainetdinov
- Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg, Russian Federation
| | - Arkady V Semke
- Mental Health Research Institute, Tomsk National Research Medical Center of Russian Academy of Sciences, Tomsk, Russian Federation
| | - German G Simutkin
- Mental Health Research Institute, Tomsk National Research Medical Center of Russian Academy of Sciences, Tomsk, Russian Federation
| | - Аnna E Gareeva
- Institute of Biochemistry and Genetics, Ufa Federal Research Center, Russian Academy of Sciences, Ufa, Russian Federation
- Federal State Educational Institution of Highest Education Bashkir State Medical University of Public Health Ministry of Russian Federation, Ufa, Russian Federation
| | - Аndrey S Glotov
- Laboratory of Biobanking and Genomic Medicine of Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg, Russian Federation
| | - Anna Gryaznova
- Institute of Psychiatric Phenomics and Genomics (IPPG), University Hospital, LMU, Munich, Germany
| | - Ivan Y Iourov
- Mental Health Research Center, Moscow, Russian Federation
| | - Evgeny M Krupitsky
- V.M. Bekhterev National Medical Research Center for Psychiatry and Neurology, Saint Petersburg, Russian Federation
| | - Igor N Lebedev
- Research Institute of Medical Genetics, Tomsk National Research Medical Center of Russian Academy of Sciences, Tomsk, Russian Federation
| | - Galina E Mazo
- V.M. Bekhterev National Medical Research Center for Psychiatry and Neurology, Saint Petersburg, Russian Federation
| | | | | | | | - Yulia A Nasykhova
- Laboratory of Biobanking and Genomic Medicine of Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg, Russian Federation
| | - Regina F Nasyrova
- V.M. Bekhterev National Medical Research Center for Psychiatry and Neurology, Saint Petersburg, Russian Federation
| | - Anton E Nikolishin
- Serbsky National Medical Research Center on Psychiatry and Addictions, Moscow, Russian Federation
| | - Evgeny D Kasyanov
- V.M. Bekhterev National Medical Research Center for Psychiatry and Neurology, Saint Petersburg, Russian Federation
| | - Grigory V Rukavishnikov
- V.M. Bekhterev National Medical Research Center for Psychiatry and Neurology, Saint Petersburg, Russian Federation
| | - Ilgiz F Timerbulatov
- Federal State Educational Institution of Highest Education Bashkir State Medical University of Public Health Ministry of Russian Federation, Ufa, Russian Federation
| | - Vadim M Brodyansky
- Serbsky National Medical Research Center on Psychiatry and Addictions, Moscow, Russian Federation
| | - Svetlana G Vorsanova
- Veltischev Research and Clinical Institute for Pediatrics, the Pirogov Russian National Research Medical University, Moscow, Russian Federation
| | - Yury B Yurov
- Mental Health Research Center, Moscow, Russian Federation
| | - Tatyana V Zhilyaeva
- Privolzhskiy Research Medical University, Nizhny Novgorod, Russian Federation
| | | | - Elena A Blokhina
- First Saint Petersburg Pavlov State Medical University, Saint Petersburg, Russian Federation
| | - Edwin E Zvartau
- First Saint Petersburg Pavlov State Medical University, Saint Petersburg, Russian Federation
| | - Anna S Blagonravova
- Privolzhskiy Research Medical University, Nizhny Novgorod, Russian Federation
| | - Lyubomir I Aftanas
- Federal State Scientific Budgetary Institution "Scientific Research Institute of Physiology and Basic Medicine,", Novosibirsk, Russian Federation
| | - Nikolay А Bokhan
- Mental Health Research Institute, Tomsk National Research Medical Center of Russian Academy of Sciences, Tomsk, Russian Federation
- National Research Tomsk State University, Tomsk, Russian Federation
| | - Zurab I Kekelidze
- Serbsky National Medical Research Center on Psychiatry and Addictions, Moscow, Russian Federation
| | - Tatyana V Klimenko
- Serbsky National Medical Research Center on Psychiatry and Addictions, Moscow, Russian Federation
| | - Irina P Anokhina
- Serbsky National Medical Research Center on Psychiatry and Addictions, Moscow, Russian Federation
| | - Elza K Khusnutdinova
- Institute of Biochemistry and Genetics, Ufa Federal Research Center, Russian Academy of Sciences, Ufa, Russian Federation
- Federal State Educational Institution of Highest Education Bashkir State Medical University of Public Health Ministry of Russian Federation, Ufa, Russian Federation
| | | | - Nikolay G Neznanov
- V.M. Bekhterev National Medical Research Center for Psychiatry and Neurology, Saint Petersburg, Russian Federation
| | - Vadim A Stepanov
- Research Institute of Medical Genetics, Tomsk National Research Medical Center of Russian Academy of Sciences, Tomsk, Russian Federation
- National Research Tomsk State University, Tomsk, Russian Federation
| | - Thomas G Schulze
- Institute of Psychiatric Phenomics and Genomics (IPPG), University Hospital, LMU, Munich, Germany
| | - Аleksandr О Kibitov
- Serbsky National Medical Research Center on Psychiatry and Addictions, Moscow, Russian Federation
| |
Collapse
|
15
|
Shukla HG, Bawa PS, Srinivasan S. hg19KIndel: ethnicity normalized human reference genome. BMC Genomics 2019; 20:459. [PMID: 31170919 PMCID: PMC6555027 DOI: 10.1186/s12864-019-5854-3] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2019] [Accepted: 05/29/2019] [Indexed: 11/22/2022] Open
Abstract
Background The most widely used human genome reference assembly hg19 harbors minor alleles at 2.18 million positions as revealed by 1000 Genome Phase 3 dataset. Although this is less than 2% of the 89 million variants reported, it has been shown that the minor alleles can result in 30% false positives in individual genomes, thus misleading and burdening downstream interpretation. More alarming is the fact that, significant percentage of variants that are homozygous recessive for these minor alleles, with potential disease implications, are masked from reporting. Results We have demonstrated that the false positives (FP) and false negatives (FN) can be corrected for by simply replacing nucleotides at the minor allele positions in hg19 with corresponding major allele. Here, we have effectively replaced 2.18 million minor alleles Single Nucleotide Polymorphism (SNPs), Insertion and Deletions (INDELs), Multiple Nucleotide Polymorphism (MNPs) in hg19 with the corresponding major alleles to create an ethnically normalized reference genome called hg19KIndel. In doing so, hg19KIndel has both corrected for sequencing errors acknowledged to be present in hg19 and has improved read alignment near the minor alleles in hg19. Conclusion We have created and made available a new version human reference genome called hg19KIndel. It has been shown that variant calling using hg19KIndel, significantly reduces false positives calls, which in-turn reduces the burden from downstream analysis and validation. It also improved false negative variants call, which means that the variants which were getting missed due to the presence of minor alleles in hg19, will now be called using hg19KIndel. Using hg19KIndel, one even gets a better mapping percentage when compared to currently available human reference genome. hg19KIndel reference genome and its auxiliary datasets are available at 10.5281/zenodo.2638113
Collapse
Affiliation(s)
- Harsh G Shukla
- Institute of Bioinformatics and Applied Biotechnology, Biotech Park, Electronic City Phase I, Bangalore, 560100, India
| | - Pushpinder Singh Bawa
- Institute of Bioinformatics and Applied Biotechnology, Biotech Park, Electronic City Phase I, Bangalore, 560100, India.,Manipal Academy of Higher Education (MAHE), Manipal, India
| | - Subhashini Srinivasan
- Institute of Bioinformatics and Applied Biotechnology, Biotech Park, Electronic City Phase I, Bangalore, 560100, India.
| |
Collapse
|
16
|
Barbitoff YA, Serebryakova EA, Nasykhova YA, Predeus AV, Polev DE, Shuvalova AR, Vasiliev EV, Urazov SP, Sarana AM, Scherbak SG, Gladyshev DV, Pokrovskaya MS, Sivakova OV, Meshkov AN, Drapkina OM, Glotov OS, Glotov AS. Identification of Novel Candidate Markers of Type 2 Diabetes and Obesity in Russia by Exome Sequencing with a Limited Sample Size. Genes (Basel) 2018; 9:genes9080415. [PMID: 30126146 PMCID: PMC6115942 DOI: 10.3390/genes9080415] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2018] [Revised: 08/11/2018] [Accepted: 08/13/2018] [Indexed: 12/22/2022] Open
Abstract
Type 2 diabetes (T2D) and obesity are common chronic disorders with multifactorial etiology. In our study, we performed an exome sequencing analysis of 110 patients of Russian ethnicity together with a multi-perspective approach based on biologically meaningful filtering criteria to detect novel candidate variants and loci for T2D and obesity. We have identified several known single nucleotide polymorphisms (SNPs) as markers for obesity (rs11960429), T2D (rs9379084, rs1126930), and body mass index (BMI) (rs11553746, rs1956549 and rs7195386) (p < 0.05). We show that a method based on scoring of case-specific variants together with selection of protein-altering variants can allow for the interrogation of novel and known candidate markers of T2D and obesity in small samples. Using this method, we identified rs328 in LPL (p = 0.023), rs11863726 in HBQ1 (p = 8 × 10−5), rs112984085 in VAV3 (p = 4.8 × 10−4) for T2D and obesity, rs6271 in DBH (p = 0.043), rs62618693 in QSER1 (p = 0.021), rs61758785 in RAD51B (p = 1.7 × 10−4), rs34042554 in PCDHA1 (p = 1 × 10−4), and rs144183813 in PLEKHA5 (p = 1.7 × 10−4) for obesity; and rs9379084 in RREB1 (p = 0.042), rs2233984 in C6orf15 (p = 0.030), rs61737764 in ITGB6 (p = 0.035), rs17801742 in COL2A1 (p = 8.5 × 10−5), and rs685523 in ADAMTS13 (p = 1 × 10−6) for T2D as important susceptibility loci in Russian population. Our results demonstrate the effectiveness of whole exome sequencing (WES) technologies for searching for novel markers of multifactorial diseases in cohorts of limited size in poorly studied populations.
Collapse
Affiliation(s)
- Yury A Barbitoff
- Biobank of the Research Park, Saint Petersburg State University, 199034 Saint Petersburg, Russia.
- Bioinformatics Institute, 194100 Saint Petersburg, Russia.
- Department of Genetics and Biotechnology, Saint Petersburg State University, 199034 Saint Petersburg, Russia.
- Institute of Translation Biomedicine, Saint Petersburg State University, 199034 Saint Petersburg, Russia.
| | - Elena A Serebryakova
- Biobank of the Research Park, Saint Petersburg State University, 199034 Saint Petersburg, Russia.
- Laboratory of Prenatal Diagnostics of Hereditary Diseases, FSBSI «The Research Institute of Obstetrics, Gynaecology and Reproductology Named after D.O. Ott», 199034 Saint Petersburg, Russia.
- City Hospital No. 40, Sestroretsk, 197706 Saint Petersburg, Russia.
| | - Yulia A Nasykhova
- Biobank of the Research Park, Saint Petersburg State University, 199034 Saint Petersburg, Russia.
- Laboratory of Prenatal Diagnostics of Hereditary Diseases, FSBSI «The Research Institute of Obstetrics, Gynaecology and Reproductology Named after D.O. Ott», 199034 Saint Petersburg, Russia.
| | | | - Dmitrii E Polev
- Biobank of the Research Park, Saint Petersburg State University, 199034 Saint Petersburg, Russia.
| | - Anna R Shuvalova
- Biobank of the Research Park, Saint Petersburg State University, 199034 Saint Petersburg, Russia.
| | | | | | - Andrey M Sarana
- Institute of Translation Biomedicine, Saint Petersburg State University, 199034 Saint Petersburg, Russia.
- City Hospital No. 40, Sestroretsk, 197706 Saint Petersburg, Russia.
| | - Sergey G Scherbak
- Institute of Translation Biomedicine, Saint Petersburg State University, 199034 Saint Petersburg, Russia.
- City Hospital No. 40, Sestroretsk, 197706 Saint Petersburg, Russia.
| | | | - Maria S Pokrovskaya
- Federal State Institution «National Medical Research Center for Preventive Medicine» of the Ministry of Healthcare of the Russian Federation, 101990 Moscow, Russia.
| | - Oksana V Sivakova
- Federal State Institution «National Medical Research Center for Preventive Medicine» of the Ministry of Healthcare of the Russian Federation, 101990 Moscow, Russia.
| | - Aleksey N Meshkov
- Federal State Institution «National Medical Research Center for Preventive Medicine» of the Ministry of Healthcare of the Russian Federation, 101990 Moscow, Russia.
| | - Oxana M Drapkina
- Federal State Institution «National Medical Research Center for Preventive Medicine» of the Ministry of Healthcare of the Russian Federation, 101990 Moscow, Russia.
| | - Oleg S Glotov
- Laboratory of Prenatal Diagnostics of Hereditary Diseases, FSBSI «The Research Institute of Obstetrics, Gynaecology and Reproductology Named after D.O. Ott», 199034 Saint Petersburg, Russia.
- City Hospital No. 40, Sestroretsk, 197706 Saint Petersburg, Russia.
| | - Andrey S Glotov
- Biobank of the Research Park, Saint Petersburg State University, 199034 Saint Petersburg, Russia.
- Laboratory of Prenatal Diagnostics of Hereditary Diseases, FSBSI «The Research Institute of Obstetrics, Gynaecology and Reproductology Named after D.O. Ott», 199034 Saint Petersburg, Russia.
| |
Collapse
|
17
|
De novo human genome assemblies reveal spectrum of alternative haplotypes in diverse populations. Nat Commun 2018; 9:3040. [PMID: 30072691 PMCID: PMC6072799 DOI: 10.1038/s41467-018-05513-w] [Citation(s) in RCA: 51] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2018] [Accepted: 07/11/2018] [Indexed: 12/20/2022] Open
Abstract
The human reference genome is used extensively in modern biological research. However, a single consensus representation is inadequate to provide a universal reference structure because it is a haplotype among many in the human population. Using 10× Genomics (10×G) “Linked-Read” technology, we perform whole genome sequencing (WGS) and de novo assembly on 17 individuals across five populations. We identify 1842 breakpoint-resolved non-reference unique insertions (NUIs) that, in aggregate, add up to 2.1 Mb of so far undescribed genomic content. Among these, 64% are considered ancestral to humans since they are found in non-human primate genomes. Furthermore, 37% of the NUIs can be found in the human transcriptome and 14% likely arose from Alu-recombination-mediated deletion. Our results underline the need of a set of human reference genomes that includes a comprehensive list of alternative haplotypes to depict the complete spectrum of genetic diversity across populations. The majority of the human reference genome assembly is represented as a single consensus haplotype. Here, Wong et al. analyze de novo assemblies of 17 diverse, haplotype-resolved genomes to gain insights into the structure of genetic diversity and compile a list of alternative haplotypes across populations.
Collapse
|
18
|
Interpretation of genomic sequencing: variants should be considered uncertain until proven guilty. Genet Med 2018; 20:291-293. [PMID: 29388946 DOI: 10.1038/gim.2017.269] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2017] [Accepted: 12/18/2017] [Indexed: 01/09/2023] Open
|