1
|
Heumos S, Guarracino A, Schmelzle JNM, Li J, Zhang Z, Hagmann J, Nahnsen S, Prins P, Garrison E. Pangenome graph layout by Path-Guided Stochastic Gradient Descent. Bioinformatics 2024; 40:btae363. [PMID: 38960860 PMCID: PMC11227364 DOI: 10.1093/bioinformatics/btae363] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Revised: 02/20/2024] [Accepted: 07/02/2024] [Indexed: 07/05/2024] Open
Abstract
MOTIVATION The increasing availability of complete genomes demands for models to study genomic variability within entire populations. Pangenome graphs capture the full genomic similarity and diversity between multiple genomes. In order to understand them, we need to see them. For visualization, we need a human-readable graph layout: a graph embedding in low (e.g. two) dimensional depictions. Due to a pangenome graph's potential excessive size, this is a significant challenge. RESULTS In response, we introduce a novel graph layout algorithm: the Path-Guided Stochastic Gradient Descent (PG-SGD). PG-SGD uses the genomes, represented in the pangenome graph as paths, as an embedded positional system to sample genomic distances between pairs of nodes. This avoids the quadratic cost seen in previous versions of graph drawing by SGD. We show that our implementation efficiently computes the low-dimensional layouts of gigabase-scale pangenome graphs, unveiling their biological features. AVAILABILITY AND IMPLEMENTATION We integrated PG-SGD in ODGI which is released as free software under the MIT open source license. Source code is available at https://github.com/pangenome/odgi.
Collapse
Affiliation(s)
- Simon Heumos
- Quantitative Biology Center (QBiC), University of Tübingen, 72076 Tübingen, Germany
- Biomedical Data Science, Department of Computer Science, University of Tübingen, 72076 Tübingen, Germany
- M3 Research Center, University Hospital Tübingen, 72076 Tübingen, Germany
- Institute for Bioinformatics and Medical Informatics (IBMI), University of Tübingen, 72076 Tübingen, Germany
| | - Andrea Guarracino
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN 38163, United States
- Genomics Research Centre, Human Technopole, 20157 Milan, Italy
| | - Jan-Niklas M Schmelzle
- Department of Computer Engineering, School of Computation, Information and Technology (CIT), Technical University of Munich, 80333 Munich, Germany
- School of Electrical and Computer Engineering, Cornell University, Ithaca, NY 14853, United States
| | - Jiajie Li
- School of Electrical and Computer Engineering, Cornell University, Ithaca, NY 14853, United States
| | - Zhiru Zhang
- School of Electrical and Computer Engineering, Cornell University, Ithaca, NY 14853, United States
| | | | - Sven Nahnsen
- Quantitative Biology Center (QBiC), University of Tübingen, 72076 Tübingen, Germany
- Biomedical Data Science, Department of Computer Science, University of Tübingen, 72076 Tübingen, Germany
- M3 Research Center, University Hospital Tübingen, 72076 Tübingen, Germany
- Institute for Bioinformatics and Medical Informatics (IBMI), University of Tübingen, 72076 Tübingen, Germany
| | - Pjotr Prins
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN 38163, United States
| | - Erik Garrison
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN 38163, United States
| |
Collapse
|
2
|
Abondio P, Bruno F, Passarino G, Montesanto A, Luiselli D. Pangenomics: A new era in the field of neurodegenerative diseases. Ageing Res Rev 2024; 94:102180. [PMID: 38163518 DOI: 10.1016/j.arr.2023.102180] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Revised: 12/14/2023] [Accepted: 12/28/2023] [Indexed: 01/03/2024]
Abstract
A pangenome is composed of all the genetic variability of a group of individuals, and its application to the study of neurodegenerative diseases may provide valuable insights into the underlying aspects of genetic heterogenetiy for these complex ailments, including gene expression, epigenetics, and translation mechanisms. Furthermore, a reference pangenome allows for the identification of previously undetected structural commonalities and differences among individuals, which may help in the diagnosis of a disease, support the prediction of what will happen over time (prognosis) and aid in developing novel treatments in the perspective of personalized medicine. Therefore, in the present review, the application of the pangenome concept to the study of neurodegenerative diseases will be discussed and analyzed for its potential to enable an improvement in diagnosis and prognosis for these illnesses, leading to the development of tailored treatments for individual patients from the knowledge of the genomic composition of a whole population.
Collapse
Affiliation(s)
- Paolo Abondio
- Laboratory of Ancient DNA, Department of Cultural Heritage, University of Bologna, Via degli Ariani 1, 48121 Ravenna, Italy.
| | - Francesco Bruno
- Academy of Cognitive Behavioral Sciences of Calabria (ASCoC), Lamezia Terme, Italy; Regional Neurogenetic Centre (CRN), Department of Primary Care, Azienda Sanitaria Provinciale Di Catanzaro, Viale A. Perugini, 88046 Lamezia Terme, CZ, Italy; Association for Neurogenetic Research (ARN), Lamezia Terme, CZ, Italy
| | - Giuseppe Passarino
- Department of Biology, Ecology and Earth Sciences, University of Calabria, Rende 87036, Italy
| | - Alberto Montesanto
- Department of Biology, Ecology and Earth Sciences, University of Calabria, Rende 87036, Italy
| | - Donata Luiselli
- Laboratory of Ancient DNA, Department of Cultural Heritage, University of Bologna, Via degli Ariani 1, 48121 Ravenna, Italy
| |
Collapse
|
3
|
Heumos S, Guarracino A, Schmelzle JNM, Li J, Zhang Z, Hagmann J, Nahnsen S, Prins P, Garrison E. Pangenome graph layout by Path-Guided Stochastic Gradient Descent. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.22.558964. [PMID: 37790531 PMCID: PMC10542513 DOI: 10.1101/2023.09.22.558964] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/05/2023]
Abstract
Motivation The increasing availability of complete genomes demands for models to study genomic variability within entire populations. Pangenome graphs capture the full genomic similarity and diversity between multiple genomes. In order to understand them, we need to see them. For visualization, we need a human readable graph layout: A graph embedding in low (e.g. two) dimensional depictions. Due to a pangenome graph's potential excessive size, this is a significant challenge. Results In response, we introduce a novel graph layout algorithm: the Path-Guided Stochastic Gradient Descent (PG-SGD). PG-SGD uses the genomes, represented in the pangenome graph as paths, as an embedded positional system to sample genomic distances between pairs of nodes. This avoids the quadratic cost seen in previous versions of graph drawing by Stochastic Gradient Descent (SGD). We show that our implementation efficiently computes the low dimensional layouts of gigabase-scale pangenome graphs, unveiling their biological features. Availability We integrated PG-SGD in ODGI which is released as free software under the MIT open source license. Source code is available at https://github.com/pangenome/odgi.
Collapse
Affiliation(s)
- Simon Heumos
- Quantitative Biology Center (QBiC), University of Tübingen, Tübingen 72076, Germany
- Biomedical Data Science, Department of Computer Science, University of Tübingen, Tübingen 72076, Germany
| | - Andrea Guarracino
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN 38163, USA
- Genomics Research Centre, Human Technopole, Milan 20157, Italy
| | - Jan-Niklas M. Schmelzle
- Department of Computer Engineering, School of Computation, Information and Technology (CIT), Technical University of Munich, Munich 80333, Germany
- School of Electrical and Computer Engineering, Cornell University, Ithaca, NY 14853, USA
| | - Jiajie Li
- School of Electrical and Computer Engineering, Cornell University, Ithaca, NY 14853, USA
| | - Zhiru Zhang
- School of Electrical and Computer Engineering, Cornell University, Ithaca, NY 14853, USA
| | - Jörg Hagmann
- Computomics GmbH, Eisenbahnstr. 1, 72072 Tübingen, Germany
| | - Sven Nahnsen
- Quantitative Biology Center (QBiC), University of Tübingen, Tübingen 72076, Germany
- Biomedical Data Science, Department of Computer Science, University of Tübingen, Tübingen 72076, Germany
- M3 Research Center, University Hospital Tübingen, 72076 Tübingen, Germany
| | - Pjotr Prins
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN 38163, USA
| | - Erik Garrison
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN 38163, USA
| |
Collapse
|
4
|
Abondio P, Cilli E, Luiselli D. Human Pangenomics: Promises and Challenges of a Distributed Genomic Reference. Life (Basel) 2023; 13:1360. [PMID: 37374141 DOI: 10.3390/life13061360] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Revised: 06/02/2023] [Accepted: 06/08/2023] [Indexed: 06/29/2023] Open
Abstract
A pangenome is a collection of the common and unique genomes that are present in a given species. It combines the genetic information of all the genomes sampled, resulting in a large and diverse range of genetic material. Pangenomic analysis offers several advantages compared to traditional genomic research. For example, a pangenome is not bound by the physical constraints of a single genome, so it can capture more genetic variability. Thanks to the introduction of the concept of pangenome, it is possible to use exceedingly detailed sequence data to study the evolutionary history of two different species, or how populations within a species differ genetically. In the wake of the Human Pangenome Project, this review aims at discussing the advantages of the pangenome around human genetic variation, which are then framed around how pangenomic data can inform population genetics, phylogenetics, and public health policy by providing insights into the genetic basis of diseases or determining personalized treatments, targeting the specific genetic profile of an individual. Moreover, technical limitations, ethical concerns, and legal considerations are discussed.
Collapse
Affiliation(s)
- Paolo Abondio
- Laboratory of Ancient DNA, Department of Cultural Heritage, University of Bologna, Via degli Ariani 1, 48121 Ravenna, Italy
| | - Elisabetta Cilli
- Laboratory of Ancient DNA, Department of Cultural Heritage, University of Bologna, Via degli Ariani 1, 48121 Ravenna, Italy
| | - Donata Luiselli
- Laboratory of Ancient DNA, Department of Cultural Heritage, University of Bologna, Via degli Ariani 1, 48121 Ravenna, Italy
| |
Collapse
|
5
|
Bogomolov A, Filonov S, Chadaeva I, Rasskazov D, Khandaev B, Zolotareva K, Kazachek A, Oshchepkov D, Ivanisenko VA, Demenkov P, Podkolodnyy N, Kondratyuk E, Ponomarenko P, Podkolodnaya O, Mustafin Z, Savinkova L, Kolchanov N, Tverdokhleb N, Ponomarenko M. Candidate SNP Markers Significantly Altering the Affinity of TATA-Binding Protein for the Promoters of Human Hub Genes for Atherogenesis, Atherosclerosis and Atheroprotection. Int J Mol Sci 2023; 24:ijms24109010. [PMID: 37240358 DOI: 10.3390/ijms24109010] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2023] [Revised: 05/13/2023] [Accepted: 05/17/2023] [Indexed: 05/28/2023] Open
Abstract
Atherosclerosis is a systemic disease in which focal lesions in arteries promote the build-up of lipoproteins and cholesterol they are transporting. The development of atheroma (atherogenesis) narrows blood vessels, reduces the blood supply and leads to cardiovascular diseases. According to the World Health Organization (WHO), cardiovascular diseases are the leading cause of death, which has been especially boosted since the COVID-19 pandemic. There is a variety of contributors to atherosclerosis, including lifestyle factors and genetic predisposition. Antioxidant diets and recreational exercises act as atheroprotectors and can retard atherogenesis. The search for molecular markers of atherogenesis and atheroprotection for predictive, preventive and personalized medicine appears to be the most promising direction for the study of atherosclerosis. In this work, we have analyzed 1068 human genes associated with atherogenesis, atherosclerosis and atheroprotection. The hub genes regulating these processes have been found to be the most ancient. In silico analysis of all 5112 SNPs in their promoters has revealed 330 candidate SNP markers, which statistically significantly change the affinity of the TATA-binding protein (TBP) for these promoters. These molecular markers have made us confident that natural selection acts against underexpression of the hub genes for atherogenesis, atherosclerosis and atheroprotection. At the same time, upregulation of the one for atheroprotection promotes human health.
Collapse
Affiliation(s)
- Anton Bogomolov
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences (SB RAS), Novosibirsk 630090, Russia
| | - Sergey Filonov
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences (SB RAS), Novosibirsk 630090, Russia
- The Natural Sciences Department, Novosibirsk State University, Novosibirsk 630090, Russia
| | - Irina Chadaeva
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences (SB RAS), Novosibirsk 630090, Russia
| | - Dmitry Rasskazov
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences (SB RAS), Novosibirsk 630090, Russia
| | - Bato Khandaev
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences (SB RAS), Novosibirsk 630090, Russia
- The Natural Sciences Department, Novosibirsk State University, Novosibirsk 630090, Russia
| | - Karina Zolotareva
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences (SB RAS), Novosibirsk 630090, Russia
- The Natural Sciences Department, Novosibirsk State University, Novosibirsk 630090, Russia
| | - Anna Kazachek
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences (SB RAS), Novosibirsk 630090, Russia
- The Natural Sciences Department, Novosibirsk State University, Novosibirsk 630090, Russia
| | - Dmitry Oshchepkov
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences (SB RAS), Novosibirsk 630090, Russia
| | - Vladimir A Ivanisenko
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences (SB RAS), Novosibirsk 630090, Russia
| | - Pavel Demenkov
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences (SB RAS), Novosibirsk 630090, Russia
| | - Nikolay Podkolodnyy
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences (SB RAS), Novosibirsk 630090, Russia
- Institute of Computational Mathematics and Mathematical Geophysics, Novosibirsk 630090, Russia
| | - Ekaterina Kondratyuk
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences (SB RAS), Novosibirsk 630090, Russia
| | - Petr Ponomarenko
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences (SB RAS), Novosibirsk 630090, Russia
| | - Olga Podkolodnaya
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences (SB RAS), Novosibirsk 630090, Russia
| | - Zakhar Mustafin
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences (SB RAS), Novosibirsk 630090, Russia
| | - Ludmila Savinkova
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences (SB RAS), Novosibirsk 630090, Russia
| | - Nikolay Kolchanov
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences (SB RAS), Novosibirsk 630090, Russia
| | - Natalya Tverdokhleb
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences (SB RAS), Novosibirsk 630090, Russia
| | - Mikhail Ponomarenko
- Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences (SB RAS), Novosibirsk 630090, Russia
| |
Collapse
|