1
|
Leger A, Brettell I, Monahan J, Barton C, Wolf N, Kusminski N, Herder C, Aadepu N, Becker C, Gierten J, Hammouda OT, Hasel E, Lischik C, Lust K, Sokolova N, Suzuki R, Tavhelidse T, Thumberger T, Tsingos E, Watson P, Welz B, Naruse K, Loosli F, Wittbrodt J, Birney E, Fitzgerald T. Genomic variations and epigenomic landscape of the Medaka Inbred Kiyosu-Karlsruhe (MIKK) panel. Genome Biol 2022; 23:58. [PMID: 35189951 PMCID: PMC8862245 DOI: 10.1186/s13059-022-02602-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Accepted: 01/05/2022] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND The teleost medaka (Oryzias latipes) is a well-established vertebrate model system, with a long history of genetic research, and multiple high-quality reference genomes available for several inbred strains. Medaka has a high tolerance to inbreeding from the wild, thus allowing one to establish inbred lines from wild founder individuals. RESULTS We exploit this feature to create an inbred panel resource: the Medaka Inbred Kiyosu-Karlsruhe (MIKK) panel. This panel of 80 near-isogenic inbred lines contains a large amount of genetic variation inherited from the original wild population. We use Oxford Nanopore Technologies (ONT) long read data to further investigate the genomic and epigenomic landscapes of a subset of the MIKK panel. Nanopore sequencing allows us to identify a large variety of high-quality structural variants, and we present results and methods using a pan-genome graph representation of 12 individual medaka lines. This graph-based reference MIKK panel genome reveals novel differences between the MIKK panel lines and standard linear reference genomes. We find additional MIKK panel-specific genomic content that would be missing from linear reference alignment approaches. We are also able to identify and quantify the presence of repeat elements in each of the lines. Finally, we investigate line-specific CpG methylation and performed differential DNA methylation analysis across these 12 lines. CONCLUSIONS We present a detailed analysis of the MIKK panel genomes using long and short read sequence technologies, creating a MIKK panel-specific pan genome reference dataset allowing for investigation of novel variation types that would be elusive using standard approaches.
Collapse
Affiliation(s)
- Adrien Leger
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Ian Brettell
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Jack Monahan
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Carl Barton
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Nadeshda Wolf
- Institute of Biological and Chemical Systems, Biological Information Processing (IBCS-BIP), Karlsruhe Institute of Technology, 76131, Karlsruhe, Germany
| | - Natalja Kusminski
- Institute of Biological and Chemical Systems, Biological Information Processing (IBCS-BIP), Karlsruhe Institute of Technology, 76131, Karlsruhe, Germany
| | - Cathrin Herder
- Institute of Biological and Chemical Systems, Biological Information Processing (IBCS-BIP), Karlsruhe Institute of Technology, 76131, Karlsruhe, Germany
| | - Narendar Aadepu
- Institute of Biological and Chemical Systems, Biological Information Processing (IBCS-BIP), Karlsruhe Institute of Technology, 76131, Karlsruhe, Germany.,Centre for Organismal Studies, University of Heidelberg, Campus Im Neuenheimer Feld, Heidelberg, Germany
| | - Clara Becker
- Centre for Organismal Studies, University of Heidelberg, Campus Im Neuenheimer Feld, Heidelberg, Germany
| | - Jakob Gierten
- Centre for Organismal Studies, University of Heidelberg, Campus Im Neuenheimer Feld, Heidelberg, Germany
| | - Omar T Hammouda
- Centre for Organismal Studies, University of Heidelberg, Campus Im Neuenheimer Feld, Heidelberg, Germany
| | - Eva Hasel
- Centre for Organismal Studies, University of Heidelberg, Campus Im Neuenheimer Feld, Heidelberg, Germany
| | - Colin Lischik
- Centre for Organismal Studies, University of Heidelberg, Campus Im Neuenheimer Feld, Heidelberg, Germany
| | - Katharina Lust
- Centre for Organismal Studies, University of Heidelberg, Campus Im Neuenheimer Feld, Heidelberg, Germany
| | - Natalia Sokolova
- Centre for Organismal Studies, University of Heidelberg, Campus Im Neuenheimer Feld, Heidelberg, Germany
| | - Risa Suzuki
- Centre for Organismal Studies, University of Heidelberg, Campus Im Neuenheimer Feld, Heidelberg, Germany
| | - Tinatini Tavhelidse
- Centre for Organismal Studies, University of Heidelberg, Campus Im Neuenheimer Feld, Heidelberg, Germany
| | - Thomas Thumberger
- Centre for Organismal Studies, University of Heidelberg, Campus Im Neuenheimer Feld, Heidelberg, Germany
| | - Erika Tsingos
- Centre for Organismal Studies, University of Heidelberg, Campus Im Neuenheimer Feld, Heidelberg, Germany
| | - Philip Watson
- Centre for Organismal Studies, University of Heidelberg, Campus Im Neuenheimer Feld, Heidelberg, Germany
| | - Bettina Welz
- Centre for Organismal Studies, University of Heidelberg, Campus Im Neuenheimer Feld, Heidelberg, Germany
| | - Kiyoshi Naruse
- National Institute for Basic Biology, Laboratory of Bioresources, Okazaki, Japan
| | - Felix Loosli
- Institute of Biological and Chemical Systems, Biological Information Processing (IBCS-BIP), Karlsruhe Institute of Technology, 76131, Karlsruhe, Germany
| | - Joachim Wittbrodt
- Centre for Organismal Studies, University of Heidelberg, Campus Im Neuenheimer Feld, Heidelberg, Germany
| | - Ewan Birney
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Tomas Fitzgerald
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| |
Collapse
|
2
|
Guo J, Pang E, Song H, Lin K. A tri-tuple coordinate system derived for fast and accurate analysis of the colored de Bruijn graph-based pangenomes. BMC Bioinformatics 2021; 22:282. [PMID: 34044757 PMCID: PMC8161984 DOI: 10.1186/s12859-021-04149-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2021] [Accepted: 04/25/2021] [Indexed: 11/25/2022] Open
Abstract
Background With the rapid development of accurate sequencing and assembly technologies, an increasing number of high-quality chromosome-level and haplotype-resolved assemblies of genomic sequences have been derived, from which there will be great opportunities for computational pangenomics. Although genome graphs are among the most useful models for pangenome representation, their structural complexity makes it difficult to present genome information intuitively, such as the linear reference genome. Thus, efficiently and accurately analyzing the genome graph spatial structure and coordinating the information remains a substantial challenge. Results We developed a new method, a colored superbubble (cSupB), that can overcome the complexity of graphs and organize a set of species- or population-specific haplotype sequences of interest. Based on this model, we propose a tri-tuple coordinate system that combines an offset value, topological structure and sample information. Additionally, cSupB provides a novel method that utilizes complete topological information and efficiently detects small indels (< 50 bp) for highly similar samples, which can be validated by simulated datasets. Moreover, we demonstrated that cSupB can adapt to the complex cycle structure. Conclusions Although the solution is made suitable for increasingly complex genome graphs by relaxing the constraint, the directed acyclic graph, the motif cSupB and the cSupB method can be extended to any colored directed acyclic graph. We anticipate that our method will facilitate the analysis of individual haplotype variants and population genomic diversity. We have developed a C + + program for implementing our method that is available at https://github.com/eggleader/cSupB. Supplementary information The online version contains supplementary material available at 10.1186/s12859-021-04149-w.
Collapse
Affiliation(s)
- Jindan Guo
- State Key Laboratory of Earth Surface Processes and Resource Ecology, Ministry of Education Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, Beijing, China
| | - Erli Pang
- State Key Laboratory of Earth Surface Processes and Resource Ecology, Ministry of Education Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, Beijing, China
| | - Hongtao Song
- State Key Laboratory of Earth Surface Processes and Resource Ecology, Ministry of Education Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, Beijing, China
| | - Kui Lin
- State Key Laboratory of Earth Surface Processes and Resource Ecology, Ministry of Education Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, Beijing, China.
| |
Collapse
|
3
|
Abstract
Motivation Whole-genome alignment (WGA) methods show insufficient scalability toward the generation of large-scale WGAs. Profile alignment-based approaches revolutionized the fields of multiple sequence alignment construction methods by significantly reducing computational complexity and runtime. However, WGAs need to consider genomic rearrangements between genomes, which make the profile-based extension of several whole-genomes challenging. Currently, none of the available methods offer the possibility to align or extend WGA profiles. Results Here, we present genome profile alignment, an approach that aligns the profiles of WGAs and that is capable of producing large-scale WGAs many times faster than conventional methods. Our concept relies on already available whole-genome aligners, which are used to compute several smaller sets of aligned genomes that are combined to a full WGA with a divide and conquer approach. To align or extend WGA profiles, we make use of the SuperGenome data structure, which features a bidirectional mapping between individual sequence and alignment coordinates. This data structure is used to efficiently transfer different coordinate systems into a common one based on the principles of profiles alignments. The approach allows the computation of a WGA where alignments are subsequently merged along a guide tree. The current implementation uses progressiveMauve and offers the possibility for parallel computation of independent genome alignments. Our results based on various bacterial datasets up to several hundred genomes show that we can reduce the runtime from months to hours with a quality that is negligibly worse than the WGA computed with the conventional progressiveMauve tool. Availability and implementation GPA is freely available at https://lambda.informatik.uni-tuebingen.de/gitlab/ahennig/GPA. GPA is implemented in Java, uses progressiveMauve and offers a parallel computation of WGAs. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- André Hennig
- Center for Bioinformatics (ZBIT), Integrative Transcriptomics, Eberhard Karls University of Tübingen, Tübingen, Germany
| | - Kay Nieselt
- Center for Bioinformatics (ZBIT), Integrative Transcriptomics, Eberhard Karls University of Tübingen, Tübingen, Germany
| |
Collapse
|
4
|
Gärtner F, Müller L, Stadler PF. Superbubbles revisited. Algorithms Mol Biol 2018; 13:16. [PMID: 30519278 PMCID: PMC6271648 DOI: 10.1186/s13015-018-0134-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2018] [Accepted: 11/21/2018] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Superbubbles are distinctive subgraphs in direct graphs that play an important role in assembly algorithms for high-throughput sequencing (HTS) data. Their practical importance derives from the fact they are connected to their host graph by a single entrance and a single exit vertex, thus allowing them to be handled independently. Efficient algorithms for the enumeration of superbubbles are therefore of important for the processing of HTS data. Superbubbles can be identified within the strongly connected components of the input digraph after transforming them into directed acyclic graphs. The algorithm by Sung et al. (IEEE ACM Trans Comput Biol Bioinform 12:770-777, 2015) achieves this task in O ( m l o g ( m ) ) -time. The extraction of superbubbles from the transformed components was later improved to by Brankovic et al. (Theor Comput Sci 609:374-383, 2016) resulting in an overall O ( m + n ) -time algorithm. RESULTS A re-analysis of the mathematical structure of superbubbles showed that the construction of auxiliary DAGs from the strongly connected components in the work of Sung et al. missed some details that can lead to the reporting of false positive superbubbles. We propose an alternative, even simpler auxiliary graph that solved the problem and retains the linear running time for general digraph. Furthermore, we describe a simpler, space-efficient O ( m + n ) -time algorithm for detecting superbubbles in DAGs that uses only simple data structures. IMPLEMENTATION We present a reference implementation of the algorithm that accepts many commonly used formats for the input graph and provides convenient access to the improved algorithm. https://github.com/Fabianexe/Superbubble.
Collapse
|