1
|
Bohutínská M, Vlček J, Monnahan P, Kolář F. Population Genomic Analysis of Diploid-Autopolyploid Species. Methods Mol Biol 2023; 2545:297-324. [PMID: 36720820 DOI: 10.1007/978-1-0716-2561-3_16] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
This chapter outlines an empirical analysis of genome-wide single-nucleotide polymorphism (SNP) variation and its underlying drivers among multiple natural populations within a diploid-autopolyploid species. The aim is to reconstruct the genetic structure among natural populations of varying ploidy and infer footprints of selection in these populations, framed around specific questions that are typically encountered when analyzing a mixed-ploidy data set,e.g., addressing the relevance of natural whole-genome duplication for speciation and adaptation. We briefly review the options for the analysis of polyploid population genomic data involving variant calling, population structure, demographic history inference, and selection scanning approaches. Further, we provide suggestions for methods and associated software, possible caveats, and examples of their application to mixed-ploidy and autopolyploid data sets.
Collapse
Affiliation(s)
- Magdalena Bohutínská
- Department of Botany, Faculty of Science, Charles University, Prague, Czech Republic.,Institute of Botany of the Czech Academy of Sciences, Průhonice, Czech Republic
| | - Jakub Vlček
- Department of Botany, Faculty of Science, Charles University, Prague, Czech Republic
| | - Patrick Monnahan
- Department of Pediatrics, University of Minnesota, Minneapolis, MN, USA
| | - Filip Kolář
- Department of Botany, Faculty of Science, Charles University, Prague, Czech Republic. .,Institute of Botany of the Czech Academy of Sciences, Průhonice, Czech Republic.
| |
Collapse
|
2
|
Saada OA, Friedrich A, Schacherer J. Towards accurate, contiguous and complete alignment-based polyploid phasing algorithms. Genomics 2022; 114:110369. [PMID: 35483655 DOI: 10.1016/j.ygeno.2022.110369] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2021] [Revised: 03/09/2022] [Accepted: 04/11/2022] [Indexed: 01/14/2023]
Abstract
Phasing, and in particular polyploid phasing, have been challenging problems held back by the limited read length of high-throughput short read sequencing methods which can't overcome the distance between heterozygous sites and labor high cost of alternative methods such as the physical separation of chromosomes for example. Recently developed single molecule long-read sequencing methods provide much longer reads which overcome this previous limitation. Here we review the alignment-based methods of polyploid phasing that rely on four main strategies: population inference methods, which leverage the genetic information of several individuals to phase a sample; objective function minimization methods, which minimize a function such as the Minimum Error Correction (MEC); graph partitioning methods, which represent the read data as a graph and split it into k haplotype subgraphs; cluster building methods, which iteratively grow clusters of similar reads into a final set of clusters that represent the haplotypes. We discuss the advantages and limitations of these methods and the metrics used to assess their performance, proposing that accuracy and contiguity are the most meaningful metrics. Finally, we propose the field of alignment-based polyploid phasing would greatly benefit from the use of a well-designed benchmarking dataset with appropriate evaluation metrics. We consider that there are still significant improvements which can be achieved to obtain more accurate and contiguous polyploid phasing results which reflect the complexity of polyploid genome architectures.
Collapse
Affiliation(s)
- Omar Abou Saada
- Université de Strasbourg, CNRS, GMGM UMR, 7156 Strasbourg, France
| | - Anne Friedrich
- Université de Strasbourg, CNRS, GMGM UMR, 7156 Strasbourg, France
| | - Joseph Schacherer
- Université de Strasbourg, CNRS, GMGM UMR, 7156 Strasbourg, France; Institut Universitaire de France (IUF), Paris, France.
| |
Collapse
|
3
|
Shaw J, Yu YW. flopp: Extremely Fast Long-Read Polyploid Haplotype Phasing by Uniform Tree Partitioning. J Comput Biol 2022; 29:195-211. [PMID: 35041529 PMCID: PMC8892958 DOI: 10.1089/cmb.2021.0436] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Resolving haplotypes in polyploid genomes using phase information from sequencing reads is an important and challenging problem. We introduce two new mathematical formulations of polyploid haplotype phasing: (1) the min-sum max tree partition problem, which is a more flexible graphical metric compared with the standard minimum error correction (MEC) model in the polyploid setting, and (2) the uniform probabilistic error minimization model, which is a probabilistic analogue of the MEC model. We incorporate both formulations into a long-read based polyploid haplotype phasing method called flopp. We show that flopp compares favorably with state-of-the-art algorithms-up to 30 times faster with 2 times fewer switch errors on 6 × ploidy simulated data. Further, we show using real nanopore data that flopp can quickly reveal reasonable haplotype structures from the autotetraploid Solanum tuberosum (potato).
Collapse
Affiliation(s)
- Jim Shaw
- Department of Mathematics, University of Toronto, Toronto, Canada
| | - Yun William Yu
- Department of Mathematics, University of Toronto, Toronto, Canada.,Computer and Mathematical Sciences, University of Toronto at Scarborough, Scarborough, Canada
| |
Collapse
|
4
|
Schrinner SD, Mari RS, Ebler J, Rautiainen M, Seillier L, Reimer JJ, Usadel B, Marschall T, Klau GW. Haplotype threading: accurate polyploid phasing from long reads. Genome Biol 2020; 21:252. [PMID: 32951599 PMCID: PMC7504856 DOI: 10.1186/s13059-020-02158-1] [Citation(s) in RCA: 36] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2020] [Accepted: 08/26/2020] [Indexed: 01/19/2023] Open
Abstract
Resolving genomes at haplotype level is crucial for understanding the evolutionary history of polyploid species and for designing advanced breeding strategies. Polyploid phasing still presents considerable challenges, especially in regions of collapsing haplotypes.We present WHATSHAP POLYPHASE, a novel two-stage approach that addresses these challenges by (i) clustering reads and (ii) threading the haplotypes through the clusters. Our method outperforms the state-of-the-art in terms of phasing quality. Using a real tetraploid potato dataset, we demonstrate how to assemble local genomic regions of interest at the haplotype level. Our algorithm is implemented as part of the widely used open source tool WhatsHap.
Collapse
Affiliation(s)
- Sven D Schrinner
- Algorithmic Bioinformatics, Heinrich Heine University Düsseldorf, Universitätsstr. 1, Düsseldorf, 40225, Germany
| | - Rebecca Serra Mari
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Moorenstraße 5, Düsseldorf, 40225, Germany
- Center for Bioinformatics, Saarland University, Saarland Informatics Campus E2.1, Saarbrücken, 66123, Germany
- Graduate School of Computer Science, Saarland Informatics Campus E1.3, Saarbrücken, 66123, Germany
| | - Jana Ebler
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Moorenstraße 5, Düsseldorf, 40225, Germany
| | - Mikko Rautiainen
- Center for Bioinformatics, Saarland University, Saarland Informatics Campus E2.1, Saarbrücken, 66123, Germany
- Graduate School of Computer Science, Saarland Informatics Campus E1.3, Saarbrücken, 66123, Germany
- Max Planck Institute for Informatics, Saarbrücken, 66123, Germany
| | - Lancelot Seillier
- Institute for Biology I, RWTH Aachen, Worringer Weg 3, Aachen, 52074, Germany
| | - Julia J Reimer
- Institute for Biology I, RWTH Aachen, Worringer Weg 3, Aachen, 52074, Germany
| | - Björn Usadel
- Forschungszentrum Jülich IBG-4, Wilhelm-Johnen-Str., Jülich, 52428, Germany
- Institute for Biology I, RWTH Aachen, Worringer Weg 3, Aachen, 52074, Germany
- Cluster of Excellence on Plant Sciences (CEPLAS), Heinrich Heine University Düsseldorf, Universitätsstr. 1, Düsseldorf, 40225, Germany
| | - Tobias Marschall
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Moorenstraße 5, Düsseldorf, 40225, Germany.
| | - Gunnar W Klau
- Algorithmic Bioinformatics, Heinrich Heine University Düsseldorf, Universitätsstr. 1, Düsseldorf, 40225, Germany.
- Cluster of Excellence on Plant Sciences (CEPLAS), Heinrich Heine University Düsseldorf, Universitätsstr. 1, Düsseldorf, 40225, Germany.
| |
Collapse
|