1
|
Mbo Nkoulou LF, Ngalle HB, Cros D, Adje COA, Fassinou NVH, Bell J, Achigan-Dako EG. Perspective for genomic-enabled prediction against black sigatoka disease and drought stress in polyploid species. FRONTIERS IN PLANT SCIENCE 2022; 13:953133. [PMID: 36388523 PMCID: PMC9650417 DOI: 10.3389/fpls.2022.953133] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Accepted: 09/28/2022] [Indexed: 06/16/2023]
Abstract
Genomic selection (GS) in plant breeding is explored as a promising tool to solve the problems related to the biotic and abiotic threats. Polyploid plants like bananas (Musa spp.) face the problem of drought and black sigatoka disease (BSD) that restrict their production. The conventional plant breeding is experiencing difficulties, particularly phenotyping costs and long generation interval. To overcome these difficulties, GS in plant breeding is explored as an alternative with a great potential for reducing costs and time in selection process. So far, GS does not have the same success in polyploid plants as with diploid plants because of the complexity of their genome. In this review, we present the main constraints to the application of GS in polyploid plants and the prospects for overcoming these constraints. Particular emphasis is placed on breeding for BSD and drought-two major threats to banana production-used in this review as a model of polyploid plant. It emerges that the difficulty in obtaining markers of good quality in polyploids is the first challenge of GS on polyploid plants, because the main tools used were developed for diploid species. In addition to that, there is a big challenge of mastering genetic interactions such as dominance and epistasis effects as well as the genotype by environment interaction, which are very common in polyploid plants. To get around these challenges, we have presented bioinformatics tools, as well as artificial intelligence approaches, including machine learning. Furthermore, a scheme for applying GS to banana for BSD and drought has been proposed. This review is of paramount impact for breeding programs that seek to reduce the selection cycle of polyploids despite the complexity of their genome.
Collapse
Affiliation(s)
- Luther Fort Mbo Nkoulou
- Genetics, Biotechnology, and Seed Science Unit (GBioS), Department of Plant Sciences, Faculty of Agronomic Sciences, University of Abomey Calavi, Cotonou, Benin
- Unit of Genetics and Plant Breeding (UGAP), Department of Plant Biology, Faculty of Sciences, University of Yaoundé 1, Yaoundé, Cameroon
- Institute of Agricultural Research for Development, Centre de Recherche Agricole de Mbalmayo (CRAM), Mbalmayo, Cameroon
| | - Hermine Bille Ngalle
- Unit of Genetics and Plant Breeding (UGAP), Department of Plant Biology, Faculty of Sciences, University of Yaoundé 1, Yaoundé, Cameroon
| | - David Cros
- Centre de Coopération Internationale en Recherche Agronomique pour le Développement (CIRAD), Unité Mixte de Recherche (UMR) Amélioration Génétique et Adaptation des Plantes méditerranéennes et tropicales (AGAP) Institut, Montpellier, France
- Unité Mixte de Recherche (UMR) Amélioration Génétique et Adaptation des Plantes méditerranéennes et tropicales (AGAP) Institut, University of Montpellier, Centre de Coopération Internationale en Recherche Agronomique pour le Développement (CIRAD), Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE), Institut Agro, Montpellier, France
| | - Charlotte O. A. Adje
- Genetics, Biotechnology, and Seed Science Unit (GBioS), Department of Plant Sciences, Faculty of Agronomic Sciences, University of Abomey Calavi, Cotonou, Benin
| | - Nicodeme V. H. Fassinou
- Genetics, Biotechnology, and Seed Science Unit (GBioS), Department of Plant Sciences, Faculty of Agronomic Sciences, University of Abomey Calavi, Cotonou, Benin
| | - Joseph Bell
- Unit of Genetics and Plant Breeding (UGAP), Department of Plant Biology, Faculty of Sciences, University of Yaoundé 1, Yaoundé, Cameroon
| | - Enoch G. Achigan-Dako
- Genetics, Biotechnology, and Seed Science Unit (GBioS), Department of Plant Sciences, Faculty of Agronomic Sciences, University of Abomey Calavi, Cotonou, Benin
| |
Collapse
|
2
|
Voorrips RE, Tumino G. PolyHaplotyper: haplotyping in polyploids based on bi-allelic marker dosage data. BMC Bioinformatics 2022; 23:442. [PMID: 36274121 PMCID: PMC9590153 DOI: 10.1186/s12859-022-04989-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2021] [Accepted: 10/16/2022] [Indexed: 11/18/2022] Open
Abstract
Background For genetic analyses, multi-allelic markers have an advantage over bi-allelic markers like SNPs (single nucleotide polymorphisms) in that they carry more information about the genetic constitution of individuals. This is especially the case in polyploids, where individuals carry more than two alleles at each locus. Haploblocks are multi-allelic markers that can be derived by phasing sets of closely-linked SNP markers. Phased haploblocks, similarly to other multi-allelic markers, will therefore be advantageous in genetic tasks like linkage mapping, QTL mapping and genome-wide association studies. Results We present a new method to reconstruct haplotypes from SNP dosages derived from genotyping arrays, which is applicable to polyploids. This method is implemented in the software package PolyHaplotyper. In contrast to existing packages for polyploids it makes use of full-sib families among the samples to guide the haplotyping process. We show that in this situation it is much more accurate than other available software, using experimental hexaploid data and simulated tetraploid data. Conclusions Our method and the software package PolyHaplotyper in which it is implemented extend the available tools for haplotyping in polyploids. They perform especially well in situations where one or more full-sib families are present. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04989-0.
Collapse
|
3
|
Saada OA, Friedrich A, Schacherer J. Towards accurate, contiguous and complete alignment-based polyploid phasing algorithms. Genomics 2022; 114:110369. [PMID: 35483655 DOI: 10.1016/j.ygeno.2022.110369] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2021] [Revised: 03/09/2022] [Accepted: 04/11/2022] [Indexed: 01/14/2023]
Abstract
Phasing, and in particular polyploid phasing, have been challenging problems held back by the limited read length of high-throughput short read sequencing methods which can't overcome the distance between heterozygous sites and labor high cost of alternative methods such as the physical separation of chromosomes for example. Recently developed single molecule long-read sequencing methods provide much longer reads which overcome this previous limitation. Here we review the alignment-based methods of polyploid phasing that rely on four main strategies: population inference methods, which leverage the genetic information of several individuals to phase a sample; objective function minimization methods, which minimize a function such as the Minimum Error Correction (MEC); graph partitioning methods, which represent the read data as a graph and split it into k haplotype subgraphs; cluster building methods, which iteratively grow clusters of similar reads into a final set of clusters that represent the haplotypes. We discuss the advantages and limitations of these methods and the metrics used to assess their performance, proposing that accuracy and contiguity are the most meaningful metrics. Finally, we propose the field of alignment-based polyploid phasing would greatly benefit from the use of a well-designed benchmarking dataset with appropriate evaluation metrics. We consider that there are still significant improvements which can be achieved to obtain more accurate and contiguous polyploid phasing results which reflect the complexity of polyploid genome architectures.
Collapse
Affiliation(s)
- Omar Abou Saada
- Université de Strasbourg, CNRS, GMGM UMR, 7156 Strasbourg, France
| | - Anne Friedrich
- Université de Strasbourg, CNRS, GMGM UMR, 7156 Strasbourg, France
| | - Joseph Schacherer
- Université de Strasbourg, CNRS, GMGM UMR, 7156 Strasbourg, France; Institut Universitaire de France (IUF), Paris, France.
| |
Collapse
|
4
|
Gerard D. Pairwise linkage disequilibrium estimation for polyploids. Mol Ecol Resour 2021; 21:1230-1242. [PMID: 33559321 DOI: 10.1111/1755-0998.13349] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2020] [Revised: 01/18/2021] [Accepted: 02/01/2021] [Indexed: 12/31/2022]
Abstract
Many tasks in statistical genetics involve pairwise estimation of linkage disequilibrium (LD). The study of LD in diploids is mature. However, in polyploids, the field lacks a comprehensive characterization of LD. Polyploids also exhibit greater levels of genotype uncertainty than diploids, yet no methods currently exist to estimate LD in polyploids in the presence of such genotype uncertainty. Furthermore, most LD estimation methods do not quantify the level of uncertainty in their LD estimates. Our study contains three major contributions. (i) We characterize haplotypic and composite measures of LD in polyploids. These composite measures of LD turn out to be functions of common statistical measures of association. (ii) We derive procedures to estimate haplotypic and composite LD in polyploids in the presence of genotype uncertainty. We do this by estimating LD directly from genotype likelihoods, which may be obtained from many genotyping platforms. (iii) We derive standard errors of all LD estimators that we discuss. We validate our methods on both real and simulated data. Our methods are implemented in the R package ldsep, available on the Comprehensive R Archive Network https://cran.r-project.org/package=ldsep.
Collapse
Affiliation(s)
- David Gerard
- Department of Mathematics and Statistics, American University, Washington, DC, USA
| |
Collapse
|
5
|
Zhou C, Olukolu B, Gemenet DC, Wu S, Gruneberg W, Cao MD, Fei Z, Zeng ZB, George AW, Khan A, Yencho GC, Coin LJM. Assembly of whole-chromosome pseudomolecules for polyploid plant genomes using outbred mapping populations. Nat Genet 2020; 52:1256-1264. [PMID: 33128049 DOI: 10.1038/s41588-020-00717-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2017] [Accepted: 09/15/2020] [Indexed: 12/31/2022]
Abstract
Despite advances in sequencing technologies, assembly of complex plant genomes remains elusive due to polyploidy and high repeat content. Here we report PolyGembler for grouping and ordering contigs into pseudomolecules by genetic linkage analysis. Our approach also provides an accurate method with which to detect and fix assembly errors. Using simulated data, we demonstrate that our approach is of high accuracy and outperforms three existing state-of-the-art genetic mapping tools. Particularly, our approach is more robust to the presence of missing genotype data and genotyping errors. We used our method to construct pseudomolecules for allotetraploid lawn grass utilizing PacBio long reads in combination with restriction site-associated DNA sequencing, and for diploid Ipomoea trifida and autotetraploid potato utilizing contigs assembled from Illumina reads in combination with genotype data generated by single-nucleotide polymorphism arrays and genotyping by sequencing, respectively. We resolved 13 assembly errors for a published I. trifida genome assembly and anchored eight unplaced scaffolds in the published potato genome.
Collapse
Affiliation(s)
- Chenxi Zhou
- Institute for Molecular Bioscience, University of Queensland, Brisbane, Queensland, Australia
- Department of Clinical Pathology, University of Melbourne, Melbourne, Victoria, Australia
| | - Bode Olukolu
- Department of Horticultural Science, North Carolina State University, Raleigh, NC, USA
- Department of Entomology and Plant Pathology, University of Tennessee, Knoxville, TN, USA
| | - Dorcus C Gemenet
- International Potato Center, Lima, Peru
- CGIAR Excellence in Breeding Platform, International Maize and Wheat Improvement Center, Nairobi, Kenya
| | - Shan Wu
- Boyce Thompson Institute, Cornell University, Ithaca, NY, USA
| | | | - Minh Duc Cao
- Institute for Molecular Bioscience, University of Queensland, Brisbane, Queensland, Australia
| | - Zhangjun Fei
- Boyce Thompson Institute, Cornell University, Ithaca, NY, USA
| | - Zhao-Bang Zeng
- Department of Statistics, North Carolina State University, Raleigh, NC, USA
- Bioinformatics Research Center, North Carolina State University, Raleigh, NC, USA
| | - Andrew W George
- Data61, Commonwealth Scientific and Industrial Research Organisation, Brisbane, Queensland, Australia
| | - Awais Khan
- International Potato Center, Lima, Peru
- Department of Plant Pathology and Plant-Microbe Biology, Cornell University, Geneva, NY, USA
| | - G Craig Yencho
- Department of Horticultural Science, North Carolina State University, Raleigh, NC, USA
| | - Lachlan J M Coin
- Institute for Molecular Bioscience, University of Queensland, Brisbane, Queensland, Australia.
- Department of Clinical Pathology, University of Melbourne, Melbourne, Victoria, Australia.
- The Peter Doherty Institute for Infection and Immunity, University of Melbourne, Melbourne, Victoria, Australia.
| |
Collapse
|
6
|
Bourke PM, Voorrips RE, Visser RGF, Maliepaard C. Tools for Genetic Studies in Experimental Populations of Polyploids. FRONTIERS IN PLANT SCIENCE 2018; 9:513. [PMID: 29720992 PMCID: PMC5915555 DOI: 10.3389/fpls.2018.00513] [Citation(s) in RCA: 55] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/25/2018] [Accepted: 04/04/2018] [Indexed: 05/19/2023]
Abstract
Polyploid organisms carry more than two copies of each chromosome, a condition rarely tolerated in animals but which occurs relatively frequently in the plant kingdom. One of the principal challenges faced by polyploid organisms is to evolve stable meiotic mechanisms to faithfully transmit genetic information to the next generation upon which the study of inheritance is based. In this review we look at the tools available to the research community to better understand polyploid inheritance, many of which have only recently been developed. Most of these tools are intended for experimental populations (rather than natural populations), facilitating genomics-assisted crop improvement and plant breeding. This is hardly surprising given that a large proportion of domesticated plant species are polyploid. We focus on three main areas: (1) polyploid genotyping; (2) genetic and physical mapping; and (3) quantitative trait analysis and genomic selection. We also briefly review some miscellaneous topics such as the mode of inheritance and the availability of polyploid simulation software. The current polyploid analytic toolbox includes software for assigning marker genotypes (and in particular, estimating the dosage of marker alleles in the heterozygous condition), establishing chromosome-scale linkage phase among marker alleles, constructing (short-range) haplotypes, generating linkage maps, performing genome-wide association studies (GWAS) and quantitative trait locus (QTL) analyses, and simulating polyploid populations. These tools can also help elucidate the mode of inheritance (disomic, polysomic or a mixture of both as in segmental allopolyploids) or reveal whether double reduction and multivalent chromosomal pairing occur. An increasing number of polyploids (or associated diploids) are being sequenced, leading to publicly available reference genome assemblies. Much work remains in order to keep pace with developments in genomic technologies. However, such technologies also offer the promise of understanding polyploid genomes at a level which hitherto has remained elusive.
Collapse
Affiliation(s)
| | | | | | - Chris Maliepaard
- Plant Breeding, Wageningen University & Research, Wageningen, Netherlands
| |
Collapse
|
7
|
Eriksson JS, de Sousa F, Bertrand YJK, Antonelli A, Oxelman B, Pfeil BE. Allele phasing is critical to revealing a shared allopolyploid origin of Medicago arborea and M. strasseri (Fabaceae). BMC Evol Biol 2018; 18:9. [PMID: 29374461 PMCID: PMC5787288 DOI: 10.1186/s12862-018-1127-z] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2017] [Accepted: 01/22/2018] [Indexed: 01/07/2023] Open
Abstract
BACKGROUND Whole genome duplication plays a central role in plant evolution. There are two main classes of polyploid formation: autopolyploids which arise within one species by doubling of similar homologous genomes; in contrast, allopolyploidy (hybrid polyploidy) arise via hybridization and subsequent doubling of nonhomologous (homoeologous) genomes. The distinction between polyploid origins can be made using gene phylogenies, if alleles from each genome can be correctly retrieved. We examined whether two closely related tetraploid Mediterranean shrubs (Medicago arborea and M. strasseri) have an allopolyploid origin - a question that has remained unsolved despite substantial previous research. We sequenced and analyzed ten low-copy nuclear genes from these and related species, phasing all alleles. To test the efficacy of allele phasing on the ability to recover the evolutionary origin of polyploids, we compared these results to analyses using unphased sequences. RESULTS In eight of the gene trees the alleles inferred from the tetraploids formed two clades, in a non-sister relationship. Each of these clades was more closely related to alleles sampled from other species of Medicago, a pattern typical of allopolyploids. However, we also observed that alleles from one of the remaining genes formed two clades that were sister to one another, as is expected for autopolyploids. Trees inferred from unphased sequences were very different, with the tetraploids often placed in poorly supported and different positions compared to results obtained using phased alleles. CONCLUSIONS The complex phylogenetic history of M. arborea and M. strasseri is explained predominantly by shared allotetraploidy. We also observed that an increase in woodiness is correlated with polyploidy in this group of species and present a new possibility that woodiness could be a transgressive phenotype. Correctly phased homoeologues are likely to be critical for inferring the hybrid origin of allopolyploid species, when most genes retain more than one homoeologue. Ignoring homoeologous variation by merging the homoeologues can obscure the signal of hybrid polyploid origins and produce inaccurate results.
Collapse
Affiliation(s)
- Jonna S Eriksson
- Department of Biological and Environmental Sciences, University of Gothenburg, Box 461, 40530, Gothenburg, Sweden. .,Gothenburg Global Biodiversity Centre, Box 461, SE-405 30, Göteborg, Sweden.
| | - Filipe de Sousa
- Department of Biological and Environmental Sciences, University of Gothenburg, Box 461, 40530, Gothenburg, Sweden
| | - Yann J K Bertrand
- Department of Biological and Environmental Sciences, University of Gothenburg, Box 461, 40530, Gothenburg, Sweden
| | - Alexandre Antonelli
- Department of Biological and Environmental Sciences, University of Gothenburg, Box 461, 40530, Gothenburg, Sweden.,Gothenburg Global Biodiversity Centre, Box 461, SE-405 30, Göteborg, Sweden.,Gothenburg Botanical Garden, SE-41319, Göteborg, Sweden
| | - Bengt Oxelman
- Department of Biological and Environmental Sciences, University of Gothenburg, Box 461, 40530, Gothenburg, Sweden.,Gothenburg Global Biodiversity Centre, Box 461, SE-405 30, Göteborg, Sweden
| | - Bernard E Pfeil
- Department of Biological and Environmental Sciences, University of Gothenburg, Box 461, 40530, Gothenburg, Sweden.,Gothenburg Global Biodiversity Centre, Box 461, SE-405 30, Göteborg, Sweden
| |
Collapse
|
8
|
Ventura RV, Miller SP, Dodds KG, Auvray B, Lee M, Bixley M, Clarke SM, McEwan JC. Assessing accuracy of imputation using different SNP panel densities in a multi-breed sheep population. Genet Sel Evol 2016; 48:71. [PMID: 27663120 PMCID: PMC5035503 DOI: 10.1186/s12711-016-0244-7] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2015] [Accepted: 08/31/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Genotype imputation is a key element of the implementation of genomic selection within the New Zealand sheep industry, but many factors can influence imputation accuracy. Our objective was to provide practical directions on the implementation of imputation strategies in a multi-breed sheep population genotyped with three single nucleotide polymorphism (SNP) panels: 5K, 50K and HD (600K SNPs). RESULTS Imputation from 5K to HD was slightly better (0.6 %) than imputation from 5K to 50K. Two-step imputation from 5K to 50K and then from 50K to HD outperformed direct imputation from 5K to HD. A slight loss in imputation accuracy was observed when a large fixed reference population was used compared to a smaller within-breed reference (including all 50K genotypes on animals from different breeds excluding those in the validation set i.e. to be imputed), but only for a few animals across all imputation scenarios from 5K to 50K. However, a major gain in imputation accuracy for a large proportion of animals (purebred and crossbred), justified the use of a fixed and large reference dataset for all situations. This study also investigated the loss in imputation accuracy specifically for SNPs located at the ends of each chromosome, and showed that only chromosome 26 had an overall imputation (5K to 50K) accuracy for 100 SNPs at each end higher than 60 % (r2). Most of the chromosomes displayed reduced imputation accuracy at least at one of their ends. Prediction of imputation accuracy based on the relatedness of low-density genotypes to those of the reference dataset, before imputation (without running an imputation software) was also investigated. FIMPUTE V2.2 outperformed BEAGLE 3.3.2 across all imputation scenarios. CONCLUSIONS Imputation accuracy in sheep breeds can be improved by following a set of recommendations on SNP panels, software, strategies of imputation (one- or two-step imputation), and choice of the animals to be genotyped using both high- and low-density SNP panels. We present a method that predicts imputation accuracy for individual animals at the low-density level, before running imputation, which can be used to restrict genomic prediction only to the animals that can be imputed with sufficient accuracy.
Collapse
Affiliation(s)
- Ricardo V Ventura
- Centre for Genetic Improvement of Livestock, University of Guelph, Guelph, ON, N1G2W1, Canada.,Beef Improvement Opportunities, Guelph, ON, N1K1E5, Canada
| | - Stephen P Miller
- Centre for Genetic Improvement of Livestock, University of Guelph, Guelph, ON, N1G2W1, Canada. .,Invermay Agricultural Centre, AgResearch Limited, Mosgiel, 9053, New Zealand.
| | - Ken G Dodds
- Invermay Agricultural Centre, AgResearch Limited, Mosgiel, 9053, New Zealand
| | - Benoit Auvray
- Department of Mathematics and Statistics, University of Otago, Dunedin, 9016, New Zealand
| | - Michael Lee
- Department of Mathematics and Statistics, University of Otago, Dunedin, 9016, New Zealand
| | - Matthew Bixley
- Invermay Agricultural Centre, AgResearch Limited, Mosgiel, 9053, New Zealand
| | - Shannon M Clarke
- Invermay Agricultural Centre, AgResearch Limited, Mosgiel, 9053, New Zealand
| | - John C McEwan
- Invermay Agricultural Centre, AgResearch Limited, Mosgiel, 9053, New Zealand
| |
Collapse
|
9
|
Shen J, Li Z, Chen J, Song Z, Zhou Z, Shi Y. SHEsisPlus, a toolset for genetic studies on polyploid species. Sci Rep 2016; 6:24095. [PMID: 27048905 PMCID: PMC4822172 DOI: 10.1038/srep24095] [Citation(s) in RCA: 65] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2015] [Accepted: 03/17/2016] [Indexed: 11/09/2022] Open
Abstract
Currently, algorithms and softwares for genetic analysis of diploid organisms with bi-allelic markers are well-established, while those for polyploids are limited. Here, we present SHEsisPlus, the online algorithm toolset for both dichotomous and quantitative trait genetic analysis on polyploid species (compatible with haploids and diploids, too). SHEsisPlus is also optimized for handling multiple-allele datasets. It's free, open source and also designed to perform a range of analyses, including haplotype inference, linkage disequilibrium analysis, epistasis detection, Hardy-Weinberg equilibrium and single locus association tests. Meanwhile, we developed an accurate and efficient haplotype inference algorithm for polyploids and proposed an entropy-based algorithm to detect epistasis in the context of quantitative traits. A study of both simulated and real datasets showed that our haplotype inference algorithm was much faster and more accurate than existing ones. Our epistasis detection algorithm was the first try to apply information theory to characterizing the gene interactions in quantitative trait datasets. Results showed that its statistical power was significantly higher than conventional approaches. SHEsisPlus is freely available on the web at http://shesisplus.bio-x.cn/. Source code is freely available for download at https://github.com/celaoforever/SHEsisPlus.
Collapse
Affiliation(s)
- Jiawei Shen
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders (Ministry of Education) and the Collaborative Innovation Center for Brain Science, Shanghai Jiao Tong University, Shanghai 200030, P.R. China.,School of Bio-medical Engineering, Shanghai Jiao Tong University, Shanghai 200230, P.R. China.,Institute of Social Cognitive and Behavioral Sciences, Shanghai Jiao Tong University, Shanghai 200240, P.R. China
| | - Zhiqiang Li
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders (Ministry of Education) and the Collaborative Innovation Center for Brain Science, Shanghai Jiao Tong University, Shanghai 200030, P.R. China.,Institute of Social Cognitive and Behavioral Sciences, Shanghai Jiao Tong University, Shanghai 200240, P.R. China
| | - Jianhua Chen
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders (Ministry of Education) and the Collaborative Innovation Center for Brain Science, Shanghai Jiao Tong University, Shanghai 200030, P.R. China.,Institute of Social Cognitive and Behavioral Sciences, Shanghai Jiao Tong University, Shanghai 200240, P.R. China
| | - Zhijian Song
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders (Ministry of Education) and the Collaborative Innovation Center for Brain Science, Shanghai Jiao Tong University, Shanghai 200030, P.R. China.,Institute of Social Cognitive and Behavioral Sciences, Shanghai Jiao Tong University, Shanghai 200240, P.R. China
| | - Zhaowei Zhou
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders (Ministry of Education) and the Collaborative Innovation Center for Brain Science, Shanghai Jiao Tong University, Shanghai 200030, P.R. China.,Shandong Provincial Key Laboratory of Metabolic Disease, the Affiliated Hospital of Qingdao University, 16 Jiangsu Road, Qingdao 266003, China.,Institute of Clinical Research, the Affiliated Hospital of Qingdao University, 16 Jiangsu Road, Qingdao 266003, China
| | - Yongyong Shi
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders (Ministry of Education) and the Collaborative Innovation Center for Brain Science, Shanghai Jiao Tong University, Shanghai 200030, P.R. China.,School of Bio-medical Engineering, Shanghai Jiao Tong University, Shanghai 200230, P.R. China.,Shanghai Changning Mental Health Center, Shanghai 200042, P.R. China.,Department of Psychiatry, the First Teaching Hospital of Xinjiang Medical University, Urumqi 830054, P.R. China
| |
Collapse
|
10
|
Iliadis A, Anastassiou D, Wang X. A sequential Monte Carlo framework for haplotype inference in CNV/SNP genotype data. EURASIP JOURNAL ON BIOINFORMATICS & SYSTEMS BIOLOGY 2014; 2014:7. [PMID: 24868199 PMCID: PMC4017783 DOI: 10.1186/1687-4153-2014-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/08/2013] [Accepted: 03/26/2014] [Indexed: 11/25/2022]
Abstract
Copy number variations (CNVs) are abundant in the human genome. They have been associated with complex traits in genome-wide association studies (GWAS) and expected to continue playing an important role in identifying the etiology of disease phenotypes. As a result of current high throughput whole-genome single-nucleotide polymorphism (SNP) arrays, we currently have datasets that simultaneously have integer copy numbers in CNV regions as well as SNP genotypes. At the same time, haplotypes that have been shown to offer advantages over genotypes in identifying disease traits even though available for SNP genotypes are largely not available for CNV/SNP data due to insufficient computational tools. We introduce a new framework for inferring haplotypes in CNV/SNP data using a sequential Monte Carlo sampling scheme ‘Tree-Based Deterministic Sampling CNV’ (TDSCNV). We compare our method with polyHap(v2.0), the only currently available software able to perform inference in CNV/SNP genotypes, on datasets of varying number of markers. We have found that both algorithms show similar accuracy but TDSCNV is an order of magnitude faster while scaling linearly with the number of markers and number of individuals and thus could be the method of choice for haplotype inference in such datasets. Our method is implemented in the TDSCNV package which is available for download at http://www.ee.columbia.edu/~anastas/tdscnv.
Collapse
Affiliation(s)
- Alexandros Iliadis
- Department of Electrical Engineering, Center for Computational Biology Bioinformatics and Columbia University, New York, NY 10027, USA
| | - Dimitris Anastassiou
- Department of Electrical Engineering, Center for Computational Biology Bioinformatics and Columbia University, New York, NY 10027, USA
| | - Xiaodong Wang
- Department of Electrical Engineering, Center for Computational Biology Bioinformatics and Columbia University, New York, NY 10027, USA
| |
Collapse
|
11
|
Shao H, Bellos E, Yin H, Liu X, Zou J, Li Y, Wang J, Coin LJM. A population model for genotyping indels from next-generation sequence data. Nucleic Acids Res 2012; 41:e46. [PMID: 23221639 PMCID: PMC3562001 DOI: 10.1093/nar/gks1143] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Insertion and deletion polymorphisms (indels) are an important source of genomic variation in plant and animal genomes, but accurate genotyping from low-coverage and exome next-generation sequence data remains challenging. We introduce an efficient population clustering algorithm for diploids and polyploids which was tested on a dataset of 2000 exomes. Compared with existing methods, we report a 4-fold reduction in overall indel genotype error rates with a 9-fold reduction in low coverage regions.
Collapse
|
12
|
Feder AF, Petrov DA, Bergland AO. LDx: estimation of linkage disequilibrium from high-throughput pooled resequencing data. PLoS One 2012; 7:e48588. [PMID: 23152785 PMCID: PMC3494690 DOI: 10.1371/journal.pone.0048588] [Citation(s) in RCA: 60] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2012] [Accepted: 10/03/2012] [Indexed: 12/14/2022] Open
Abstract
High-throughput pooled resequencing offers significant potential for whole genome population sequencing. However, its main drawback is the loss of haplotype information. In order to regain some of this information, we present LDx, a computational tool for estimating linkage disequilibrium (LD) from pooled resequencing data. LDx uses an approximate maximum likelihood approach to estimate LD (r(2)) between pairs of SNPs that can be observed within and among single reads. LDx also reports r(2) estimates derived solely from observed genotype counts. We demonstrate that the LDx estimates are highly correlated with r(2) estimated from individually resequenced strains. We discuss the performance of LDx using more stringent quality conditions and infer via simulation the degree to which performance can improve based on read depth. Finally we demonstrate two possible uses of LDx with real and simulated pooled resequencing data. First, we use LDx to infer genomewide patterns of decay of LD with physical distance in D. melanogaster population resequencing data. Second, we demonstrate that r(2) estimates from LDx are capable of distinguishing alternative demographic models representing plausible demographic histories of D. melanogaster.
Collapse
Affiliation(s)
- Alison F Feder
- Department of Biology, Stanford University, Stanford, California, United States of America.
| | | | | |
Collapse
|
13
|
Li Z, Gopal V, Li X, Davis JM, Casella G. Simultaneous SNP identification in association studies with missing data. Ann Appl Stat 2012. [DOI: 10.1214/11-aoas516] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
14
|
Inferring haplotypes of copy number variations from high-throughput data with uncertainty. G3-GENES GENOMES GENETICS 2011; 1:35-42. [PMID: 22384316 PMCID: PMC3276117 DOI: 10.1534/g3.111.000174] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/31/2010] [Accepted: 03/14/2011] [Indexed: 11/18/2022]
Abstract
Accurate information on haplotypes and diplotypes (haplotype pairs) is required for population-genetic analyses; however, microarrays do not provide data on a haplotype or diplotype at a copy number variation (CNV) locus; they only provide data on the total number of copies over a diplotype or an unphased sequence genotype (e.g., AAB, unlike AB of single nucleotide polymorphism). Moreover, such copy numbers or genotypes are often incorrectly determined when microarray signal intensities derived from different copy numbers or genotypes are not clearly separated due to noise. Here we report an algorithm to infer CNV haplotypes and individuals' diplotypes at multiple loci from noisy microarray data, utilizing the probability that a signal intensity may be derived from different underlying copy numbers or genotypes. Performing simulation studies based on known diplotypes and an error model obtained from real microarray data, we demonstrate that this probabilistic approach succeeds in accurate inference (error rate: 1-2%) from noisy data, whereas previous deterministic approaches failed (error rate: 12-18%). Applying this algorithm to real microarray data, we estimated haplotype frequencies and diplotypes in 1486 CNV regions for 100 individuals. Our algorithm will facilitate accurate population-genetic analyses and powerful disease association studies of CNVs.
Collapse
|
15
|
Coin LJM, Asher JE, Walters RG, El-Sayed Moustafa JS, de Smith AJ, Sladek R, Balding DJ, Froguel P, Blakemore AIF. cnvHap: an integrative population and haplotype–based multiplatform model of SNPs and CNVs. Nat Methods 2010; 7:541-6. [DOI: 10.1038/nmeth.1466] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2010] [Accepted: 05/05/2010] [Indexed: 11/09/2022]
|
16
|
Su SY, Asher JE, Jarvelin MR, Froguel P, Blakemore AIF, Balding DJ, Coin LJM. Inferring combined CNV/SNP haplotypes from genotype data. ACTA ACUST UNITED AC 2010; 26:1437-45. [PMID: 20406911 DOI: 10.1093/bioinformatics/btq157] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
MOTIVATION Copy number variations (CNVs) are increasingly recognized as an substantial source of individual genetic variation, and hence there is a growing interest in investigating the evolutionary history of CNVs as well as their impact on complex disease susceptibility. CNV/SNP haplotypes are critical for this research, but although many methods have been proposed for inferring integer copy number, few have been designed for inferring CNV haplotypic phase and none of these are applicable at genome-wide scale. Here, we present a method for inferring missing CNV genotypes, predicting CNV allelic configuration and for inferring CNV haplotypic phase from SNP/CNV genotype data. Our method, implemented in the software polyHap v2.0, is based on a hidden Markov model, which models the joint haplotype structure between CNVs and SNPs. Thus, haplotypic phase of CNVs and SNPs are inferred simultaneously. A sampling algorithm is employed to obtain a measure of confidence/credibility of each estimate. RESULTS We generated diploid phase-known CNV-SNP genotype datasets by pairing male X chromosome CNV-SNP haplotypes. We show that polyHap provides accurate estimates of missing CNV genotypes, allelic configuration and CNV haplotypic phase on these datasets. We applied our method to a non-simulated dataset-a region on Chromosome 2 encompassing a short deletion. The results confirm that polyHap's accuracy extends to real-life datasets. AVAILABILITY Our method is implemented in version 2.0 of the polyHap software package and can be downloaded from http://www.imperial.ac.uk/medicine/people/l.coin.
Collapse
Affiliation(s)
- Shu-Yi Su
- Department of Epidemiology and Biostatistics, School of Public Health, Imperial College, London W2 1PG, UK
| | | | | | | | | | | | | |
Collapse
|
17
|
Boyd LK, Mao X, Lu YJ. Use of SNPs in cancer predisposition analysis, diagnosis and prognosis: tools and prospects. ACTA ACUST UNITED AC 2009; 3:313-26. [PMID: 23488466 DOI: 10.1517/17530050902828325] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
BACKGROUND The development of cancer is accompanied by several genetic alterations. Single nucleotide polymorphisms (SNPs) are the most common form of genetic variation found in the human population. SNP arrays offer a high-resolution, high-throughput technology for genome-wide analysis, allowing the simultaneous detection of genotype and copy number changes. The power of SNP arrays as a research tool has accelerated our understanding of the genetic alterations in cancer, providing potential clinical applications. OBJECTIVE This manuscript reviews the use of SNPs in cancer research and discusses the potential clinical application of analysing SNPs for cancer predisposition analysis, diagnosis and prognosis. We also discuss potential future applications for the analysis of SNPs. METHODS In writing this review, we have reflected on our own extensive experience in the field of cancer genomics and have surveyed peer-reviewed articles focussing on the application of SNPs in cancer research. In addition, we have referred to product websites. CONCLUSION Since its development, SNP array technology has been extensively applied in cancer research. Information generated from SNP array analysis has been providing valuable information. With the full understanding of the rich resources of SNPs and their effects on influencing cellular function, SNP arrays will revolutionise the clinical practice in cancer risk assessment, diagnosis and prognosis making the concept of personalised medicine a reality.
Collapse
Affiliation(s)
- Lara K Boyd
- Queen Mary University of London, Barts and the London School of Medicine and Dentistry, Institute of Cancer, Centre for Molecular Oncology and Imaging, John Vane Science Centre, Charterhouse Square, London EC1M 6BQ, UK +44 20 7882 6140 ; +44 20 7014 0431 ;
| | | | | |
Collapse
|