1
|
Sivabharathi RC, Rajagopalan VR, Suresh R, Sudha M, Karthikeyan G, Jayakanthan M, Raveendran M. Haplotype-based breeding: A new insight in crop improvement. PLANT SCIENCE : AN INTERNATIONAL JOURNAL OF EXPERIMENTAL PLANT BIOLOGY 2024; 346:112129. [PMID: 38763472 DOI: 10.1016/j.plantsci.2024.112129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Revised: 05/09/2024] [Accepted: 05/15/2024] [Indexed: 05/21/2024]
Abstract
Haplotype-based breeding (HBB) is one of the cutting-edge technologies in the realm of crop improvement due to the increasing availability of Single Nucleotide Polymorphisms identified by Next Generation Sequencing technologies. The complexity of the data can be decreased with fewer statistical tests and a lower probability of spurious associations by combining thousands of SNPs into a few hundred haplotype blocks. The presence of strong genomic regions in breeding lines of most crop species facilitates the use of haplotypes to improve the efficiency of genomic and marker-assisted selection. Haplotype-based breeding as a Genomic Assisted Breeding (GAB) approach harnesses the genome sequence data to pinpoint the allelic variation used to hasten the breeding cycle and circumvent the challenges associated with linkage drag. This review article demonstrates ways to identify candidate genes, superior haplotype identification, haplo-pheno analysis, and haplotype-based marker-assisted selection. The crop improvement strategies that utilize superior haplotypes will hasten the breeding progress to safeguard global food security.
Collapse
Affiliation(s)
- R C Sivabharathi
- Department of Genetics and Plant breeding, CPBG, Tamil Nadu Agricultural University, Coimbatore 641003, India
| | - Veera Ranjani Rajagopalan
- Department of Plant Biotechnology, Centre for Plant Molecular Biology and Biotechnology, Tamil Nadu Agricultural University, Coimbatore, 641003, India
| | - R Suresh
- Department of Rice, CPBG, Tamil Nadu Agricultural University, Coimbatore 641003, India
| | - M Sudha
- Department of Plant Biotechnology, Centre for Plant Molecular Biology and Biotechnology, Tamil Nadu Agricultural University, Coimbatore, 641003, India.
| | - G Karthikeyan
- Department of Plant Pathology, CPPS, Tamil Nadu Agricultural University, Coimbatore 641003, India
| | - M Jayakanthan
- Department of Plant Molecular Biology and Bioinformatics, Centre for Plant Molecular Biology and Biotechnology, Tamil Nadu Agricultural University, Coimbatore 641003, India
| | - M Raveendran
- Directorate of research, Tamil Nadu Agricultural University, Coimbatore 641003, India.
| |
Collapse
|
2
|
Montero-Tena JA, Abdollahi Sisi N, Kox T, Abbadi A, Snowdon RJ, Golicz AA. haploMAGIC: accurate phasing and detection of recombination in multiparental populations despite genotyping errors. G3 (BETHESDA, MD.) 2024; 14:jkae109. [PMID: 38808682 PMCID: PMC11304941 DOI: 10.1093/g3journal/jkae109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/12/2024] [Revised: 02/12/2024] [Accepted: 05/08/2024] [Indexed: 05/30/2024]
Abstract
Recombination is a key mechanism in breeding for promoting genetic variability. Multiparental populations (MPPs) constitute an excellent platform for precise genotype phasing, identification of genome-wide crossovers (COs), estimation of recombination frequencies, and construction of recombination maps. Here, we introduce haploMAGIC, a pipeline to detect COs in MPPs with single-nucleotide polymorphism (SNP) data by exploiting the pedigree relationships for accurate genotype phasing and inference of grandparental haplotypes. haploMAGIC applies filtering to prevent false-positive COs due to genotyping errors (GEs), a common problem in high-throughput SNP analysis of complex plant genomes. Hence, it discards haploblocks not reaching a specified minimum number of informative alleles. A performance analysis using populations simulated with AlphaSimR revealed that haploMAGIC improves upon existing methods of CO detection in terms of recall and precision, most notably when GE rates are high. Furthermore, we constructed recombination maps using haploMAGIC with high-resolution genotype data from 2 large multiparental populations of winter rapeseed (Brassica napus). The results demonstrate the applicability of the pipeline in real-world scenarios and showed good correlations in recombination frequency compared with alternative software. Therefore, we propose haploMAGIC as an accurate tool at CO detection with MPPs that shows robustness against GEs.
Collapse
Affiliation(s)
- Jose A Montero-Tena
- Department of Agrobioinformatics, IFZ Research Center for Biosystems, Land Use and Nutrition, Justus Liebig University, Heinrich Buff Ring 26, 35392 Giessen, Germany
| | - Nayyer Abdollahi Sisi
- Department of Plant Breeding, IFZ Research Center for Biosystems, Land Use and Nutrition, Justus Liebig University, Heinrich Buff Ring 26, 35392 Giessen, Germany
| | - Tobias Kox
- NPZ Innovation GmbH, Hohenlieth-Hof, 24363 Holtsee, Germany
| | - Amine Abbadi
- NPZ Innovation GmbH, Hohenlieth-Hof, 24363 Holtsee, Germany
| | - Rod J Snowdon
- Department of Plant Breeding, IFZ Research Center for Biosystems, Land Use and Nutrition, Justus Liebig University, Heinrich Buff Ring 26, 35392 Giessen, Germany
| | - Agnieszka A Golicz
- Department of Agrobioinformatics, IFZ Research Center for Biosystems, Land Use and Nutrition, Justus Liebig University, Heinrich Buff Ring 26, 35392 Giessen, Germany
| |
Collapse
|
3
|
van der Burg LLJ, de Wreede LC, Baldauf H, Sauter J, Schetelig J, Putter H, Böhringer S. Haplotype reconstruction for genetically complex regions with ambiguous genotype calls: Illustration by the KIR gene region. Genet Epidemiol 2024; 48:3-26. [PMID: 37830494 DOI: 10.1002/gepi.22538] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2023] [Revised: 09/06/2023] [Accepted: 09/25/2023] [Indexed: 10/14/2023]
Abstract
Advances in DNA sequencing technologies have enabled genotyping of complex genetic regions exhibiting copy number variation and high allelic diversity, yet it is impossible to derive exact genotypes in all cases, often resulting in ambiguous genotype calls, that is, partially missing data. An example of such a gene region is the killer-cell immunoglobulin-like receptor (KIR) genes. These genes are of special interest in the context of allogeneic hematopoietic stem cell transplantation. For such complex gene regions, current haplotype reconstruction methods are not feasible as they cannot cope with the complexity of the data. We present an expectation-maximization (EM)-algorithm to estimate haplotype frequencies (HTFs) which deals with the missing data components, and takes into account linkage disequilibrium (LD) between genes. To cope with the exponential increase in the number of haplotypes as genes are added, we add three components to a standard EM-algorithm implementation. First, reconstruction is performed iteratively, adding one gene at a time. Second, after each step, haplotypes with frequencies below a threshold are collapsed in a rare haplotype group. Third, the HTF of the rare haplotype group is profiled in subsequent iterations to improve estimates. A simulation study evaluates the effect of combining information of multiple genes on the estimates of these frequencies. We show that estimated HTFs are approximately unbiased. Our simulation study shows that the EM-algorithm is able to combine information from multiple genes when LD is high, whereas increased ambiguity levels increase bias. Linear regression models based on this EM, show that a large number of haplotypes can be problematic for unbiased effect size estimation and that models need to be sparse. In a real data analysis of KIR genotypes, we compare HTFs to those obtained in an independent study. Our new EM-algorithm-based method is the first to account for the full genetic architecture of complex gene regions, such as the KIR gene region. This algorithm can handle the numerous observed ambiguities, and allows for the collapsing of haplotypes to perform implicit dimension reduction. Combining information from multiple genes improves haplotype reconstruction.
Collapse
Affiliation(s)
| | - Liesbeth C de Wreede
- Biomedical Data Sciences, LUMC, Leiden, The Netherlands
- DKMS, Dresden/Tübingen, Germany
| | | | | | - Johannes Schetelig
- DKMS, Dresden/Tübingen, Germany
- Department of Internal Medicine I, University Hospital Carl Gustav Carus, Dresden, Germany
| | - Hein Putter
- Biomedical Data Sciences, LUMC, Leiden, The Netherlands
| | | |
Collapse
|
4
|
Wragg D, Zhang W, Peterson S, Yerramilli M, Mellanby R, Schoenebeck JJ, Clements DN. A cautionary tale of low-pass sequencing and imputation with respect to haplotype accuracy. Genet Sel Evol 2024; 56:6. [PMID: 38216889 PMCID: PMC10785484 DOI: 10.1186/s12711-024-00875-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Accepted: 01/03/2024] [Indexed: 01/14/2024] Open
Abstract
BACKGROUND Low-pass whole-genome sequencing and imputation offer significant cost savings, enabling substantial increases in sample size and statistical power. This approach is particularly promising in livestock breeding, providing an affordable means of screening individuals for deleterious alleles or calculating genomic breeding values. Consequently, it may also be of value in companion animal genomics to support pedigree breeding. We sought to evaluate in dogs the impact of low coverage sequencing and reference-guided imputation on genotype concordance and association analyses. RESULTS DNA isolated from saliva of 30 Labrador retrievers was sequenced at low (0.9X and 3.8X) and high (43.5X) coverage, and down-sampled from 43.5X to 9.6X and 17.4X. Genotype imputation was performed using a diverse reference panel (1021 dogs), and two subsets of the former panel (256 dogs each) where one had an excess of Labrador retrievers relative to other breeds. We observed little difference in imputed genotype concordance between reference panels. Association analyses for a locus acting as a disease proxy were performed using single-marker (GEMMA) and haplotype-based (XP-EHH) tests. GEMMA results were highly correlated (r ≥ 0.97) between 43.5X and ≥ 3.8X depths of coverage, while for 0.9X the correlation was lower (r ≤ 0.8). XP-EHH results were less well correlated, with r ranging from 0.58 (0.9X) to 0.88 (17.4X). Across a random sample of 10,000 genomic regions averaging 17 kb in size, we observed a median of three haplotypes per dog across the sequencing depths, with 5% of the regions returning more than eight haplotypes. Inspection of one such region revealed genotype and phasing inconsistencies across sequencing depths. CONCLUSIONS We demonstrate that saliva-derived canine DNA is suitable for whole-genome sequencing, highlighting the feasibility of client-based sampling. Low-pass sequencing and imputation require caution as incorrect allele assignments result when the subject possesses alleles that are absent in the reference panel. Larger panels have the capacity for greater allelic diversity, which should reduce the potential for imputation error. Although low-pass sequencing can accurately impute allele dosage, we highlight issues with phasing accuracy that impact haplotype-based analyses. Consequently, if accurately phased genotypes are required for analyses, we advocate sequencing at high depth (> 20X).
Collapse
Affiliation(s)
- David Wragg
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, UK.
| | - Wengang Zhang
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, UK
| | - Sarah Peterson
- IDEXX Laboratories Inc, One IDEXX Drive, Westbrook, ME, 04092, USA
| | | | - Richard Mellanby
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, UK
- IDEXX Laboratories Inc, One IDEXX Drive, Westbrook, ME, 04092, USA
| | - Jeffrey J Schoenebeck
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, UK.
| | - Dylan N Clements
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, UK
| |
Collapse
|
5
|
Herzig AF, Velo-Suárez L, Dina C, Redon R, Deleuze JF, Génin E. How local reference panels improve imputation in French populations. Sci Rep 2024; 14:370. [PMID: 38172507 PMCID: PMC10764714 DOI: 10.1038/s41598-023-49931-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Accepted: 12/13/2023] [Indexed: 01/05/2024] Open
Abstract
Imputation servers offer the exclusive possibility to harness the largest public reference panels which have been shown to deliver very high precision in the imputation of European genomes. Many studies have nonetheless stressed the importance of 'study specific panels' (SSPs) as an alternative and have shown the benefits of combining public reference panels with SSPs. But such combined approaches are not attainable when using external imputation servers. To investigate how to confront this challenge, we imputed 550 French individuals using either the University of Michigan imputation server with the Haplotype Reference Consortium (HRC) panel or an in-house SSP of 850 whole-genome sequenced French individuals. With approximate geo-localization of both our target and SSP individuals we are able to pinpoint different scenarios where SSP-based imputation would be preferred over server-based imputation or vice-versa. This is achieved by showing to a high degree of resolution the importance of the proximity of the reference panel to target individuals; with a focus on the clear added value of SSPs for estimating haplotype phase and for the imputation of rare variants (minor allele-frequency below 0.01). Such benefits were most evident for individuals from the same geographical regions in France as the SSP individuals. Overall, only 42.3% of all 125,442 variants evaluated were better imputed with an SSP from France compared to an external reference panel, however this rises to 58.1% for individuals from geographic regions well covered by the SSP. By investigating haplotype sharing and population fine-structure in France, we show the importance of including SSP haplotypes for imputation but also that they should ideally be combined with large public panels. In the absence of the unattainable results from a combined panel of the HRC and our French SSP, we put forward a pragmatic solution where server-based and SSP-based imputation outcomes can be combined based on comparing posterior genotype probabilities. We show that such an approach can give a level of imputation accuracy in excess of what could be achieved with either strategy alone. The results presented provide detailed insights into the accuracy of imputation that should be expected from different strategies for European populations.
Collapse
Affiliation(s)
| | - Lourdes Velo-Suárez
- Univ Brest, Inserm, EFS, UMR 1078, GGB, Brest, France
- CHRU Brest, Brest, France
| | - Christian Dina
- Nantes Université, CHU Nantes, CNRS, INSERM, l'institut du thorax, Nantes, France
| | - Richard Redon
- Nantes Université, CHU Nantes, CNRS, INSERM, l'institut du thorax, Nantes, France
| | - Jean-François Deleuze
- Centre National de Recherche en Génomique Humaine (CNRGH), Université Paris-Saclay, CEA, Evry, France
- Fondation Jean Dausset - Centre d'Etude du Polymorphisme Humain (CEPH), Paris, France
| | - Emmanuelle Génin
- Univ Brest, Inserm, EFS, UMR 1078, GGB, Brest, France
- CHRU Brest, Brest, France
| |
Collapse
|
6
|
Xiao N, Cao X, Liu Z, Han Y. Two germline mutations can serve as genetic susceptibility screening makers for a lung adenocarcinoma family. J Cancer Res Clin Oncol 2023; 149:6541-6548. [PMID: 36781503 DOI: 10.1007/s00432-023-04616-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Accepted: 01/27/2023] [Indexed: 02/15/2023]
Abstract
OBJECTIVES Lung cancer is the most common form of cancer and the leading cause of cancer death. For familial lung cancer, identification of causing genetic factors is essential for prevention and control of non-lung cancer in carriers. MATERIALS AND METHODS We studied two generations of a family with suspected inherited lung cancer susceptibility. Four individuals in this family had lung adenocarcinoma. To identify the gene(s) that cause the lung cancer in this pedigree, we extracted DNA from the peripheral blood of four cancer individuals and blood from three cancer-free family members as the control and performed whole-genome sequencing. Our filtering strategy includes, assessment of allele frequency, functional affection on amino acids, mutation accumulation, phased blocks and evolution analysis towards the alterations. RESULTS We identified two possible mutations, including PLEKHM2 (D134N) and MCC (R448Q) in all affected family members but did not found in the control group. Then, we performed a genetic susceptibility screening for 10 non-lung cancer relatives and found two individuals with PLEKHM2 (D134N) mutation, two with MCC (R448Q) mutation and one carrying both mutations. 3 carriers performed LDCT scan and 2 of them carried MCC (R448Q) also had ground-glass opacity (GGO) lesion in their lung. CONCLUSION Our data suggested that WGS together with our filtering strategy was successful in identifying PLEKHM2 (D134N) and MCC (R448Q) as the possible driver mutations in this family. Genetic susceptibility screening of non-lung cancer carriers will be a useful approach to prevent and control lung cancer in families with high-risk for the disease.
Collapse
Affiliation(s)
- Ning Xiao
- Second Department of Thoracic Surgery, Beijing Chest Hospital, Capital Medical University, Beijing, China
| | - Xiaoqing Cao
- Second Department of Thoracic Surgery, Beijing Chest Hospital, Capital Medical University, Beijing, China
| | - Zhidong Liu
- Second Department of Thoracic Surgery, Beijing Chest Hospital, Capital Medical University, Beijing, China.
| | - Yi Han
- Third Department of Thoracic Surgery, Beijing Chest Hospital, Capital Medical University, Beijing, China.
| |
Collapse
|
7
|
Shipilina D, Pal A, Stankowski S, Chan YF, Barton NH. On the origin and structure of haplotype blocks. Mol Ecol 2023; 32:1441-1457. [PMID: 36433653 PMCID: PMC10946714 DOI: 10.1111/mec.16793] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2022] [Revised: 11/16/2022] [Accepted: 11/18/2022] [Indexed: 11/27/2022]
Abstract
The term "haplotype block" is commonly used in the developing field of haplotype-based inference methods. We argue that the term should be defined based on the structure of the Ancestral Recombination Graph (ARG), which contains complete information on the ancestry of a sample. We use simulated examples to demonstrate key features of the relationship between haplotype blocks and ancestral structure, emphasizing the stochasticity of the processes that generate them. Even the simplest cases of neutrality or of a "hard" selective sweep produce a rich structure, often missed by commonly used statistics. We highlight a number of novel methods for inferring haplotype structure, based on the full ARG, or on a sequence of trees, and illustrate how they can be used to define haplotype blocks using an empirical data set. While the advent of new, computationally efficient methods makes it possible to apply these concepts broadly, they (and additional new methods) could benefit from adding features to explore haplotype blocks, as we define them. Understanding and applying the concept of the haplotype block will be essential to fully exploit long and linked-read sequencing technologies.
Collapse
Affiliation(s)
- Daria Shipilina
- Evolutionary Biology Program, Department of Ecology and Genetics (IEG), Uppsala University, Uppsala, Sweden
- Institute of Science and Technology Austria, Klosterneuburg, Austria
- Swedish Collegium for Advanced Study, Uppsala, Sweden
| | - Arka Pal
- Institute of Science and Technology Austria, Klosterneuburg, Austria
| | - Sean Stankowski
- Institute of Science and Technology Austria, Klosterneuburg, Austria
| | | | - Nicholas H Barton
- Institute of Science and Technology Austria, Klosterneuburg, Austria
| |
Collapse
|
8
|
Liu D, Peter BM, Schiefenhövel W, Kayser M, Stoneking M. Assessing human genome-wide variation in the Massim region of Papua New Guinea and implications for the Kula trading tradition. Mol Biol Evol 2022; 39:6653776. [PMID: 35920169 PMCID: PMC9372566 DOI: 10.1093/molbev/msac165] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
The Massim, a cultural region that includes the southeastern tip of mainland Papua New Guinea (PNG) and nearby PNG offshore islands, is renowned for a trading network called Kula, in which different valuable items circulate in different directions among some of the islands. Although the Massim has been a focus of anthropological investigation since the pioneering work of Malinowski in 1922, the genetic background of its inhabitants remains relatively unexplored. To characterize the Massim genomically, we generated genome-wide SNP data from 192 individuals from 15 groups spanning the entire region. Analyzing these together with comparative data, we found that all Massim individuals have variable Papuan-related (indigenous) and Austronesian-related (arriving ∼3,000 years ago) ancestries. Individuals from Rossel Island in southern Massim, speaking an isolate Papuan language, have the highest amount of a distinct Papuan ancestry. We also investigated the recent contact via sharing of identical by descent (IBD) genomic segments and found that Austronesian-related IBD tracts are widely distributed geographically, but Papuan-related tracts are shared exclusively between the PNG mainland and Massim, and between the Bismarck and Solomon Archipelagoes. Moreover, the Kula-practicing groups of the Massim show higher IBD sharing among themselves than do groups that do not participate in Kula. This higher sharing predates the formation of Kula, suggesting that extensive contact between these groups since the Austronesian settlement may have facilitated the formation of Kula. Our study provides the first comprehensive genome-wide assessment of Massim inhabitants and new insights into the fascinating Kula system.
Collapse
Affiliation(s)
- Dang Liu
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany.,Human Evolutionary Genetics Unit, Institut Pasteur, UMR 2000, CNRS, Paris, France
| | - Benjamin M Peter
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| | - Wulf Schiefenhövel
- Human Ethology Group, Max Planck Institute for Ornithology, Seewiesen, Germany
| | - Manfred Kayser
- Department of Genetic Identification, Erasmus MC, University Medical Center Rotterdam, Rotterdam, the Netherlands
| | - Mark Stoneking
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany.,Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Evolutive, UMR 5558, Villeurbanne, France
| |
Collapse
|
9
|
Jighly A. When do autopolyploids need poly-sequencing data? Mol Ecol 2021; 31:1021-1027. [PMID: 34875138 DOI: 10.1111/mec.16313] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2021] [Revised: 11/23/2021] [Accepted: 12/01/2021] [Indexed: 12/17/2022]
Abstract
The sequencing depth required to genotype autopolyploid populations is a very controversial topic. Different studies have adopted variable depth values without a clear guide on the optimal sequencing depth value. Many studies suggest high depth thresholds for different ploidies that may not be practical and substantially increase the overall genotyping cost for different projects. However, such conservative thresholds may not be required to achieve the most common research goals. In fact, some recent reports in the field of quantitative genetics found that much lower sequencing depth thresholds could achieve the same accuracy as high depth thresholds. In this manuscript, I discuss when researchers need to use stringent sequencing depth thresholds and when they can use more relaxed ones. I support my argument by calculating the probabilities of sampling different homologues at a given sequencing depth. I also discuss the uses and the uncertainty in calculating a continuous allelic dosage as the proportion of sequencing reads that hold the alternative allele, which is becoming a common method now in quantitative genetics to replace discrete dosage estimation.
Collapse
Affiliation(s)
- Abdulqader Jighly
- AgriBio, Centre for AgriBiosciences, Agriculture Victoria, Bundoora, Victoria, Australia
| |
Collapse
|
10
|
Bhat JA, Yu D, Bohra A, Ganie SA, Varshney RK. Features and applications of haplotypes in crop breeding. Commun Biol 2021; 4:1266. [PMID: 34737387 PMCID: PMC8568931 DOI: 10.1038/s42003-021-02782-y] [Citation(s) in RCA: 39] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2021] [Accepted: 10/09/2021] [Indexed: 12/17/2022] Open
Abstract
Climate change with altered pest-disease dynamics and rising abiotic stresses threatens resource-constrained agricultural production systems worldwide. Genomics-assisted breeding (GAB) approaches have greatly contributed to enhancing crop breeding efficiency and delivering better varieties. Fast-growing capacity and affordability of DNA sequencing has motivated large-scale germplasm sequencing projects, thus opening exciting avenues for mining haplotypes for breeding applications. This review article highlights ways to mine haplotypes and apply them for complex trait dissection and in GAB approaches including haplotype-GWAS, haplotype-based breeding, haplotype-assisted genomic selection. Improvement strategies that efficiently deploy superior haplotypes to hasten breeding progress will be key to safeguarding global food security.
Collapse
Affiliation(s)
- Javaid Akhter Bhat
- National Center for Soybean Improvement, State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, China
| | - Deyue Yu
- National Center for Soybean Improvement, State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, China
| | - Abhishek Bohra
- Crop Improvement Division, ICAR- Indian Institute of Pulses Research (ICAR- IIPR), Kanpur, India
| | - Showkat Ahmad Ganie
- Department of Biotechnology, Visva-Bharati, Santiniketan, 731235, WB, India.
| | - Rajeev K Varshney
- Center of Excellence in Genomics & Systems Biology, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad, 502324, India.
- State Agricultural Biotechnology Centre, Centre for Crop & Food Research Innovation, Food Futures Institute, Murdoch University, Murdoch, WA, Australia.
| |
Collapse
|
11
|
Kulski JK, Suzuki S, Shiina T. Haplotype Shuffling and Dimorphic Transposable Elements in the Human Extended Major Histocompatibility Complex Class II Region. Front Genet 2021; 12:665899. [PMID: 34122517 PMCID: PMC8193847 DOI: 10.3389/fgene.2021.665899] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Accepted: 04/12/2021] [Indexed: 12/26/2022] Open
Abstract
The major histocompatibility complex (MHC) on chromosome 6p21 is one of the most single-nucleotide polymorphism (SNP)-dense regions of the human genome and a prime model for the study and understanding of conserved sequence polymorphisms and structural diversity of ancestral haplotypes/conserved extended haplotypes. This study aimed to follow up on a previous analysis of the MHC class I region by using the same set of 95 MHC haplotype sequences downloaded from a publicly available BioProject database at the National Center for Biotechnology Information to identify and characterize the polymorphic human leukocyte antigen (HLA)-class II genes, the MTCO3P1 pseudogene alleles, the indels of transposable elements as haplotypic lineage markers, and SNP-density crossover (XO) loci at haplotype junctions in DNA sequence alignments of different haplotypes across the extended class II region (∼1 Mb) from the telomeric PRRT1 gene in class III to the COL11A2 gene at the centromeric end of class II. We identified 42 haplotypic indels (20 Alu, 7 SVA, 13 LTR or MERs, and 2 indels composed of a mosaic of different transposable elements) linked to particular HLA-class II alleles. Comparative sequence analyses of 136 haplotype pairs revealed 98 unique XO sites between SNP-poor and SNP-rich genomic segments with considerable haplotype shuffling located in the proximity of putative recombination hotspots. The majority of XO sites occurred across various regions including in the vicinity of MTCO3P1 between HLA-DQB1 and HLA-DQB3, between HLA-DQB2 and HLA-DOB, between DOB and TAP2, and between HLA-DOA and HLA-DPA1, where most XOs were within a HERVK22 sequence. We also determined the genomic positions of the PRDM9-recombination suppression sequence motif ATCCATG/CATGGAT and the PRDM9 recombination activation partial binding motif CCTCCCCT/AGGGGAG in the class II region of the human reference genome (NC_ 000006) relative to published meiotic recombination positions. Both the recombination and anti-recombination PRDM9 binding motifs were widely distributed throughout the class II genomic regions with 50% or more found within repeat elements; the anti-recombination motifs were found mostly in L1 fragmented repeats. This study shows substantial haplotype shuffling between different polymorphic blocks and confirms the presence of numerous putative ancestral recombination sites across the class II region between various HLA class II genes.
Collapse
Affiliation(s)
- Jerzy K Kulski
- Faculty of Health and Medical Sciences, The University of Western Australia, Crawley, WA, Australia.,Department of Molecular Life Sciences, Division of Basic Medical Science and Molecular Medicine, Tokai University School of Medicine, Isehara, Japan
| | - Shingo Suzuki
- Department of Molecular Life Sciences, Division of Basic Medical Science and Molecular Medicine, Tokai University School of Medicine, Isehara, Japan
| | - Takashi Shiina
- Department of Molecular Life Sciences, Division of Basic Medical Science and Molecular Medicine, Tokai University School of Medicine, Isehara, Japan
| |
Collapse
|
12
|
Srivastava K, Fratzscher AS, Lan B, Flegel WA. Cataloguing experimentally confirmed 80.7 kb-long ACKR1 haplotypes from the 1000 Genomes Project database. BMC Bioinformatics 2021; 22:273. [PMID: 34039276 PMCID: PMC8150616 DOI: 10.1186/s12859-021-04169-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2020] [Accepted: 05/04/2021] [Indexed: 12/18/2022] Open
Abstract
Background Clinically effective and safe genotyping relies on correct reference sequences, often represented by haplotypes. The 1000 Genomes Project recorded individual genotypes across 26 different populations and, using computerized genotype phasing, reported haplotype data. In contrast, we identified long reference sequences by analyzing the homozygous genomic regions in this online database, a concept that has rarely been reported since next generation sequencing data became available. Study design and methods Phased genotype data for a 80.6 kb region of chromosome 1 was downloaded for all 2,504 unrelated individuals of the 1000 Genome Project Phase 3 cohort. The data was centered on the ACKR1 gene and bordered by the CADM3 and FCER1A genes. Individuals with heterozygosity at a single site or with complete homozygosity allowed unambiguous assignment of an ACKR1 haplotype. A computer algorithm was developed for extracting these haplotypes from the 1000 Genome Project in an automated fashion. A manual analysis validated the data extracted by the algorithm. Results We confirmed 902 ACKR1 haplotypes of varying lengths, the longest at 80,584 nucleotides and shortest at 1,901 nucleotides. The combined length of haplotype sequences comprised 19,895,388 nucleotides with a median of 16,014 nucleotides. Based on our approach, all haplotypes can be considered experimentally confirmed and not affected by the known errors of computerized genotype phasing. Conclusions Tracts of homozygosity can provide definitive reference sequences for any gene. They are particularly useful when observed in unrelated individuals of large scale sequence databases. As a proof of principle, we explored the 1000 Genomes Project database for ACKR1 gene data and mined long haplotypes. These haplotypes are useful for high throughput analysis with next generation sequencing. Our approach is scalable, using automated bioinformatics tools, and can be applied to any gene. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04169-6.
Collapse
Affiliation(s)
- Kshitij Srivastava
- Laboratory Services Section, Department of Transfusion Medicine, NIH Clinical Center, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Anne-Sophie Fratzscher
- Laboratory Services Section, Department of Transfusion Medicine, NIH Clinical Center, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Bo Lan
- Laboratory Services Section, Department of Transfusion Medicine, NIH Clinical Center, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Willy Albert Flegel
- Laboratory Services Section, Department of Transfusion Medicine, NIH Clinical Center, National Institutes of Health, Bethesda, MD, 20892, USA.
| |
Collapse
|
13
|
Al Bkhetan Z, Chana G, Soon Ong C, Goudey B, Ramamohanarao K. eQTLHap: a tool for comprehensive eQTL analysis considering haplotypic and genotypic effects. Brief Bioinform 2021; 22:6214641. [PMID: 33834181 DOI: 10.1093/bib/bbab093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2020] [Revised: 03/01/2021] [Accepted: 03/03/2021] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION The high accuracy of recent haplotype phasing tools is enabling the integration of haplotype (or phase) information more widely in genetic investigations. One such possibility is phase-aware expression quantitative trait loci (eQTL) analysis, where haplotype-based analysis has the potential to detect associations that may otherwise be missed by standard SNP-based approaches. RESULTS We present eQTLHap, a novel method to investigate associations between gene expression and genetic variants, considering their haplotypic and genotypic effect. Using multiple simulations based on real data, we demonstrate that phase-aware eQTL analysis significantly outperforms typical SNP-based methods when the causal genetic architecture involves multiple SNPs. We show that phase-aware eQTL analysis is robust to phasing errors, showing only a minor impact ($<4\%$) on sensitivity. Applying eQTLHap to real GEUVADIS and GTEx datasets detects numerous novel eQTLs undetected by a single-SNP approach, with 22 eQTLs replicating across studies or tissue types, highlighting the utility of phase-aware eQTL analysis. AVAILABILITY AND IMPLEMENTATION https://github.com/ziadbkh/eQTLHap. CONTACT ziad.albkhetan@gmail.com. SUPPLEMENTARY INFORMATION Supplementary data are available at Briefings in Bioinformatics online.
Collapse
Affiliation(s)
- Ziad Al Bkhetan
- School of Computing and Information Systems, The University of Melbourne, Parkville, 3010, Australia
| | - Gursharan Chana
- Department of Medicine, Royal Melbourne Hospital, The University of Melbourne, Parkville, 3010, Australia
| | | | - Benjamin Goudey
- School of Computing and Information Systems, The University of Melbourne, Parkville, 3010, Australia.,IBM Australia Research, Southgate, Victoria, Australia
| | - Kotagiri Ramamohanarao
- School of Computing and Information Systems, The University of Melbourne, Parkville, 3010, Australia
| |
Collapse
|
14
|
A SNaPshot Assay for Determination of the Mannose-Binding Lectin Gene Variants and an Algorithm for Calculation of Haplogenotype Combinations. Diagnostics (Basel) 2021; 11:diagnostics11020301. [PMID: 33668563 PMCID: PMC7918147 DOI: 10.3390/diagnostics11020301] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2020] [Revised: 02/07/2021] [Accepted: 02/11/2021] [Indexed: 11/16/2022] Open
Abstract
Mannose-binding lectin (MBL) deficiency caused by the variability in the MBL2 gene is responsible for the susceptibility to and severity of various infectious and autoimmune diseases. A combination of six single nucleotide polymorphisms (SNPs) has a major impact on MBL levels in circulation. The aim of this study is to design and validate a sensitive and economical method for determining MBL2 haplogenotypes. The SNaPshot assay is designed and optimized to genotype six SNPs (rs1800451, rs1800450, rs5030737, rs7095891, rs7096206, rs11003125) and is validated by comparing results with Sanger sequencing. Additionally, an algorithm for online calculation of haplogenotype combinations from the determined genotypes is developed. Three hundred and twenty-eight DNA samples from healthy individuals from the Czech population are genotyped. Minor allele frequencies (MAFs) in the Czech population are in accordance with those present in the European population. The SNaPshot assay for MBL2 genotyping is a high-throughput, cost-effective technique that can be used in further genetic-association studies or in clinical practice. Moreover, a freely available online application for the calculation of haplogenotypes from SNPs is developed within the scope of this project.
Collapse
|
15
|
Al Bkhetan Z, Chana G, Ramamohanarao K, Verspoor K, Goudey B. Evaluation of consensus strategies for haplotype phasing. Brief Bioinform 2020; 22:5998997. [PMID: 33236761 DOI: 10.1093/bib/bbaa280] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2020] [Revised: 09/22/2020] [Accepted: 09/22/2020] [Indexed: 01/05/2023] Open
Abstract
Haplotype phasing is a critical step for many genetic applications but incorrect estimates of phase can negatively impact downstream analyses. One proposed strategy to improve phasing accuracy is to combine multiple independent phasing estimates to overcome the limitations of any individual estimate. However, such a strategy is yet to be thoroughly explored. This study provides a comprehensive evaluation of consensus strategies for haplotype phasing. We explore the performance of different consensus paradigms, and the effect of specific constituent tools, across several datasets with different characteristics and their impact on the downstream task of genotype imputation. Based on the outputs of existing phasing tools, we explore two different strategies to construct haplotype consensus estimators: voting across outputs from multiple phasing tools and multiple outputs of a single non-deterministic tool. We find that the consensus approach from multiple tools reduces SE by an average of 10% compared to any constituent tool when applied to European populations and has the highest accuracy regardless of population ethnicity, sample size, variant density or variant frequency. Furthermore, the consensus estimator improves the accuracy of the downstream task of genotype imputation carried out by the widely used Minimac3, pbwt and BEAGLE5 tools. Our results provide guidance on how to produce the most accurate phasing estimates and the trade-offs that a consensus approach may have. Our implementation of consensus haplotype phasing, consHap, is available freely at https://github.com/ziadbkh/consHap. Supplementary information: Supplementary data are available at Briefings in Bioinformatics online.
Collapse
Affiliation(s)
- Ziad Al Bkhetan
- School of Computing and Information Systems at the University of Melbourne
| | | | | | - Karin Verspoor
- School of Computing and Information Systems at the University of Melbourne
| | - Benjamin Goudey
- IBM Research Australia and an Honorary Research Fellow at the School of Computing and Information Systems, University of Melbourne
| |
Collapse
|
16
|
Lutgen D, Ritter R, Olsen R, Schielzeth H, Gruselius J, Ewels P, García JT, Shirihai H, Schweizer M, Suh A, Burri R. Linked‐read sequencing enables haplotype‐resolved resequencing at population scale. Mol Ecol Resour 2020; 20:1311-1322. [DOI: 10.1111/1755-0998.13192] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2020] [Revised: 04/25/2020] [Accepted: 05/06/2020] [Indexed: 11/28/2022]
Affiliation(s)
- Dave Lutgen
- Department of Population Ecology Institute of Ecology and Evolution Friedrich Schiller University Jena Jena Germany
| | - Raphael Ritter
- Department of Population Ecology Institute of Ecology and Evolution Friedrich Schiller University Jena Jena Germany
| | - Remi‐André Olsen
- Science for Life Laboratory Department of Biochemistry and Biophysics Stockholm University Solna Sweden
| | - Holger Schielzeth
- Department of Population Ecology Institute of Ecology and Evolution Friedrich Schiller University Jena Jena Germany
| | - Joel Gruselius
- Science for Life Laboratory Department of Biosciences and Nutrition Karolinska Institutet Stockholm Sweden
| | - Philip Ewels
- Science for Life Laboratory Department of Biochemistry and Biophysics Stockholm University Solna Sweden
| | - Jesús T. García
- Instituto de Investigación en Recursos Cinegéticos (IREC) CSIC‐UCLM‐JCCM Ciudad Real Spain
| | | | - Manuel Schweizer
- Natural History Museum Bern Bern Switzerland
- Institute of Ecology and Evolution University of Bern Bern Switzerland
| | - Alexander Suh
- Department of Organismal Biology – Systematic Biology Evolutionary Biology Centre (EBC) Uppsala University Uppsala Sweden
| | - Reto Burri
- Department of Population Ecology Institute of Ecology and Evolution Friedrich Schiller University Jena Jena Germany
| |
Collapse
|