1
|
Zhao W, Qadri QR, Zhang Z, Wang Z, Pan Y, Wang Q, Zhang Z. PyAGH: a python package to fast construct kinship matrices based on different levels of omic data. BMC Bioinformatics 2023; 24:153. [PMID: 37072709 PMCID: PMC10111838 DOI: 10.1186/s12859-023-05280-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2022] [Accepted: 04/10/2023] [Indexed: 04/20/2023] Open
Abstract
BACKGROUND Construction of kinship matrices among individuals is an important step for both association studies and prediction studies based on different levels of omic data. Methods for constructing kinship matrices are becoming diverse and different methods have their specific appropriate scenes. However, software that can comprehensively calculate kinship matrices for a variety of scenarios is still in an urgent demand. RESULTS In this study, we developed an efficient and user-friendly python module, PyAGH, that can accomplish (1) conventional additive kinship matrces construction based on pedigree, genotypes, abundance data from transcriptome or microbiome; (2) genomic kinship matrices construction in combined population; (3) dominant and epistatic effects kinship matrices construction; (4) pedigree selection, tracing, detection and visualization; (5) visualization of cluster, heatmap and PCA analysis based on kinship matrices. The output from PyAGH can be easily integrated in other mainstream software based on users' purposes. Compared with other softwares, PyAGH integrates multiple methods for calculating the kinship matrix and has advantages in terms of speed and data size compared to other software. PyAGH is developed in python and C + + and can be easily installed by pip tool. Installation instructions and a manual document can be freely available from https://github.com/zhaow-01/PyAGH . CONCLUSION PyAGH is a fast and user-friendly Python package for calculating kinship matrices using pedigree, genotype, microbiome and transcriptome data as well as processing, analyzing and visualizing data and results. This package makes it easier to perform predictions and association studies processes based on different levels of omic data.
Collapse
Affiliation(s)
- Wei Zhao
- Department of Animal Science, School of Agriculture and Biology, Shanghai Jiao Tong University, 800# Dongchuan Road, Shanghai, China
| | - Qamar Raza Qadri
- Department of Animal Science, School of Agriculture and Biology, Shanghai Jiao Tong University, 800# Dongchuan Road, Shanghai, China
| | - Zhenyang Zhang
- Department of Animal Science, College of Animal Sciences, Zhejiang University, 866# Yuhangtang Road, Hangzhou, 310058, China
| | - Zhen Wang
- Department of Animal Science, College of Animal Sciences, Zhejiang University, 866# Yuhangtang Road, Hangzhou, 310058, China
| | - Yuchun Pan
- Department of Animal Science, College of Animal Sciences, Zhejiang University, 866# Yuhangtang Road, Hangzhou, 310058, China
- Hainan Research Institute, Zhejiang University, 11# Yonyou Industrial Park, Yazhou Bay Science and Technology City, Sanya, 572025, China
| | - Qishan Wang
- Department of Animal Science, College of Animal Sciences, Zhejiang University, 866# Yuhangtang Road, Hangzhou, 310058, China.
| | - Zhe Zhang
- Department of Animal Science, College of Animal Sciences, Zhejiang University, 866# Yuhangtang Road, Hangzhou, 310058, China.
| |
Collapse
|
2
|
Araujo AC, Carneiro PLS, Oliveira HR, Lewis RM, Brito LF. SNP- and haplotype-based single-step genomic predictions for body weight, wool, and reproductive traits in North American Rambouillet sheep. J Anim Breed Genet 2023; 140:216-234. [PMID: 36408677 PMCID: PMC10099590 DOI: 10.1111/jbg.12748] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2022] [Accepted: 10/23/2022] [Indexed: 11/22/2022]
Abstract
Rambouillet sheep are commonly raised in extensive grazing systems in the US, mainly for wool and meat production. Genomic evaluations in US sheep breeds, including Rambouillet, are still incipient. Therefore, we aimed to evaluate the feasibility of performing genomic prediction of breeding values for various traits in Rambouillet sheep based on single nucleotide polymorphisms (SNP) or haplotypes (fitted as pseudo-SNP) under a single-step GBLUP approach. A total of 28,834 records for birth weight (BWT), 23,306 for postweaning weight (PWT), 5,832 for yearling weight (YWT), 9,880 for yearling fibre diameter (YFD), 11,872 for yearling greasy fleece weight (YGFW), and 15,984 for number of lambs born (NLB) were used in this study. Seven hundred forty-one individuals were genotyped using a moderate (50 K; n = 677) or high (600 K; n = 64) density SNP panel, in which 32 K SNP in common between the two SNP panels (after genotypic quality control) were used for further analyses. Single-step genomic predictions using SNP (H-BLUP) or haplotypes (HAP-BLUP) from blocks with different linkage disequilibrium (LD) thresholds (0.15, 0.35, 0.50, 0.65, and 0.80) were evaluated. We also considered different blending parameters when constructing the genomic relationship matrix used to predict the genomic-enhanced estimated breeding values (GEBV), with alpha equal to 0.95 or 0.50. The GEBV were compared to the estimated breeding values (EBV) obtained from traditional pedigree-based evaluations (A-BLUP). The mean theoretical accuracy ranged from 0.499 (A-BLUP for PWT) to 0.795 (HAP-BLUP using haplotypes from blocks with LD threshold of 0.35 and alpha equal to 0.95 for YFD). The prediction accuracies ranged from 0.143 (A-BLUP for PWT) to 0.330 (A-BLUP for YGFW) while the prediction bias ranged from -0.104 (H-BLUP for PWT) to 0.087 (HAP-BLUP using haplotypes from blocks with LD threshold of 0.15 and alpha equal to 0.95 for YGFW). The GEBV dispersion ranged from 0.428 (A-BLUP for PWT) to 1.035 (A-BLUP for YGFW). Similar results were observed for H-BLUP or HAP-BLUP, independently of the LD threshold to create the haplotypes, alpha value, or trait analysed. Using genomic information (fitting individual SNP or haplotypes) provided similar or higher prediction and theoretical accuracies and reduced the dispersion of the GEBV for body weight, wool, and reproductive traits in Rambouillet sheep. However, there were no clear improvements in the prediction bias when compared to pedigree-based predictions. The next step will be to enlarge the training populations for this breed to increase the benefits of genomic predictions.
Collapse
Affiliation(s)
- Andre C. Araujo
- Graduate Program in Animal SciencesState University of Southwestern BahiaItapetingaBahiaBrazil
- Department of Animal SciencesPurdue UniversityWest LafayetteIndianaUSA
| | | | | | - Ronald M. Lewis
- Department of Animal SciencesUniversity of Nebraska‐LincolnLincolnNebraskaUSA
| | - Luiz F. Brito
- Department of Animal SciencesPurdue UniversityWest LafayetteIndianaUSA
| |
Collapse
|
3
|
Alamin M, Sultana MH, Lou X, Jin W, Xu H. Dissecting Complex Traits Using Omics Data: A Review on the Linear Mixed Models and Their Application in GWAS. PLANTS (BASEL, SWITZERLAND) 2022; 11:3277. [PMID: 36501317 PMCID: PMC9739826 DOI: 10.3390/plants11233277] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Revised: 11/23/2022] [Accepted: 11/25/2022] [Indexed: 06/17/2023]
Abstract
Genome-wide association study (GWAS) is the most popular approach to dissecting complex traits in plants, humans, and animals. Numerous methods and tools have been proposed to discover the causal variants for GWAS data analysis. Among them, linear mixed models (LMMs) are widely used statistical methods for regulating confounding factors, including population structure, resulting in increased computational proficiency and statistical power in GWAS studies. Recently more attention has been paid to pleiotropy, multi-trait, gene-gene interaction, gene-environment interaction, and multi-locus methods with the growing availability of large-scale GWAS data and relevant phenotype samples. In this review, we have demonstrated all possible LMMs-based methods available in the literature for GWAS. We briefly discuss the different LMM methods, software packages, and available open-source applications in GWAS. Then, we include the advantages and weaknesses of the LMMs in GWAS. Finally, we discuss the future perspective and conclusion. The present review paper would be helpful to the researchers for selecting appropriate LMM models and methods quickly for GWAS data analysis and would benefit the scientific society.
Collapse
Affiliation(s)
- Md. Alamin
- Institute of Bioinformatics, Zhejiang University, Hangzhou 310058, China
- Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China
| | | | - Xiangyang Lou
- Department of Biostatistics, University of Alabama at Birmingham, Birmingham, AL 35294, USA
| | - Wenfei Jin
- Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China
| | - Haiming Xu
- Institute of Bioinformatics, Zhejiang University, Hangzhou 310058, China
| |
Collapse
|
4
|
Ma Y, Li D, Xu Z, Gu R, Wang P, Fu J, Wang J, Du W, Zhang H. Dissection of the Genetic Basis of Yield Traits in Line per se and Testcross Populations and Identification of Candidate Genes for Hybrid Performance in Maize. Int J Mol Sci 2022; 23:5074. [PMID: 35563470 PMCID: PMC9102962 DOI: 10.3390/ijms23095074] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Revised: 04/22/2022] [Accepted: 04/25/2022] [Indexed: 12/31/2022] Open
Abstract
Dissecting the genetic basis of yield traits in hybrid populations and identifying the candidate genes are important for molecular crop breeding. In this study, a BC1F3:4 population, the line per se (LPS) population, was constructed by using elite inbred lines Zheng58 and PH4CV as the parental lines. The population was genotyped with 55,000 SNPs and testcrossed to Chang7-2 and PH6WC (two testers) to construct two testcross (TC) populations. The three populations were evaluated for hundred kernel weight (HKW) and yield per plant (YPP) in multiple environments. Marker-trait association analysis (MTA) identified 24 to 151 significant SNPs in the three populations. Comparison of the significant SNPs identified common and specific quantitative trait locus/loci (QTL) in the LPS and TC populations. Genetic feature analysis of these significant SNPs proved that these SNPs were associated with the tested traits and could be used to predict trait performance of both LPS and TC populations. RNA-seq analysis was performed using maize hybrid varieties and their parental lines, and differentially expressed genes (DEGs) between hybrid varieties and parental lines were identified. Comparison of the chromosome positions of DEGs with those of significant SNPs detected in the TC population identified potential candidate genes that might be related to hybrid performance. Combining RNA-seq analysis and MTA results identified candidate genes for hybrid performance, providing information that could be useful for maize hybrid breeding.
Collapse
Affiliation(s)
- Yuting Ma
- Agronomy College, Shenyang Agricultural University, Shenyang 110866, China;
- National Key Facility for Crop Gene Resources and Genetic Improvement, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China; (D.L.); (P.W.); (J.F.)
| | - Dongdong Li
- National Key Facility for Crop Gene Resources and Genetic Improvement, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China; (D.L.); (P.W.); (J.F.)
| | - Zhenxiang Xu
- Center for Seed Science and Technology, College of Agronomy and Biotechnology, China Agricultural University, Beijing 100193, China; (Z.X.); (R.G.); (J.W.)
| | - Riliang Gu
- Center for Seed Science and Technology, College of Agronomy and Biotechnology, China Agricultural University, Beijing 100193, China; (Z.X.); (R.G.); (J.W.)
| | - Pingxi Wang
- National Key Facility for Crop Gene Resources and Genetic Improvement, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China; (D.L.); (P.W.); (J.F.)
| | - Junjie Fu
- National Key Facility for Crop Gene Resources and Genetic Improvement, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China; (D.L.); (P.W.); (J.F.)
| | - Jianhua Wang
- Center for Seed Science and Technology, College of Agronomy and Biotechnology, China Agricultural University, Beijing 100193, China; (Z.X.); (R.G.); (J.W.)
| | - Wanli Du
- Agronomy College, Shenyang Agricultural University, Shenyang 110866, China;
| | - Hongwei Zhang
- National Key Facility for Crop Gene Resources and Genetic Improvement, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China; (D.L.); (P.W.); (J.F.)
| |
Collapse
|
5
|
Zhang W, Kang Y, Dai X, Xu S, Zhao PX. PIP-SNP: a pipeline for processing SNP data featured as linkage disequilibrium bin mapping, genotype imputing and marker synthesizing. NAR Genom Bioinform 2021; 3:lqab060. [PMID: 34235432 PMCID: PMC8256826 DOI: 10.1093/nargab/lqab060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2020] [Revised: 05/15/2021] [Accepted: 06/14/2021] [Indexed: 11/12/2022] Open
Abstract
Genome-wide association study data analyses often face two significant challenges: (i) high dimensionality of single-nucleotide polymorphism (SNP) genotypes and (ii) imputation of missing values. SNPs are not independent due to physical linkage and natural selection. The correlation of nearby SNPs is known as linkage disequilibrium (LD), which can be used for LD conceptual SNP bin mapping, missing genotype inferencing and SNP dimension reduction. We used a stochastic process to describe the SNP signals and proposed two types of autocorrelations to measure nearby SNPs' information redundancy. Based on the calculated autocorrelation coefficients, we constructed LD bins. We adopted a k-nearest neighbors algorithm (kNN) to impute the missing genotypes. We proposed several novel methods to find the optimal synthetic marker to represent the SNP bin. We also proposed methods to evaluate the information loss or information conservation between using the original genome-wide markers and using dimension-reduced synthetic markers. Our performance assessments on the real-life SNP data from a rice recombinant inbred line (RIL) population and a rice HapMap project show that the new methods produce satisfactory results. We implemented these functional modules in C/C++ and streamlined them into a web-based pipeline named PIP-SNP (https://bioinfo.noble.org/PIP_SNP/) for processing SNP data.
Collapse
Affiliation(s)
- Wenchao Zhang
- Noble Research Institute LLC, 2510 Sam Noble Parkway, Ardmore, OK 73401, USA
| | - Yun Kang
- Noble Research Institute LLC, 2510 Sam Noble Parkway, Ardmore, OK 73401, USA
| | - Xinbin Dai
- Noble Research Institute LLC, 2510 Sam Noble Parkway, Ardmore, OK 73401, USA
| | - Shizhong Xu
- Department of Botany and Plant Sciences, University of California, Riverside, CA 92521, USA
| | - Patrick X Zhao
- Noble Research Institute LLC, 2510 Sam Noble Parkway, Ardmore, OK 73401, USA
| |
Collapse
|
6
|
Genome-wide association studies: assessing trait characteristics in model and crop plants. Cell Mol Life Sci 2021; 78:5743-5754. [PMID: 34196733 PMCID: PMC8316211 DOI: 10.1007/s00018-021-03868-w] [Citation(s) in RCA: 41] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2021] [Revised: 05/28/2021] [Accepted: 05/29/2021] [Indexed: 01/19/2023]
Abstract
GWAS involves testing genetic variants across the genomes of many individuals of a population to identify genotype–phenotype association. It was initially developed and has proven highly successful in human disease genetics. In plants genome-wide association studies (GWAS) initially focused on single feature polymorphism and recombination and linkage disequilibrium but has now been embraced by a plethora of different disciplines with several thousand studies being published in model and crop species within the last decade or so. Here we will provide a comprehensive review of these studies providing cases studies on biotic resistance, abiotic tolerance, yield associated traits, and metabolic composition. We also detail current strategies of candidate gene validation as well as the functional study of haplotypes. Furthermore, we provide a critical evaluation of the GWAS strategy and its alternatives as well as future perspectives that are emerging with the emergence of pan-genomic datasets.
Collapse
|
7
|
Zhang W, Kang Y, Cheng X, Wen J, Zhang H, Torres-Jerez I, Krom N, Udvardi MK, Scheible WR, Zhao PX. Distinguishing HapMap Accessions Through Recursive Set Partitioning in Hierarchical Decision Trees. FRONTIERS IN PLANT SCIENCE 2021; 12:628421. [PMID: 33613609 PMCID: PMC7886675 DOI: 10.3389/fpls.2021.628421] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/11/2020] [Accepted: 01/11/2021] [Indexed: 06/12/2023]
Abstract
The HapMap (haplotype map) projects have produced valuable genetic resources in life science research communities, allowing researchers to investigate sequence variations and conduct genome-wide association study (GWAS) analyses. A typical HapMap project may require sequencing hundreds, even thousands, of individual lines or accessions within a species. Due to limitations in current sequencing technology, the genotype values for some accessions cannot be clearly called. Additionally, allelic heterozygosity can be very high in some lines, causing genetic and sometimes phenotypic segregation in their descendants. Genetic and phenotypic segregation degrades the original accession's specificity and makes it difficult to distinguish one accession from another. Therefore, it is vitally important to determine and validate HapMap accessions before one conducts a GWAS analysis. However, to the best of our knowledge, there are no prior methodologies or tools that can readily distinguish or validate multiple accessions in a HapMap population. We devised a bioinformatics approach to distinguish multiple HapMap accessions using only a minimum number of genetic markers. First, we assign each candidate marker with a distinguishing score (DS), which measures its capability in distinguishing accessions. The DS score prioritizes those markers with higher percentages of homozygous genotypes (allele combinations), as they can be stably passed on to offspring. Next, we apply the "set-partitioning" concept to select optimal markers by recursively partitioning accession sets. Subsequently, we build a hierarchical decision tree in which a specific path represents the selected markers and the homogenous genotypes that can be used to distinguish one accession from others in the HapMap population. Based on these algorithms, we developed a web tool named MAD-HiDTree (Multiple Accession Distinguishment-Hierarchical Decision Tree), designed to analyze a user-input genotype matrix and construct a hierarchical decision tree. Using genetic marker data extracted from the Medicago truncatula HapMap population, we successfully constructed hierarchical decision trees by which the original 262 M. truncatula accessions could be efficiently distinguished. PCR experiments verified our proposed method, confirming that MAD-HiDTree can be used for the identification of a specific accession. MAD-HiDTree was developed in C/C++ in Linux. Both the source code and test data are publicly available at https://bioinfo.noble.org/MAD-HiDTree/.
Collapse
|
8
|
Liu JY, Zhang YW, Han X, Zuo JF, Zhang Z, Shang H, Song Q, Zhang YM. An evolutionary population structure model reveals pleiotropic effects of GmPDAT for traits related to seed size and oil content in soybean. JOURNAL OF EXPERIMENTAL BOTANY 2020; 71:6988-7002. [PMID: 32926130 DOI: 10.1093/jxb/eraa426] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/10/2020] [Accepted: 09/10/2020] [Indexed: 05/20/2023]
Abstract
Seed oil traits in soybean that are of benefit to human nutrition and health have been selected for during crop domestication. However, these domesticated traits have significant differences across various evolutionary types. In this study, we found that the integration of evolutionary population structure (evolutionary types) with genome-wide association studies increased the power of gene detection, and it identified one locus for traits related to seed size and oil content on chromosome 13. This domestication locus, together with another one in a 200-kb region, was confirmed by the GEMMA and EMMAX software. The candidate gene, GmPDAT, had higher expressional levels in high-oil and large-seed accessions than in low-oil and small-seed accessions. Overexpression lines had increased seed size and oil content, whereas RNAi lines had decreased seed size and oil content. The molecular mechanism of GmPDAT was deduced based on results from linkage analysis for triacylglycerols and on histocytological comparisons of transgenic soybean seeds. Our results illustrate a new approach for identifying domestication genes with pleiotropic effects.
Collapse
Affiliation(s)
- Jin-Yang Liu
- Crop Information Center, College of Plant Science and Technology, Huazhong Agricultural University, Wuhan, China
- Institute of Industrial Crops, Jiangsu Academy of Agricultural Sciences, Nanjing, China
- State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, China
| | - Ya-Wen Zhang
- Crop Information Center, College of Plant Science and Technology, Huazhong Agricultural University, Wuhan, China
| | - Xu Han
- Crop Information Center, College of Plant Science and Technology, Huazhong Agricultural University, Wuhan, China
| | - Jian-Fang Zuo
- Crop Information Center, College of Plant Science and Technology, Huazhong Agricultural University, Wuhan, China
| | - Zhibin Zhang
- Zhengzhou Research Base, State Key Laboratory of Cotton Biology, Zhengzhou University, Zhengzhou, China
| | - Haihong Shang
- Zhengzhou Research Base, State Key Laboratory of Cotton Biology, Zhengzhou University, Zhengzhou, China
| | - Qijian Song
- Soybean Genomics and Improvement Laboratory, Agricultural Research Service, United States Department of Agriculture, Beltsville, Maryland, USA
| | - Yuan-Ming Zhang
- Crop Information Center, College of Plant Science and Technology, Huazhong Agricultural University, Wuhan, China
| |
Collapse
|
9
|
Liu JY, Li P, Zhang YW, Zuo JF, Li G, Han X, Dunwell JM, Zhang YM. Three-dimensional genetic networks among seed oil-related traits, metabolites and genes reveal the genetic foundations of oil synthesis in soybean. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2020; 103:1103-1124. [PMID: 32344462 DOI: 10.1111/tpj.14788] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/11/2019] [Accepted: 04/21/2020] [Indexed: 05/11/2023]
Abstract
Although the biochemical and genetic basis of lipid metabolism is clear in Arabidopsis, there is limited information concerning the relevant genes in Glycine max (soybean). To address this issue, we constructed three-dimensional genetic networks using six seed oil-related traits, 52 lipid metabolism-related metabolites and 54 294 SNPs in 286 soybean accessions in total. As a result, 284 and 279 candidate genes were found to be significantly associated with seed oil-related traits and metabolites by phenotypic and metabolic genome-wide association studies and multi-omics analyses, respectively. Using minimax concave penalty (MCP) and smoothly clipped absolute deviation (SCAD) analyses, six seed oil-related traits were found to be significantly related to 31 metabolites. Among the above candidate genes, 36 genes were found to be associated with oil synthesis (27 genes), amino acid synthesis (four genes) and the tricarboxylic acid (TCA) cycle (five genes), and four genes (GmFATB1a, GmPDAT, GmPLDα1 and GmDAGAT1) are already known to be related to oil synthesis. Using this information, 133 three-dimensional genetic networks were constructed, 24 of which are known, e.g. pyruvate-GmPDAT-GmFATA2-oil content. Using these networks, GmPDAT, GmAGT and GmACP4 reveal the genetic relationships between pyruvate and the three major nutrients, and GmPDAT, GmZF351 and GmPgs1 reveal the genetic relationships between amino acids and seed oil content. In addition, GmCds1, along with average temperature in July and the rainfall from June to September, influence seed oil content across years. This study provides a new approach for the construction of three-dimensional genetic networks and reveals new information for soybean seed oil improvement and the identification of gene function.
Collapse
Affiliation(s)
- Jin-Yang Liu
- State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, 210095, China
- Crop Information Center, College of Plant Science and Technology, Huazhong Agricultural University, Wuhan, 430070, China
| | - Pei Li
- Crop Information Center, College of Plant Science and Technology, Huazhong Agricultural University, Wuhan, 430070, China
| | - Ya-Wen Zhang
- Crop Information Center, College of Plant Science and Technology, Huazhong Agricultural University, Wuhan, 430070, China
| | - Jian-Fang Zuo
- Crop Information Center, College of Plant Science and Technology, Huazhong Agricultural University, Wuhan, 430070, China
| | - Guo Li
- Crop Information Center, College of Plant Science and Technology, Huazhong Agricultural University, Wuhan, 430070, China
| | - Xu Han
- Crop Information Center, College of Plant Science and Technology, Huazhong Agricultural University, Wuhan, 430070, China
| | - Jim M Dunwell
- School of Agriculture, Policy and Development, University of Reading, Reading, RG6 6AR, UK
| | - Yuan-Ming Zhang
- State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, 210095, China
- Crop Information Center, College of Plant Science and Technology, Huazhong Agricultural University, Wuhan, 430070, China
| |
Collapse
|
10
|
Kim B, Dai X, Zhang W, Zhuang Z, Sanchez DL, Lübberstedt T, Kang Y, Udvardi MK, Beavis WD, Xu S, Zhao PX. GWASpro: a high-performance genome-wide association analysis server. Bioinformatics 2020; 35:2512-2514. [PMID: 30508039 PMCID: PMC6612817 DOI: 10.1093/bioinformatics/bty989] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2018] [Revised: 11/14/2018] [Accepted: 11/30/2018] [Indexed: 12/25/2022] Open
Abstract
Summary We present GWASpro, a high-performance web server for the analyses of large-scale genome-wide association studies (GWAS). GWASpro was developed to provide data analyses for large-scale molecular genetic data, coupled with complex replicated experimental designs such as found in plant science investigations and to overcome the steep learning curves of existing GWAS software tools. GWASpro supports building complex design matrices, by which complex experimental designs that may include replications, treatments, locations and times, can be accounted for in the linear mixed model. GWASpro is optimized to handle GWAS data that may consist of up to 10 million markers and 10 000 samples from replicable lines or hybrids. GWASpro provides an interface that significantly reduces the learning curve for new GWAS investigators. Availability and implementation GWASpro is freely available at https://bioinfo.noble.org/GWASPRO. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Xinbin Dai
- Noble Research Institute, Ardmore, OK, USA
| | | | | | | | | | - Yun Kang
- Noble Research Institute, Ardmore, OK, USA
| | | | | | - Shizhong Xu
- Department of Botany and Plant Sciences, University of California, Riverside, CA, USA
| | | |
Collapse
|
11
|
Moura EG, Pamplona AKA, Balestre M. Functional models in genome-wide selection. PLoS One 2019; 14:e0222699. [PMID: 31644532 PMCID: PMC6808424 DOI: 10.1371/journal.pone.0222699] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2018] [Accepted: 09/05/2019] [Indexed: 11/29/2022] Open
Abstract
The development of sequencing technologies has enabled the discovery of markers that are abundantly distributed over the whole genome. Knowledge about the marker locations in reference genomes provides further insights in the search for causal regions and the prediction of genomic values. The present study proposes a Bayesian functional approach for incorporating the marker locations into genomic analysis using stochastic methods to search causal regions and predict genotypic values. For this, three scenarios were analyzed: F2 population with 300 individuals and three different heritability levels (0.2, 0.5, and 0.8), along with 12,150 SNP markers that were distributed through ten linkage groups; F∞ populations with 320 individuals and three different heritability levels (0.2, 0.5, and 0.8), along with 10,020 SNP markers that were distributed through ten linkage groups; and data related to Eucalyptus spp. to measure the model performance in a real LD setting, with 611 individuals whose phenotypes were simulated from QTLs distributed through a panel of 36,812 SNPs with known positions. The performance of the proposed method was compared with those of other genome selection models, namely, RR-BLUP, Bayes B and Bayesian Lasso. The Bayesian functional model presented higher or similar predictive ability when compared with those classical regressions methods in simulated and real scenarios on different LD structures. In general, the Bayesian functional model also achieved higher computational efficiency, using 12 SNPs per MCMC round. The model was efficient in the identification of causal regions and showed high flexibility of analysis, as it is easily adaptable to any genomic selection model.
Collapse
Affiliation(s)
- Ernandes Guedes Moura
- Federal Institute of Maranhão - Campus São João dos Patos, São João dos Patos, Maranhão, Brasil
| | | | - Marcio Balestre
- Department of Statistics - Federal University of Lavras, Lavras, Minas Gerais, Brazil
| |
Collapse
|
12
|
Zhang W, Dai X, Xu S, Zhao PX. GPU empowered pipelines for calculating genome-wide kinship matrices with ultra-high dimensional genetic variants and facilitating 1D and 2D GWAS. NAR Genom Bioinform 2019; 2:lqz009. [PMID: 33575561 PMCID: PMC7671369 DOI: 10.1093/nargab/lqz009] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2019] [Revised: 08/22/2019] [Accepted: 09/25/2019] [Indexed: 12/13/2022] Open
Abstract
Genome-wide association study (GWAS) is a powerful approach that has revolutionized the field of quantitative genetics. Two-dimensional GWAS that accounts for epistatic genetic effects needs to consider the effects of marker pairs, thus quadratic genetic variants, compared to one-dimensional GWAS that accounts for individual genetic variants. Calculating genome-wide kinship matrices in GWAS that account for relationships among individuals represented by ultra-high dimensional genetic variants is computationally challenging. Fortunately, kinship matrix calculation involves pure matrix operations and the algorithms can be parallelized, particular on graphics processing unit (GPU)-empowered high-performance computing (HPC) architectures. We have devised a new method and two pipelines: KMC1D and KMC2D for kinship matrix calculation with high-dimensional genetic variants, respectively, facilitating 1D and 2D GWAS analyses. We first divide the ultra-high-dimensional markers and marker pairs into successive blocks. We then calculate the kinship matrix for each block and merge together the block-wise kinship matrices to form the genome-wide kinship matrix. All the matrix operations have been parallelized using GPU kernels on our NVIDIA GPU-accelerated server platform. The performance analyses show that the calculation speed of KMC1D and KMC2D can be accelerated by 100–400 times over the conventional CPU-based computing.
Collapse
Affiliation(s)
- Wenchao Zhang
- Noble Research Institute, LLC, 2510 Sam Noble Parkway, Ardmore, OK 73401, USA
| | - Xinbin Dai
- Noble Research Institute, LLC, 2510 Sam Noble Parkway, Ardmore, OK 73401, USA
| | - Shizhong Xu
- Department of Botany and Plant Sciences, University of California, Riverside, CA 92521, USA
| | - Patrick X Zhao
- Noble Research Institute, LLC, 2510 Sam Noble Parkway, Ardmore, OK 73401, USA
| |
Collapse
|
13
|
Rapp JP, Joe B. Dissecting Epistatic QTL for Blood Pressure in Rats: Congenic Strains versus Heterogeneous Stocks, a Reality Check. Compr Physiol 2019; 9:1305-1337. [PMID: 31688958 DOI: 10.1002/cphy.c180038] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
Advances in molecular genetics have provided well-defined physical genetic maps and large numbers of genetic markers for both model organisms and humans. It is now possible to gain a fundamental understanding of the genetic architecture underlying quantitative traits, of which blood pressure (BP) is an important example. This review emphasizes analytical techniques and results obtained using the Dahl salt-sensitive (S) rat as a model of hypertension by presenting results in detail for three specific chromosomal regions harboring genetic elements of increasing complexity controlling BP. These results highlight the critical importance of genetic interactions (epistasis) on BP at all levels of structure, intragenic, intergenic, intrachromosomal, interchromosomal, and across whole genomes. In two of the three examples presented, specific DNA structural variations leading to biochemical, physiological, and pathological mechanisms are well defined. This proves the usefulness of the techniques involving interval mapping followed by substitution mapping using congenic strains. These classic techniques are compared to newer approaches using sophisticated statistical analysis on various segregating or outbred model-organism populations, which in some cases are uniquely useful in demonstrating the existence of higher-order interactions. It is speculated that hypertension as an outlier quantitative phenotype is dependent on higher-order genetic interactions. The obstacle to the identification of genetic elements and the biochemical/physiological mechanisms involved in higher-order interactions is not theoretical or technical but the lack of future resources to finish the job of identifying the individual genetic elements underlying the quantitative trait loci for BP and ascertaining their molecular functions. © 2019 American Physiological Society. Compr Physiol 9:1305-1337, 2019.
Collapse
Affiliation(s)
- John P Rapp
- Physiological Genomics Laboratory, Department of Physiology and Pharmacology, Center for Hypertension and Precision Medicine, University of Toledo College of Medicine and Life Sciences, Toledo, OH, USA
| | - Bina Joe
- Physiological Genomics Laboratory, Department of Physiology and Pharmacology, Center for Hypertension and Precision Medicine, University of Toledo College of Medicine and Life Sciences, Toledo, OH, USA
| |
Collapse
|
14
|
Liu HJ, Yan J. Crop genome-wide association study: a harvest of biological relevance. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2019; 97:8-18. [PMID: 30368955 DOI: 10.1111/tpj.14139] [Citation(s) in RCA: 113] [Impact Index Per Article: 22.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/23/2018] [Revised: 10/13/2018] [Accepted: 10/22/2018] [Indexed: 05/20/2023]
Abstract
With the advent of rapid genotyping and next-generation sequencing technologies, genome-wide association study (GWAS) has become a routine strategy for decoding genotype-phenotype associations in many species. More than 1000 such studies over the last decade have revealed substantial genotype-phenotype associations in crops and provided unparalleled opportunities to probe functional genomics. Beyond the many 'hits' obtained, this review summarizes recent efforts to increase our understanding of the genetic architecture of complex traits by focusing on non-main effects including epistasis, pleiotropy, and phenotypic plasticity. We also discuss how these achievements and the remaining gaps in our knowledge will guide future studies. Synthetic association is highlighted as leading to false causality, which is prevalent but largely underestimated. Furthermore, validation evidence is appealing for future GWAS, especially in the context of emerging genome-editing technologies.
Collapse
Affiliation(s)
- Hai-Jun Liu
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
| | - Jianbing Yan
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
| |
Collapse
|
15
|
Zhang W, Dai X, Xu S, Zhao PX. 2D association and integrative omics analysis in rice provides systems biology view in trait analysis. Commun Biol 2018; 1:153. [PMID: 30272029 PMCID: PMC6160469 DOI: 10.1038/s42003-018-0159-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2018] [Accepted: 08/30/2018] [Indexed: 12/28/2022] Open
Abstract
The interactions among genes and between genes and environment contribute significantly to the phenotypic variation of complex traits and may be possible explanations for missing heritability. However, to our knowledge no existing tool can address the two kinds of interactions. Here we propose a novel linear mixed model that considers not only the additive effects of biological markers but also the interaction effects of marker pairs. Interaction effect is demonstrated as a 2D association. Based on this linear mixed model, we developed a pipeline, namely PATOWAS. PATOWAS can be used to study transcriptome-wide and metabolome-wide associations in addition to genome-wide associations. Our case analysis with real rice recombinant inbred lines (RILs) at three omics levels demonstrates that 2D association mapping and integrative omics are able to provide a systems biology view into the analyzed traits, leading toward an answer about how genes, transcripts, proteins, and metabolites work together to produce an observable phenotype. Wenchao Zhang et al. developed a tool for analyzing traits using data generated from genome-wide, transcriptome-wide, and metabolome-wide association studies. They test their approach in rice, providing a systems biology view of identified traits.
Collapse
Affiliation(s)
- Wenchao Zhang
- Computational Biology and Bioinformatics Lab, Noble Research Institute, Ardmore, OK, 73401, USA
| | - Xinbin Dai
- Computational Biology and Bioinformatics Lab, Noble Research Institute, Ardmore, OK, 73401, USA
| | - Shizhong Xu
- Department of Botany and Plant Sciences, University of California, Riverside, CA, 92521, USA.
| | - Patrick X Zhao
- Computational Biology and Bioinformatics Lab, Noble Research Institute, Ardmore, OK, 73401, USA.
| |
Collapse
|
16
|
Bazakos C, Hanemian M, Trontin C, Jiménez-Gómez JM, Loudet O. New Strategies and Tools in Quantitative Genetics: How to Go from the Phenotype to the Genotype. ANNUAL REVIEW OF PLANT BIOLOGY 2017; 68:435-455. [PMID: 28226236 DOI: 10.1146/annurev-arplant-042916-040820] [Citation(s) in RCA: 54] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
Quantitative genetics has a long history in plants: It has been used to study specific biological processes, identify the factors important for trait evolution, and breed new crop varieties. These classical approaches to quantitative trait locus mapping have naturally improved with technology. In this review, we show how quantitative genetics has evolved recently in plants and how new developments in phenotyping, population generation, sequencing, gene manipulation, and statistics are rejuvenating both the classical linkage mapping approaches (for example, through nested association mapping) as well as the more recently developed genome-wide association studies. These strategies are complementary in most instances, and indeed, one is often used to confirm the results of the other. Despite significant advances, an emerging trend is that the outcome and efficiency of the different approaches depend greatly on the genetic architecture of the trait in the genetic material under study.
Collapse
Affiliation(s)
- Christos Bazakos
- Institut Jean-Pierre Bourgin, INRA, AgroParisTech, CNRS, Université Paris-Saclay, 78026 Versailles Cedex, France;
| | - Mathieu Hanemian
- Institut Jean-Pierre Bourgin, INRA, AgroParisTech, CNRS, Université Paris-Saclay, 78026 Versailles Cedex, France;
| | - Charlotte Trontin
- Institut Jean-Pierre Bourgin, INRA, AgroParisTech, CNRS, Université Paris-Saclay, 78026 Versailles Cedex, France;
| | - José M Jiménez-Gómez
- Institut Jean-Pierre Bourgin, INRA, AgroParisTech, CNRS, Université Paris-Saclay, 78026 Versailles Cedex, France;
| | - Olivier Loudet
- Institut Jean-Pierre Bourgin, INRA, AgroParisTech, CNRS, Université Paris-Saclay, 78026 Versailles Cedex, France;
| |
Collapse
|
17
|
Development of a multiple-hybrid population for genome-wide association studies: theoretical consideration and genetic mapping of flowering traits in maize. Sci Rep 2017; 7:40239. [PMID: 28071695 PMCID: PMC5223130 DOI: 10.1038/srep40239] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2016] [Accepted: 12/05/2016] [Indexed: 12/18/2022] Open
Abstract
Various types of populations have been used in genetics, genomics and crop improvement, including bi- and multi-parental populations and natural ones. The latter has been widely used in genome-wide association study (GWAS). However, inbred-based GWAS cannot be used to reveal the mechanisms involved in hybrid performance. We developed a novel maize population, multiple-hybrid population (MHP), consisting of 724 hybrids produced using 28 temperate and 23 tropical inbreds. The hybrids can be divided into three subpopulations, two diallels and NC (North Carolina Design) II. Significant genetic differences were identified among parents, hybrids and heterotic groups. A cluster analysis revealed heterotic groups existing in the parental lines and the results showed that MHPs are well suitable for GWAS in hybrid crops. MHP-based GWAS was performed using 55 K SNP array for flowering time traits, days to tassel, days to silk, days to anthesis and anthesis-silking interval. Two independent methods, PEPIS developed for hybrids and TASSEL software designed for inbred line populations, revealed highly consistent results with five overlapping chromosomal regions identified and used for discovery of candidate genes and quantitative trait nucleotides. Our results indicate that MHPs are powerful in GWAS for hybrid-related traits with great potential applications in the molecular breeding era.
Collapse
|