1
|
Wang XY, Ren CX, Fan QW, Xu YP, Wang LW, Mao ZL, Cai XZ. Integrated Assays of Genome-Wide Association Study, Multi-Omics Co-Localization, and Machine Learning Associated Calcium Signaling Genes with Oilseed Rape Resistance to Sclerotinia sclerotiorum. Int J Mol Sci 2024; 25:6932. [PMID: 39000053 PMCID: PMC11240920 DOI: 10.3390/ijms25136932] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2024] [Revised: 06/20/2024] [Accepted: 06/20/2024] [Indexed: 07/14/2024] Open
Abstract
Sclerotinia sclerotiorum (Ss) is one of the most devastating fungal pathogens, causing huge yield loss in multiple economically important crops including oilseed rape. Plant resistance to Ss pertains to quantitative disease resistance (QDR) controlled by multiple minor genes. Genome-wide identification of genes involved in QDR to Ss is yet to be conducted. In this study, we integrated several assays including genome-wide association study (GWAS), multi-omics co-localization, and machine learning prediction to identify, on a genome-wide scale, genes involved in the oilseed rape QDR to Ss. Employing GWAS and multi-omics co-localization, we identified seven resistance-associated loci (RALs) associated with oilseed rape resistance to Ss. Furthermore, we developed a machine learning algorithm and named it Integrative Multi-Omics Analysis and Machine Learning for Target Gene Prediction (iMAP), which integrates multi-omics data to rapidly predict disease resistance-related genes within a broad chromosomal region. Through iMAP based on the identified RALs, we revealed multiple calcium signaling genes related to the QDR to Ss. Population-level analysis of selective sweeps and haplotypes of variants confirmed the positive selection of the predicted calcium signaling genes during evolution. Overall, this study has developed an algorithm that integrates multi-omics data and machine learning methods, providing a powerful tool for predicting target genes associated with specific traits. Furthermore, it makes a basis for further understanding the role and mechanisms of calcium signaling genes in the QDR to Ss.
Collapse
Affiliation(s)
- Xin-Yao Wang
- Key Laboratory of Biology and Ecological Control of Crop Pathogens and Insects of Zhejiang Province, Institute of Biotechnology, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 310058, China; (X.-Y.W.); (C.-X.R.); (Q.-W.F.); (L.-W.W.); (Z.-L.M.)
| | - Chun-Xiu Ren
- Key Laboratory of Biology and Ecological Control of Crop Pathogens and Insects of Zhejiang Province, Institute of Biotechnology, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 310058, China; (X.-Y.W.); (C.-X.R.); (Q.-W.F.); (L.-W.W.); (Z.-L.M.)
| | - Qing-Wen Fan
- Key Laboratory of Biology and Ecological Control of Crop Pathogens and Insects of Zhejiang Province, Institute of Biotechnology, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 310058, China; (X.-Y.W.); (C.-X.R.); (Q.-W.F.); (L.-W.W.); (Z.-L.M.)
| | - You-Ping Xu
- Centre of Analysis and Measurement, Zhejiang University, 866 Yu Hang Tang Road, Hangzhou 310058, China;
| | - Lu-Wen Wang
- Key Laboratory of Biology and Ecological Control of Crop Pathogens and Insects of Zhejiang Province, Institute of Biotechnology, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 310058, China; (X.-Y.W.); (C.-X.R.); (Q.-W.F.); (L.-W.W.); (Z.-L.M.)
| | - Zhou-Lu Mao
- Key Laboratory of Biology and Ecological Control of Crop Pathogens and Insects of Zhejiang Province, Institute of Biotechnology, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 310058, China; (X.-Y.W.); (C.-X.R.); (Q.-W.F.); (L.-W.W.); (Z.-L.M.)
| | - Xin-Zhong Cai
- Key Laboratory of Biology and Ecological Control of Crop Pathogens and Insects of Zhejiang Province, Institute of Biotechnology, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 310058, China; (X.-Y.W.); (C.-X.R.); (Q.-W.F.); (L.-W.W.); (Z.-L.M.)
- Hainan Institute, Zhejiang University, Sanya 572025, China
| |
Collapse
|
2
|
Liang X, Sun H. Weighted Selection Probability to Prioritize Susceptible Rare Variants in Multi-Phenotype Association Studies with Application to a Soybean Genetic Data Set. J Comput Biol 2023; 30:1075-1088. [PMID: 37871292 DOI: 10.1089/cmb.2022.0487] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2023] Open
Abstract
Rare variant association studies with multiple traits or diseases have drawn a lot of attention since association signals of rare variants can be boosted if more than one phenotype outcome is associated with the same rare variants. Most of the existing statistical methods to identify rare variants associated with multiple phenotypes are based on a group test, where a pre-specified genetic region is tested one at a time. However, these methods are not designed to locate susceptible rare variants within the genetic region. In this article, we propose new statistical methods to prioritize rare variants within a genetic region when a group test for the genetic region identifies a statistical association with multiple phenotypes. It computes the weighted selection probability (WSP) of individual rare variants and ranks them from largest to smallest according to their WSP. In simulation studies, we demonstrated that the proposed method outperforms other statistical methods in terms of true positive selection, when multiple phenotypes are correlated with each other. We also applied it to our soybean single nucleotide polymorphism (SNP) data with 13 highly correlated amino acids, where we identified some potentially susceptible rare variants in chromosome 19.
Collapse
Affiliation(s)
- Xianglong Liang
- Department of Statistic, Pusan National University, Busan, Korea
| | - Hokeun Sun
- Department of Statistic, Pusan National University, Busan, Korea
| |
Collapse
|
3
|
Ehn M, Michel S, Morales L, Gordon T, Dallinger HG, Buerstmayr H. Genome-wide association mapping identifies common bunt (Tilletia caries) resistance loci in bread wheat (Triticum aestivum) accessions of the USDA National Small Grains Collection. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2022; 135:3103-3115. [PMID: 35896689 PMCID: PMC9668943 DOI: 10.1007/s00122-022-04171-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Accepted: 07/05/2022] [Indexed: 06/15/2023]
Abstract
Association mapping and phenotypic analysis of a diversity panel of 238 bread wheat accessions highlights differences in resistance against common vs. dwarf bunt and identifies genotypes valuable for bi-parental crosses. Common bunt caused by Tilletia caries and T. laevis was successfully controlled by seed dressings with systemic fungicides for decades, but has become a renewed threat to wheat yield and quality in organic agriculture where such treatments are forbidden. As the most efficient way to address this problem is the use of resistant cultivars, this study aims to broaden the spectrum of resistance sources available for breeders by identifying resistance loci against common bunt in bread wheat accessions of the USDA National Small Grains Collection. We conducted three years of artificially inoculated field trials to assess common bunt infection levels in a diversity panel comprising 238 wheat accessions for which data on resistance against the closely related pathogen Tilletia controversa causing dwarf bunt was already available. Resistance levels against common bunt were higher compared to dwarf bunt with 99 accessions showing [Formula: see text] 1% incidence. Genome-wide association mapping identified six markers significantly associated with common bunt incidence in regions already known to confer resistance on chromosomes 1A and 1B and novel loci on 2B and 7A. Our results show that resistance against common and dwarf bunt is not necessarily controlled by the same loci but we identified twenty accessions with high resistance against both diseases. These represent valuable new resources for research and breeding programs since several bunt races have already been reported to overcome known resistance genes.
Collapse
Affiliation(s)
- Magdalena Ehn
- Institute of Biotechnology in Plant Production, University of Natural Resources and Life Sciences, Konrad-Lorenz-Strasse 20, 3430, Tulln, Austria.
| | - Sebastian Michel
- Institute of Biotechnology in Plant Production, University of Natural Resources and Life Sciences, Konrad-Lorenz-Strasse 20, 3430, Tulln, Austria
| | - Laura Morales
- Institute of Biotechnology in Plant Production, University of Natural Resources and Life Sciences, Konrad-Lorenz-Strasse 20, 3430, Tulln, Austria
| | - Tyler Gordon
- Small Grains and Potato Germplasm Research Unit, USDA-ARS, 1691 S. 2700 W., Aberdeen, ID, 83210, USA
| | - Hermann Gregor Dallinger
- Institute of Biotechnology in Plant Production, University of Natural Resources and Life Sciences, Konrad-Lorenz-Strasse 20, 3430, Tulln, Austria
| | - Hermann Buerstmayr
- Institute of Biotechnology in Plant Production, University of Natural Resources and Life Sciences, Konrad-Lorenz-Strasse 20, 3430, Tulln, Austria
| |
Collapse
|
4
|
Genome-wide association studies: assessing trait characteristics in model and crop plants. Cell Mol Life Sci 2021; 78:5743-5754. [PMID: 34196733 PMCID: PMC8316211 DOI: 10.1007/s00018-021-03868-w] [Citation(s) in RCA: 41] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2021] [Revised: 05/28/2021] [Accepted: 05/29/2021] [Indexed: 01/19/2023]
Abstract
GWAS involves testing genetic variants across the genomes of many individuals of a population to identify genotype–phenotype association. It was initially developed and has proven highly successful in human disease genetics. In plants genome-wide association studies (GWAS) initially focused on single feature polymorphism and recombination and linkage disequilibrium but has now been embraced by a plethora of different disciplines with several thousand studies being published in model and crop species within the last decade or so. Here we will provide a comprehensive review of these studies providing cases studies on biotic resistance, abiotic tolerance, yield associated traits, and metabolic composition. We also detail current strategies of candidate gene validation as well as the functional study of haplotypes. Furthermore, we provide a critical evaluation of the GWAS strategy and its alternatives as well as future perspectives that are emerging with the emergence of pan-genomic datasets.
Collapse
|
5
|
Kumar R, Sharma V, Suresh S, Ramrao DP, Veershetty A, Kumar S, Priscilla K, Hangargi B, Narasanna R, Pandey MK, Naik GR, Thomas S, Kumar A. Understanding Omics Driven Plant Improvement and de novo Crop Domestication: Some Examples. Front Genet 2021; 12:637141. [PMID: 33889179 PMCID: PMC8055929 DOI: 10.3389/fgene.2021.637141] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2020] [Accepted: 03/02/2021] [Indexed: 01/07/2023] Open
Abstract
In the current era, one of biggest challenges is to shorten the breeding cycle for rapid generation of a new crop variety having high yield capacity, disease resistance, high nutrient content, etc. Advances in the "-omics" technology have revolutionized the discovery of genes and bio-molecules with remarkable precision, resulting in significant development of plant-focused metabolic databases and resources. Metabolomics has been widely used in several model plants and crop species to examine metabolic drift and changes in metabolic composition during various developmental stages and in response to stimuli. Over the last few decades, these efforts have resulted in a significantly improved understanding of the metabolic pathways of plants through identification of several unknown intermediates. This has assisted in developing several new metabolically engineered important crops with desirable agronomic traits, and has facilitated the de novo domestication of new crops for sustainable agriculture and food security. In this review, we discuss how "omics" technologies, particularly metabolomics, has enhanced our understanding of important traits and allowed speedy domestication of novel crop plants.
Collapse
Affiliation(s)
- Rakesh Kumar
- Department of Life Science, Central University of Karnataka, Kalaburagi, India
| | - Vinay Sharma
- International Crops Research Institute for the Semi-Arid Tropics, Hyderabad, India
| | - Srinivas Suresh
- Department of Life Science, Central University of Karnataka, Kalaburagi, India
| | | | - Akash Veershetty
- Department of Life Science, Central University of Karnataka, Kalaburagi, India
| | - Sharan Kumar
- Department of Life Science, Central University of Karnataka, Kalaburagi, India
| | - Kagolla Priscilla
- Department of Life Science, Central University of Karnataka, Kalaburagi, India
| | | | - Rahul Narasanna
- Department of Life Science, Central University of Karnataka, Kalaburagi, India
| | - Manish Kumar Pandey
- International Crops Research Institute for the Semi-Arid Tropics, Hyderabad, India
| | | | - Sherinmol Thomas
- Department of Biosciences & Bioengineering, Indian Institute of Technology Bombay, Mumbai, India
| | - Anirudh Kumar
- Department of Botany, Indira Gandhi National Tribal University, Amarkantak, India
| |
Collapse
|
6
|
Tibbs Cortes L, Zhang Z, Yu J. Status and prospects of genome-wide association studies in plants. THE PLANT GENOME 2021; 14:e20077. [PMID: 33442955 DOI: 10.1002/tpg2.20077] [Citation(s) in RCA: 127] [Impact Index Per Article: 42.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/05/2020] [Accepted: 11/18/2020] [Indexed: 05/22/2023]
Abstract
Genome-wide association studies (GWAS) have developed into a powerful and ubiquitous tool for the investigation of complex traits. In large part, this was fueled by advances in genomic technology, enabling us to examine genome-wide genetic variants across diverse genetic materials. The development of the mixed model framework for GWAS dramatically reduced the number of false positives compared with naïve methods. Building on this foundation, many methods have since been developed to increase computational speed or improve statistical power in GWAS. These methods have allowed the detection of genomic variants associated with either traditional agronomic phenotypes or biochemical and molecular phenotypes. In turn, these associations enable applications in gene cloning and in accelerated crop breeding through marker assisted selection or genetic engineering. Current topics of investigation include rare-variant analysis, synthetic associations, optimizing the choice of GWAS model, and utilizing GWAS results to advance knowledge of biological processes. Ongoing research in these areas will facilitate further advances in GWAS methods and their applications.
Collapse
Affiliation(s)
| | - Zhiwu Zhang
- Department of Crop and Soil Sciences, Washington State University, Pullman, WA, 99164, USA
| | - Jianming Yu
- Department of Agronomy, Iowa State University, Ames, IA, 50010, USA
| |
Collapse
|
7
|
Genome-Wide Association Studies in Arabidopsis thaliana: Statistical Analysis and Network-Based Augmentation of Signals. Methods Mol Biol 2020; 2200:187-210. [PMID: 33175379 DOI: 10.1007/978-1-0716-0880-7_9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/09/2023]
Abstract
Genome-wide association studies (GWAS) have proven effective at identifying genetic variants and genes that are associated with phenotypes in humans, animals, and plants. Since most phenotypes of plant species are complex traits regulated by many genes and their functional interactions, GWAS are increasing in popularity for genetic dissections of plant phenotypes. For the reference plant, Arabidopsis thaliana, detailed information on genetic variations became available with the completion of the 1001 Genomes Project, enabling highly resolved association mapping between chromosomal loci and complex traits. Improvements have been made in the statistical analysis methods for testing the significance of genotype-to-phenotype associations, thereby substantially reducing the confounding effects of population structures. Furthermore, there have been large efforts toward post-GWAS augmentation of signals via integration with other types of information to overcome the limited statistical power of GWAS. This chapter describes the stepwise procedure of GWAS in Arabidopsis, focusing on data analysis processes including preprocessing of genotype and phenotype data, statistical analysis to identify phenotype-associated chromosomal loci, identification of phenotype-associated genes based on the phenotype-associated loci, and finally network-based augmentation of GWAS signals to identify additional candidate genes for the phenotype.
Collapse
|
8
|
Liu HJ, Yan J. Crop genome-wide association study: a harvest of biological relevance. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2019; 97:8-18. [PMID: 30368955 DOI: 10.1111/tpj.14139] [Citation(s) in RCA: 112] [Impact Index Per Article: 22.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/23/2018] [Revised: 10/13/2018] [Accepted: 10/22/2018] [Indexed: 05/20/2023]
Abstract
With the advent of rapid genotyping and next-generation sequencing technologies, genome-wide association study (GWAS) has become a routine strategy for decoding genotype-phenotype associations in many species. More than 1000 such studies over the last decade have revealed substantial genotype-phenotype associations in crops and provided unparalleled opportunities to probe functional genomics. Beyond the many 'hits' obtained, this review summarizes recent efforts to increase our understanding of the genetic architecture of complex traits by focusing on non-main effects including epistasis, pleiotropy, and phenotypic plasticity. We also discuss how these achievements and the remaining gaps in our knowledge will guide future studies. Synthetic association is highlighted as leading to false causality, which is prevalent but largely underestimated. Furthermore, validation evidence is appealing for future GWAS, especially in the context of emerging genome-editing technologies.
Collapse
Affiliation(s)
- Hai-Jun Liu
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
| | - Jianbing Yan
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
| |
Collapse
|
9
|
Cubry P, Vigouroux Y, François O. The Empirical Distribution of Singletons for Geographic Samples of DNA Sequences. Front Genet 2017; 8:139. [PMID: 29033977 PMCID: PMC5627571 DOI: 10.3389/fgene.2017.00139] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2017] [Accepted: 09/14/2017] [Indexed: 12/31/2022] Open
Abstract
Rare variants are important for drawing inference about past demographic events in a species history. A singleton is a rare variant for which genetic variation is carried by a unique chromosome in a sample. How singletons are distributed across geographic space provides a local measure of genetic diversity that can be measured at the individual level. Here, we define the empirical distribution of singletons in a sample of chromosomes as the proportion of the total number of singletons that each chromosome carries, and we present a theoretical background for studying this distribution. Next, we use computer simulations to evaluate the potential for the empirical distribution of singletons to provide a description of genetic diversity across geographic space. In a Bayesian framework, we show that the empirical distribution of singletons leads to accurate estimates of the geographic origin of range expansions. We apply the Bayesian approach to estimating the origin of the cultivated plant species Pennisetum glaucum [L.] R. Br. (pearl millet) in Africa, and find support for range expansion having started from Northern Mali. Overall, we report that the empirical distribution of singletons is a useful measure to analyze results of sequencing projects based on large scale sampling of individuals across geographic space.
Collapse
Affiliation(s)
- Philippe Cubry
- UMR DIADE, University of Montpellier, Montpellier, France
| | - Yves Vigouroux
- UMR DIADE, University of Montpellier, Montpellier, France
| | - Olivier François
- TIMC-IMAG UMR 5525, Centre National de la Recherche Scientifique (CNRS), Université Grenoble-Alpes, Grenoble, France
| |
Collapse
|
10
|
Memon S, Jia X, Gu L, Zhang X. Genomic variations and distinct evolutionary rate of rare alleles in Arabidopsis thaliana. BMC Evol Biol 2016; 16:25. [PMID: 26817829 PMCID: PMC4728917 DOI: 10.1186/s12862-016-0590-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2015] [Accepted: 01/12/2016] [Indexed: 01/24/2023] Open
Abstract
BACKGROUND The variation rate in genomic regions associated with different alleles, impacts to distinct evolutionary patterns involving rare alleles. The rare alleles bias towards genome-wide association studies (GWASs), aim to detect different variants at genomic loci associated with single-nucleotide polymorphisms (SNPs) inclined to produce different haplotypes. Here, we sequenced Arabidopsis thaliana and compared its coding and non-coding genomic regions with its closest outgroup relative, Arabidopsis lyrta, which accounted for the ancestral misinference. The use of genome-wide SNPs interpret the genetic architecture of rare alleles in Arabidopsis thaliana, elucidating a significant departure from a neutral evolutionary model and the pattern of polymorphisms around a selected locus will exclusively influence natural selection. RESULTS We found 23.4% of the rare alleles existing randomly in the genome. Notably, in our results significant differences (P < 0.01) were estimated in the relative rates between rare versus intermediate alleles, between fixed versus non-fixed mutations, and between type I versus type II rare-mutations by using the χ (2)-test. However, the rare alleles generating negative values of Tajima's D suggest that they generated under selective sweeps. Relative to polymorphic sites including SNPs, 67.5% of the fixed mutations were attributed, indicating major contributors to speciation. Substantially, an evolution occurred in the rare allele that was 1.42-times faster than that in a major haplotype. CONCLUSION Our results interpret that rare alleles fits a random occurrence model, indicating that rare alleles occur at any locus in a genome and in any accession in a species. Based on the higher relative rate of derived to ancient mutations and higher average D xy, we conclude that rare alleles evolve faster than the higher frequency alleles. The rapid evolution of rare alleles indicates that they must have been newly generated with fixed mutations, compared with the other alleles. Eventually, PCR and sequencing results, in the flanking regions of rare allele loci confirm that they are of short extension, indicating the absence of a genome-wide pattern for a rare haplotype. The indel-associated model for rare alleles assumes that indel-associated mutations only occur in an indel heterozygote.
Collapse
Affiliation(s)
- Shabana Memon
- School of life Sciences, Nanjing University, Nanjing, 210093, China. .,Lecturer, Department of Plant Breeding and Genetics, Sindh Agriculture University, Tando Jam, Hyderabad, 70060, Pakistan.
| | - Xianqing Jia
- School of life Sciences, Nanjing University, Nanjing, 210093, China.
| | - Longjiang Gu
- School of life Sciences, Nanjing University, Nanjing, 210093, China.
| | - Xiaohui Zhang
- School of life Sciences, Nanjing University, Nanjing, 210093, China.
| |
Collapse
|
11
|
Zhai J, Tang Y, Yuan H, Wang L, Shang H, Ma C. A Meta-Analysis Based Method for Prioritizing Candidate Genes Involved in a Pre-specific Function. FRONTIERS IN PLANT SCIENCE 2016; 7:1914. [PMID: 28018423 PMCID: PMC5156684 DOI: 10.3389/fpls.2016.01914] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/01/2016] [Accepted: 12/02/2016] [Indexed: 05/10/2023]
Abstract
The identification of genes associated with a given biological function in plants remains a challenge, although network-based gene prioritization algorithms have been developed for Arabidopsis thaliana and many non-model plant species. Nevertheless, these network-based gene prioritization algorithms have encountered several problems; one in particular is that of unsatisfactory prediction accuracy due to limited network coverage, varying link quality, and/or uncertain network connectivity. Thus, a model that integrates complementary biological data may be expected to increase the prediction accuracy of gene prioritization. Toward this goal, we developed a novel gene prioritization method named RafSee, to rank candidate genes using a random forest algorithm that integrates sequence, evolutionary, and epigenetic features of plants. Subsequently, we proposed an integrative approach named RAP (Rank Aggregation-based data fusion for gene Prioritization), in which an order statistics-based meta-analysis was used to aggregate the rank of the network-based gene prioritization method and RafSee, for accurately prioritizing candidate genes involved in a pre-specific biological function. Finally, we showcased the utility of RAP by prioritizing 380 flowering-time genes in Arabidopsis. The "leave-one-out" cross-validation experiment showed that RafSee could work as a complement to a current state-of-art network-based gene prioritization system (AraNet v2). Moreover, RAP ranked 53.68% (204/380) flowering-time genes higher than AraNet v2, resulting in an 39.46% improvement in term of the first quartile rank. Further evaluations also showed that RAP was effective in prioritizing genes-related to different abiotic stresses. To enhance the usability of RAP for Arabidopsis and non-model plant species, an R package implementing the method is freely available at http://bioinfo.nwafu.edu.cn/software.
Collapse
|
12
|
Lipka AE, Kandianis CB, Hudson ME, Yu J, Drnevich J, Bradbury PJ, Gore MA. From association to prediction: statistical methods for the dissection and selection of complex traits in plants. CURRENT OPINION IN PLANT BIOLOGY 2015; 24:110-8. [PMID: 25795170 DOI: 10.1016/j.pbi.2015.02.010] [Citation(s) in RCA: 82] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/15/2014] [Revised: 02/24/2015] [Accepted: 02/27/2015] [Indexed: 05/02/2023]
Abstract
Quantification of genotype-to-phenotype associations is central to many scientific investigations, yet the ability to obtain consistent results may be thwarted without appropriate statistical analyses. Models for association can consider confounding effects in the materials and complex genetic interactions. Selecting optimal models enables accurate evaluation of associations between marker loci and numerous phenotypes including gene expression. Significant improvements in QTL discovery via association mapping and acceleration of breeding cycles through genomic selection are two successful applications of models using genome-wide markers. Given recent advances in genotyping and phenotyping technologies, further refinement of these approaches is needed to model genetic architecture more accurately and run analyses in a computationally efficient manner, all while accounting for false positives and maximizing statistical power.
Collapse
Affiliation(s)
- Alexander E Lipka
- University of Illinois, Department of Crop Sciences, Urbana, IL 61801, USA.
| | - Catherine B Kandianis
- Michigan State University, Department of Biochemistry and Molecular Biology, East Lansing, MI 48824, USA; Cornell University, Plant Breeding and Genetics Section, School of Integrative Plant Science, Ithaca, NY 14853, USA
| | - Matthew E Hudson
- University of Illinois, Department of Crop Sciences, Urbana, IL 61801, USA
| | - Jianming Yu
- Iowa State University, Department of Agronomy, Ames, IA 50011, USA
| | - Jenny Drnevich
- University of Illinois, High Performance Biological Computing Group and the Carver Biotechnology Center, Urbana, IL 61801, USA
| | - Peter J Bradbury
- United States Department of Agriculture (USDA) - Agricultural Research Service (ARS), Robert W. Holley Center for Agriculture and Health, Ithaca, NY 14853, USA
| | - Michael A Gore
- Cornell University, Plant Breeding and Genetics Section, School of Integrative Plant Science, Ithaca, NY 14853, USA
| |
Collapse
|