1
|
Berruto CA, Demirer GS. Engineering agricultural soil microbiomes and predicting plant phenotypes. Trends Microbiol 2024:S0966-842X(24)00043-X. [PMID: 38429182 DOI: 10.1016/j.tim.2024.02.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2023] [Revised: 02/02/2024] [Accepted: 02/06/2024] [Indexed: 03/03/2024]
Abstract
Plant growth-promoting rhizobacteria (PGPR) can improve crop yields, nutrient use efficiency, plant tolerance to stressors, and confer benefits to future generations of crops grown in the same soil. Unlocking the potential of microbial communities in the rhizosphere and endosphere is therefore of great interest for sustainable agriculture advancements. Before plant microbiomes can be engineered to confer desirable phenotypic effects on their plant hosts, a deeper understanding of the interacting factors influencing rhizosphere community structure and function is needed. Dealing with this complexity is becoming more feasible using computational approaches. In this review, we discuss recent advances at the intersection of experimental and computational strategies for the investigation of plant-microbiome interactions and the engineering of desirable soil microbiomes.
Collapse
Affiliation(s)
- Chiara A Berruto
- Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
| | - Gozde S Demirer
- Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, CA, USA.
| |
Collapse
|
2
|
Corut AK, Wallace JG. kGWASflow: a modular, flexible, and reproducible Snakemake workflow for k-mers-based GWAS. G3 (BETHESDA, MD.) 2023; 14:jkad246. [PMID: 37976215 PMCID: PMC10755180 DOI: 10.1093/g3journal/jkad246] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Accepted: 10/15/2023] [Indexed: 11/19/2023]
Abstract
Genome-wide association studies (GWAS) have been widely used to identify genetic variation associated with complex traits. Despite its success and popularity, the traditional GWAS approach comes with a variety of limitations. For this reason, newer methods for GWAS have been developed, including the use of pan-genomes instead of a reference genome and the utilization of markers beyond single-nucleotide polymorphisms, such as structural variations and k-mers. The k-mers-based GWAS approach has especially gained attention from researchers in recent years. However, these new methodologies can be complicated and challenging to implement. Here, we present kGWASflow, a modular, user-friendly, and scalable workflow to perform GWAS using k-mers. We adopted an existing kmersGWAS method into an easier and more accessible workflow using management tools like Snakemake and Conda and eliminated the challenges caused by missing dependencies and version conflicts. kGWASflow increases the reproducibility of the kmersGWAS method by automating each step with Snakemake and using containerization tools like Docker. The workflow encompasses supplemental components such as quality control, read-trimming procedures, and generating summary statistics. kGWASflow also offers post-GWAS analysis options to identify the genomic location and context of trait-associated k-mers. kGWASflow can be applied to any organism and requires minimal programming skills. kGWASflow is freely available on GitHub (https://github.com/akcorut/kGWASflow) and Bioconda (https://anaconda.org/bioconda/kgwasflow).
Collapse
Affiliation(s)
- Adnan Kivanc Corut
- Institute of Bioinformatics, University of Georgia, Athens, GA 30602, USA
| | - Jason G Wallace
- Institute of Bioinformatics, University of Georgia, Athens, GA 30602, USA
- Institute of Plant Breeding, Genetics, and Genomics, University of Georgia, Athens, GA 30602, USA
- Department of Crop and Soil Sciences, University of Georgia, Athens, GA 30602, USA
| |
Collapse
|
3
|
Filippenkov IB, Khrunin AV, Mozgovoy IV, Dergunova LV, Limborska SA. Are Ischemic Stroke and Alzheimer's Disease Genetically Consecutive Pathologies? Biomedicines 2023; 11:2727. [PMID: 37893101 PMCID: PMC10604604 DOI: 10.3390/biomedicines11102727] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Revised: 10/01/2023] [Accepted: 10/03/2023] [Indexed: 10/29/2023] Open
Abstract
Complex diseases that affect the functioning of the central nervous system pose a major problem for modern society. Among these, ischemic stroke (IS) holds a special place as one of the most common causes of disability and mortality worldwide. Furthermore, Alzheimer's disease (AD) ranks first among neurodegenerative diseases, drastically reducing brain activity and overall life quality and duration. Recent studies have shown that AD and IS share several common risk and pathogenic factors, such as an overlapping genomic architecture and molecular signature. In this review, we will summarize the genomics and RNA biology studies of IS and AD, discussing the interconnected nature of these pathologies. Additionally, we highlight specific genomic points and RNA molecules that can serve as potential tools in predicting the risks of diseases and developing effective therapies in the future.
Collapse
Affiliation(s)
| | | | | | | | - Svetlana A. Limborska
- Laboratory of Human Molecular Genetics, National Research Center “Kurchatov Institute”, Kurchatov Sq. 2, 123182 Moscow, Russia (A.V.K.); (I.V.M.); (L.V.D.)
| |
Collapse
|
4
|
Afrasiabi A, Ahlenstiel C, Swaminathan S, Parnell GP. The interaction between Epstein-Barr virus and multiple sclerosis genetic risk loci: insights into disease pathogenesis and therapeutic opportunities. Clin Transl Immunology 2023; 12:e1454. [PMID: 37337612 PMCID: PMC10276892 DOI: 10.1002/cti2.1454] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2023] [Revised: 05/30/2023] [Accepted: 06/05/2023] [Indexed: 06/21/2023] Open
Abstract
Multiple sclerosis (MS) is a chronic neurodegenerative autoimmune disease, characterised by the demyelination of neurons in the central nervous system. Whilst it is unclear what precisely leads to MS, it is believed that genetic predisposition combined with environmental factors plays a pivotal role. It is estimated that close to half the disease risk is determined by genetic factors. However, the risk of developing MS cannot be attributed to genetic factors alone, and environmental factors are likely to play a significant role by themselves or in concert with host genetics. Epstein-Barr virus (EBV) infection is the strongest known environmental risk factor for MS. There has been increasing evidence that leaves little doubt that EBV is necessary, but not sufficient, for developing MS. One plausible explanation is EBV may alter the host immune response in the presence of MS risk alleles and this contributes to the pathogenesis of MS. In this review, we discuss recent findings regarding how EBV infection may contribute to MS pathogenesis via interactions with genetic risk loci and discuss possible therapeutic interventions.
Collapse
Affiliation(s)
- Ali Afrasiabi
- EBV Molecular Lab, Centre for Immunology and Allergy Research, Westmead Institute for Medical ResearchUniversity of SydneySydneyNSWAustralia
- The Graduate School of Biomedical EngineeringUniversity of New South WalesSydneyNSWAustralia
| | - Chantelle Ahlenstiel
- Kirby InstituteUniversity of New South WalesSydneyNSWAustralia
- RNA InstituteUniversity of New South WalesSydneyNSWAustralia
| | - Sanjay Swaminathan
- EBV Molecular Lab, Centre for Immunology and Allergy Research, Westmead Institute for Medical ResearchUniversity of SydneySydneyNSWAustralia
- Department of MedicineWestern Sydney UniversitySydneyNSWAustralia
| | - Grant P Parnell
- EBV Molecular Lab, Centre for Immunology and Allergy Research, Westmead Institute for Medical ResearchUniversity of SydneySydneyNSWAustralia
- Biomedical Informatics and Digital Health, School of Medical Sciences, Faculty of Medicine and HealthThe University of SydneySydneyNSWAustralia
| |
Collapse
|
5
|
Yan J, Wang X. Machine learning bridges omics sciences and plant breeding. TRENDS IN PLANT SCIENCE 2023; 28:199-210. [PMID: 36153276 DOI: 10.1016/j.tplants.2022.08.018] [Citation(s) in RCA: 21] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Revised: 08/15/2022] [Accepted: 08/23/2022] [Indexed: 06/16/2023]
Abstract
Some of the biological knowledge obtained from fundamental research will be implemented in applied plant breeding. To bridge basic research and breeding practice, machine learning (ML) holds great promise to translate biological knowledge and omics data into precision-designed plant breeding. Here, we review ML for multi-omics analysis in plants, including data dimensionality reduction, inference of gene-regulation networks, and gene discovery and prioritization. These applications will facilitate understanding trait regulation mechanisms and identifying target genes potentially applicable to knowledge-driven molecular design breeding. We also highlight applications of deep learning in plant phenomics and ML in genomic selection-assisted breeding, such as various ML algorithms that model the correlations among genotypes (genes), phenotypes (traits), and environments, to ultimately achieve data-driven genomic design breeding.
Collapse
Affiliation(s)
- Jun Yan
- National Maize Improvement Center, College of Agronomy and Biotechnology, China Agricultural University, Beijing 100094, China; Frontiers Science Center for Molecular Design Breeding, China Agricultural University, Beijing 100094, China
| | - Xiangfeng Wang
- National Maize Improvement Center, College of Agronomy and Biotechnology, China Agricultural University, Beijing 100094, China; Frontiers Science Center for Molecular Design Breeding, China Agricultural University, Beijing 100094, China.
| |
Collapse
|
6
|
Tuo S, Li C, Liu F, Li A, He L, Geem ZW, Shang J, Liu H, Zhu Y, Feng Z, Chen T. MTHSA-DHEI: multitasking harmony search algorithm for detecting high-order SNP epistatic interactions. COMPLEX INTELL SYST 2022. [DOI: 10.1007/s40747-022-00813-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
AbstractGenome-wide association studies have succeeded in identifying genetic variants associated with complex diseases, but the findings have not been well interpreted biologically. Although it is widely accepted that epistatic interactions of high-order single nucleotide polymorphisms (SNPs) [(1) Single nucleotide polymorphisms (SNP) are mainly deoxyribonucleic acid (DNA) sequence polymorphisms caused by variants at a single nucleotide at the genome level. They are the most common type of heritable variation in humans.] are important causes of complex diseases, the combinatorial explosion of millions of SNPs and multiple tests impose a large computational burden. Moreover, it is extremely challenging to correctly distinguish high-order SNP epistatic interactions from other high-order SNP combinations due to small sample sizes. In this study, a multitasking harmony search algorithm (MTHSA-DHEI) is proposed for detecting high-order epistatic interactions [(2) In classical genetics, if genes X1 and X2 are mutated and each mutation by itself produces a unique disease status (phenotype) but the mutations together cause the same disease status as the gene X1 mutation, gene X1 is epistatic and gene X2 is hypostatic, and gene X1 has an epistatic effect (main effect) on disease status. In this work, a high-order epistatic interaction occurs when two or more SNP loci have a joint influence on disease status.], with the goal of simultaneously detecting multiple types of high-order (k1-order, k2-order, …, kn-order) SNP epistatic interactions. Unified coding is adopted for multiple tasks, and four complementary association evaluation functions are employed to improve the capability of discriminating the high-order SNP epistatic interactions. We compare the proposed MTHSA-DHEI method with four excellent methods for detecting high-order SNP interactions for 8 high-orderepistatic interaction models with no marginal effect (EINMEs) and 12 epistatic interaction models with marginal effects (EIMEs) (*) and implement the MTHSA-DHEI algorithm with a real dataset: age-related macular degeneration (AMD). The experimental results indicate that MTHSA-DHEI has power and an F1-score exceeding 90% for all EIMEs and five EINMEs and reduces the computational time by more than 90%. It can efficiently perform multiple high-order detection tasks for high-order epistatic interactions and improve the discrimination ability for diverse epistasis models.
Collapse
|
7
|
Chen Y, Li Z, Li Z. Prediction of Plant Resistance Proteins Based on Pairwise Energy Content and Stacking Framework. FRONTIERS IN PLANT SCIENCE 2022; 13:912599. [PMID: 35712582 PMCID: PMC9194944 DOI: 10.3389/fpls.2022.912599] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/05/2022] [Accepted: 05/10/2022] [Indexed: 06/15/2023]
Abstract
Plant resistance proteins (R proteins) recognize effector proteins secreted by pathogenic microorganisms and trigger an immune response against pathogenic microbial infestation. Accurate identification of plant R proteins is an important research topic in plant pathology. Plant R protein prediction has achieved many research results. Recently, some machine learning-based methods have emerged to identify plant R proteins. Still, most of them only rely on protein sequence features, which ignore inter-amino acid features, thus limiting the further improvement of plant R protein prediction performance. In this manuscript, we propose a method called StackRPred to predict plant R proteins. Specifically, the StackRPred first obtains plant R protein feature information from the pairwise energy content of residues; then, the obtained feature information is fed into the stacking framework for training to construct a prediction model for plant R proteins. The results of both the five-fold cross-validation and independent test validation show that our proposed method outperforms other state-of-the-art methods, indicating that StackRPred is an effective tool for predicting plant R proteins. It is expected to bring some favorable contribution to the study of plant R proteins.
Collapse
Affiliation(s)
- Yifan Chen
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Zejun Li
- School of Computer Science and Technology, Hunan Institute of Technology, Hengyang, China
| | - Zhiyong Li
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| |
Collapse
|
8
|
Thong EP, Ghelani DP, Manoleehakul P, Yesmin A, Slater K, Taylor R, Collins C, Hutchesson M, Lim SS, Teede HJ, Harrison CL, Moran L, Enticott J. Optimising Cardiometabolic Risk Factors in Pregnancy: A Review of Risk Prediction Models Targeting Gestational Diabetes and Hypertensive Disorders. J Cardiovasc Dev Dis 2022; 9:jcdd9020055. [PMID: 35200708 PMCID: PMC8874392 DOI: 10.3390/jcdd9020055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2021] [Revised: 01/30/2022] [Accepted: 02/07/2022] [Indexed: 11/16/2022] Open
Abstract
Cardiovascular disease, especially coronary heart disease and cerebrovascular disease, is a leading cause of mortality and morbidity in women globally. The development of cardiometabolic conditions in pregnancy, such as gestational diabetes mellitus and hypertensive disorders of pregnancy, portend an increased risk of future cardiovascular disease in women. Pregnancy therefore represents a unique opportunity to detect and manage risk factors, prior to the development of cardiovascular sequelae. Risk prediction models for gestational diabetes mellitus and hypertensive disorders of pregnancy can help identify at-risk women in early pregnancy, allowing timely intervention to mitigate both short- and long-term adverse outcomes. In this narrative review, we outline the shared pathophysiological pathways for gestational diabetes mellitus and hypertensive disorders of pregnancy, summarise contemporary risk prediction models and candidate predictors for these conditions, and discuss the utility of these models in clinical application.
Collapse
Affiliation(s)
- Eleanor P. Thong
- Monash Centre for Health Research and Implementation, School of Public Health and Preventive Medicine, Monash University, Clayton, VIC 3168, Australia; (E.P.T.); (D.P.G.); (S.S.L.); (H.J.T.); (C.L.H.); (L.M.)
| | - Drishti P. Ghelani
- Monash Centre for Health Research and Implementation, School of Public Health and Preventive Medicine, Monash University, Clayton, VIC 3168, Australia; (E.P.T.); (D.P.G.); (S.S.L.); (H.J.T.); (C.L.H.); (L.M.)
| | - Pamada Manoleehakul
- Faculty of Medicine, Nursing and Health Sciences, Monash University, Clayton, VIC 3168, Australia; (P.M.); (A.Y.)
| | - Anika Yesmin
- Faculty of Medicine, Nursing and Health Sciences, Monash University, Clayton, VIC 3168, Australia; (P.M.); (A.Y.)
| | - Kaylee Slater
- School of Health Sciences, College of Health, Medicine and Wellbeing, and Priority Research Centre for Physical Activity and Nutrition, University of Newcastle, Callaghan, NSW 2308, Australia; (K.S.); (R.T.); (C.C.); (M.H.)
| | - Rachael Taylor
- School of Health Sciences, College of Health, Medicine and Wellbeing, and Priority Research Centre for Physical Activity and Nutrition, University of Newcastle, Callaghan, NSW 2308, Australia; (K.S.); (R.T.); (C.C.); (M.H.)
| | - Clare Collins
- School of Health Sciences, College of Health, Medicine and Wellbeing, and Priority Research Centre for Physical Activity and Nutrition, University of Newcastle, Callaghan, NSW 2308, Australia; (K.S.); (R.T.); (C.C.); (M.H.)
| | - Melinda Hutchesson
- School of Health Sciences, College of Health, Medicine and Wellbeing, and Priority Research Centre for Physical Activity and Nutrition, University of Newcastle, Callaghan, NSW 2308, Australia; (K.S.); (R.T.); (C.C.); (M.H.)
| | - Siew S. Lim
- Monash Centre for Health Research and Implementation, School of Public Health and Preventive Medicine, Monash University, Clayton, VIC 3168, Australia; (E.P.T.); (D.P.G.); (S.S.L.); (H.J.T.); (C.L.H.); (L.M.)
| | - Helena J. Teede
- Monash Centre for Health Research and Implementation, School of Public Health and Preventive Medicine, Monash University, Clayton, VIC 3168, Australia; (E.P.T.); (D.P.G.); (S.S.L.); (H.J.T.); (C.L.H.); (L.M.)
| | - Cheryce L. Harrison
- Monash Centre for Health Research and Implementation, School of Public Health and Preventive Medicine, Monash University, Clayton, VIC 3168, Australia; (E.P.T.); (D.P.G.); (S.S.L.); (H.J.T.); (C.L.H.); (L.M.)
| | - Lisa Moran
- Monash Centre for Health Research and Implementation, School of Public Health and Preventive Medicine, Monash University, Clayton, VIC 3168, Australia; (E.P.T.); (D.P.G.); (S.S.L.); (H.J.T.); (C.L.H.); (L.M.)
| | - Joanne Enticott
- Monash Centre for Health Research and Implementation, School of Public Health and Preventive Medicine, Monash University, Clayton, VIC 3168, Australia; (E.P.T.); (D.P.G.); (S.S.L.); (H.J.T.); (C.L.H.); (L.M.)
- Correspondence:
| |
Collapse
|
9
|
Lim AJW, Lim LJ, Ooi BNS, Koh ET, Tan JWL, Chong SS, Khor CC, Tucker-Kellogg L, Leong KP, Lee CG. Functional coding haplotypes and machine-learning feature elimination identifies predictors of Methotrexate Response in Rheumatoid Arthritis patients. EBioMedicine 2022; 75:103800. [PMID: 35022146 PMCID: PMC8808170 DOI: 10.1016/j.ebiom.2021.103800] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2021] [Revised: 12/19/2021] [Accepted: 12/20/2021] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND Major challenges in large scale genetic association studies include not only the identification of causative single nucleotide polymorphisms (SNPs), but also accounting for SNP-SNP interactions. This study thus proposes a novel feature engineering approach integrating potentially functional coding haplotypes (pfcHap) with machine-learning (ML) feature selection to identify biologically meaningful, possibly causative genetic factors, that take into consideration potential SNP-SNP interactions within the pfcHap, to best predict for methotrexate (MTX) response in rheumatoid arthritis (RA) patients. METHODS Exome sequencing from 349 RA patients were analysed, of which they were split into training and unseen test set. Inferred pfcHaps were combined with 30 non-genetic features to undergo ML recursive feature elimination with cross-validation using the training set. Predictive capacity and robustness of the selected features were assessed using six popular machine learning models through a train set cross-validation and evaluated in an unseen test set. FINDINGS Significantly, 100 features (95 pfcHaps, 5 non-genetic factors) were identified to have good predictive performance (AUC: 0.776-0.828; Sensitivity: 0.656-0.813; Specificity: 0.684-0.868) across all six ML models in an unseen test dataset for the prediction of MTX response in RA patients. INTERPRETATION Majority of the predictive pfcHap SNPs were predicted to be potentially functional and some of the genes in which the pfcHap resides in were identified to be associated with previously reported MTX/RA pathways. FUNDING Singapore Ministry of Health's National Medical Research Council (NMRC) [NMRC/CBRG/0095/2015; CG12Aug17; CGAug16M012; NMRC/CG/017/2013]; National Cancer Center Research Fund and block funding Duke-NUS Medical School.; Singapore Ministry of Education Academic Research Fund Tier 2 grant MOE2019-T2-1-138.
Collapse
Affiliation(s)
- Ashley J W Lim
- Dept of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| | - Lee Jin Lim
- Dept of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| | - Brandon N S Ooi
- Dept of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| | - Ee Tzun Koh
- Department of Rheumatology, Allergy and Immunology, Tan Tock Seng Hospital, Singapore
| | - Justina Wei Lynn Tan
- Department of Rheumatology, Allergy and Immunology, Tan Tock Seng Hospital, Singapore
| | - Samuel S Chong
- Dept of Pediatrics, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| | - Chiea Chuen Khor
- Division of Human Genetics, Genome Institute of Singapore, Singapore
| | - Lisa Tucker-Kellogg
- Centre for Computational Biology, and Cancer and Stem Cell Biology, Duke-NUS Medical School, Singapore
| | - Khai Pang Leong
- Department of Rheumatology, Allergy and Immunology, Tan Tock Seng Hospital, Singapore; Clinical Research & Innovation Office, Tan Tock Seng Hospital, Singapore.
| | - Caroline G Lee
- Dept of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Div of Cellular & Molecular Research, Humphrey Oei Institute of Cancer Research, National Cancer Centre Singapore, Singapore; Duke-NUS Medical School, Singapore; NUS Graduate School, National University of Singapore, Singapore.
| |
Collapse
|
10
|
Yoosefzadeh-Najafabadi M, Eskandari M, Belzile F, Torkamaneh D. Genome-Wide Association Study Statistical Models: A Review. Methods Mol Biol 2022; 2481:43-62. [PMID: 35641758 DOI: 10.1007/978-1-0716-2237-7_4] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Statistical models are at the core of the genome-wide association study (GWAS). In this chapter, we provide an overview of single- and multilocus statistical models, Bayesian, and machine learning approaches for association studies in plants. These models are discussed based on their basic methodology, cofactors adjustment accounted for, statistical power and computational efficiency. New statistical models and machine learning algorithms are both showing improved performance in detecting missed signals, rare mutations and prioritizing causal genetic variants; nevertheless, further optimization and validation studies are required to maximize the power of GWAS.
Collapse
Affiliation(s)
| | - Milad Eskandari
- Department of Plant Agriculture, University of Guelph, Guelph, ON, Canada
| | - François Belzile
- Département de Phytologie, Université Laval, Quebec City, QC, Canada
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Quebec City, QC, Canada
| | - Davoud Torkamaneh
- Département de Phytologie, Université Laval, Quebec City, QC, Canada.
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Quebec City, QC, Canada.
| |
Collapse
|
11
|
Yoosefzadeh-Najafabadi M, Torabi S, Tulpan D, Rajcan I, Eskandari M. Genome-Wide Association Studies of Soybean Yield-Related Hyperspectral Reflectance Bands Using Machine Learning-Mediated Data Integration Methods. FRONTIERS IN PLANT SCIENCE 2021; 12:777028. [PMID: 34880894 PMCID: PMC8647880 DOI: 10.3389/fpls.2021.777028] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Accepted: 10/18/2021] [Indexed: 05/12/2023]
Abstract
In conjunction with big data analysis methods, plant omics technologies have provided scientists with cost-effective and promising tools for discovering genetic architectures of complex agronomic traits using large breeding populations. In recent years, there has been significant progress in plant phenomics and genomics approaches for generating reliable large datasets. However, selecting an appropriate data integration and analysis method to improve the efficiency of phenome-phenome and phenome-genome association studies is still a bottleneck. This study proposes a hyperspectral wide association study (HypWAS) approach as a phenome-phenome association analysis through a hierarchical data integration strategy to estimate the prediction power of hyperspectral reflectance bands in predicting soybean seed yield. Using HypWAS, five important hyperspectral reflectance bands in visible, red-edge, and near-infrared regions were identified significantly associated with seed yield. The phenome-genome association analysis of each tested hyperspectral reflectance band was performed using two conventional genome-wide association studies (GWAS) methods and a machine learning mediated GWAS based on the support vector regression (SVR) method. Using SVR-mediated GWAS, more relevant QTL with the physiological background of the tested hyperspectral reflectance bands were detected, supported by the functional annotation of candidate gene analyses. The results of this study have indicated the advantages of using hierarchical data integration strategy and advanced mathematical methods coupled with phenome-phenome and phenome-genome association analyses for a better understanding of the biology and genetic backgrounds of hyperspectral reflectance bands affecting soybean yield formation. The identified yield-related hyperspectral reflectance bands using HypWAS can be used as indirect selection criteria for selecting superior genotypes with improved yield genetic gains in large breeding populations.
Collapse
Affiliation(s)
| | - Sepideh Torabi
- Department of Plant Agriculture, University of Guelph, Guelph, ON, Canada
| | - Dan Tulpan
- Department of Animal Biosciences, University of Guelph, Guelph, ON, Canada
| | - Istvan Rajcan
- Department of Plant Agriculture, University of Guelph, Guelph, ON, Canada
| | - Milad Eskandari
- Department of Plant Agriculture, University of Guelph, Guelph, ON, Canada
| |
Collapse
|
12
|
Kekäläinen J. Genetic incompatibility of the reproductive partners: an evolutionary perspective on infertility. Hum Reprod 2021; 36:3028-3035. [PMID: 34580729 PMCID: PMC8600657 DOI: 10.1093/humrep/deab221] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Revised: 08/22/2021] [Indexed: 12/18/2022] Open
Abstract
In natural fertilisation, the female reproductive tract allows only a strictly selected sperm subpopulation to proceed in the vicinity of an unfertilised oocyte. Female-mediated sperm selection (also known as cryptic female choice (CFC)) is far from a random process, which frequently biases paternity towards particular males over others. Earlier studies have shown that CFC is a ubiquitous phenomenon in the animal kingdom and often promotes assortative fertilisation between genetically compatible mates. Here, I demonstrate that CFC for genetic compatibility likely also occurs in humans and is mediated by a complex network of interacting male and female genes. I also show that the relative contribution of genetic compatibility (i.e. the male-female interaction effect) to reproductive success is generally high and frequently outweighs the effects of individual males and females. Together, these facts indicate that, along with male- and female-dependent pathological factors, reproductive failure can also result from gamete-level incompatibility of the reproductive partners. Therefore, I argue that a deeper understanding of these evolutionary mechanisms of sperm selection can pave the way towards a more inclusive view of infertility and open novel possibilities for the development of more personalised infertility diagnostics and treatments.
Collapse
Affiliation(s)
- Jukka Kekäläinen
- Department of Environmental and Biological Sciences, University of Eastern Finland, Joensuu, Finland
| |
Collapse
|
13
|
Wu Y, Guo Y, Ma J, Sa Y, Li Q, Zhang N. Research Progress of Gliomas in Machine Learning. Cells 2021; 10:cells10113169. [PMID: 34831392 PMCID: PMC8622230 DOI: 10.3390/cells10113169] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2021] [Revised: 11/04/2021] [Accepted: 11/05/2021] [Indexed: 12/29/2022] Open
Abstract
In the field of gliomas research, the broad availability of genetic and image information originated by computer technologies and the booming of biomedical publications has led to the advent of the big-data era. Machine learning methods were applied as possible approaches to speed up the data mining processes. In this article, we reviewed the present situation and future orientations of machine learning application in gliomas within the context of workflows to integrate analysis for precision cancer care. Publicly available tools or algorithms for key machine learning technologies in the literature mining for glioma clinical research were reviewed and compared. Further, the existing solutions of machine learning methods and their limitations in glioma prediction and diagnostics, such as overfitting and class imbalanced, were critically analyzed.
Collapse
|