1
|
Moon J, Maqsood M, So D, Baik SW, Rho S, Nam Y. Advancing ensemble learning techniques for residential building electricity consumption forecasting: Insight from explainable artificial intelligence. PLoS One 2024; 19:e0307654. [PMID: 39541326 PMCID: PMC11563398 DOI: 10.1371/journal.pone.0307654] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Accepted: 07/09/2024] [Indexed: 11/16/2024] Open
Abstract
Accurate electricity consumption forecasting in residential buildings has a direct impact on energy efficiency and cost management, making it a critical component of sustainable energy practices. Decision tree-based ensemble learning techniques are particularly effective for this task due to their ability to process complex datasets with high accuracy. Furthermore, incorporating explainable artificial intelligence into these predictions provides clarity and interpretability, allowing energy managers and homeowners to make informed decisions that optimize usage and reduce costs. This study comparatively analyzes decision tree-ensemble learning techniques augmented with explainable artificial intelligence for transparency and interpretability in residential building energy consumption forecasting. This approach employs the University Residential Complex and Appliances Energy Prediction datasets, data preprocessing, and decision-tree bagging and boosting methods. The superior model is evaluated using the Shapley additive explanations method within the explainable artificial intelligence framework, explaining the influence of input variables and decision-making processes. The analysis reveals the significant influence of the temperature-humidity index and wind chill temperature on short-term load forecasting, transcending traditional parameters, such as temperature, humidity, and wind speed. The complete study and source code have been made available on our GitHub repository at https://github.com/sodayeong for the purpose of enhancing precision and interpretability in energy system management, thereby promoting transparency and enabling replication.
Collapse
Affiliation(s)
- Jihoon Moon
- Department of AI and Big Data, Soonchunhyang University, Asan, Republic of Korea
- Department of ICT Convergence, Soonchunhyang University, Asan, Republic of Korea
| | - Muazzam Maqsood
- Department of Computer Science, COMSATS University Islamabad, Attock Campus, Attock, Pakistan
| | - Dayeong So
- Department of ICT Convergence, Soonchunhyang University, Asan, Republic of Korea
| | | | - Seungmin Rho
- Department of Industrial Security, Chung-Ang University, Seoul, Republic of Korea
| | - Yunyoung Nam
- Department of ICT Convergence, Soonchunhyang University, Asan, Republic of Korea
- Department of Computer Science and Engineering, Soonchunhyang University, Asan, Republic of Korea
| |
Collapse
|
2
|
Hay EH. Machine Learning for the Genomic Prediction of Growth Traits in a Composite Beef Cattle Population. Animals (Basel) 2024; 14:3014. [PMID: 39457945 PMCID: PMC11505319 DOI: 10.3390/ani14203014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2024] [Revised: 10/14/2024] [Accepted: 10/17/2024] [Indexed: 10/28/2024] Open
Abstract
The adoption of genomic selection is prevalent across various plant and livestock species, yet existing models for predicting genomic breeding values often remain suboptimal. Machine learning models present a promising avenue to enhance prediction accuracy due to their ability to accommodate both linear and non-linear relationships. In this study, we evaluated four machine learning models-Random Forest, Support Vector Machine, Convolutional Neural Networks, and Multi-Layer Perceptrons-for predicting genomic values related to birth weight (BW), weaning weight (WW), and yearling weight (YW), and compared them with other conventional models-GBLUP (Genomic Best Linear Unbiased Prediction), Bayes A, and Bayes B. The results demonstrated that the GBLUP model achieved the highest prediction accuracy for both BW and YW, whereas the Random Forest model exhibited a superior prediction accuracy for WW. Furthermore, GBLUP outperformed the other models in terms of model fit, as evidenced by the lower mean square error values and regression coefficients of the corrected phenotypes on predicted values. Overall, the GBLUP model delivered a superior prediction accuracy and model fit compared to the machine learning models tested.
Collapse
Affiliation(s)
- El Hamidi Hay
- USDA Agricultural Research Service, Fort Keogh Livestock and Range Research Laboratory, Miles City, MT 59301, USA
| |
Collapse
|
3
|
Ghavi Hossein-Zadeh N. An overview of recent technological developments in bovine genomics. Vet Anim Sci 2024; 25:100382. [PMID: 39166173 PMCID: PMC11334705 DOI: 10.1016/j.vas.2024.100382] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/22/2024] Open
Abstract
Cattle are regarded as highly valuable animals because of their milk, beef, dung, fur, and ability to draft. The scientific community has tried a number of strategies to improve the genetic makeup of bovine germplasm. To ensure higher returns for the dairy and beef industries, researchers face their greatest challenge in improving commercially important traits. One of the biggest developments in the last few decades in the creation of instruments for cattle genetic improvement is the discovery of the genome. Breeding livestock is being revolutionized by genomic selection made possible by the availability of medium- and high-density single nucleotide polymorphism (SNP) arrays coupled with sophisticated statistical techniques. It is becoming easier to access high-dimensional genomic data in cattle. Continuously declining genotyping costs and an increase in services that use genomic data to increase return on investment have both made a significant contribution to this. The field of genomics has come a long way thanks to groundbreaking discoveries such as radiation-hybrid mapping, in situ hybridization, synteny analysis, somatic cell genetics, cytogenetic maps, molecular markers, association studies for quantitative trait loci, high-throughput SNP genotyping, whole-genome shotgun sequencing to whole-genome mapping, and genome editing. These advancements have had a significant positive impact on the field of cattle genomics. This manuscript aimed to review recent advances in genomic technologies for cattle breeding and future prospects in this field.
Collapse
Affiliation(s)
- Navid Ghavi Hossein-Zadeh
- Department of Animal Science, Faculty of Agricultural Sciences, University of Guilan, Rasht, 41635-1314, Iran
| |
Collapse
|
4
|
Haque MA, Iqbal A, Alam MZ, Lee YM, Ha JJ, Kim JJ. Estimation of genetic correlations and genomic prediction accuracy for reproductive and carcass traits in Hanwoo cows. JOURNAL OF ANIMAL SCIENCE AND TECHNOLOGY 2024; 66:682-701. [PMID: 39165742 PMCID: PMC11331368 DOI: 10.5187/jast.2024.e75] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Revised: 07/04/2023] [Accepted: 07/18/2023] [Indexed: 08/22/2024]
Abstract
This study estimated the heritabilities (h2) and genetic and phenotypic correlations between reproductive traits, including calving interval (CI), age at first calving (AFC), gestation length (GL), number of artificial inseminations per conception (NAIPC), and carcass traits, including carcass weight (CWT), eye muscle area (EMA), backfat thickness (BF), and marbling score (MS) in Korean Hanwoo cows. In addition, the accuracy of genomic predictions of breeding values was evaluated by applying the genomic best linear unbiased prediction (GBLUP) and the weighted GBLUP (WGBLUP) method. The phenotypic data for reproductive and carcass traits were collected from 1,544 Hanwoo cows, and all animals were genotyped using Illumina Bovine 50K single nucleotide polymorphism (SNP) chip. The genetic parameters were estimated using a multi-trait animal model using the MTG2 program. The estimated h2 for CI, AFC, GL, NAIPC, CWT, EMA, BF, and MS were 0.10, 0.13, 0.17, 0.11, 0.37, 0.35, 0.27, and 0.45, respectively, according to the GBLUP model. The GBLUP accuracy estimates ranged from 0.51 to 0.74, while the WGBLUP accuracy estimates for the traits under study ranged from 0.51 to 0.79. Strong and favorable genetic correlations were observed between GL and NAIPC (0.61), CWT and EMA (0.60), NAIPC and CWT (0.49), AFC and CWT (0.48), CI and GL (0.36), BF and MS (0.35), NAIPC and EMA (0.35), CI and BF (0.30), EMA and MS (0.28), CI and AFC (0.26), AFC and EMA (0.24), and AFC and BF (0.21). The present study identified low to moderate positive genetic correlations between reproductive and CWT traits, suggesting that a heavier body weight may lead to a longer CI, AFC, GL, and NAIPC. The moderately positive genetic correlation between CWT and AFC, and NAIPC, with a phenotypic correlation of nearly zero, suggesting that the genotype-environment interactions are more likely to be responsible for the phenotypic manifestation of these traits. As a result, the inclusion of these traits by breeders as selection criteria may present a good opportunity for developing a selection index to increase the response to the selection and identification of candidate animals, which can result in significantly increased profitability of production systems.
Collapse
Affiliation(s)
- Md Azizul Haque
- Department of Biotechnology, Yeungnam
University, Gyeongsan 38541, Korea
| | - Asif Iqbal
- Department of Biotechnology, Yeungnam
University, Gyeongsan 38541, Korea
| | | | - Yun-Mi Lee
- Department of Biotechnology, Yeungnam
University, Gyeongsan 38541, Korea
| | - Jae-Jung Ha
- Gyeongbuk Livestock Research
Institute, Yeongju 36052, Korea
| | - Jong-Joo Kim
- Department of Biotechnology, Yeungnam
University, Gyeongsan 38541, Korea
| |
Collapse
|
5
|
Mora M, González P, Quevedo JR, Montañés E, Tusell L, Bergsma R, Piles M. Impact of multi-output and stacking methods on feed efficiency prediction from genotype using machine learning algorithms. J Anim Breed Genet 2023; 140:638-652. [PMID: 37403756 DOI: 10.1111/jbg.12815] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Revised: 05/23/2023] [Accepted: 06/23/2023] [Indexed: 07/06/2023]
Abstract
Feeding represents the largest economic cost in meat production; therefore, selection to improve traits related to feed efficiency is a goal in most livestock breeding programs. Residual feed intake (RFI), that is, the difference between the actual and the expected feed intake based on animal's requirements, has been used as the selection criteria to improve feed efficiency since it was proposed by Kotch in 1963. In growing pigs, it is computed as the residual of the multiple regression model of daily feed intake (DFI), on average daily gain (ADG), backfat thickness (BFT), and metabolic body weight (MW). Recently, prediction using single-output machine learning algorithms and information from SNPs as predictor variables have been proposed for genomic selection in growing pigs, but like in other species, the prediction quality achieved for RFI has been generally poor. However, it has been suggested that it could be improved through multi-output or stacking methods. For this purpose, four strategies were implemented to predict RFI. Two of them correspond to the computation of RFI in an indirect way using the predicted values of its components obtained from (i) individual (multiple single-output strategy) or (ii) simultaneous predictions (multi-output strategy). The other two correspond to the direct prediction of RFI using (iii) the individual predictions of its components as predictor variables jointly with the genotype (stacking strategy), or (iv) using only the genotypes as predictors of RFI (single-output strategy). The single-output strategy was considered the benchmark. This research aimed to test the former three hypotheses using data recorded from 5828 growing pigs and 45,610 SNPs. For all the strategies two different learning methods were fitted: random forest (RF) and support vector regression (SVR). A nested cross-validation (CV) with an outer 10-folds CV and an inner threefold CV for hyperparameter tuning was implemented to test all strategies. This scheme was repeated using as predictor variables different subsets with an increasing number (from 200 to 3000) of the most informative SNPs identified with RF. Results showed that the highest prediction performance was achieved with 1000 SNPs, although the stability of feature selection was poor (0.13 points out of 1). For all SNP subsets, the benchmark showed the best prediction performance. Using the RF as a learner and the 1000 most informative SNPs as predictors, the mean (SD) of the 10 values obtained in the test sets were: 0.23 (0.04) for the Spearman correlation, 0.83 (0.04) for the zero-one loss, and 0.33 (0.03) for the rank distance loss. We conclude that the information on predicted components of RFI (DFI, ADG, MW, and BFT) does not contribute to improve the quality of the prediction of this trait in relation to the one obtained with the single-output strategy.
Collapse
Affiliation(s)
- Mónica Mora
- Departamento de Ciencia Animal, Universidad Politècnica de València, Valencia, Spain
- Animal Breeding and Genetics, Institute of Agrifood Research and Technology (IRTA), Barcelona, Spain
| | - Pablo González
- Artificial Intelligence Centre, University of Oviedo, Gijón, Spain
| | | | - Elena Montañés
- Artificial Intelligence Centre, University of Oviedo, Gijón, Spain
| | - Llibertat Tusell
- Animal Breeding and Genetics, Institute of Agrifood Research and Technology (IRTA), Barcelona, Spain
| | - Rob Bergsma
- Topigs Norsvin Research Center, Beuningen, Netherlands
| | - Miriam Piles
- Animal Breeding and Genetics, Institute of Agrifood Research and Technology (IRTA), Barcelona, Spain
| |
Collapse
|
6
|
Chafai N, Hayah I, Houaga I, Badaoui B. A review of machine learning models applied to genomic prediction in animal breeding. Front Genet 2023; 14:1150596. [PMID: 37745853 PMCID: PMC10516561 DOI: 10.3389/fgene.2023.1150596] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Accepted: 08/22/2023] [Indexed: 09/26/2023] Open
Abstract
The advent of modern genotyping technologies has revolutionized genomic selection in animal breeding. Large marker datasets have shown several drawbacks for traditional genomic prediction methods in terms of flexibility, accuracy, and computational power. Recently, the application of machine learning models in animal breeding has gained a lot of interest due to their tremendous flexibility and their ability to capture patterns in large noisy datasets. Here, we present a general overview of a handful of machine learning algorithms and their application in genomic prediction to provide a meta-picture of their performance in genomic estimated breeding values estimation, genotype imputation, and feature selection. Finally, we discuss a potential adoption of machine learning models in genomic prediction in developing countries. The results of the reviewed studies showed that machine learning models have indeed performed well in fitting large noisy data sets and modeling minor nonadditive effects in some of the studies. However, sometimes conventional methods outperformed machine learning models, which confirms that there's no universal method for genomic prediction. In summary, machine learning models have great potential for extracting patterns from single nucleotide polymorphism datasets. Nonetheless, the level of their adoption in animal breeding is still low due to data limitations, complex genetic interactions, a lack of standardization and reproducibility, and the lack of interpretability of machine learning models when trained with biological data. Consequently, there is no remarkable outperformance of machine learning methods compared to traditional methods in genomic prediction. Therefore, more research should be conducted to discover new insights that could enhance livestock breeding programs.
Collapse
Affiliation(s)
- Narjice Chafai
- Laboratory of Biodiversity, Ecology, and Genome, Department of Biology, Faculty of Sciences, Mohammed V University in Rabat, Rabat, Morocco
| | - Ichrak Hayah
- Laboratory of Biodiversity, Ecology, and Genome, Department of Biology, Faculty of Sciences, Mohammed V University in Rabat, Rabat, Morocco
| | - Isidore Houaga
- Centre for Tropical Livestock Genetics and Health, The Roslin Institute, Royal (Dick) School of Veterinary Medicine, The University of Edinburgh, Edinburgh, United Kingdom
- The Roslin Institute, Royal (Dick) School of Veterinary Studies, University of Edinburgh, Edinburgh, United Kingdom
| | - Bouabid Badaoui
- Laboratory of Biodiversity, Ecology, and Genome, Department of Biology, Faculty of Sciences, Mohammed V University in Rabat, Rabat, Morocco
- African Sustainable Agriculture Research Institute (ASARI), Mohammed VI Polytechnic University (UM6P), Laayoune, Morocco
| |
Collapse
|
7
|
Neshat M, Lee S, Momin MM, Truong B, van der Werf JHJ, Lee SH. An effective hyper-parameter can increase the prediction accuracy in a single-step genetic evaluation. Front Genet 2023; 14:1104906. [PMID: 37359380 PMCID: PMC10285379 DOI: 10.3389/fgene.2023.1104906] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2022] [Accepted: 05/23/2023] [Indexed: 06/28/2023] Open
Abstract
The H-matrix best linear unbiased prediction (HBLUP) method has been widely used in livestock breeding programs. It can integrate all information, including pedigree, genotypes, and phenotypes on both genotyped and non-genotyped individuals into one single evaluation that can provide reliable predictions of breeding values. The existing HBLUP method requires hyper-parameters that should be adequately optimised as otherwise the genomic prediction accuracy may decrease. In this study, we assess the performance of HBLUP using various hyper-parameters such as blending, tuning, and scale factor in simulated and real data on Hanwoo cattle. In both simulated and cattle data, we show that blending is not necessary, indicating that the prediction accuracy decreases when using a blending hyper-parameter <1. The tuning process (adjusting genomic relationships accounting for base allele frequencies) improves prediction accuracy in the simulated data, confirming previous studies, although the improvement is not statistically significant in the Hanwoo cattle data. We also demonstrate that a scale factor, α, which determines the relationship between allele frequency and per-allele effect size, can improve the HBLUP accuracy in both simulated and real data. Our findings suggest that an optimal scale factor should be considered to increase prediction accuracy, in addition to blending and tuning processes, when using HBLUP.
Collapse
Affiliation(s)
- Mehdi Neshat
- Australian Centre for Precision Health, University of South Australia, Adelaide, SA, Australia
- UniSA Allied Health and Human Performance, University of South Australia, Adelaide, SA, Australia
- South Australian Health and Medical Research Institute (SAHMRI), Adelaide, SA, Australia
| | - Soohyun Lee
- Division of Animal Breeding and Genetics, National Institute of Animal Science (NIAS), Cheonan, Republic of Korea
| | - Md. Moksedul Momin
- Australian Centre for Precision Health, University of South Australia, Adelaide, SA, Australia
- UniSA Allied Health and Human Performance, University of South Australia, Adelaide, SA, Australia
- South Australian Health and Medical Research Institute (SAHMRI), Adelaide, SA, Australia
- Department of Genetics and Animal Breeding, Faculty of Veterinary Medicine, Chattogram Veterinary and Animal Sciences University (CVASU), Chattogram, Bangladesh
| | - Buu Truong
- Australian Centre for Precision Health, University of South Australia, Adelaide, SA, Australia
- Cardiovascular Research Centre, Massachusetts General Hospital, Boston, MA, United States
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, United States
- Program in Medical and Population Genetics and the Cardiovascular Disease Initiative, Broad, Institute of Harvard and Massachusetts Institute of Technology (MIT), Cambridge, MA, United States
| | | | - S. Hong Lee
- Australian Centre for Precision Health, University of South Australia, Adelaide, SA, Australia
- UniSA Allied Health and Human Performance, University of South Australia, Adelaide, SA, Australia
- South Australian Health and Medical Research Institute (SAHMRI), Adelaide, SA, Australia
| |
Collapse
|
8
|
Perez BC, Bink MCAM, Svenson KL, Churchill GA, Calus MPL. Prediction performance of linear models and gradient boosting machine on complex phenotypes in outbred mice. G3 (BETHESDA, MD.) 2022; 12:6528848. [PMID: 35166767 PMCID: PMC8982369 DOI: 10.1093/g3journal/jkac039] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Accepted: 01/29/2022] [Indexed: 12/14/2022]
Abstract
We compared the performance of linear (GBLUP, BayesB, and elastic net) methods to a nonparametric tree-based ensemble (gradient boosting machine) method for genomic prediction of complex traits in mice. The dataset used contained genotypes for 50,112 SNP markers and phenotypes for 835 animals from 6 generations. Traits analyzed were bone mineral density, body weight at 10, 15, and 20 weeks, fat percentage, circulating cholesterol, glucose, insulin, triglycerides, and urine creatinine. The youngest generation was used as a validation subset, and predictions were based on all older generations. Model performance was evaluated by comparing predictions for animals in the validation subset against their adjusted phenotypes. Linear models outperformed gradient boosting machine for 7 out of 10 traits. For bone mineral density, cholesterol, and glucose, the gradient boosting machine model showed better prediction accuracy and lower relative root mean squared error than the linear models. Interestingly, for these 3 traits, there is evidence of a relevant portion of phenotypic variance being explained by epistatic effects. Using a subset of top markers selected from a gradient boosting machine model helped for some of the traits to improve the accuracy of prediction when these were fitted into linear and gradient boosting machine models. Our results indicate that gradient boosting machine is more strongly affected by data size and decreased connectedness between reference and validation sets than the linear models. Although the linear models outperformed gradient boosting machine for the polygenic traits, our results suggest that gradient boosting machine is a competitive method to predict complex traits with assumed epistatic effects.
Collapse
Affiliation(s)
- Bruno C Perez
- Hendrix Genetics B.V., Research and Technology Center (RTC), 5830 AC Boxmeer, The Netherlands
| | - Marco C A M Bink
- Hendrix Genetics B.V., Research and Technology Center (RTC), 5830 AC Boxmeer, The Netherlands
| | | | | | - Mario P L Calus
- Wageningen University & Research, Animal Breeding and Genomics, 6700 AH Wageningen, The Netherlands
| |
Collapse
|