1
|
Wen Y, Liu X, He F, Shi Y, Chen F, Li W, Song Y, Li L, Jiang H, Zhou L, Wu L. Machine learning prediction of stalk lignin content using Fourier transform infrared spectroscopy in large scale maize germplasm. Int J Biol Macromol 2024; 280:136140. [PMID: 39349086 DOI: 10.1016/j.ijbiomac.2024.136140] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2024] [Revised: 09/24/2024] [Accepted: 09/27/2024] [Indexed: 10/02/2024]
Abstract
Lignin has been recognized as a major factor contributing to lignocellulosic recalcitrance in biofuel production and attracted attentions as a high-value product in the biorefinery field. As the traditional wet chemical methods for detecting lignin content are labor-intensive, time-consuming and environment-toxic, it is an urgent need to develop high-throughput and environment-friendly techniques for large-scale crop germplasms screening. In this study, we conducted a Fourier transform infrared (FTIR) assay on 150 maize germplasms with a diverse lignin composition to build predictive models for lignin content in maize stalk. Principal component analysis (PCA) was applied to the FTIR spectra for use as model inputs. Classification and advanced gradient boosting machine (GBM) algorithms demonstrated higher predictive accuracy (0.82-0.96) compared to traditional linear and regularization algorithms (0.03-0.04) in the training set. Notably, two optimal models, built using the extreme gradient boosting (XGBoost) and light gradient boosting machine (LightGBM) algorithms, achieved R2 values of over 0.91 in the training set and over 0.82 in the test set. Overall, the combination of FTIR and machine learning (ML) algorithms offers a high-throughput and efficient method for predicting lignin content. This approach holds significant potential for genetic breeding and the effective utilization of maize in industrial production.
Collapse
Affiliation(s)
- Yujing Wen
- The National Engineering Laboratory of Crop Resistance Breeding, School of Life Sciences, Anhui Agricultural University, Hefei 230036, China
| | - Xing Liu
- School of Materials and Chemistry, Anhui Agricultural University, Hefei, Anhui 230036, China
| | - Feng He
- The National Engineering Laboratory of Crop Resistance Breeding, School of Life Sciences, Anhui Agricultural University, Hefei 230036, China
| | - Yanli Shi
- The National Engineering Laboratory of Crop Resistance Breeding, School of Life Sciences, Anhui Agricultural University, Hefei 230036, China
| | - Fanghui Chen
- The National Engineering Laboratory of Crop Resistance Breeding, School of Life Sciences, Anhui Agricultural University, Hefei 230036, China
| | - Wenfei Li
- The National Engineering Laboratory of Crop Resistance Breeding, School of Life Sciences, Anhui Agricultural University, Hefei 230036, China
| | - Youhong Song
- School of Agronomy, Anhui Agricultural University, Hefei 230036, China
| | - Lin Li
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan 430070, China
| | - Haiyang Jiang
- The National Engineering Laboratory of Crop Resistance Breeding, School of Life Sciences, Anhui Agricultural University, Hefei 230036, China
| | - Liang Zhou
- School of Materials and Chemistry, Anhui Agricultural University, Hefei, Anhui 230036, China.
| | - Leiming Wu
- The National Engineering Laboratory of Crop Resistance Breeding, School of Life Sciences, Anhui Agricultural University, Hefei 230036, China.
| |
Collapse
|
2
|
Cheng Q, Wang X. Machine Learning for AI Breeding in Plants. GENOMICS, PROTEOMICS & BIOINFORMATICS 2024; 22:qzae051. [PMID: 38954837 PMCID: PMC11479635 DOI: 10.1093/gpbjnl/qzae051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/11/2024] [Revised: 06/21/2024] [Accepted: 06/25/2024] [Indexed: 07/04/2024]
Affiliation(s)
- Qian Cheng
- State Key Laboratory of Maize Bio-breeding, National Maize Improvement Center, Frontiers Science Center for Molecular Design Breeding, China Agricultural University, Beijing 100094, China
| | - Xiangfeng Wang
- State Key Laboratory of Maize Bio-breeding, National Maize Improvement Center, Frontiers Science Center for Molecular Design Breeding, China Agricultural University, Beijing 100094, China
| |
Collapse
|
3
|
Peng S, Rajjou L. Advancing plant biology through deep learning-powered natural language processing. PLANT CELL REPORTS 2024; 43:208. [PMID: 39102077 DOI: 10.1007/s00299-024-03294-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/07/2024] [Accepted: 07/19/2024] [Indexed: 08/06/2024]
Abstract
The application of deep learning methods, specifically the utilization of Large Language Models (LLMs), in the field of plant biology holds significant promise for generating novel knowledge on plant cell systems. The LLM framework exhibits exceptional potential, particularly with the development of Protein Language Models (PLMs), allowing for in-depth analyses of nucleic acid and protein sequences. This analytical capacity facilitates the discernment of intricate patterns and relationships within biological data, encompassing multi-scale information within DNA or protein sequences. The contribution of PLMs extends beyond mere sequence patterns and structure--function recognition; it also supports advancements in genetic improvements for agriculture. The integration of deep learning approaches into the domain of plant sciences offers opportunities for major breakthroughs in basic research across multi-scale plant traits. Consequently, the strategic application of deep learning methodologies, particularly leveraging the potential of LLMs, will undoubtedly play a pivotal role in advancing plant sciences, plant production, plant uses and propelling the trajectory toward sustainable agroecological and agro-food transitions.
Collapse
Affiliation(s)
- Shuang Peng
- Université Paris-Saclay, INRAE, AgroParisTech, Institut Jean-Pierre Bourgin for Plant Sciences (IJPB), 78000, Versailles, France
| | - Loïc Rajjou
- Université Paris-Saclay, INRAE, AgroParisTech, Institut Jean-Pierre Bourgin for Plant Sciences (IJPB), 78000, Versailles, France.
| |
Collapse
|
4
|
Wang XY, Ren CX, Fan QW, Xu YP, Wang LW, Mao ZL, Cai XZ. Integrated Assays of Genome-Wide Association Study, Multi-Omics Co-Localization, and Machine Learning Associated Calcium Signaling Genes with Oilseed Rape Resistance to Sclerotinia sclerotiorum. Int J Mol Sci 2024; 25:6932. [PMID: 39000053 PMCID: PMC11240920 DOI: 10.3390/ijms25136932] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2024] [Revised: 06/20/2024] [Accepted: 06/20/2024] [Indexed: 07/14/2024] Open
Abstract
Sclerotinia sclerotiorum (Ss) is one of the most devastating fungal pathogens, causing huge yield loss in multiple economically important crops including oilseed rape. Plant resistance to Ss pertains to quantitative disease resistance (QDR) controlled by multiple minor genes. Genome-wide identification of genes involved in QDR to Ss is yet to be conducted. In this study, we integrated several assays including genome-wide association study (GWAS), multi-omics co-localization, and machine learning prediction to identify, on a genome-wide scale, genes involved in the oilseed rape QDR to Ss. Employing GWAS and multi-omics co-localization, we identified seven resistance-associated loci (RALs) associated with oilseed rape resistance to Ss. Furthermore, we developed a machine learning algorithm and named it Integrative Multi-Omics Analysis and Machine Learning for Target Gene Prediction (iMAP), which integrates multi-omics data to rapidly predict disease resistance-related genes within a broad chromosomal region. Through iMAP based on the identified RALs, we revealed multiple calcium signaling genes related to the QDR to Ss. Population-level analysis of selective sweeps and haplotypes of variants confirmed the positive selection of the predicted calcium signaling genes during evolution. Overall, this study has developed an algorithm that integrates multi-omics data and machine learning methods, providing a powerful tool for predicting target genes associated with specific traits. Furthermore, it makes a basis for further understanding the role and mechanisms of calcium signaling genes in the QDR to Ss.
Collapse
Affiliation(s)
- Xin-Yao Wang
- Key Laboratory of Biology and Ecological Control of Crop Pathogens and Insects of Zhejiang Province, Institute of Biotechnology, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 310058, China; (X.-Y.W.); (C.-X.R.); (Q.-W.F.); (L.-W.W.); (Z.-L.M.)
| | - Chun-Xiu Ren
- Key Laboratory of Biology and Ecological Control of Crop Pathogens and Insects of Zhejiang Province, Institute of Biotechnology, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 310058, China; (X.-Y.W.); (C.-X.R.); (Q.-W.F.); (L.-W.W.); (Z.-L.M.)
| | - Qing-Wen Fan
- Key Laboratory of Biology and Ecological Control of Crop Pathogens and Insects of Zhejiang Province, Institute of Biotechnology, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 310058, China; (X.-Y.W.); (C.-X.R.); (Q.-W.F.); (L.-W.W.); (Z.-L.M.)
| | - You-Ping Xu
- Centre of Analysis and Measurement, Zhejiang University, 866 Yu Hang Tang Road, Hangzhou 310058, China;
| | - Lu-Wen Wang
- Key Laboratory of Biology and Ecological Control of Crop Pathogens and Insects of Zhejiang Province, Institute of Biotechnology, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 310058, China; (X.-Y.W.); (C.-X.R.); (Q.-W.F.); (L.-W.W.); (Z.-L.M.)
| | - Zhou-Lu Mao
- Key Laboratory of Biology and Ecological Control of Crop Pathogens and Insects of Zhejiang Province, Institute of Biotechnology, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 310058, China; (X.-Y.W.); (C.-X.R.); (Q.-W.F.); (L.-W.W.); (Z.-L.M.)
| | - Xin-Zhong Cai
- Key Laboratory of Biology and Ecological Control of Crop Pathogens and Insects of Zhejiang Province, Institute of Biotechnology, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 310058, China; (X.-Y.W.); (C.-X.R.); (Q.-W.F.); (L.-W.W.); (Z.-L.M.)
- Hainan Institute, Zhejiang University, Sanya 572025, China
| |
Collapse
|
5
|
Wu C, Luo J, Xiao Y. Multi-omics assists genomic prediction of maize yield with machine learning approaches. MOLECULAR BREEDING : NEW STRATEGIES IN PLANT IMPROVEMENT 2024; 44:14. [PMID: 38343399 PMCID: PMC10853138 DOI: 10.1007/s11032-024-01454-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Accepted: 01/19/2024] [Indexed: 02/28/2024]
Abstract
With the improvement of high-throughput technologies in recent years, large multi-dimensional plant omics data have been produced, and big-data-driven yield prediction research has received increasing attention. Machine learning offers promising computational and analytical solutions to interpret the biological meaning of large amounts of data in crops. In this study, we utilized multi-omics datasets from 156 maize recombinant inbred lines, containing 2496 single nucleotide polymorphisms (SNPs), 46 image traits (i-traits) from 16 developmental stages obtained through an automatic phenotyping platform, and 133 primary metabolites. Based on benchmark tests with different types of prediction models, some machine learning methods, such as Partial Least Squares (PLS), Random Forest (RF), and Gaussian process with Radial basis function kernel (GaussprRadial), achieved better prediction for maize yield, albeit slight difference for method preferences among i-traits, genomic, and metabolic data. We found that better yield prediction may be caused by various capabilities in ranking and filtering data features, which is found to be linked with biological meaning such as photosynthesis-related or kernel development-related regulations. Finally, by integrating multiple omics data with the RF machine learning approach, we can further improve the prediction accuracy of grain yield from 0.32 to 0.43. Our research provides new ideas for the application of plant omics data and artificial intelligence approaches to facilitate crop genetic improvements. Supplementary Information The online version contains supplementary material available at 10.1007/s11032-024-01454-z.
Collapse
Affiliation(s)
- Chengxiu Wu
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070 China
| | - Jingyun Luo
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070 China
| | - Yingjie Xiao
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070 China
- Hubei Hongshan Laboratory, Wuhan, 430070 China
| |
Collapse
|
6
|
Lyu K, Xiao J, Lyu S, Liu R. Comparative Analysis of Transposable Elements in Strawberry Genomes of Different Ploidy Levels. Int J Mol Sci 2023; 24:16935. [PMID: 38069258 PMCID: PMC10706760 DOI: 10.3390/ijms242316935] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Revised: 11/25/2023] [Accepted: 11/27/2023] [Indexed: 12/18/2023] Open
Abstract
Transposable elements (TEs) make up a large portion of plant genomes and play a vital role in genome structure, function, and evolution. Cultivated strawberry (Fragaria x ananassa) is one of the most important fruit crops, and its octoploid genome was formed through several rounds of genome duplications from diploid ancestors. Here, we built a pan-genome TE library for the Fragaria genus using ten published strawberry genomes at different ploidy levels, including seven diploids, one tetraploid, and two octoploids, and performed comparative analysis of TE content in these genomes. The TEs comprise 51.83% (F. viridis) to 60.07% (F. nilgerrensis) of the genomes. Long terminal repeat retrotransposons (LTR-RTs) are the predominant TE type in the Fragaria genomes (20.16% to 34.94%), particularly in F. iinumae (34.94%). Estimating TE content and LTR-RT insertion times revealed that species-specific TEs have shaped each strawberry genome. Additionally, the copy number of different LTR-RT families inserted in the last one million years reflects the genetic distance between Fragaria species. Comparing cultivated strawberry subgenomes to extant diploid ancestors showed that F. vesca and F. iinumae are likely the diploid ancestors of the cultivated strawberry, but not F. viridis. These findings provide new insights into the TE variations in the strawberry genomes and their roles in strawberry genome evolution.
Collapse
Affiliation(s)
- Keliang Lyu
- College of Horticulture, Fujian Agriculture and Forestry University, Fuzhou 350002, China; (K.L.); (S.L.)
- Haixia Institute of Science and Technology, Fujian Agriculture and Forestry University, Fuzhou 350002, China;
| | - Jiajing Xiao
- Haixia Institute of Science and Technology, Fujian Agriculture and Forestry University, Fuzhou 350002, China;
| | - Shiheng Lyu
- College of Horticulture, Fujian Agriculture and Forestry University, Fuzhou 350002, China; (K.L.); (S.L.)
| | - Renyi Liu
- Haixia Institute of Science and Technology, Fujian Agriculture and Forestry University, Fuzhou 350002, China;
| |
Collapse
|
7
|
Sharma N, Raman H, Wheeler D, Kalenahalli Y, Sharma R. Data-driven approaches to improve water-use efficiency and drought resistance in crop plants. PLANT SCIENCE : AN INTERNATIONAL JOURNAL OF EXPERIMENTAL PLANT BIOLOGY 2023; 336:111852. [PMID: 37659733 DOI: 10.1016/j.plantsci.2023.111852] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/11/2022] [Revised: 08/23/2023] [Accepted: 08/29/2023] [Indexed: 09/04/2023]
Abstract
With the increasing population, there lies a pressing demand for food, feed and fibre, while the changing climatic conditions pose severe challenges for agricultural production worldwide. Water is the lifeline for crop production; thus, enhancing crop water-use efficiency (WUE) and improving drought resistance in crop varieties are crucial for overcoming these challenges. Genetically-driven improvements in yield, WUE and drought tolerance traits can buffer the worst effects of climate change on crop production in dry areas. While traditional crop breeding approaches have delivered impressive results in increasing yield, the methods remain time-consuming and are often limited by the existing allelic variation present in the germplasm. Significant advances in breeding and high-throughput omics technologies in parallel with smart agriculture practices have created avenues to dramatically speed up the process of trait improvement by leveraging the vast volumes of genomic and phenotypic data. For example, individual genome and pan-genome assemblies, along with transcriptomic, metabolomic and proteomic data from germplasm collections, characterised at phenotypic levels, could be utilised to identify marker-trait associations and superior haplotypes for crop genetic improvement. In addition, these omics approaches enable the identification of genes involved in pathways leading to the expression of a trait, thereby providing an understanding of the genetic, physiological and biochemical basis of trait variation. These data-driven gene discoveries and validation approaches are essential for crop improvement pipelines, including genomic breeding, speed breeding and gene editing. Herein, we provide an overview of prospects presented using big data-driven approaches (including artificial intelligence and machine learning) to harness new genetic gains for breeding programs and develop drought-tolerant crop varieties with favourable WUE and high-yield potential traits.
Collapse
Affiliation(s)
- Niharika Sharma
- NSW Department of Primary Industries, Orange Agricultural Institute, Orange, NSW 2800, Australia.
| | - Harsh Raman
- NSW Department of Primary Industries, Wagga Wagga Agricultural Institute, Wagga Wagga, NSW 2650, Australia
| | - David Wheeler
- NSW Department of Primary Industries, Orange Agricultural Institute, Orange, NSW 2800, Australia
| | - Yogendra Kalenahalli
- International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Patancheru, Hyderabad, Telangana 502324, India
| | - Rita Sharma
- Department of Biological Sciences, BITS Pilani, Pilani Campus, Rajasthan 333031, India
| |
Collapse
|
8
|
Zhao T, Wu H, Wang X, Zhao Y, Wang L, Pan J, Mei H, Han J, Wang S, Lu K, Li M, Gao M, Cao Z, Zhang H, Wan K, Li J, Fang L, Zhang T, Guan X. Integration of eQTL and machine learning to dissect causal genes with pleiotropic effects in genetic regulation networks of seed cotton yield. Cell Rep 2023; 42:113111. [PMID: 37676770 DOI: 10.1016/j.celrep.2023.113111] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Revised: 06/19/2023] [Accepted: 08/24/2023] [Indexed: 09/09/2023] Open
Abstract
The dissection of a gene regulatory network (GRN) that complements the genome-wide association study (GWAS) locus and the crosstalk underlying multiple agronomical traits remains a major challenge. In this study, we generate 558 transcriptional profiles of lint-bearing ovules at one day post-anthesis from a selective core cotton germplasm, from which 12,207 expression quantitative trait loci (eQTLs) are identified. Sixty-six known phenotypic GWAS loci are colocalized with 1,090 eQTLs, forming 38 functional GRNs associated predominantly with seed yield. Of the eGenes, 34 exhibit pleiotropic effects. Combining the eQTLs within the seed yield GRNs significantly increases the portion of narrow-sense heritability. The extreme gradient boosting (XGBoost) machine learning approach is applied to predict seed cotton yield phenotypes on the basis of gene expression. Top-ranking eGenes (NF-YB3, FLA2, and GRDP1) derived with pleiotropic effects on yield traits are validated, along with their potential roles by correlation analysis, domestication selection analysis, and transgenic plants.
Collapse
Affiliation(s)
- Ting Zhao
- Zhejiang Provincial Key Laboratory of Crop Genetic Resources, The Advanced Seed Institute, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 300058, China; Hainan Institute of Zhejiang University, Building 11, Yonyou Industrial Park, Yazhou Bay Science and Technology City, Yazhou District, Sanya 572025, China
| | - Hongyu Wu
- Zhejiang Provincial Key Laboratory of Crop Genetic Resources, The Advanced Seed Institute, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 300058, China
| | - Xutong Wang
- Hubei Hongshan Laboratory, Wuhan 430070, China
| | - Yongyan Zhao
- Zhejiang Provincial Key Laboratory of Crop Genetic Resources, The Advanced Seed Institute, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 300058, China; Hainan Institute of Zhejiang University, Building 11, Yonyou Industrial Park, Yazhou Bay Science and Technology City, Yazhou District, Sanya 572025, China
| | - Luyao Wang
- Hainan Institute of Zhejiang University, Building 11, Yonyou Industrial Park, Yazhou Bay Science and Technology City, Yazhou District, Sanya 572025, China
| | - Jiaying Pan
- Zhejiang Provincial Key Laboratory of Crop Genetic Resources, The Advanced Seed Institute, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 300058, China; Hainan Institute of Zhejiang University, Building 11, Yonyou Industrial Park, Yazhou Bay Science and Technology City, Yazhou District, Sanya 572025, China
| | - Huan Mei
- Zhejiang Provincial Key Laboratory of Crop Genetic Resources, The Advanced Seed Institute, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 300058, China
| | - Jin Han
- Zhejiang Provincial Key Laboratory of Crop Genetic Resources, The Advanced Seed Institute, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 300058, China
| | - Siyuan Wang
- Zhejiang Provincial Key Laboratory of Crop Genetic Resources, The Advanced Seed Institute, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 300058, China
| | - Kening Lu
- State Key Laboratory of Crop Genetics and Germplasm Enhancement, Cotton Hybrid R & D Engineering Center (the Ministry of Education), College of Agriculture, Nanjing Agricultural University, Nanjing 210095, China
| | - Menglin Li
- State Key Laboratory of Crop Genetics and Germplasm Enhancement, Cotton Hybrid R & D Engineering Center (the Ministry of Education), College of Agriculture, Nanjing Agricultural University, Nanjing 210095, China
| | - Mengtao Gao
- State Key Laboratory of Crop Genetics and Germplasm Enhancement, Cotton Hybrid R & D Engineering Center (the Ministry of Education), College of Agriculture, Nanjing Agricultural University, Nanjing 210095, China
| | - Zeyi Cao
- Zhejiang Provincial Key Laboratory of Crop Genetic Resources, The Advanced Seed Institute, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 300058, China
| | - Hailin Zhang
- Zhejiang Provincial Key Laboratory of Crop Genetic Resources, The Advanced Seed Institute, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 300058, China
| | - Ke Wan
- State Key Laboratory of Crop Genetics and Germplasm Enhancement, Cotton Hybrid R & D Engineering Center (the Ministry of Education), College of Agriculture, Nanjing Agricultural University, Nanjing 210095, China
| | - Jie Li
- State Key Laboratory of Crop Genetics and Germplasm Enhancement, Cotton Hybrid R & D Engineering Center (the Ministry of Education), College of Agriculture, Nanjing Agricultural University, Nanjing 210095, China
| | - Lei Fang
- Zhejiang Provincial Key Laboratory of Crop Genetic Resources, The Advanced Seed Institute, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 300058, China; Hainan Institute of Zhejiang University, Building 11, Yonyou Industrial Park, Yazhou Bay Science and Technology City, Yazhou District, Sanya 572025, China
| | - Tianzhen Zhang
- Zhejiang Provincial Key Laboratory of Crop Genetic Resources, The Advanced Seed Institute, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 300058, China; Hainan Institute of Zhejiang University, Building 11, Yonyou Industrial Park, Yazhou Bay Science and Technology City, Yazhou District, Sanya 572025, China
| | - Xueying Guan
- Zhejiang Provincial Key Laboratory of Crop Genetic Resources, The Advanced Seed Institute, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 300058, China; Hainan Institute of Zhejiang University, Building 11, Yonyou Industrial Park, Yazhou Bay Science and Technology City, Yazhou District, Sanya 572025, China.
| |
Collapse
|
9
|
Wang X, Han L, Li J, Shang X, Liu Q, Li L, Zhang H. Next-generation bulked segregant analysis for Breeding 4.0. Cell Rep 2023; 42:113039. [PMID: 37651230 DOI: 10.1016/j.celrep.2023.113039] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Revised: 07/11/2023] [Accepted: 08/10/2023] [Indexed: 09/02/2023] Open
Abstract
Functional cloning and manipulation of genes controlling various agronomic traits are important for boosting crop production. Although bulked segregant analysis (BSA) is an efficient method for functional cloning, its low throughput cannot satisfy the current need for crop breeding and food security. Here, we review the rationale and development of conventional BSA and discuss its strengths and drawbacks. We then propose next-generation BSA (NG-BSA) integrating multiple cutting-edge technologies, including high-throughput phenotyping, biological big data, and the use of machine learning. NG-BSA increases the resolution of genetic mapping and throughput for cloning quantitative trait genes (QTGs) and optimizes candidate gene selection while providing a means to elucidate the interaction network of QTGs. The ability of NG-BSA to efficiently batch-clone QTGs makes it an important tool for dissecting molecular mechanisms underlying various traits, as well as for the improvement of Breeding 4.0 strategy, especially in targeted improvement and population improvement of crops.
Collapse
Affiliation(s)
- Xi Wang
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China; Hubei Hongshan Laboratory, Wuhan 430070, China
| | - Linqian Han
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China; Hubei Hongshan Laboratory, Wuhan 430070, China
| | - Juan Li
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China; Hubei Hongshan Laboratory, Wuhan 430070, China
| | - Xiaoyang Shang
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China; Hubei Hongshan Laboratory, Wuhan 430070, China
| | - Qian Liu
- Institute of Crop Science, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Lin Li
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China; Hubei Hongshan Laboratory, Wuhan 430070, China.
| | - Hongwei Zhang
- Institute of Crop Science, Chinese Academy of Agricultural Sciences, Beijing 100081, China.
| |
Collapse
|
10
|
Rehman S, Ahmad Z, Ramakrishnan M, Kalendar R, Zhuge Q. Regulation of plant epigenetic memory in response to cold and heat stress: towards climate resilient agriculture. Funct Integr Genomics 2023; 23:298. [PMID: 37700098 DOI: 10.1007/s10142-023-01219-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2023] [Revised: 08/18/2023] [Accepted: 08/23/2023] [Indexed: 09/14/2023]
Abstract
Plants have evolved to adapt and grow in hot and cold climatic conditions. Some also adapt to daily and seasonal temperature changes. Epigenetic modifications play an important role in regulating plant tolerance under such conditions. DNA methylation and post-translational modifications of histone proteins influence gene expression during plant developmental stages and under stress conditions, including cold and heat stress. While short-term modifications are common, some modifications may persist and result in stress memory that can be inherited by subsequent generations. Understanding the mechanisms of epigenomes responding to stress and the factors that trigger stress memory is crucial for developing climate-resilient agriculture, but such an integrated view is currently limited. This review focuses on the plant epigenetic stress memory during cold and heat stress. It also discusses the potential of machine learning to modify stress memory through epigenetics to develop climate-resilient crops.
Collapse
Affiliation(s)
- Shamsur Rehman
- Co-Innovation Center for Sustainable Forestry in Southern China, Key Laboratory of Forest Genetics and Biotechnology, College of Biology and the Environment, Nanjing Forestry University, Ministry of Education, Nanjing, China
| | - Zishan Ahmad
- Co-Innovation Center for Sustainable Forestry in Southern China, Nanjing Forestry University, Nanjing, 210037, China
- Bamboo Research Institute, Nanjing Forestry University, Nanjing, 210037, China
| | - Muthusamy Ramakrishnan
- Co-Innovation Center for Sustainable Forestry in Southern China, Nanjing Forestry University, Nanjing, 210037, China
- Bamboo Research Institute, Nanjing Forestry University, Nanjing, 210037, China
| | - Ruslan Kalendar
- Helsinki Institute of Life Science HiLIFE, Biocenter 3, Viikinkaari 1, FI-00014 University of Helsinki, Helsinki, Finland.
- Center for Life Sciences, National Laboratory Astana, Nazarbayev University, Astana, Kazakhstan.
| | - Qiang Zhuge
- Co-Innovation Center for Sustainable Forestry in Southern China, Key Laboratory of Forest Genetics and Biotechnology, College of Biology and the Environment, Nanjing Forestry University, Ministry of Education, Nanjing, China.
| |
Collapse
|
11
|
Shamloo-Dashtpagerdi R, Lindlöf A, Nouripour-Sisakht J. Unraveling the regulatory role of MYC2 on ASMT gene expression in wheat: Implications for melatonin biosynthesis and drought tolerance. PHYSIOLOGIA PLANTARUM 2023; 175:e14015. [PMID: 37882265 DOI: 10.1111/ppl.14015] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 08/08/2023] [Accepted: 08/21/2023] [Indexed: 10/27/2023]
Abstract
Recognized for its multifaceted functions, melatonin is a hormone found in both animals and plants. In the plant kingdom, it plays diverse roles, regulating growth, development, and stress responses. Notably, melatonin demonstrates its significance by mitigating the effects of abiotic stresses like drought. However, understanding the precise regulatory mechanisms controlling melatonin biosynthesis genes, especially during monocots' response to stresses, requires further exploration. Seeking to understand the molecular basis of drought stress tolerance in wheat, we analyzed RNA-Seq libraries of wheat exposed to drought stress using bioinformatics methods. In light of our findings, we identified that the Myelocytomatosis oncogenes 2 (MYC2) transcription factor is a hub gene upstream of a main melatonin biosynthesis gene, N-acetylserotonin methyltransferase (ASMT), in the wheat drought response-gene network. Promoter analysis of the ASMT gene suggested that it might be a target gene of MYC2. We conducted a set of molecular and physiochemical assays along with robust machine learning approaches to elevate those findings further. MYC2 and ASMT were co-regulated under Jasmonate, drought, and a combination of them in the leaf tissues of wheat was detected. A meaningful correlation was observed among gene expression profiles, melatonin contents, photosynthetic activities, antioxidant enzyme activities, H2 O2 levels, and plasma membrane damage. The results indicated an evident relationship between jasmonic acid and the melatonin biosynthesis pathway. Moreover, it seems that the MYC2-ASMT module might contribute to wheat drought tolerance by regulating melatonin contents.
Collapse
Affiliation(s)
| | | | - Javad Nouripour-Sisakht
- Department of Plant Production and Genetics, College of Agricultural Engineering, Isfahan University of Technology, Isfahan, Iran
| |
Collapse
|
12
|
Shakeri R, Amini H, Fakheri F, Ketabchi H. Assessment of drought conditions and prediction by machine learning algorithms using Standardized Precipitation Index and Standardized Water-Level Index (case study: Yazd province, Iran). ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2023; 30:101744-101760. [PMID: 37656297 DOI: 10.1007/s11356-023-29522-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Accepted: 08/22/2023] [Indexed: 09/02/2023]
Abstract
Drought as a natural phenomenon has always been a serious threat to regions with hot and dry climates. One of the major effects of drought is the drop in groundwater level. This paper focused on the SPI (Standardized Precipitation Index) and SWI (Standardized Water-Level Index) to assess meteorological and hydrological drought, respectively. In the first part, we used different time frames of SPI (3, 6, 12, and 24 months) to investigate drought in Yazd, a dry province in the center of Iran for 29 years (1990-2018). Then, in the second part, the relationship between SPI and SWI was investigated in the three aquifers of Yazd by some rain gauge stations and the closest observation wells to them. In addition to using SPI and SWI, we also used different machine learning (ML) algorithms to predict drought conditions including linear model and six non-linear models of K_Nearest_Neighbors, Gradient_Boosting, Decision_Tree, XGBoost, Random_Forest, and Neural_Net. To evaluate the accuracy of the mentioned models, three statistical indicators including Score, RMSE, and MAE were used. Based on the results of the first part, Yazd province has changed from mild wet to mild drought in terms of meteorological drought (the amount of rainfall according to SPI), and this condition can worsen due to climate change. The models used in ML showed that SPI-6 (score ave = 0.977), SPI-3 (score ave = 0.936), SPI-24 (score ave = 0.571), and SPI-12 (score ave = 0.413) indices had the highest accuracy, respectively. The models of Neural_Net (score ave = 0.964-RMSE ave = 0.020-MAE ave = 0.077) and Gradient_Boosting (score ave = 0.551-RMSE ave = 0.124-MAE ave = 0.248) had the highest and lowest accuracy in prediction of the SPI in all four-time scales. Based on the results of the second part, about the SWI, Random_Forest model (score = 0.929-RMSE = 0.052-MAE = 0.150) and model of Neural_Net (score = 0.755-RMSE = 0.235-MAE = 0.456) had the highest and lowest accuracy, respectively. Also, hydrological drought (reduction of the groundwater level) of the region has been much more severe, and according to the low correlation coefficient of average SPI and SWI (R2 = 0.14), we found that the uncontrolled pumping wells, as a main factor than a shortage of rainfall, have aggravated the hydrological drought, and this region is at risk of becoming a more arid region in the future.
Collapse
Affiliation(s)
- Reza Shakeri
- Department of Civil and Environmental Engineering, Amirkabir University of Technology, Tehran, Iran
| | - Hossein Amini
- Engineering Department, Cardiff University, Cardiff, UK
| | - Farshid Fakheri
- Department of Civil and Environmental Engineering, Amirkabir University of Technology, Tehran, Iran
| | - Hamed Ketabchi
- Department of Water Engineering and Management, Tarbiat Modares University, Tehran, Iran.
| |
Collapse
|
13
|
Wang Z, Zhu Y, Liu Z, Li H, Tang X, Jiang Y. Comparative analysis of tissue-specific genes in maize based on machine learning models: CNN performs technically best, LightGBM performs biologically soundest. Front Genet 2023; 14:1190887. [PMID: 37229198 PMCID: PMC10203421 DOI: 10.3389/fgene.2023.1190887] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Accepted: 04/17/2023] [Indexed: 05/27/2023] Open
Abstract
Introduction: With the advancement of RNA-seq technology and machine learning, training large-scale RNA-seq data from databases with machine learning models can generally identify genes with important regulatory roles that were previously missed by standard linear analytic methodologies. Finding tissue-specific genes could improve our comprehension of the relationship between tissues and genes. However, few machine learning models for transcriptome data have been deployed and compared to identify tissue-specific genes, particularly for plants. Methods: In this study, an expression matrix was processed with linear models (Limma), machine learning models (LightGBM), and deep learning models (CNN) with information gain and the SHAP strategy based on 1,548 maize multi-tissue RNA-seq data obtained from a public database to identify tissue-specific genes. In terms of validation, V-measure values were computed based on k-means clustering of the gene sets to evaluate their technical complementarity. Furthermore, GO analysis and literature retrieval were used to validate the functions and research status of these genes. Results: Based on clustering validation, the convolutional neural network outperformed others with higher V-measure values as 0.647, indicating that its gene set could cover as many specific properties of various tissues as possible, whereas LightGBM discovered key transcription factors. The combination of three gene sets produced 78 core tissue-specific genes that had previously been shown in the literature to be biologically significant. Discussion: Different tissue-specific gene sets were identified due to the distinct interpretation strategy for machine learning models and researchers may use multiple methodologies and strategies for tissue-specific gene sets based on their goals, types of data, and computational resources. This study provided comparative insight for large-scale data mining of transcriptome datasets, shedding light on resolving high dimensions and bias difficulties in bioinformatics data processing.
Collapse
Affiliation(s)
- Zijie Wang
- School of Agriculture, Sun Yat-sen University, Shenzhen, China
| | - Yuzhi Zhu
- School of Agriculture, Sun Yat-sen University, Shenzhen, China
| | - Zhule Liu
- School of Agriculture, Sun Yat-sen University, Shenzhen, China
| | - Hongfu Li
- School of Agriculture, Sun Yat-sen University, Shenzhen, China
| | - Xinqiang Tang
- School of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen, China
| | - Yi Jiang
- School of Agriculture, Sun Yat-sen University, Shenzhen, China
| |
Collapse
|
14
|
Kisiel A, Krzemińska A, Cembrowska-Lech D, Miller T. Data Science and Plant Metabolomics. Metabolites 2023; 13:metabo13030454. [PMID: 36984894 PMCID: PMC10054611 DOI: 10.3390/metabo13030454] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Revised: 03/16/2023] [Accepted: 03/17/2023] [Indexed: 03/30/2023] Open
Abstract
The study of plant metabolism is one of the most complex tasks, mainly due to the huge amount and structural diversity of metabolites, as well as the fact that they react to changes in the environment and ultimately influence each other. Metabolic profiling is most often carried out using tools that include mass spectrometry (MS), which is one of the most powerful analytical methods. All this means that even when analyzing a single sample, we can obtain thousands of data. Data science has the potential to revolutionize our understanding of plant metabolism. This review demonstrates that machine learning, network analysis, and statistical modeling are some techniques being used to analyze large quantities of complex data that provide insights into plant development, growth, and how they interact with their environment. These findings could be key to improving crop yields, developing new forms of plant biotechnology, and understanding the relationship between plants and microbes. It is also necessary to consider the constraints that come with data science such as quality and availability of data, model complexity, and the need for deep knowledge of the subject in order to achieve reliable outcomes.
Collapse
Affiliation(s)
- Anna Kisiel
- Institute of Marine and Environmental Sciences, University of Szczecin, Wąska 13, 71-415 Szczecin, Poland
- Polish Society of Bioinformatics and Data Science BIODATA, Popiełuszki 4c, 71-214 Szczecin, Poland
| | - Adrianna Krzemińska
- Polish Society of Bioinformatics and Data Science BIODATA, Popiełuszki 4c, 71-214 Szczecin, Poland
| | - Danuta Cembrowska-Lech
- Polish Society of Bioinformatics and Data Science BIODATA, Popiełuszki 4c, 71-214 Szczecin, Poland
- Department of Physiology and Biochemistry, Institute of Biology, University of Szczecin, Felczaka 3c, 71-412 Szczecin, Poland
| | - Tymoteusz Miller
- Institute of Marine and Environmental Sciences, University of Szczecin, Wąska 13, 71-415 Szczecin, Poland
- Polish Society of Bioinformatics and Data Science BIODATA, Popiełuszki 4c, 71-214 Szczecin, Poland
| |
Collapse
|
15
|
Wang W, Guo W, Le L, Yu J, Wu Y, Li D, Wang Y, Wang H, Lu X, Qiao H, Gu X, Tian J, Zhang C, Pu L. Integration of high-throughput phenotyping, GWAS, and predictive models reveals the genetic architecture of plant height in maize. MOLECULAR PLANT 2023; 16:354-373. [PMID: 36447436 DOI: 10.1016/j.molp.2022.11.016] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/12/2022] [Revised: 09/05/2022] [Accepted: 11/27/2022] [Indexed: 06/16/2023]
Abstract
Plant height (PH) is an essential trait in maize (Zea mays) that is tightly associated with planting density, biomass, lodging resistance, and grain yield in the field. Dissecting the dynamics of maize plant architecture will be beneficial for ideotype-based maize breeding and prediction, as the genetic basis controlling PH in maize remains largely unknown. In this study, we developed an automated high-throughput phenotyping platform (HTP) to systematically and noninvasively quantify 77 image-based traits (i-traits) and 20 field traits (f-traits) for 228 maize inbred lines across all developmental stages. Time-resolved i-traits with novel digital phenotypes and complex correlations with agronomic traits were characterized to reveal the dynamics of maize growth. An i-trait-based genome-wide association study identified 4945 trait-associated SNPs, 2603 genetic loci, and 1974 corresponding candidate genes. We found that rapid growth of maize plants occurs mainly at two developmental stages, stage 2 (S2) to S3 and S5 to S6, accounting for the final PH indicators. By integrating the PH-association network with the transcriptome profiles of specific internodes, we revealed 13 hub genes that may play vital roles during rapid growth. The candidate genes and novel i-traits identified at multiple growth stages may be used as potential indicators for final PH in maize. One candidate gene, ZmVATE, was functionally validated and shown to regulate PH-related traits in maize using genetic mutation. Furthermore, machine learning was used to build predictive models for final PH based on i-traits, and their performance was assessed across developmental stages. Moderate, strong, and very strong correlations between predictions and experimental datasets were achieved from the early S4 (tenth-leaf) stage. Colletively, our study provides a valuable tool for dissecting the spatiotemporal formation of specific internodes and the genetic architecture of PH, as well as resources and predictive models that are useful for molecular design breeding and predicting maize varieties with ideal plant architectures.
Collapse
Affiliation(s)
- Weixuan Wang
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China; National Nanfan Research Institute (Sanya), Chinese Academy of Agricultural Sciences, Sanya 572024, China
| | - Weijun Guo
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Liang Le
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Jia Yu
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Yue Wu
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Dongwei Li
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Yifan Wang
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Huan Wang
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Xiaoduo Lu
- Institute of Molecular Breeding for Maize, Qilu Normal University, Jinan 250200, China
| | - Hong Qiao
- Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, TX 78712, USA; Department of Molecular Biosciences, The University of Texas at Austin, Austin, TX 78712, USA
| | - Xiaofeng Gu
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Jian Tian
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Chunyi Zhang
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China; Sanya Institute, Hainan Academy of Agricultural Sciences, Sanya 572000, China.
| | - Li Pu
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China; National Nanfan Research Institute (Sanya), Chinese Academy of Agricultural Sciences, Sanya 572024, China.
| |
Collapse
|
16
|
Yan J, Wang X. Machine learning bridges omics sciences and plant breeding. TRENDS IN PLANT SCIENCE 2023; 28:199-210. [PMID: 36153276 DOI: 10.1016/j.tplants.2022.08.018] [Citation(s) in RCA: 25] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Revised: 08/15/2022] [Accepted: 08/23/2022] [Indexed: 06/16/2023]
Abstract
Some of the biological knowledge obtained from fundamental research will be implemented in applied plant breeding. To bridge basic research and breeding practice, machine learning (ML) holds great promise to translate biological knowledge and omics data into precision-designed plant breeding. Here, we review ML for multi-omics analysis in plants, including data dimensionality reduction, inference of gene-regulation networks, and gene discovery and prioritization. These applications will facilitate understanding trait regulation mechanisms and identifying target genes potentially applicable to knowledge-driven molecular design breeding. We also highlight applications of deep learning in plant phenomics and ML in genomic selection-assisted breeding, such as various ML algorithms that model the correlations among genotypes (genes), phenotypes (traits), and environments, to ultimately achieve data-driven genomic design breeding.
Collapse
Affiliation(s)
- Jun Yan
- National Maize Improvement Center, College of Agronomy and Biotechnology, China Agricultural University, Beijing 100094, China; Frontiers Science Center for Molecular Design Breeding, China Agricultural University, Beijing 100094, China
| | - Xiangfeng Wang
- National Maize Improvement Center, College of Agronomy and Biotechnology, China Agricultural University, Beijing 100094, China; Frontiers Science Center for Molecular Design Breeding, China Agricultural University, Beijing 100094, China.
| |
Collapse
|
17
|
Artificial Intelligence-Based Robust Hybrid Algorithm Design and Implementation for Real-Time Detection of Plant Diseases in Agricultural Environments. BIOLOGY 2022; 11:biology11121732. [PMID: 36552243 PMCID: PMC9775035 DOI: 10.3390/biology11121732] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/29/2022] [Revised: 11/24/2022] [Accepted: 11/25/2022] [Indexed: 12/02/2022]
Abstract
The early detection and prevention of plant diseases that are an important cause of famine and food insecurity worldwide are very important for increasing agricultural product productivity. Not only the early detection of the plant disease but also the determination of its type play a critical role in determining the appropriate treatment. The fact that visual inspection, which is frequently used in determining plant disease and types, is tiring and prone to human error, necessitated the development of algorithms that can automatically classify plant disease with high accuracy and low computational cost. In this study, a new hybrid plant leaf disease classification model with high accuracy and low computational complexity, consisting of the wrapper approach, including the flower pollination algorithm (FPA) and support vector machine (SVM), and a convolutional neural network (CNN) classifier, is developed with a wrapper-based feature selection approach using metaheuristic optimization techniques. The features of the image dataset consisting of apple, grape, and tomato plants have been extracted by a two-dimensional discrete wavelet transform (2D-DWT) using wavelet families such as biorthogonal, Coiflets, Daubechies, Fejer-Korovkin, and symlets. Features that keep classifier performance high for each family are selected by the wrapper approach, consisting of the population-based metaheuristics FPA and SVM. The performance of the proposed optimization algorithm is compared with the particle swarm optimization (PSO) algorithm. Afterwards, the classification performance is obtained by using the lowest number of features that can keep the classification performance high for the CNN classifier. The CNN classifier with a single layer of classification without a feature extraction layer is used to minimize the complexity of the model and to deal with the model hyperparameter problem. The obtained model is embedded in the NVIDIA Jetson Nano developer kit on the unmanned aerial vehicle (UAV), and real-time classification tests are performed on apple, grape, and tomato plants. The experimental results obtained show that the proposed model classifies the specified plant leaf diseases in real time with high accuracy. Moreover, it is concluded that the robust hybrid classification model, which is created by selecting the lowest number of features with the optimization algorithm with low computational complexity, can classify plant leaf diseases in real time with precision.
Collapse
|
18
|
Feng H, Tang Q, Yu Z, Tang H, Yin M, Wei A. A Machine Learning Applied Diagnosis Method for Subcutaneous Cyst by Ultrasonography. OXIDATIVE MEDICINE AND CELLULAR LONGEVITY 2022; 2022:1526540. [PMID: 36299601 PMCID: PMC9592196 DOI: 10.1155/2022/1526540] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Revised: 09/19/2022] [Accepted: 09/28/2022] [Indexed: 11/18/2022]
Abstract
For decades, ultrasound images have been widely used in the detection of various diseases due to their high security and efficiency. However, reading ultrasound images requires years of experience and training. In order to support the diagnosis of clinicians and reduce the workload of doctors, many ultrasonic computer aided diagnostic systems have been proposed. In recent years, the success of deep learning in image classification and segmentation has made more and more scholars realize the potential performance improvement brought by the application of deep learning in ultrasonic computer-aided diagnosis systems. This study is aimed at applying several machine learning algorithms and develop a machine learning method to diagnose subcutaneous cyst. Clinical features are extracted from datasets and images of ultrasonography of 132 patients from Hunan Provincial People's Hospital in China. All datasets are separated into 70% training and 30% testing. Four kinds of machine learning algorithms including decision tree (DT), support vector machine (SVM), K-nearest neighbors (KNN), and neural networks (NN) had been approached to determine the best performance. Compared with all the results from each feature, SVM achieved the best performance from 91.7% to 100%. Results show that SVM performed the highest accuracy in the diagnosis of subcutaneous cyst by ultrasonography, which provide a good reference in further application to clinical practice of ultrasonography of subcutaneous cyst.
Collapse
Affiliation(s)
- Hao Feng
- Department of Dermatology, Hunan Provincial People's Hospital (The First Affiliated Hospital of Hunan Normal University), Changsha 410005, China
| | - Qian Tang
- Department of Dermatology, Hunan Provincial People's Hospital (The First Affiliated Hospital of Hunan Normal University), Changsha 410005, China
| | - Zhengyu Yu
- Faculty of Engineering and IT, University of Technology, Sydney, Sydney, NSW 2007, Australia
| | - Hua Tang
- Department of Dermatology, Hunan Provincial People's Hospital (The First Affiliated Hospital of Hunan Normal University), Changsha 410005, China
| | - Ming Yin
- Department of Dermatology, Hunan Provincial People's Hospital (The First Affiliated Hospital of Hunan Normal University), Changsha 410005, China
| | - An Wei
- Department of Ultrasound, Hunan Provincial People's Hospital (The First Affiliated Hospital of Hunan Normal University), Changsha 410005, China
| |
Collapse
|
19
|
Jia Z, Sun M, Ou C, Sun S, Mao C, Hong L, Wang J, Li M, Jia S, Mao P. Single Seed Identification in Three Medicago Species via Multispectral Imaging Combined with Stacking Ensemble Learning. SENSORS (BASEL, SWITZERLAND) 2022; 22:s22197521. [PMID: 36236620 PMCID: PMC9572871 DOI: 10.3390/s22197521] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/06/2022] [Revised: 09/23/2022] [Accepted: 09/26/2022] [Indexed: 05/24/2023]
Abstract
Multispectral imaging (MSI) has become a new fast and non-destructive detection method in seed identification. Previous research has usually focused on single models in MSI data analysis, which always employed all features and increased the risk to efficiency and that of system cost. In this study, we developed a stacking ensemble learning (SEL) model for successfully identifying a single seed of sickle alfalfa (Medicago falcata), hybrid alfalfa (M. varia), and alfalfa (M. sativa). SEL adopted a three-layer structure, i.e., level 0 with principal component analysis (PCA), linear discriminant analysis (LDA), and quadratic discriminant analysis (QDA) as models of dimensionality reduction and feature extraction (DRFE); level 1 with support vector machine (SVM), multiple logistic regression (MLR), generalized linear models with elastic net regularization (GLMNET), and eXtreme Gradient Boosting (XGBoost) as basic learners; and level 3 with XGBoost as meta-learner. We confirmed that the values of overall accuracy, kappa, precision, sensitivity, specificity, and sensitivity in the SEL model were all significantly higher than those in basic models alone, based on both spectral features and a combination of morphological and spectral features. Furthermore, we also developed a feature filtering process and successfully selected 5 optimal features out of 33 ones, which corresponded to the contents of chlorophyll, anthocyanin, fat, and moisture in seeds. Our SEL model in MSI data analysis provided a new way for seed identification, and the feature filter process potentially could be used widely for development of a low-cost and narrow-channel sensor.
Collapse
|
20
|
Yan J, Wang X. Unsupervised and semi-supervised learning: the next frontier in machine learning for plant systems biology. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2022; 111:1527-1538. [PMID: 35821601 DOI: 10.1111/tpj.15905] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/20/2022] [Revised: 07/05/2022] [Accepted: 07/07/2022] [Indexed: 06/15/2023]
Abstract
Advances in high-throughput omics technologies are leading plant biology research into the era of big data. Machine learning (ML) performs an important role in plant systems biology because of its excellent performance and wide application in the analysis of big data. However, to achieve ideal performance, supervised ML algorithms require large numbers of labeled samples as training data. In some cases, it is impossible or prohibitively expensive to obtain enough labeled training data; here, the paradigms of unsupervised learning (UL) and semi-supervised learning (SSL) play an indispensable role. In this review, we first introduce the basic concepts of ML techniques, as well as some representative UL and SSL algorithms, including clustering, dimensionality reduction, self-supervised learning (self-SL), positive-unlabeled (PU) learning and transfer learning. We then review recent advances and applications of UL and SSL paradigms in both plant systems biology and plant phenotyping research. Finally, we discuss the limitations and highlight the significance and challenges of UL and SSL strategies in plant systems biology.
Collapse
Affiliation(s)
- Jun Yan
- Frontiers Science Center for Molecular Design Breeding, China Agricultural University, Beijing, 100094, China
- National Maize Improvement Center, College of Agronomy and Biotechnology, China Agricultural University, Beijing, 100094, China
| | - Xiangfeng Wang
- Frontiers Science Center for Molecular Design Breeding, China Agricultural University, Beijing, 100094, China
- National Maize Improvement Center, College of Agronomy and Biotechnology, China Agricultural University, Beijing, 100094, China
| |
Collapse
|
21
|
Design of metaheuristic rough set-based feature selection and rule-based medical data classification model on MapReduce framework. JOURNAL OF INTELLIGENT SYSTEMS 2022. [DOI: 10.1515/jisys-2022-0066] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Abstract
Recently, big data analytics have gained significant attention in healthcare industry due to generation of massive quantities of data in various forms such as electronic health records, sensors, medical imaging, and pharmaceutical details. However, the data gathered from various sources are intrinsically uncertain owing to noise, incompleteness, and inconsistency. The analysis of such huge data necessitates advanced analytical techniques using machine learning and computational intelligence for effective decision making. To handle data uncertainty in healthcare sector, this article presents a novel metaheuristic rough set-based feature selection with rule-based medical data classification (MRSFS-RMDC) technique on MapReduce framework. The proposed MRSFS-RMDC technique designs a butterfly optimization algorithm for minimal rough set selection. In addition, Hadoop MapReduce is applied to process massive quantity of data. Moreover, a rule-based classification approach named Repeated Incremental Pruning for Error Reduction (RIPPER) is used with the inclusion of a set of conditional rules. The RIPPER will scale in a linear way with the number of training records utilized and is suitable to build models with data uncertainty. The proposed MRSFS-RMDC technique is validated using benchmark dataset and the results are inspected under varying aspects. The experimental results highlighted the supremacy of the MRSFS-RMDC technique over the recent state of art methods in terms of different performance measures. The proposed methodology has achieved a higher F-score of 96.49%.
Collapse
|
22
|
A Machine-Learning Method to Assess Growth Patterns in Plants of the Family Lemnaceae. PLANTS 2022; 11:plants11151910. [PMID: 35893614 PMCID: PMC9332063 DOI: 10.3390/plants11151910] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/28/2022] [Revised: 07/19/2022] [Accepted: 07/20/2022] [Indexed: 11/17/2022]
Abstract
Numerous new technologies have been implemented in image analysis methods that help researchers draw scientific conclusions from biological phenomena. Plants of the family Lemnaceae (duckweeds) are the smallest flowering plants in the world, and biometric measurements of single plants and their growth rate are highly challenging. Although the use of software for digital image analysis has changed the way scientists extract phenomenological data (also for studies on duckweeds), the procedure is often not wholly automated and sometimes relies on the intervention of a human operator. Such a constraint can limit the objectivity of the measurements and generally slows down the time required to produce scientific data. Herein lies the need to implement image analysis software with artificial intelligence that can substitute the human operator. In this paper, we present a new method to study the growth rates of the plants of the Lemnaceae family based on the application of machine-learning procedures to digital image analysis. The method is compared to existing analogical and computer-operated procedures. The results showed that our method drastically reduces the time consumption of the human operator while retaining a high correlation in the growth rates measured with other procedures. As expected, machine-learning methods applied to digital image analysis can overcome the constraints of measuring growth rates of very small plants and might help duckweeds gain worldwide attention thanks to their strong nutritional qualities and biological plasticity.
Collapse
|
23
|
Biswal B, Duncan A, Sun Z. ADA: Advanced data analytics methods for abnormal frequent episodes in the baseline data of ISD. NUCLEAR ENGINEERING AND TECHNOLOGY 2022. [DOI: 10.1016/j.net.2022.07.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
|
24
|
Gomes MAS, Kovaleski JL, Pagani RN, da Silva VL. Machine learning applied to healthcare: a conceptual review. J Med Eng Technol 2022; 46:608-616. [PMID: 35678368 DOI: 10.1080/03091902.2022.2080885] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
The technological inference in procedures applied to healthcare is frequently investigated in order to understand the real contribution to decision-making and clinical improvement. In this context, the theoretical field of machine learning has suitably presented itself. The objective of this research is to identify the main machine learning algorithms used in healthcare through the methodology of a systematic literature review. Considering the time frame of the last twenty years, 173 studies were mined based on established criteria, which allowed the grouping of algorithms into typologies. Supervised Learning, Unsupervised Learning, and Deep Learning were the groups derived from the studies mined, establishing 59 works employed. We expect that this research will stimulate investigations towards machine learning applications in healthcare.
Collapse
Affiliation(s)
| | - João Luiz Kovaleski
- Department of Production Engineering, Federal University of Technology of Paraná, Ponta Grossa, Brazil
| | - Regina Negri Pagani
- Department of Production Engineering, Federal University of Technology of Paraná, Ponta Grossa, Brazil
| | - Vander Luiz da Silva
- Department of Production Engineering, Federal University of Technology of Paraná, Ponta Grossa, Brazil
| |
Collapse
|
25
|
Zhu M, Wang J, Yang X, Zhang Y, Zhang L, Ren H, Wu B, Ye L. A review of the application of machine learning in water quality evaluation. ECO-ENVIRONMENT & HEALTH (ONLINE) 2022; 1:107-116. [PMID: 38075524 PMCID: PMC10702893 DOI: 10.1016/j.eehl.2022.06.001] [Citation(s) in RCA: 45] [Impact Index Per Article: 22.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Revised: 05/19/2022] [Accepted: 06/01/2022] [Indexed: 12/31/2023]
Abstract
With the rapid increase in the volume of data on the aquatic environment, machine learning has become an important tool for data analysis, classification, and prediction. Unlike traditional models used in water-related research, data-driven models based on machine learning can efficiently solve more complex nonlinear problems. In water environment research, models and conclusions derived from machine learning have been applied to the construction, monitoring, simulation, evaluation, and optimization of various water treatment and management systems. Additionally, machine learning can provide solutions for water pollution control, water quality improvement, and watershed ecosystem security management. In this review, we describe the cases in which machine learning algorithms have been applied to evaluate the water quality in different water environments, such as surface water, groundwater, drinking water, sewage, and seawater. Furthermore, we propose possible future applications of machine learning approaches to water environments.
Collapse
Affiliation(s)
- Mengyuan Zhu
- State Key Laboratory of Pollution Control and Resource Reuse, School of Environment, Nanjing University, Nanjing 210023, China
| | - Jiawei Wang
- State Key Laboratory of Pollution Control and Resource Reuse, School of Environment, Nanjing University, Nanjing 210023, China
| | - Xiao Yang
- State Key Laboratory of Pollution Control and Resource Reuse, School of Environment, Nanjing University, Nanjing 210023, China
| | - Yu Zhang
- State Key Laboratory of Pollution Control and Resource Reuse, School of Environment, Nanjing University, Nanjing 210023, China
| | - Linyu Zhang
- State Key Laboratory of Pollution Control and Resource Reuse, School of Environment, Nanjing University, Nanjing 210023, China
| | - Hongqiang Ren
- State Key Laboratory of Pollution Control and Resource Reuse, School of Environment, Nanjing University, Nanjing 210023, China
| | - Bing Wu
- State Key Laboratory of Pollution Control and Resource Reuse, School of Environment, Nanjing University, Nanjing 210023, China
| | - Lin Ye
- State Key Laboratory of Pollution Control and Resource Reuse, School of Environment, Nanjing University, Nanjing 210023, China
| |
Collapse
|
26
|
A Review of Integrative Omic Approaches for Understanding Rice Salt Response Mechanisms. PLANTS 2022; 11:plants11111430. [PMID: 35684203 PMCID: PMC9182744 DOI: 10.3390/plants11111430] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Revised: 05/20/2022] [Accepted: 05/24/2022] [Indexed: 01/04/2023]
Abstract
Soil salinity is one of the most serious environmental challenges, posing a growing threat to agriculture across the world. Soil salinity has a significant impact on rice growth, development, and production. Hence, improving rice varieties’ resistance to salt stress is a viable solution for meeting global food demand. Adaptation to salt stress is a multifaceted process that involves interacting physiological traits, biochemical or metabolic pathways, and molecular mechanisms. The integration of multi-omics approaches contributes to a better understanding of molecular mechanisms as well as the improvement of salt-resistant and tolerant rice varieties. Firstly, we present a thorough review of current knowledge about salt stress effects on rice and mechanisms behind rice salt tolerance and salt stress signalling. This review focuses on the use of multi-omics approaches to improve next-generation rice breeding for salinity resistance and tolerance, including genomics, transcriptomics, proteomics, metabolomics and phenomics. Integrating multi-omics data effectively is critical to gaining a more comprehensive and in-depth understanding of the molecular pathways, enzyme activity and interacting networks of genes controlling salinity tolerance in rice. The key data mining strategies within the artificial intelligence to analyse big and complex data sets that will allow more accurate prediction of outcomes and modernise traditional breeding programmes and also expedite precision rice breeding such as genetic engineering and genome editing.
Collapse
|
27
|
Sales CRG, Molero G, Evans JR, Taylor SH, Joynson R, Furbank RT, Hall A, Carmo-Silva E. Phenotypic variation in photosynthetic traits in wheat grown under field versus glasshouse conditions. JOURNAL OF EXPERIMENTAL BOTANY 2022; 73:3221-3237. [PMID: 35271722 PMCID: PMC9126738 DOI: 10.1093/jxb/erac096] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/05/2021] [Accepted: 03/08/2022] [Indexed: 05/19/2023]
Abstract
Recognition of the untapped potential of photosynthesis to improve crop yields has spurred research to identify targets for breeding. The CO2-fixing enzyme Rubisco is characterized by a number of inefficiencies, and frequently limits carbon assimilation at the top of the canopy, representing a clear target for wheat improvement. Two bread wheat lines with similar genetic backgrounds and contrasting in vivo maximum carboxylation activity of Rubisco per unit leaf nitrogen (Vc,max,25/Narea) determined using high-throughput phenotyping methods were selected for detailed study from a panel of 80 spring wheat lines. Detailed phenotyping of photosynthetic traits in the two lines using glasshouse-grown plants showed no difference in Vc,max,25/Narea determined directly via in vivo and in vitro methods. Detailed phenotyping of glasshouse-grown plants of the 80 wheat lines also showed no correlation between photosynthetic traits measured via high-throughput phenotyping of field-grown plants. Our findings suggest that the complex interplay between traits determining crop productivity and the dynamic environments experienced by field-grown plants needs to be considered in designing strategies for effective wheat crop yield improvement when breeding for particular environments.
Collapse
Affiliation(s)
- Cristina R G Sales
- Lancaster Environment Centre, Lancaster University, Library Avenue, Lancaster LA1 4YQ, UK
- Department of Plant Sciences, University of Cambridge, Downing Street, Cambridge CB2 3EA, UK
- Correspondence: or
| | - Gemma Molero
- International Maize and Wheat Improvement Centre (CIMMYT), Int. Apdo. Postal 6-641, 06600 Mexico, DF, Mexico
- KWS Momont Recherche, 7 rue de Martinval, 59246 Mons-en-Pévèle, France
| | - John R Evans
- ARC Centre of Excellence for Translational Photosynthesis, Research School of Biology, The Australian National University, Canberra ACT 2601, Australia
| | - Samuel H Taylor
- Lancaster Environment Centre, Lancaster University, Library Avenue, Lancaster LA1 4YQ, UK
| | - Ryan Joynson
- Organisms and Ecosystems, Earlham Institute, Norwich Research Park, Norwich NR4 7UG, UK
- Limagrain Europe, CS 3911, 63720 Chappes, France
| | - Robert T Furbank
- ARC Centre of Excellence for Translational Photosynthesis, Research School of Biology, The Australian National University, Canberra ACT 2601, Australia
| | - Anthony Hall
- Organisms and Ecosystems, Earlham Institute, Norwich Research Park, Norwich NR4 7UG, UK
| | - Elizabete Carmo-Silva
- Lancaster Environment Centre, Lancaster University, Library Avenue, Lancaster LA1 4YQ, UK
- Correspondence: or
| |
Collapse
|
28
|
Hesami M, Alizadeh M, Jones AMP, Torkamaneh D. Machine learning: its challenges and opportunities in plant system biology. Appl Microbiol Biotechnol 2022; 106:3507-3530. [PMID: 35575915 DOI: 10.1007/s00253-022-11963-6] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Revised: 03/14/2022] [Accepted: 05/07/2022] [Indexed: 12/25/2022]
Abstract
Sequencing technologies are evolving at a rapid pace, enabling the generation of massive amounts of data in multiple dimensions (e.g., genomics, epigenomics, transcriptomic, metabolomics, proteomics, and single-cell omics) in plants. To provide comprehensive insights into the complexity of plant biological systems, it is important to integrate different omics datasets. Although recent advances in computational analytical pipelines have enabled efficient and high-quality exploration and exploitation of single omics data, the integration of multidimensional, heterogenous, and large datasets (i.e., multi-omics) remains a challenge. In this regard, machine learning (ML) offers promising approaches to integrate large datasets and to recognize fine-grained patterns and relationships. Nevertheless, they require rigorous optimizations to process multi-omics-derived datasets. In this review, we discuss the main concepts of machine learning as well as the key challenges and solutions related to the big data derived from plant system biology. We also provide in-depth insight into the principles of data integration using ML, as well as challenges and opportunities in different contexts including multi-omics, single-cell omics, protein function, and protein-protein interaction. KEY POINTS: • The key challenges and solutions related to the big data derived from plant system biology have been highlighted. • Different methods of data integration have been discussed. • Challenges and opportunities of the application of machine learning in plant system biology have been highlighted and discussed.
Collapse
Affiliation(s)
- Mohsen Hesami
- Department of Plant Agriculture, University of Guelph, Guelph, ON, N1G 2W1, Canada
| | - Milad Alizadeh
- Department of Botany, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada
| | | | - Davoud Torkamaneh
- Département de Phytologie, Université Laval, Québec City, QC, G1V 0A6, Canada. .,Institut de Biologie Intégrative Et Des Systèmes (IBIS), Université Laval, Québec City, QC, G1V 0A6, Canada.
| |
Collapse
|
29
|
Haleem A, Klees S, Schmitt AO, Gültas M. Deciphering Pleiotropic Signatures of Regulatory SNPs in Zea mays L. Using Multi-Omics Data and Machine Learning Algorithms. Int J Mol Sci 2022; 23:5121. [PMID: 35563516 PMCID: PMC9100765 DOI: 10.3390/ijms23095121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2022] [Revised: 04/28/2022] [Accepted: 05/02/2022] [Indexed: 01/25/2023] Open
Abstract
Maize is one of the most widely grown cereals in the world. However, to address the challenges in maize breeding arising from climatic anomalies, there is a need for developing novel strategies to harness the power of multi-omics technologies. In this regard, pleiotropy is an important genetic phenomenon that can be utilized to simultaneously enhance multiple agronomic phenotypes in maize. In addition to pleiotropy, another aspect is the consideration of the regulatory SNPs (rSNPs) that are likely to have causal effects in phenotypic development. By incorporating both aspects in our study, we performed a systematic analysis based on multi-omics data to reveal the novel pleiotropic signatures of rSNPs in a global maize population. For this purpose, we first applied Random Forests and then Markov clustering algorithms to decipher the pleiotropic signatures of rSNPs, based on which hierarchical network models are constructed to elucidate the complex interplay among transcription factors, rSNPs, and phenotypes. The results obtained in our study could help to understand the genetic programs orchestrating multiple phenotypes and thus could provide novel breeding targets for the simultaneous improvement of several agronomic traits.
Collapse
Affiliation(s)
- Ataul Haleem
- Breeding Informatics Group, Department of Animal Sciences, Georg-August University, Margarethe von Wrangell-Weg 7, 37075 Göttingen, Germany; (A.H.); (S.K.); (A.O.S.)
- Faculty of Agriculture, South Westphalia University of Applied Sciences, Lübecker Ring 2, 59494 Soest, Germany
| | - Selina Klees
- Breeding Informatics Group, Department of Animal Sciences, Georg-August University, Margarethe von Wrangell-Weg 7, 37075 Göttingen, Germany; (A.H.); (S.K.); (A.O.S.)
- Center for Integrated Breeding Research (CiBreed), Georg-August University, Carl-Sprengel-Weg 1, 37075 Göttingen, Germany
| | - Armin Otto Schmitt
- Breeding Informatics Group, Department of Animal Sciences, Georg-August University, Margarethe von Wrangell-Weg 7, 37075 Göttingen, Germany; (A.H.); (S.K.); (A.O.S.)
- Center for Integrated Breeding Research (CiBreed), Georg-August University, Carl-Sprengel-Weg 1, 37075 Göttingen, Germany
| | - Mehmet Gültas
- Faculty of Agriculture, South Westphalia University of Applied Sciences, Lübecker Ring 2, 59494 Soest, Germany
- Center for Integrated Breeding Research (CiBreed), Georg-August University, Carl-Sprengel-Weg 1, 37075 Göttingen, Germany
| |
Collapse
|
30
|
Banchhor C, Srinivasu N. A comprehensive study of data intelligence in the context of big data analytics. WEB INTELLIGENCE 2022. [DOI: 10.3233/web-210480] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Modern systems like the Internet of Things, cloud computing, and sensor networks generate a huge data archive. The knowledge extraction from these huge archived data requires modified approaches in algorithm design techniques. The field of study in which analysis of such huge data is carried out is called big data analytics, which helps to optimize the performance with reduced cost and retrieves the information efficiently. The enhancement of traditional data analytics needs to modify to suit big data analytics because it may not manage huge amounts of data. The real thought is how to design the data mining algorithms suitable to handle big data analysis. This paper discusses data analytics at the initial level, to begin with, the insights about the analysis process for big data. Big data analytics have a current research edge in the knowledge extraction field. This paper highlights the challenges and problems associated with big data analysis and provide inner insights into several techniques and methods used.
Collapse
Affiliation(s)
- Chitrakant Banchhor
- School of Computer Engineering and Technology, Dr. Vishwanath Karad World Peace University, Pune, M.S., India
| | - N. Srinivasu
- Department of Computer Science and Engineering, Koneru Lakshmaiah Education Foundation, Vaddeswaram, AP, India
| |
Collapse
|
31
|
Ghosh D, Chakraborty S, Kodamana H, Chakraborty S. Application of machine learning in understanding plant virus pathogenesis: trends and perspectives on emergence, diagnosis, host-virus interplay and management. Virol J 2022; 19:42. [PMID: 35264189 PMCID: PMC8905280 DOI: 10.1186/s12985-022-01767-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2021] [Accepted: 02/27/2022] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND Inclusion of high throughput technologies in the field of biology has generated massive amounts of data in the recent years. Now, transforming these huge volumes of data into knowledge is the primary challenge in computational biology. The traditional methods of data analysis have failed to carry out the task. Hence, researchers are turning to machine learning based approaches for the analysis of high-dimensional big data. In machine learning, once a model is trained with a training dataset, it can be applied on a testing dataset which is independent. In current times, deep learning algorithms further promote the application of machine learning in several field of biology including plant virology. MAIN BODY Plant viruses have emerged as one of the principal global threats to food security due to their devastating impact on crops and vegetables. The emergence of new viral strains and species help viruses to evade the concurrent preventive methods. According to a survey conducted in 2014, plant viruses are anticipated to cause a global yield loss of more than thirty billion USD per year. In order to design effective, durable and broad-spectrum management protocols, it is very important to understand the mechanistic details of viral pathogenesis. The application of machine learning enables precise diagnosis of plant viral diseases at an early stage. Furthermore, the development of several machine learning-guided bioinformatics platforms has primed plant virologists to understand the host-virus interplay better. In addition, machine learning has tremendous potential in deciphering the pattern of plant virus evolution and emergence as well as in developing viable control options. CONCLUSIONS Considering a significant progress in the application of machine learning in understanding plant virology, this review highlights an introductory note on machine learning and comprehensively discusses the trends and prospects of machine learning in the diagnosis of viral diseases, understanding host-virus interplay and emergence of plant viruses.
Collapse
Affiliation(s)
- Dibyendu Ghosh
- Molecular Virology Laboratory, School of Life Sciences, Jawaharlal Nehru University, New Delhi, 110067 India
| | - Srija Chakraborty
- Department of Chemical Engineering, Indian Institute of Technology Delhi, New Delhi, 110016 India
| | - Hariprasad Kodamana
- Department of Chemical Engineering, Indian Institute of Technology Delhi, New Delhi, 110016 India
- School of Artificial Intelligence, Indian Institute of Technology Delhi, New Delhi, 110016 India
| | - Supriya Chakraborty
- Molecular Virology Laboratory, School of Life Sciences, Jawaharlal Nehru University, New Delhi, 110067 India
| |
Collapse
|
32
|
Abstract
Population growth, climate change, and the worldwide COVID-19 pandemic are imposing increasing pressure on global agricultural production. The challenge of increasing crop yield while ensuring sustainable development of environmentally friendly agriculture is a common issue throughout the world. Autonomous systems, sensing technologies, and artificial intelligence offer great opportunities to tackle this issue. In precision agriculture (PA), non-destructive and non-invasive remote and proximal sensing methods have been widely used to observe crops in visible and invisible spectra. Nowadays, the integration of high-performance imagery sensors (e.g., RGB, multispectral, hyperspectral, thermal, and SAR) and unmanned mobile platforms (e.g., satellites, UAVs, and terrestrial agricultural robots) are yielding a huge number of high-resolution farmland images, in which rich crop information is compressed. However, this has been accompanied by challenges, i.e., ways to swiftly and efficiently making full use of these images, and then, to perform fine crop management based on information-supported decision making. In the past few years, deep learning (DL) has shown great potential to reshape many industries because of its powerful capabilities of feature learning from massive datasets, and the agriculture industry is no exception. More and more agricultural scientists are paying attention to applications of deep learning in image-based farmland observations, such as land mapping, crop classification, biotic/abiotic stress monitoring, and yield prediction. To provide an update on these studies, we conducted a comprehensive investigation with a special emphasis on deep learning in multiscale agricultural remote and proximal sensing. Specifically, the applications of convolutional neural network-based supervised learning (CNN-SL), transfer learning (TL), and few-shot learning (FSL) in crop sensing at land, field, canopy, and leaf scales are the focus of this review. We hope that this work can act as a reference for the global agricultural community regarding DL in PA and can inspire deeper and broader research to promote the evolution of modern agriculture.
Collapse
|
33
|
Finding and Characterizing Repeats in Plant Genomes. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2022; 2443:327-385. [PMID: 35037215 DOI: 10.1007/978-1-0716-2067-0_18] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
Plant genomes contain a particularly high proportion of repeated structures of various types. This chapter proposes a guided tour of the available software that can help biologists to scan automatically for these repeats in sequence data or check hypothetical models intended to characterize their structures. Since transposable elements (TEs) are a major source of repeats in plants, many methods have been used or developed for this broad class of sequences. They are representative of the range of tools available for other classes of repeats and we have provided two sections on this topic (for the analysis of genomes or directly of sequenced reads), as well as a selection of the main existing software. It may be hard to keep up with the profusion of proposals in this dynamic field and the rest of the chapter is devoted to the foundations of an efficient search for repeats and more complex patterns. We first introduce the key concepts of the art of indexing and mapping or querying sequences. We end the chapter with the more prospective issue of building models of repeat families. We present the Machine Learning approach first, seeking to build predictors automatically for some families of ET, from a set of sequences known to belong to this family. A second approach, the linguistic (or syntactic) approach, allows biologists to describe themselves and check the validity of models of their favorite repeat family.
Collapse
|
34
|
Ashraf MF, Hou D, Hussain Q, Imran M, Pei J, Ali M, Shehzad A, Anwar M, Noman A, Waseem M, Lin X. Entailing the Next-Generation Sequencing and Metabolome for Sustainable Agriculture by Improving Plant Tolerance. Int J Mol Sci 2022; 23:651. [PMID: 35054836 PMCID: PMC8775971 DOI: 10.3390/ijms23020651] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2021] [Revised: 12/23/2021] [Accepted: 12/29/2021] [Indexed: 02/07/2023] Open
Abstract
Crop production is a serious challenge to provide food for the 10 billion individuals forecasted to live across the globe in 2050. The scientists' emphasize establishing an equilibrium among diversity and quality of crops by enhancing yield to fulfill the increasing demand for food supply sustainably. The exploitation of genetic resources using genomics and metabolomics strategies can help generate resilient plants against stressors in the future. The innovation of the next-generation sequencing (NGS) strategies laid the foundation to unveil various plants' genetic potential and help us to understand the domestication process to unmask the genetic potential among wild-type plants to utilize for crop improvement. Nowadays, NGS is generating massive genomic resources using wild-type and domesticated plants grown under normal and harsh environments to explore the stress regulatory factors and determine the key metabolites. Improved food nutritional value is also the key to eradicating malnutrition problems around the globe, which could be attained by employing the knowledge gained through NGS and metabolomics to achieve suitability in crop yield. Advanced technologies can further enhance our understanding in defining the strategy to obtain a specific phenotype of a crop. Integration among bioinformatic tools and molecular techniques, such as marker-assisted, QTLs mapping, creation of reference genome, de novo genome assembly, pan- and/or super-pan-genomes, etc., will boost breeding programs. The current article provides sequential progress in NGS technologies, a broad application of NGS, enhancement of genetic manipulation resources, and understanding the crop response to stress by producing plant metabolites. The NGS and metabolomics utilization in generating stress-tolerant plants/crops without deteriorating a natural ecosystem is considered a sustainable way to improve agriculture production. This highlighted knowledge also provides useful research that explores the suitable resources for agriculture sustainability.
Collapse
Affiliation(s)
- Muhammad Furqan Ashraf
- State Key Laboratory of Subtropical Silviculture, Zhejiang A&F University, 666 Wusu Street, Lin’An, Hangzhou 311300, China; (M.F.A.); (D.H.); (Q.H.); (J.P.)
| | - Dan Hou
- State Key Laboratory of Subtropical Silviculture, Zhejiang A&F University, 666 Wusu Street, Lin’An, Hangzhou 311300, China; (M.F.A.); (D.H.); (Q.H.); (J.P.)
| | - Quaid Hussain
- State Key Laboratory of Subtropical Silviculture, Zhejiang A&F University, 666 Wusu Street, Lin’An, Hangzhou 311300, China; (M.F.A.); (D.H.); (Q.H.); (J.P.)
| | - Muhammad Imran
- Colleges of Agriculture and Horticulture, South China Agricultural University, Guangzhou 510642, China; (M.I.); (M.W.)
| | - Jialong Pei
- State Key Laboratory of Subtropical Silviculture, Zhejiang A&F University, 666 Wusu Street, Lin’An, Hangzhou 311300, China; (M.F.A.); (D.H.); (Q.H.); (J.P.)
| | - Mohsin Ali
- State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, Beijing 100101, China;
| | - Aamar Shehzad
- Maize Research Station, AARI, Faisalabad 38000, Pakistan;
| | - Muhammad Anwar
- Guangdong Technology Research Center for Marine Algal Bioengineering, Guangdong Key Laboratory of Plant Epigenetics, College of Life Sciences and Oceanography, Shenzhen University, Shenzhen 518055, China;
| | - Ali Noman
- Department of Botany, Government College University, Faisalabad 38000, Pakistan;
| | - Muhammad Waseem
- Colleges of Agriculture and Horticulture, South China Agricultural University, Guangzhou 510642, China; (M.I.); (M.W.)
| | - Xinchun Lin
- State Key Laboratory of Subtropical Silviculture, Zhejiang A&F University, 666 Wusu Street, Lin’An, Hangzhou 311300, China; (M.F.A.); (D.H.); (Q.H.); (J.P.)
| |
Collapse
|
35
|
Yan J, Xu Y, Cheng Q, Jiang S, Wang Q, Xiao Y, Ma C, Yan J, Wang X. LightGBM: accelerated genomically designed crop breeding through ensemble learning. Genome Biol 2021; 22:271. [PMID: 34544450 PMCID: PMC8451137 DOI: 10.1186/s13059-021-02492-y] [Citation(s) in RCA: 50] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2021] [Accepted: 09/09/2021] [Indexed: 11/10/2022] Open
Abstract
LightGBM is an ensemble model of decision trees for classification and regression prediction. We demonstrate its utility in genomic selection-assisted breeding with a large dataset of inbred and hybrid maize lines. LightGBM exhibits superior performance in terms of prediction precision, model stability, and computing efficiency through a series of benchmark tests. We also assess the factors that are essential to ensure the best performance of genomic selection prediction by taking complex scenarios in crop hybrid breeding into account. LightGBM has been implemented as a toolbox, CropGBM, encompassing multiple novel functions and analytical modules to facilitate genomically designed breeding in crops.
Collapse
Affiliation(s)
- Jun Yan
- National Maize Improvement Center, Department of Crop Genomics and Bioinformatics, College of Agronomy and Biotechnology, China Agricultural University, Beijing, 100193 China
| | - Yuetong Xu
- National Maize Improvement Center, Department of Crop Genomics and Bioinformatics, College of Agronomy and Biotechnology, China Agricultural University, Beijing, 100193 China
| | - Qian Cheng
- Key Laboratory of Biology and Genetics Improvement of Maize in Arid Area of Northwest Region, Ministry of Agriculture, Northwest A&F University, Shaanxi, China
| | - Shuqin Jiang
- National Maize Improvement Center, Department of Crop Genomics and Bioinformatics, College of Agronomy and Biotechnology, China Agricultural University, Beijing, 100193 China
| | - Qian Wang
- National Maize Improvement Center, Department of Crop Genomics and Bioinformatics, College of Agronomy and Biotechnology, China Agricultural University, Beijing, 100193 China
| | - Yingjie Xiao
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070 China
| | - Chuang Ma
- Key Laboratory of Biology and Genetics Improvement of Maize in Arid Area of Northwest Region, Ministry of Agriculture, Northwest A&F University, Shaanxi, China
| | - Jianbing Yan
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070 China
| | - Xiangfeng Wang
- National Maize Improvement Center, Department of Crop Genomics and Bioinformatics, College of Agronomy and Biotechnology, China Agricultural University, Beijing, 100193 China
| |
Collapse
|
36
|
Bioinformatics and Machine Learning Approaches to Understand the Regulation of Mobile Genetic Elements. BIOLOGY 2021; 10:biology10090896. [PMID: 34571773 PMCID: PMC8465862 DOI: 10.3390/biology10090896] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Revised: 09/06/2021] [Accepted: 09/07/2021] [Indexed: 11/22/2022]
Abstract
Simple Summary Transposable elements (TEs) are DNA sequences that are, or were, able to move (transpose) within the genome of a single cell. They were first discovered by Barbara McClintock while working on maize, and they make up a large fraction of the genome. Transpositions can result in mutations and they can alter the genome size. Cells regulate the activity of TEs using a variety of mechanisms, such as chemical modifications of DNA and small RNAs. Machine learning (ML) is an interdisciplinary subject that studies computer algorithms that can improve through experience and by the use of data. ML has been successfully applied to a variety of problems in bioinformatics and has exhibited favorable precision and speed. Here, we provide a systematic and guided review on the ML and bioinformatic methods and tools that are used for the analysis of the regulation of TEs. Abstract Transposable elements (TEs, or mobile genetic elements, MGEs) are ubiquitous genetic elements that make up a substantial proportion of the genome of many species. The recent growing interest in understanding the evolution and function of TEs has revealed that TEs play a dual role in genome evolution, development, disease, and drug resistance. Cells regulate TE expression against uncontrolled activity that can lead to developmental defects and disease, using multiple strategies, such as DNA chemical modification, small RNA (sRNA) silencing, chromatin modification, as well as sequence-specific repressors. Advancements in bioinformatics and machine learning approaches are increasingly contributing to the analysis of the regulation mechanisms. A plethora of tools and machine learning approaches have been developed for prediction, annotation, and expression profiling of sRNAs, for methylation analysis of TEs, as well as for genome-wide methylation analysis through bisulfite sequencing data. In this review, we provide a guided overview of the bioinformatic and machine learning state of the art of fields closely associated with TE regulation and function.
Collapse
|
37
|
Gupta R, Kleinjans J, Caiment F. Identifying novel transcript biomarkers for hepatocellular carcinoma (HCC) using RNA-Seq datasets and machine learning. BMC Cancer 2021; 21:962. [PMID: 34445986 PMCID: PMC8394105 DOI: 10.1186/s12885-021-08704-9] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2020] [Accepted: 08/09/2021] [Indexed: 11/26/2022] Open
Abstract
BACKGROUND Hepatocellular carcinoma (HCC) is one of the leading causes of cancer death in the world owing to limitations in its prognosis. The current prognosis approaches include radiological examination and detection of serum biomarkers, however, both have limited efficiency and are ineffective in early prognosis. Due to such limitations, we propose to use RNA-Seq data for evaluating putative higher accuracy biomarkers at the transcript level that could help in early prognosis. METHODS To identify such potential transcript biomarkers, RNA-Seq data for healthy liver and various HCC cell models were subjected to five different machine learning algorithms: random forest, K-nearest neighbor, Naïve Bayes, support vector machine, and neural networks. Various metrics, namely sensitivity, specificity, MCC, informedness, and AUC-ROC (except for support vector machine) were evaluated. The algorithms that produced the highest values for all metrics were chosen to extract the top features that were subjected to recursive feature elimination. Through recursive feature elimination, the least number of features were obtained to differentiate between the healthy and HCC cell models. RESULTS From the metrics used, it is demonstrated that the efficiency of the known protein biomarkers for HCC is comparatively lower than complete transcriptomics data. Among the different machine learning algorithms, random forest and support vector machine demonstrated the best performance. Using recursive feature elimination on top features of random forest and support vector machine three transcripts were selected that had an accuracy of 0.97 and kappa of 0.93. Of the three transcripts, two were protein coding (PARP2-202 and SPON2-203) and one was a non-coding transcript (CYREN-211). Lastly, we demonstrated that these three selected transcripts outperformed randomly taken three transcripts (15,000 combinations), hence were not chance findings, and could then be an interesting candidate for new HCC biomarker development. CONCLUSION Using RNA-Seq data combined with machine learning approaches can aid in finding novel transcript biomarkers. The three biomarkers identified: PARP2-202, SPON2-203, and CYREN-211, presented the highest accuracy among all other transcripts in differentiating the healthy and HCC cell models. The machine learning pipeline developed in this study can be used for any RNA-Seq dataset to find novel transcript biomarkers. Code: www.github.com/rajinder4489/ML_biomarkers.
Collapse
Affiliation(s)
- Rajinder Gupta
- Department of Toxicogenomics, School of Oncology and Developmental Biology (GROW), Maastricht University, Maastricht, The Netherlands
| | - Jos Kleinjans
- Department of Toxicogenomics, School of Oncology and Developmental Biology (GROW), Maastricht University, Maastricht, The Netherlands
| | - Florian Caiment
- Department of Toxicogenomics, School of Oncology and Developmental Biology (GROW), Maastricht University, Maastricht, The Netherlands.
| |
Collapse
|
38
|
The role of 3S in big data quality: a perspective on operational performance indicators using an integrated approach. TQM JOURNAL 2021. [DOI: 10.1108/tqm-02-2021-0062] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Purpose
This study aims to provide insight into the operational factors of big data. The operational indicators/factors are categorized into three functional parts, namely synthesis, speed and significance. Based on these factors, the organization enhances its big data analytics (BDA) performance followed by the selection of data quality dimensions to any organization's success.
Design/methodology/approach
A fuzzy analytic hierarchy process (AHP) based research methodology has been proposed and utilized to assign the criterion weights and to prioritize the identified speed, synthesis and significance (3S) indicators. Further, the PROMETHEE (Preference Ranking Organization METHod for Enrichment of Evaluations) technique has been used to measure the data quality dimensions considering 3S as criteria.
Findings
The effective indicators are identified from the past literature and the model confirmed with industry experts to measure these indicators. The results of this fuzzy AHP model show that the synthesis is recognized as the top positioned and most significant indicator followed by speed and significance are developed as the next level. These operational indicators contribute toward BDA and explore with their sub-categories' priority.
Research limitations/implications
The outcomes of this study will facilitate the businesses that are contemplating this technology as a breakthrough, but it is both a challenge and opportunity for developers and experts. Big data has many risks and challenges related to economic, social, operational and political performance. The understanding of data quality dimensions provides insightful guidance to forecast accurate demand, solve a complex problem and make collaboration in supply chain management performance.
Originality/value
Big data is one of the most popular technology concepts in the market today. People live in a world where every facet of life increasingly depends on big data and data science. This study creates awareness about the role of 3S encountered during big data quality by prioritizing using fuzzy AHP and PROMETHEE.
Collapse
|
39
|
Verbyla AP, De Faveri J, Deery DM, Rebetzke GJ. Modelling temporal genetic and spatio‐temporal residual effects for high‐throughput phenotyping data*. AUST NZ J STAT 2021. [DOI: 10.1111/anzs.12336] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- A. P. Verbyla
- Data61 CSIRO 47 Maunds Rd. Atherton QLD4883Australia
| | - J. De Faveri
- Data61 CSIRO 47 Maunds Rd. Atherton QLD4883Australia
| | - D. M. Deery
- Agriculture and Food CSIRO 2 ‐ 40 Clunies Ross Street Acton ACT2601Australia
| | - G. J. Rebetzke
- Agriculture and Food CSIRO 2 ‐ 40 Clunies Ross Street Acton ACT2601Australia
| |
Collapse
|
40
|
Han Z, Shang X, Shao L, Wang Y, Zhu X, Fang W, Ma Y. Meta-analysis of the effect of expression of MYB transcription factor genes on abiotic stress. PeerJ 2021; 9:e11268. [PMID: 34164229 PMCID: PMC8194419 DOI: 10.7717/peerj.11268] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2020] [Accepted: 03/23/2021] [Indexed: 01/06/2023] Open
Abstract
Background MYB proteins are a large group of transcription factors. The overexpression of MYB genes has been reported to improve abiotic stress tolerance in plant. However, due to the variety of plant species studied and the types of gene donors/recipients, along with different experimental conditions, it is difficult to interpret the roles of MYB in abiotic stress tolerance from published data. Methods Using meta-analysis approach, we investigated the plant characteristics involved in cold, drought, and salt stress in MYB-overexpressing plants and analyzed the degrees of influence on plant performance by experimental variables. Results The results show that two of the four measured plant parameters in cold-stressed plants, two of the six in drought-stressed, and four of the 13 in salt-stressed were significantly impacted by MYB overexpression by 22% or more, and the treatment medium, donor/recipient species, and donor type significantly influence the effects of MYB-overexpression on drought stress tolerance. Also, the donor/recipient species, donor type, and stress duration all significantly affected the extent of MYB-mediated salt stress tolerance. In summary, this study compiles and analyzes the data across studies to help us understand the complex interactions that dictate the efficacy of heterologous MYB expression designed for improved abiotic stress tolerance in plants.
Collapse
Affiliation(s)
- Zhaolan Han
- College of Horticulture, Nanjing Agricultural University, Nanjing, Jiangsu, China
| | - Xiaowen Shang
- College of Horticulture, Nanjing Agricultural University, Nanjing, Jiangsu, China
| | - Lingxia Shao
- College of Horticulture, Nanjing Agricultural University, Nanjing, Jiangsu, China
| | - Ya Wang
- College of Horticulture, Nanjing Agricultural University, Nanjing, Jiangsu, China
| | - Xujun Zhu
- College of Horticulture, Nanjing Agricultural University, Nanjing, Jiangsu, China
| | - Wanping Fang
- College of Horticulture, Nanjing Agricultural University, Nanjing, Jiangsu, China
| | - Yuanchun Ma
- College of Horticulture, Nanjing Agricultural University, Nanjing, Jiangsu, China
| |
Collapse
|
41
|
Gupta C, Ramegowda V, Basu S, Pereira A. Using Network-Based Machine Learning to Predict Transcription Factors Involved in Drought Resistance. Front Genet 2021; 12:652189. [PMID: 34249082 PMCID: PMC8264776 DOI: 10.3389/fgene.2021.652189] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2021] [Accepted: 05/13/2021] [Indexed: 12/13/2022] Open
Abstract
Gene regulatory networks underpin stress response pathways in plants. However, parsing these networks to prioritize key genes underlying a particular trait is challenging. Here, we have built the Gene Regulation and Association Network (GRAiN) of rice (Oryza sativa). GRAiN is an interactive query-based web-platform that allows users to study functional relationships between transcription factors (TFs) and genetic modules underlying abiotic-stress responses. We built GRAiN by applying a combination of different network inference algorithms to publicly available gene expression data. We propose a supervised machine learning framework that complements GRAiN in prioritizing genes that regulate stress signal transduction and modulate gene expression under drought conditions. Our framework converts intricate network connectivity patterns of 2160 TFs into a single drought score. We observed that TFs with the highest drought scores define the functional, structural, and evolutionary characteristics of drought resistance in rice. Our approach accurately predicted the function of OsbHLH148 TF, which we validated using in vitro protein-DNA binding assays and mRNA sequencing loss-of-function mutants grown under control and drought stress conditions. Our network and the complementary machine learning strategy lends itself to predicting key regulatory genes underlying other agricultural traits and will assist in the genetic engineering of desirable rice varieties.
Collapse
Affiliation(s)
- Chirag Gupta
- Department of Crop, Soil, and Environmental Sciences, University of Arkansas, Fayetteville, AR, United States
| | - Venkategowda Ramegowda
- Department of Crop, Soil, and Environmental Sciences, University of Arkansas, Fayetteville, AR, United States
| | - Supratim Basu
- Department of Crop, Soil, and Environmental Sciences, University of Arkansas, Fayetteville, AR, United States
| | - Andy Pereira
- Department of Crop, Soil, and Environmental Sciences, University of Arkansas, Fayetteville, AR, United States
| |
Collapse
|
42
|
Zenda T, Liu S, Dong A, Duan H. Advances in Cereal Crop Genomics for Resilience under Climate Change. Life (Basel) 2021; 11:502. [PMID: 34072447 PMCID: PMC8228855 DOI: 10.3390/life11060502] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2021] [Revised: 05/21/2021] [Accepted: 05/25/2021] [Indexed: 12/12/2022] Open
Abstract
Adapting to climate change, providing sufficient human food and nutritional needs, and securing sufficient energy supplies will call for a radical transformation from the current conventional adaptation approaches to more broad-based and transformative alternatives. This entails diversifying the agricultural system and boosting productivity of major cereal crops through development of climate-resilient cultivars that can sustainably maintain higher yields under climate change conditions, expanding our focus to crop wild relatives, and better exploitation of underutilized crop species. This is facilitated by the recent developments in plant genomics, such as advances in genome sequencing, assembly, and annotation, as well as gene editing technologies, which have increased the availability of high-quality reference genomes for various model and non-model plant species. This has necessitated genomics-assisted breeding of crops, including underutilized species, consequently broadening genetic variation of the available germplasm; improving the discovery of novel alleles controlling important agronomic traits; and enhancing creation of new crop cultivars with improved tolerance to biotic and abiotic stresses and superior nutritive quality. Here, therefore, we summarize these recent developments in plant genomics and their application, with particular reference to cereal crops (including underutilized species). Particularly, we discuss genome sequencing approaches, quantitative trait loci (QTL) mapping and genome-wide association (GWAS) studies, directed mutagenesis, plant non-coding RNAs, precise gene editing technologies such as CRISPR-Cas9, and complementation of crop genotyping by crop phenotyping. We then conclude by providing an outlook that, as we step into the future, high-throughput phenotyping, pan-genomics, transposable elements analysis, and machine learning hold much promise for crop improvements related to climate resilience and nutritional superiority.
Collapse
Affiliation(s)
- Tinashe Zenda
- State Key Laboratory of North China Crop Improvement and Regulation, Hebei Agricultural University, Baoding 071001, China; (S.L.); (A.D.)
- North China Key Laboratory for Crop Germplasm Resources of the Education Ministry, Hebei Agricultural University, Baoding 071001, China
- Department of Crop Genetics and Breeding, College of Agronomy, Hebei Agricultural University, Baoding 071001, China
- Department of Crop Science, Faculty of Agriculture and Environmental Science, Bindura University of Science Education, Bindura P. Bag 1020, Zimbabwe
| | - Songtao Liu
- State Key Laboratory of North China Crop Improvement and Regulation, Hebei Agricultural University, Baoding 071001, China; (S.L.); (A.D.)
- North China Key Laboratory for Crop Germplasm Resources of the Education Ministry, Hebei Agricultural University, Baoding 071001, China
- Department of Crop Genetics and Breeding, College of Agronomy, Hebei Agricultural University, Baoding 071001, China
| | - Anyi Dong
- State Key Laboratory of North China Crop Improvement and Regulation, Hebei Agricultural University, Baoding 071001, China; (S.L.); (A.D.)
- North China Key Laboratory for Crop Germplasm Resources of the Education Ministry, Hebei Agricultural University, Baoding 071001, China
- Department of Crop Genetics and Breeding, College of Agronomy, Hebei Agricultural University, Baoding 071001, China
| | - Huijun Duan
- State Key Laboratory of North China Crop Improvement and Regulation, Hebei Agricultural University, Baoding 071001, China; (S.L.); (A.D.)
- North China Key Laboratory for Crop Germplasm Resources of the Education Ministry, Hebei Agricultural University, Baoding 071001, China
- Department of Crop Genetics and Breeding, College of Agronomy, Hebei Agricultural University, Baoding 071001, China
| |
Collapse
|
43
|
Cortés AJ, López-Hernández F. Harnessing Crop Wild Diversity for Climate Change Adaptation. Genes (Basel) 2021; 12:783. [PMID: 34065368 PMCID: PMC8161384 DOI: 10.3390/genes12050783] [Citation(s) in RCA: 49] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Revised: 04/28/2021] [Accepted: 05/19/2021] [Indexed: 12/20/2022] Open
Abstract
Warming and drought are reducing global crop production with a potential to substantially worsen global malnutrition. As with the green revolution in the last century, plant genetics may offer concrete opportunities to increase yield and crop adaptability. However, the rate at which the threat is happening requires powering new strategies in order to meet the global food demand. In this review, we highlight major recent 'big data' developments from both empirical and theoretical genomics that may speed up the identification, conservation, and breeding of exotic and elite crop varieties with the potential to feed humans. We first emphasize the major bottlenecks to capture and utilize novel sources of variation in abiotic stress (i.e., heat and drought) tolerance. We argue that adaptation of crop wild relatives to dry environments could be informative on how plant phenotypes may react to a drier climate because natural selection has already tested more options than humans ever will. Because isolated pockets of cryptic diversity may still persist in remote semi-arid regions, we encourage new habitat-based population-guided collections for genebanks. We continue discussing how to systematically study abiotic stress tolerance in these crop collections of wild and landraces using geo-referencing and extensive environmental data. By uncovering the genes that underlie the tolerance adaptive trait, natural variation has the potential to be introgressed into elite cultivars. However, unlocking adaptive genetic variation hidden in related wild species and early landraces remains a major challenge for complex traits that, as abiotic stress tolerance, are polygenic (i.e., regulated by many low-effect genes). Therefore, we finish prospecting modern analytical approaches that will serve to overcome this issue. Concretely, genomic prediction, machine learning, and multi-trait gene editing, all offer innovative alternatives to speed up more accurate pre- and breeding efforts toward the increase in crop adaptability and yield, while matching future global food demands in the face of increased heat and drought. In order for these 'big data' approaches to succeed, we advocate for a trans-disciplinary approach with open-source data and long-term funding. The recent developments and perspectives discussed throughout this review ultimately aim to contribute to increased crop adaptability and yield in the face of heat waves and drought events.
Collapse
Affiliation(s)
- Andrés J. Cortés
- Corporación Colombiana de Investigación Agropecuaria AGROSAVIA, C.I. La Selva, Km 7 Vía Rionegro, Las Palmas, Rionegro 054048, Colombia;
- Departamento de Ciencias Forestales, Facultad de Ciencias Agrarias, Universidad Nacional de Colombia, Sede Medellín, Medellín 050034, Colombia
| | - Felipe López-Hernández
- Corporación Colombiana de Investigación Agropecuaria AGROSAVIA, C.I. La Selva, Km 7 Vía Rionegro, Las Palmas, Rionegro 054048, Colombia;
| |
Collapse
|
44
|
Purugganan MD, Jackson SA. Advancing crop genomics from lab to field. Nat Genet 2021; 53:595-601. [PMID: 33958781 DOI: 10.1038/s41588-021-00866-3] [Citation(s) in RCA: 36] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2020] [Accepted: 03/22/2021] [Indexed: 01/23/2023]
Abstract
Crop genomics remains a key element in ensuring scientific progress to secure global food security. It has been two decades since the sequence of the first plant genome, that of Arabidopsis thaliana, was released, and soon after that the draft sequencing of the rice genome was completed. Since then, the genomes of more than 100 crops have been sequenced, plant genome research has expanded across multiple fronts and the next few years promise to bring further advances spurred by the advent of new technologies and approaches. We are likely to see continued innovations in crop genome sequencing, genetic mapping and the acquisition of multiple levels of biological data. There will be exciting opportunities to integrate genome-scale information across multiple scales of biological organization, leading to advances in our mechanistic understanding of crop biological processes, which will, in turn, provide greater impetus for translation of laboratory results to the field.
Collapse
Affiliation(s)
- Michael D Purugganan
- Center for Genomics and Systems Biology, New York University, New York, NY, USA. .,Center for Genomics and Systems Biology, New York University Abu Dhabi, Abu Dhabi, United Arab Emirates.
| | | |
Collapse
|
45
|
Enhancing Coffee Supply Chain towards Sustainable Growth with Big Data and Modern Agricultural Technologies. SUSTAINABILITY 2021. [DOI: 10.3390/su13084593] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Modern agricultural technology management is nowadays crucial in terms of the economy and the global market, while food safety, quality control, and environmentally friendly practices should not be neglected. This review aims to give perspectives on applying big data analytic and modern technologies to increase the efficacy and effectiveness of the coffee supply chain throughout the process. It was revealed that several tools such as wireless sensor networks, cloud computing, Internet of Things (IoT), image processing, convolutional neural networks (CNN), and remote sensing could be implemented in and used to improve the coffee supply chain. Those tools could help in reducing cost as well as time for entrepreneurs and create a reliable service for the customer. It can be summarized that in the long term, these modern technologies will be able to assist coffee business management and ensure the sustainable growth for the coffee industry.
Collapse
|
46
|
Serra N, Di Carlo P, Rea T, Sergi CM. Diffusion modeling of COVID-19 under lockdown. PHYSICS OF FLUIDS (WOODBURY, N.Y. : 1994) 2021; 33:041903. [PMID: 33897246 PMCID: PMC8060971 DOI: 10.1063/5.0044061] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/13/2021] [Accepted: 03/16/2021] [Indexed: 05/26/2023]
Abstract
Viral immune evasion by sequence variation is a significant barrier to severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) vaccine design and coronavirus disease-2019 diffusion under lockdown are unpredictable with subsequent waves. Our group has developed a computational model rooted in physics to address this challenge, aiming to predict the fitness landscape of SARS-CoV-2 diffusion using a variant of the bidimensional Ising model (2DIMV) connected seasonally. The 2DIMV works in a closed system composed of limited interaction subjects and conditioned by only temperature changes. Markov chain Monte Carlo method shows that an increase in temperature implicates reduced virus diffusion and increased mobility, leading to increased virus diffusion.
Collapse
Affiliation(s)
- Nicola Serra
- Departments of Public Health, University Federico II of Naples, 80131 Naples, Italy
| | - Paola Di Carlo
- Department of Health Promotion, Maternal-Childhood, Internal Medicine of Excellence “G. D'Alessandro,” PROMISE, University of Palermo, Palermo 90127, Italy
| | - Teresa Rea
- Departments of Public Health, University Federico II of Naples, 80131 Naples, Italy
| | - Consolato M. Sergi
- Pathology Laboratories, Children's Hospital of Eastern Ontario, University of Ottawa, 401 Smyth Rd., Ottawa, Ontario K1H 8L1, Canada
| |
Collapse
|
47
|
Wang Y, Zhou M, Zou Q, Xu L. Machine learning for phytopathology: from the molecular scale towards the network scale. Brief Bioinform 2021; 22:6204793. [PMID: 33787847 DOI: 10.1093/bib/bbab037] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Revised: 01/09/2021] [Accepted: 01/26/2021] [Indexed: 01/16/2023] Open
Abstract
With the increasing volume of high-throughput sequencing data from a variety of omics techniques in the field of plant-pathogen interactions, sorting, retrieving, processing and visualizing biological information have become a great challenge. Within the explosion of data, machine learning offers powerful tools to process these complex omics data by various algorithms, such as Bayesian reasoning, support vector machine and random forest. Here, we introduce the basic frameworks of machine learning in dissecting plant-pathogen interactions and discuss the applications and advances of machine learning in plant-pathogen interactions from molecular to network biology, including the prediction of pathogen effectors, plant disease resistance protein monitoring and the discovery of protein-protein networks. The aim of this review is to provide a summary of advances in plant defense and pathogen infection and to indicate the important developments of machine learning in phytopathology.
Collapse
Affiliation(s)
- Yansu Wang
- Postdoctoral Innovation Practice Base, Shenzhen Polytechnic, China
| | | | - Quan Zou
- University of Electronic Science and Technology of China
| | - Lei Xu
- Shenzhen Polytechnic, China
| |
Collapse
|
48
|
Daley SK, Cordell GA. Natural Products, the Fourth Industrial Revolution, and the Quintuple Helix. Nat Prod Commun 2021. [DOI: 10.1177/1934578x211003029] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Abstract
The profound interconnectedness of the sciences and technologies embodied in the Fourth Industrial Revolution is discussed in terms of the global role of natural products, and how that interplays with the development of sustainable and climate-conscious practices of cyberecoethnopharmacolomics within the Quintuple Helix for the promotion of a healthier planet and society.
Collapse
Affiliation(s)
| | - Geoffrey A. Cordell
- Natural Products Inc., Evanston, IL, USA
- Department of Pharmaceutics, College of Pharmacy, University of Florida, Gainesville, FL, USA
| |
Collapse
|
49
|
Volpato L, Pinto F, González-Pérez L, Thompson IG, Borém A, Reynolds M, Gérard B, Molero G, Rodrigues FA. High Throughput Field Phenotyping for Plant Height Using UAV-Based RGB Imagery in Wheat Breeding Lines: Feasibility and Validation. FRONTIERS IN PLANT SCIENCE 2021; 12:591587. [PMID: 33664755 PMCID: PMC7921806 DOI: 10.3389/fpls.2021.591587] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/04/2020] [Accepted: 01/25/2021] [Indexed: 05/07/2023]
Abstract
Plant height (PH) is an essential trait in the screening of most crops. While in crops such as wheat, medium stature helps reduce lodging, tall plants are preferred to increase total above-ground biomass. PH is an easy trait to measure manually, although it can be labor-intense depending on the number of plots. There is an increasing demand for alternative approaches to estimate PH in a higher throughput mode. Crop surface models (CSMs) derived from dense point clouds generated via aerial imagery could be used to estimate PH. This study evaluates PH estimation at different phenological stages using plot-level information from aerial imaging-derived 3D CSM in wheat inbred lines during two consecutive years. Multi-temporal and high spatial resolution images were collected by fixed-wing (P l a t F W ) and multi-rotor (P l a t M R ) unmanned aerial vehicle (UAV) platforms over two wheat populations (50 and 150 lines). The PH was measured and compared at four growth stages (GS) using ground-truth measurements (PHground) and UAV-based estimates (PHaerial). The CSMs generated from the aerial imagery were validated using ground control points (GCPs) as fixed reference targets at different heights. The results show that PH estimations using P l a t F W were consistent with those obtained from P l a t M R , showing some slight differences due to image processing settings. The GCPs heights derived from CSM showed a high correlation and low error compared to their actual heights (R 2 ≥ 0.90, RMSE ≤ 4 cm). The coefficient of determination (R 2) between PHground and PHaerial at different GS ranged from 0.35 to 0.88, and the root mean square error (RMSE) from 0.39 to 4.02 cm for both platforms. In general, similar and higher heritability was obtained using PHaerial across different GS and years and ranged according to the variability, and environmental error of the PHground observed (0.06-0.97). Finally, we also observed high Spearman rank correlations (0.47-0.91) and R 2 (0.63-0.95) of PHaerial adjusted and predicted values against PHground values. This study provides an example of the use of UAV-based high-resolution RGB imagery to obtain time-series estimates of PH, scalable to tens-of-thousands of plots, and thus suitable to be applied in plant wheat breeding trials.
Collapse
Affiliation(s)
- Leonardo Volpato
- Department of Agronomy, Federal University of Viçosa, Viçosa, Brazil
| | - Francisco Pinto
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
| | | | | | - Aluízio Borém
- Department of Agronomy, Federal University of Viçosa, Viçosa, Brazil
| | - Matthew Reynolds
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
| | - Bruno Gérard
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
| | - Gemma Molero
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
- KWS Momont Recherche, Mons-en-Pevele, France
| | | |
Collapse
|
50
|
Bonidia RP, Sampaio LDH, Domingues DS, Paschoal AR, Lopes FM, de Carvalho ACPLF, Sanches DS. Feature extraction approaches for biological sequences: a comparative study of mathematical features. Brief Bioinform 2021; 22:6135010. [PMID: 33585910 DOI: 10.1093/bib/bbab011] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2020] [Revised: 12/13/2020] [Accepted: 01/07/2021] [Indexed: 11/14/2022] Open
Abstract
As consequence of the various genomic sequencing projects, an increasing volume of biological sequence data is being produced. Although machine learning algorithms have been successfully applied to a large number of genomic sequence-related problems, the results are largely affected by the type and number of features extracted. This effect has motivated new algorithms and pipeline proposals, mainly involving feature extraction problems, in which extracting significant discriminatory information from a biological set is challenging. Considering this, our work proposes a new study of feature extraction approaches based on mathematical features (numerical mapping with Fourier, entropy and complex networks). As a case study, we analyze long non-coding RNA sequences. Moreover, we separated this work into three studies. First, we assessed our proposal with the most addressed problem in our review, e.g. lncRNA and mRNA; second, we also validate the mathematical features in different classification problems, to predict the class of lncRNA, e.g. circular RNAs sequences; third, we analyze its robustness in scenarios with imbalanced data. The experimental results demonstrated three main contributions: first, an in-depth study of several mathematical features; second, a new feature extraction pipeline; and third, its high performance and robustness for distinct RNA sequence classification. Availability: https://github.com/Bonidia/FeatureExtraction_BiologicalSequences.
Collapse
Affiliation(s)
- Robson P Bonidia
- Department of Computer Science, Bioinformatics Graduate Program (PPGBIOINFO), Federal University of Technology - Paraná, UTFPR, Campus Cornélio Procópio, 86300-000, Brazil.,Institute of Mathematics and Computer Sciences, University of São Paulo - USP, São Carlos, 13566-590, Brazil
| | - Lucas D H Sampaio
- Department of Computer Science, Bioinformatics Graduate Program (PPGBIOINFO), Federal University of Technology - Paraná, UTFPR, Campus Cornélio Procópio, 86300-000, Brazil
| | - Douglas S Domingues
- Department of Computer Science, Bioinformatics Graduate Program (PPGBIOINFO), Federal University of Technology - Paraná, UTFPR, Campus Cornélio Procópio, 86300-000, Brazil.,Department of Botany, Institute of Biosciences, São Paulo State University (UNESP), Rio Claro 13506-900, Brazil
| | - Alexandre R Paschoal
- Department of Computer Science, Bioinformatics Graduate Program (PPGBIOINFO), Federal University of Technology - Paraná, UTFPR, Campus Cornélio Procópio, 86300-000, Brazil
| | - Fabrício M Lopes
- Department of Computer Science, Bioinformatics Graduate Program (PPGBIOINFO), Federal University of Technology - Paraná, UTFPR, Campus Cornélio Procópio, 86300-000, Brazil
| | - André C P L F de Carvalho
- Institute of Mathematics and Computer Sciences, University of São Paulo - USP, São Carlos, 13566-590, Brazil
| | - Danilo S Sanches
- Department of Computer Science, Bioinformatics Graduate Program (PPGBIOINFO), Federal University of Technology - Paraná, UTFPR, Campus Cornélio Procópio, 86300-000, Brazil
| |
Collapse
|