1
|
Yao Z, Yao M, Wang C, Li K, Guo J, Xiao Y, Yan J, Liu J. GEFormer: A genotype-environment interaction-based genomic prediction method that integrates the gating multilayer perceptron and linear attention mechanisms. MOLECULAR PLANT 2025; 18:527-549. [PMID: 39881541 DOI: 10.1016/j.molp.2025.01.020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/20/2024] [Revised: 12/08/2024] [Accepted: 01/25/2025] [Indexed: 01/31/2025]
Abstract
The integration of genotypic and environmental data can enhance genomic prediction accuracy for crop field traits. Existing genomic prediction methods fail to consider environmental factors and the real growth environments of crops, resulting in low genomic prediction accuracy. In this work, we developed GEFormer, a genotype-environment interaction genomic prediction method that integrates gating multilayer perceptron (gMLP) and linear attention mechanisms. First, GEFormer uses gMLP to extract local and global features among SNPs. Then, Omni-dimensional Dynamic Convolution is used to extract the dynamic and comprehensive features of multiple environmental factors within each day, taking into consideration the real growth pattern of crops. A linear attention mechanism is used to capture the temporal features of environmental changes. Finally, GEFormer uses a gating mechanism to effectively fuse the genomic and environmental features. We examined the accuracy of GEFormer for predicting important agronomic traits of maize, rice, and wheat under three experimental scenarios: untested genotypes in tested environments, tested genotypes in untested environments, and untested genotypes in untested environments. The results showed that GEFormer outperforms six cutting-edge statistical learning methods and four machine learning methods, especially with great advantages under the scenario of untested genotypes in untested environments. In addition, we used GEFormer for three real-world breeding applications: phenotype prediction in unknown environments, hybrid phenotype prediction using an inbred population, and cross-population phenotype prediction. The results showed that GEFormer had better prediction performance in actual breeding scenarios and could be used to assist in crop breeding.
Collapse
Affiliation(s)
- Zhou Yao
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China; Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan 430070, China; College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Mengting Yao
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Chuang Wang
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China; Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan 430070, China; College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Ke Li
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Junhao Guo
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Yingjie Xiao
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China; Hubei Hongshan Laboratory, Wuhan 430070, China
| | - Jianbing Yan
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China; Hubei Hongshan Laboratory, Wuhan 430070, China
| | - Jianxiao Liu
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China; Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan 430070, China; College of Informatics, Huazhong Agricultural University, Wuhan 430070, China; Hubei Hongshan Laboratory, Wuhan 430070, China.
| |
Collapse
|
2
|
Chang B, Geng Z, Guo T, Mei J, Xiong C, Chen P, Liu M, Niu C. Comprehensive clinical scale-based machine learning model for predicting subthalamic nucleus deep brain stimulation outcomes in Parkinson's disease. Neurosurg Rev 2025; 48:266. [PMID: 39994077 DOI: 10.1007/s10143-025-03424-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2024] [Revised: 02/11/2025] [Accepted: 02/18/2025] [Indexed: 02/26/2025]
Abstract
Parkinson's Disease (PD) is a growing burden with varied clinical manifestations and responses to Subthalamic Nucleus Deep Brain Stimulation (STN-DBS). At present, there is no effective and simple machine learning model based on comprehensive clinical scales to predict the improvement in motor symptoms of PD treated with DBS. A total of 647 PD patients from the First Affiliated Hospital of University of Science and Technology of China were enrolled retrospectively. LightGBM machine learning algorithm was used for modeling, and 123 PD patients from Qingdao Municipal Hospital were used as external data to verify the effectiveness of the model. The study was registered in the Chinese Clinical Trial Registry with the registration number of ChiCTR2300073955. The LightGBM model outperformed others, demonstrating an internal test set AUC of 0.874 (95%CI [0.822-0.927]) and an average AUC of 0.921 ± 0.03 during cross-validation. The external validation yielded an AUC of 0.769 (95% CI[0.685-0.853]). Key predictive variables identified include MMSE scores, HAMA scores, years of education, medication improvement rate, and preoperative UPDRS scores. The results indicate that the LightGBM model based on the top seven influencing factors is a promising tool for predicting the improvement in motor symptoms of PD after 1 year of STN-DBS.
Collapse
Affiliation(s)
- Bowen Chang
- Department of Neurosurgery, The First Affiliated Hospital of USTC, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, People's Republic of China
| | - Zhi Geng
- Department of Neurology, The First Affiliated Hospital of Anhui Medical University, Anhui Medical University, Hefei, People's Republic of China
| | - Tao Guo
- Center for Biomedical Imaging, University of Science and Technology of China, Hefei, People's Republic of China
| | - Jiaming Mei
- Department of Neurosurgery, The First Affiliated Hospital of USTC, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, People's Republic of China
| | - Chi Xiong
- Department of Neurosurgery, The First Affiliated Hospital of USTC, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, People's Republic of China
| | - Peng Chen
- Department of Neurosurgery, The First Affiliated Hospital of USTC, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, People's Republic of China
| | - Mingxing Liu
- Department of Neurosurgery, Qingdao Municipal Hospital (Headquarters), Qingdao, People's Republic of China
| | - Chaoshi Niu
- Department of Neurosurgery, The First Affiliated Hospital of USTC, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, People's Republic of China.
| |
Collapse
|
3
|
Gu W, Huang Z, Fan Y, Li T, Yu X, Chen Z, Hu Y, Li A, Zhang F, Fu Y. Peripheral blood microbiome signature and Mycobacterium tuberculosis-derived rsRNA as diagnostic biomarkers for tuberculosis in human. J Transl Med 2025; 23:204. [PMID: 39972378 PMCID: PMC11837313 DOI: 10.1186/s12967-025-06190-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2024] [Accepted: 01/29/2025] [Indexed: 02/21/2025] Open
Abstract
BACKGROUND Tuberculosis (TB) is a major global health issue. Early diagnosis of TB is still a challenge. Studies are seeking non-sputum biomarker-based TB test. Emerging evidence indicates potential significance of blood microbiome signatures for diseases. However, blood microbiome RNA profiles are unknown in TB. We aimed to characterize the blood microbiome of TB patients and identify Mycobacterium tuberculosis (Mtb) genome-derived small RNA molecules to serve as diagnostic biomarkers for TB. METHODS RNA sequencing data of the blood from TB patients and healthy controls were retrieved from the NCBI-SRA database for analyzing the blood microbiome and identifying rRNA-derived small RNA (rsRNA) of Mtb. Small RNA-seq was performed on plasma exosomes from TB patients and healthy controls. The levels of the candidate Mtb rsRNAs were determined by real-time quantitative reverse transcription PCR (RT-qPCR) on plasma from a separate cohort of 73 TB patients and 62 healthy controls. RESULTS The blood microbiome of TB patients consisted of RNA signals from bacteria, fungi, archaea, and viruses, with bacteria accounting for more than 97% of the total. Reduced blood microbial diversity and abundance of 6 Mycobacterium-associated bacterial genera, including Mycobacterium, Priestia, Nocardioides, Agrobacterium, Bradyrhizobium, and Escherichia, were significantly altered in the blood of TB patients. A diagnostic model for TB based on the 6 genera achieved an area under the curve (AUC) of 0.8945. rsRNAs mapped to the Mtb genome were identified from blood and plasma exosomes of TB patients. RT-qPCR results showed that 2 Mtb-derived rsRNAs, 16 S-L1 and 16 S-L2, could be used as diagnostic biomarkers to differentiate TB patients from healthy controls, with a high co-diagnostic efficacy (AUC = 0.7197). CONCLUSIONS A panel of blood microbiome signatures and Mtb-derived rsRNAs can serve as blood biomarkers for TB diagnosis.
Collapse
Affiliation(s)
- Wei Gu
- Department of Microbiology, School of Basic Medical Sciences, WU Lien-Teh Institute, Harbin Medical University, Harbin, China
| | - Zhigang Huang
- Department of Microbiology, School of Basic Medical Sciences, WU Lien-Teh Institute, Harbin Medical University, Harbin, China
| | - Yunfan Fan
- Department of Microbiology, School of Basic Medical Sciences, WU Lien-Teh Institute, Harbin Medical University, Harbin, China
- Department of Clinical Laboratory, Chongqing Public Health Medical Center, Chongqing, China
| | - Ting Li
- Department of Microbiology, School of Basic Medical Sciences, WU Lien-Teh Institute, Harbin Medical University, Harbin, China
- Department of Clinical Laboratory, Chongqing Public Health Medical Center, Chongqing, China
| | - Xinyuan Yu
- Department of Microbiology, School of Basic Medical Sciences, WU Lien-Teh Institute, Harbin Medical University, Harbin, China
| | - Zhiyuan Chen
- Department of Microbiology, School of Basic Medical Sciences, WU Lien-Teh Institute, Harbin Medical University, Harbin, China
| | - Yan Hu
- Department of Microbiology, School of Basic Medical Sciences, WU Lien-Teh Institute, Harbin Medical University, Harbin, China
| | - Aimei Li
- Department of Microbiology, School of Basic Medical Sciences, WU Lien-Teh Institute, Harbin Medical University, Harbin, China
- Heilongjiang Provincial Key Laboratory of Infection and Immunity, Harbin Medical University, Harbin, China
| | - Fengmin Zhang
- Department of Microbiology, School of Basic Medical Sciences, WU Lien-Teh Institute, Harbin Medical University, Harbin, China
- Heilongjiang Provincial Key Laboratory of Infection and Immunity, Harbin Medical University, Harbin, China
| | - Yingmei Fu
- Department of Microbiology, School of Basic Medical Sciences, WU Lien-Teh Institute, Harbin Medical University, Harbin, China.
- Heilongjiang Provincial Key Laboratory of Infection and Immunity, Harbin Medical University, Harbin, China.
| |
Collapse
|
4
|
Wang J, Chai J, Chen L, Zhang T, Long X, Diao S, Chen D, Guo Z, Tang G, Wu P. Enhancing Genomic Prediction Accuracy of Reproduction Traits in Rongchang Pigs Through Machine Learning. Animals (Basel) 2025; 15:525. [PMID: 40003007 PMCID: PMC11852217 DOI: 10.3390/ani15040525] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2025] [Revised: 02/02/2025] [Accepted: 02/10/2025] [Indexed: 02/27/2025] Open
Abstract
The increasing volume of genome sequencing data presents challenges for traditional genome-wide prediction methods in handling large datasets. Machine learning (ML) techniques, which can process high-dimensional data, offer promising solutions. This study aimed to find a genome-wide prediction method for local pig breeds, using 10 datasets with varying SNP densities derived from imputed sequencing data of 515 Rongchang pigs and the Pig QTL database. Three reproduction traits-litter weight, total number of piglets born, and number of piglets born alive-were predicted using six traditional methods and five ML methods, including kernel ridge regression, random forest, Gradient Boosting Decision Tree (GBDT), Light Gradient Boosting Machine, and Adaboost. The methods' efficacy was evaluated using fivefold cross-validation and independent tests. The predictive performance of both traditional and ML methods initially increased with SNP density, peaking at 800-900 k SNPs. ML methods outperformed traditional ones, showing improvements of 0.4-4.1%. The integration of GWAS and the Pig QTL database enhanced ML robustness. ML models exhibited superior generalizability, with high correlation coefficients (0.935-0.998) between cross-validation and independent test results. GBDT and random forest showed high computational efficiency, making them promising methods for genomic prediction in livestock breeding.
Collapse
Affiliation(s)
- Junge Wang
- Farm Animal Genetic Resources Exploration and Innovation Key Laboratory of Sichuan Province, Sichuan Agricultural University, Chengdu 611130, China; (J.W.); (D.C.)
| | - Jie Chai
- Chongqing Academy of Animal Sciences, Chongqing 402460, China; (J.C.); (L.C.); (T.Z.); (X.L.); (S.D.); (Z.G.)
- National Center of Technology Innovation for Pigs, Chongqing 402460, China
| | - Li Chen
- Chongqing Academy of Animal Sciences, Chongqing 402460, China; (J.C.); (L.C.); (T.Z.); (X.L.); (S.D.); (Z.G.)
- National Center of Technology Innovation for Pigs, Chongqing 402460, China
| | - Tinghuan Zhang
- Chongqing Academy of Animal Sciences, Chongqing 402460, China; (J.C.); (L.C.); (T.Z.); (X.L.); (S.D.); (Z.G.)
- National Center of Technology Innovation for Pigs, Chongqing 402460, China
| | - Xi Long
- Chongqing Academy of Animal Sciences, Chongqing 402460, China; (J.C.); (L.C.); (T.Z.); (X.L.); (S.D.); (Z.G.)
- National Center of Technology Innovation for Pigs, Chongqing 402460, China
| | - Shuqi Diao
- Chongqing Academy of Animal Sciences, Chongqing 402460, China; (J.C.); (L.C.); (T.Z.); (X.L.); (S.D.); (Z.G.)
- National Center of Technology Innovation for Pigs, Chongqing 402460, China
| | - Dong Chen
- Farm Animal Genetic Resources Exploration and Innovation Key Laboratory of Sichuan Province, Sichuan Agricultural University, Chengdu 611130, China; (J.W.); (D.C.)
| | - Zongyi Guo
- Chongqing Academy of Animal Sciences, Chongqing 402460, China; (J.C.); (L.C.); (T.Z.); (X.L.); (S.D.); (Z.G.)
- National Center of Technology Innovation for Pigs, Chongqing 402460, China
| | - Guoqing Tang
- Farm Animal Genetic Resources Exploration and Innovation Key Laboratory of Sichuan Province, Sichuan Agricultural University, Chengdu 611130, China; (J.W.); (D.C.)
| | - Pingxian Wu
- Chongqing Academy of Animal Sciences, Chongqing 402460, China; (J.C.); (L.C.); (T.Z.); (X.L.); (S.D.); (Z.G.)
- National Center of Technology Innovation for Pigs, Chongqing 402460, China
| |
Collapse
|
5
|
Chao D, Wang H, Wan F, Yan S, Fang W, Yang Y. MtCro: multi-task deep learning framework improves multi-trait genomic prediction of crops. PLANT METHODS 2025; 21:12. [PMID: 39910577 DOI: 10.1186/s13007-024-01321-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/07/2024] [Accepted: 12/26/2024] [Indexed: 02/07/2025]
Abstract
Genomic Selection (GS) predicts traits using genome-wide markers, speeding up genetic progress and enhancing breeding efficiency. Recent emphasis has been placed on deep learning models to enhance prediction accuracy. However, current deep learning models focus on learning specific phenotypes for the given task, overlooking the inter-correlations among different phenotypes. In response, we introduce MtCro, a multi-task learning approach that simultaneously captures diverse plant phenotypes within a shared parameter space. Extensive experiments reveal that MtCro outperforms mainstream models, including DNNGP and SoyDNGP, with performance gains of 1-9% on the Wheat2000 dataset, 1-8% on Wheat599, and 1-3% on Maize8652. Furthermore, comparative analysis shows a consistent 2-3% improvement in multi-phenotype predictions, emphasizing the impact of inter-phenotype correlations on accuracy. By leveraging multi-task learning, MtCro efficiently captures diverse plant phenotypes, enhancing both model training efficiency and prediction accuracy, ultimately accelerating the progress of plant genetic breeding. Our code is available on https://github.com/chaodian12/mtcro .
Collapse
Affiliation(s)
- Dian Chao
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, 210094, China
| | - Hao Wang
- Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
| | - Fengqiang Wan
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, 210094, China
| | - Shen Yan
- Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100081, China.
| | - Wei Fang
- Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100081, China.
| | - Yang Yang
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, 210094, China.
| |
Collapse
|
6
|
Jiang C, Kan J, Gao G, Dockter C, Li C, Wu W, Yang P, Stein N. Barley2035: A decadal vision for barley research and breeding. MOLECULAR PLANT 2025; 18:195-218. [PMID: 39690737 DOI: 10.1016/j.molp.2024.12.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/23/2024] [Revised: 12/04/2024] [Accepted: 12/12/2024] [Indexed: 12/19/2024]
Abstract
Barley (Hordeum vulgare ssp. vulgare) is one of the oldest founder crops in human civilization and has been widely dispersed across the globe to support human society as a livestock feed and a raw material for the brewing industries. Since the early half of the 20th century, it has been used for innovative research on cytogenetics, biochemistry, and genetics, facilitated by its mode of reproduction through self-pollination and its true diploid status, which have contributed to the accumulation of numerous germplasm and mutant resources. In the era of molecular genomics and biology, a multitude of barley genes and their related regulatory mechanisms have been identified and functionally validated, providing a paradigm for equivalent studies in other Triticeae crops. This review highlights important advances on barley research over the past decade, focusing mainly on genomics and genomics-assisted germplasm exploration, genetic dissection of developmental and adaptation-related traits, and the complex dynamics of yield and quality formation. In the coming decade, the prospect of integrating these innovations in barley research and breeding shows great promise. Barley is proposed as a reference Triticeae crop for the discovery and functional validation of new genes and the dissection of their molecular mechanisms. The application of precise genome editing as well as genomic prediction and selection, further enhanced by artificial intelligence-based tools and applications, is expected to promote barley improvement to efficiently meet the evolving global demands for this important crop.
Collapse
Affiliation(s)
- Congcong Jiang
- State Key Laboratory of Crop Gene Resources and Breeding/Key Laboratory of Grain Crop Genetic Resources Evaluation and Utilization (MARA)/Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Jinhong Kan
- State Key Laboratory of Crop Gene Resources and Breeding/Key Laboratory of Grain Crop Genetic Resources Evaluation and Utilization (MARA)/Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Guangqi Gao
- State Key Laboratory of Crop Gene Resources and Breeding/Key Laboratory of Grain Crop Genetic Resources Evaluation and Utilization (MARA)/Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Christoph Dockter
- Carlsberg Research Laboratory, J.C. Jacobsens Gade 4, 1799 Copenhagen, Denmark
| | - Chengdao Li
- Western Crop Genetic Alliance, Murdoch University, Perth, WA 6150, Australia
| | - Wenxue Wu
- State Key Laboratory of Crop Gene Resources and Breeding/Key Laboratory of Grain Crop Genetic Resources Evaluation and Utilization (MARA)/Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Ping Yang
- State Key Laboratory of Crop Gene Resources and Breeding/Key Laboratory of Grain Crop Genetic Resources Evaluation and Utilization (MARA)/Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China.
| | - Nils Stein
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), 06466 Seeland, Germany; Crop Plant Genetics, Institute of Agricultural and Nutritional Sciences, Martin-Luther-University of Halle-Wittenberg, Halle (Saale), Germany.
| |
Collapse
|
7
|
Jianyao Y, Yuan H, Su G, Wang J, Weng W, Zhang X. Machine learning-enhanced high-resolution exposure assessment of ultrafine particles. Nat Commun 2025; 16:1209. [PMID: 39885206 PMCID: PMC11782512 DOI: 10.1038/s41467-025-56581-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2024] [Accepted: 01/20/2025] [Indexed: 02/01/2025] Open
Abstract
Ultrafine particles (UFPs) under 100 nm pose significant health risks inadequately addressed by traditional mass-based metrics. The WHO emphasizes particle number concentration (PNC) for assessing UFP exposure, but large-scale evaluations remain scarce. In this study, we developed a stacking-based machine learning framework integrating data-driven and physical-chemical models for a national-scale UFP exposure assessment at 1 km spatial and 1-hour temporal resolutions, leveraging long-term standardized PNC measurements in Switzerland. Approximately 20% (1.7 million) of the Swiss population experiences high UFP exposure exceeding an annual mean of 104 particles‧cm-3, with a national average of (9.3 ± 4.7)×103 particles‧cm-3, ranging from (5.5 ± 2.3)×103 (rural) to (1.4 ± 0.5)×104 particles‧cm-3 (urban). A nonlinear relationship is identified between the WHO-recommended 1-hour and 24-hour exposure reference levels, suggesting their non-interchangeability. UFP spatial heterogeneity, quantified by coefficient of variation, ranges from 4.7 ± 4.2 (urban) to 13.8 ± 15.1 (rural) times greater than PM2.5. These findings provide crucial insights for the development of future UFP standards.
Collapse
Affiliation(s)
- Yudie Jianyao
- School of Safety Science, Tsinghua University, Beijing, China
- Institute of Public Safety Research, Tsinghua University, Beijing, China
| | - Hongyong Yuan
- School of Safety Science, Tsinghua University, Beijing, China
- Institute of Public Safety Research, Tsinghua University, Beijing, China
| | - Guofeng Su
- School of Safety Science, Tsinghua University, Beijing, China
- Institute of Public Safety Research, Tsinghua University, Beijing, China
| | - Jing Wang
- Institute of Environmental Engineering (IfU), ETH Zürich, Zürich, Switzerland
- Laboratory for Advanced Analytical Technologies, Empa, Dübendorf, Switzerland
| | - Wenguo Weng
- School of Safety Science, Tsinghua University, Beijing, China
- Institute of Public Safety Research, Tsinghua University, Beijing, China
| | - Xiaole Zhang
- School of Safety Science, Tsinghua University, Beijing, China.
- Institute of Public Safety Research, Tsinghua University, Beijing, China.
| |
Collapse
|
8
|
Li W, Zhang M, Fan J, Yang Z, Peng J, Zhang J, Lan Y, Chai M. Analysis of the genetic basis of fiber-related traits and flowering time in upland cotton using machine learning. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2025; 138:36. [PMID: 39853381 DOI: 10.1007/s00122-025-04821-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/09/2024] [Accepted: 01/11/2025] [Indexed: 01/26/2025]
Abstract
Cotton is an important crop for fiber production, but the genetic basis underlying key agronomic traits, such as fiber quality and flowering days, remains complex. While machine learning (ML) has shown great potential in uncovering the genetic architecture of complex traits in other crops, its application in cotton has been limited. Here, we applied five machine learning models-AdaBoost, Gradient Boosting Regressor, LightGBM, Random Forest, and XGBoost-to identify loci associated with fiber quality and flowering days in cotton. We compared two SNP dataset down-sampling methods for model training and found that selecting SNPs with an Fscale value greater than 0 outperformed randomly selected SNPs in terms of model accuracy. We further performed machine learning quantitative trait loci (mlQTLs) analysis for 13 traits related to fiber quality and flowering days. These mlQTLs were then compared to those identified through genome-wide association studies (GWAS), revealing that the machine learning approach not only confirmed known loci but also identified novel QTLs. Additionally, we evaluated the effect of population size on model accuracy and found that larger population sizes resulted in better predictive performance. Finally, we proposed candidate genes for the identified mlQTLs, including two argonaute 5 proteins, Gh_A09G104100 and Gh_A09G104400, for the FL3/FS2 locus, as well as GhFLA17 and Syntaxin-121 (Gh_D09G143700) for the FSD09_2/FED09_2 locus. Our findings demonstrate the efficacy of machine learning in enhancing the identification of genetic loci in cotton, providing valuable insights for improving cotton breeding strategies.
Collapse
Affiliation(s)
- Weinan Li
- Nanfan Research Institute, Chinese Academy of Agricultural Sciences, Sanya, 572024, Hainan, China
- College of Electronic Engineering (College of Artificial Intelligence), South China Agricultural University, Guangzhou, 510642, Guangdong, China
| | - Mingjun Zhang
- State Key Laboratory of Cotton Bio-Breeding and Integrated Utilization, Institute of Cotton Research of Chinese Academy of Agricultural Sciences, Anyang, 455000, Henan, China
| | - Jingchao Fan
- Agricultural Information Institute of Chinese Academy of Agricultural Sciences, Beijing, 100081, China
| | - Zhaoen Yang
- State Key Laboratory of Cotton Bio-Breeding and Integrated Utilization, Institute of Cotton Research of Chinese Academy of Agricultural Sciences, Anyang, 455000, Henan, China
| | - Jun Peng
- Nanfan Research Institute, Chinese Academy of Agricultural Sciences, Sanya, 572024, Hainan, China
- State Key Laboratory of Cotton Bio-Breeding and Integrated Utilization, Institute of Cotton Research of Chinese Academy of Agricultural Sciences, Anyang, 455000, Henan, China
| | - Jianhua Zhang
- Nanfan Research Institute, Chinese Academy of Agricultural Sciences, Sanya, 572024, Hainan, China.
- Agricultural Information Institute of Chinese Academy of Agricultural Sciences, Beijing, 100081, China.
| | - Yubin Lan
- College of Electronic Engineering (College of Artificial Intelligence), South China Agricultural University, Guangzhou, 510642, Guangdong, China.
| | - Mao Chai
- State Key Laboratory of Cotton Bio-Breeding and Integrated Utilization, Institute of Cotton Research of Chinese Academy of Agricultural Sciences, Anyang, 455000, Henan, China.
| |
Collapse
|
9
|
Pan S, Shi T, Ji J, Wang K, Jiang K, Yu Y, Li C. Developing and validating a machine learning model to predict multidrug-resistant Klebsiella pneumoniae-related septic shock. Front Immunol 2025; 15:1539465. [PMID: 39867898 PMCID: PMC11757138 DOI: 10.3389/fimmu.2024.1539465] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2024] [Accepted: 12/23/2024] [Indexed: 01/28/2025] Open
Abstract
Background Multidrug-resistant Klebsiella pneumoniae (MDR-KP) infections pose a significant global healthcare challenge, particularly due to the high mortality risk associated with septic shock. This study aimed to develop and validate a machine learning-based model to predict the risk of MDR-KP-associated septic shock, enabling early risk stratification and targeted interventions. Methods A retrospective analysis was conducted on 1,385 patients with MDR-KP infections admitted between January 2019 and June 2024. The cohort was randomly divided into a training set (n = 969) and a validation set (n = 416). Feature selection was performed using LASSO regression and the Boruta algorithm. Seven machine learning algorithms were evaluated, with logistic regression chosen for its optimal balance between performance and robustness against overfitting. Results The overall incidence of MDR-KP-associated septic shock was 16.32% (226/1,385). The predictive model identified seven key risk factors: procalcitonin (PCT), sepsis, acute kidney injury, intra-abdominal infection, use of vasoactive medications, ventilator weaning failure, and mechanical ventilation. The logistic regression model demonstrated excellent predictive performance, with an area under the receiver operating characteristic curve (AUC) of 0.906 in the training set and 0.865 in the validation set. Calibration was robust, with Hosmer-Lemeshow test results of P = 0.065 (training) and P = 0.069 (validation). Decision curve analysis indicated substantial clinical net benefit. Conclusion This study presents a validated, high-performing predictive model for MDR-KP-associated septic shock, offering a valuable tool for early clinical decision-making. Prospective, multi-center studies are recommended to further evaluate its clinical applicability and effectiveness in diverse settings.
Collapse
Affiliation(s)
- Shengnan Pan
- Department of Medical Laboratory, The Affiliated Huai’an No. 1 People’s Hospital of Nanjing Medical University, Huai’an, Jiangsu, China
| | - Ting Shi
- Department of Hepatobiliary and Pancreatic Surgery, The Affiliated Huai’an No. 1 People’s Hospital of Nanjing Medical University, Huai’an, Jiangsu, China
| | - Jinling Ji
- Department of Medical Laboratory, The Affiliated Huai’an No. 1 People’s Hospital of Nanjing Medical University, Huai’an, Jiangsu, China
| | - Kai Wang
- Department of Rheumatology, The Affiliated Huai’an No. 1 People’s Hospital of Nanjing Medical University, Huai’an, Jiangsu, China
| | - Kun Jiang
- Department of Medical Laboratory, The Affiliated Huai’an No. 1 People’s Hospital of Nanjing Medical University, Huai’an, Jiangsu, China
| | - Yabin Yu
- Department of Hepatobiliary and Pancreatic Surgery, The Affiliated Huai’an No. 1 People’s Hospital of Nanjing Medical University, Huai’an, Jiangsu, China
| | - Chang Li
- Department of Medical Laboratory, The Affiliated Huai’an No. 1 People’s Hospital of Nanjing Medical University, Huai’an, Jiangsu, China
| |
Collapse
|
10
|
Wu H, Han R, Zhao L, Liu M, Chen H, Li W, Li L. AutoGP: An intelligent breeding platform for enhancing maize genomic selection. PLANT COMMUNICATIONS 2025:101240. [PMID: 39789848 DOI: 10.1016/j.xplc.2025.101240] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/02/2024] [Revised: 10/06/2024] [Accepted: 01/08/2025] [Indexed: 01/12/2025]
Abstract
In the face of climate change and the growing global population, there is an urgent need to accelerate the development of high-yielding crop varieties. To this end, vast amounts of genotype-to-phenotype data have been collected, and many machine learning (ML) models have been developed to predict phenotype from a given genotype. However, the requirement for high densities of single-nucleotide polymorphisms (SNPs) and the labor-intensive collection of phenotypic data are hampering the use of these models to advance breeding. Furthermore, recently developed genomic selection (GS) models, such as deep learning (DL), are complicated and inconvenient for breeders to navigate and optimize within their breeding programs. Here, we present the development of an intelligent breeding platform named AutoGP (http://autogp.hzau.edu.cn), which integrates genotype extraction, phenotypic extraction, and GS models of genotype-to-phenotype data within a user-friendly web interface. AutoGP has three main advantages over previously developed platforms: 1) an efficient sequencing chip to identify high-quality, high-confidence SNPs throughout gene-regulatory networks; 2) a complete workflow for extraction of plant phenotypes (such as plant height and leaf count) from smartphone-captured video; and 3) a broad model pool, enabling users to select from five ML models (support vector machine, extreme gradient boosting, gradient-boosted decision tree, multilayer perceptron, and random forest) and four commonly used DL models (deep learning genomic selection, deep learning genomic-wide association study, deep neural network for genomic prediction, and SoyDNGP). For the convenience of breeders, we use datasets from the maize (Zea mays) complete-diallel design plus unbalanced breeding-like inter-cross population as a case study to demonstrate the usefulness of AutoGP. We show that our genotype chips can effectively extract high-quality SNPs associated with days to tasseling and plant height. The models show reliable predictive accuracy on different populations and can provide effective guidance for breeders. Overall, AutoGP offers a practical solution to streamline the breeding process, enabling breeders to achieve more efficient and accurate genomic selection.
Collapse
Affiliation(s)
- Hao Wu
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Rui Han
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China
| | - Liang Zhao
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Mengyao Liu
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Hong Chen
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China; Engineering Research Center of Intelligent Technology for Agriculture, Ministry of Education, Wuhan, China
| | - Weifu Li
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China; Engineering Research Center of Intelligent Technology for Agriculture, Ministry of Education, Wuhan, China.
| | - Lin Li
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China; Hubei Hongshan Laboratory, Hubei, China.
| |
Collapse
|
11
|
Liang H, Yang T, Liu Z, Jian W, Chen Y, Li B, Yan Z, Xu W, Chen L, Qi Y, Wang Z, Liao Y, Lin P, Li J, Wang W, Li L, Wang M, Zhang Y, Deng L, Jiang T, He J. LungDiag: Empowering artificial intelligence for respiratory diseases diagnosis based on electronic health records, a multicenter study. MedComm (Beijing) 2025; 6:e70043. [PMID: 39802635 PMCID: PMC11725045 DOI: 10.1002/mco2.70043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2024] [Revised: 11/16/2024] [Accepted: 11/20/2024] [Indexed: 01/16/2025] Open
Abstract
Respiratory diseases pose a significant global health burden, with challenges in early and accurate diagnosis due to overlapping clinical symptoms, which often leads to misdiagnosis or delayed treatment. To address this issue, we developed LungDiag, an artificial intelligence (AI)-based diagnostic system that utilizes natural language processing (NLP) to extract key clinical features from electronic health records (EHRs) for the accurate classification of respiratory diseases. This study employed a large cohort of 31,267 EHRs from multiple centers for model training and internal testing. Additionally, prospective real-world validation was conducted using 1142 EHRs from three external centers. LungDiag demonstrated superior diagnostic performance, achieving an F1 score of 0.711 for top 1 diagnosis and 0.927 for top 3 diagnoses. In real-world testing, LungDiag outperformed both human experts and ChatGPT 4.0, achieving an F1 score of 0.651 for top 1 diagnosis. The study emphasizes the potential of LungDiag as an effective tool to support physicians in diagnosing respiratory diseases more accurately and efficiently. Despite the promising results, further large-scale multicenter validation with larger sample sizes is still needed to confirm its clinical utility and generalizability.
Collapse
Affiliation(s)
- Hengrui Liang
- Department of Thoracic SurgeryChina State Key Laboratory of Respiratory Disease & National Clinical Research Center for Respiratory Diseasethe Key laboratory of Advanced Interdisciplinary Studies Centerthe First Affiliated Hospital of Guangzhou Medical UniversityGuangzhouChina
- Guangzhou National LaboratoryGuangzhouChina
| | - Tao Yang
- Guangzhou National LaboratoryGuangzhouChina
- Guangzhou Women and Children's Medical CenterGuangzhouChina
| | - Zihao Liu
- Department of Thoracic SurgeryChina State Key Laboratory of Respiratory Disease & National Clinical Research Center for Respiratory Diseasethe Key laboratory of Advanced Interdisciplinary Studies Centerthe First Affiliated Hospital of Guangzhou Medical UniversityGuangzhouChina
| | - Wenhua Jian
- Department of Thoracic SurgeryChina State Key Laboratory of Respiratory Disease & National Clinical Research Center for Respiratory Diseasethe Key laboratory of Advanced Interdisciplinary Studies Centerthe First Affiliated Hospital of Guangzhou Medical UniversityGuangzhouChina
| | - Yilong Chen
- Department of Research and DevelopementTianpeng Technology Co. LtdGuangzhouChina
| | - Bingliang Li
- Department of Thoracic SurgeryChina State Key Laboratory of Respiratory Disease & National Clinical Research Center for Respiratory Diseasethe Key laboratory of Advanced Interdisciplinary Studies Centerthe First Affiliated Hospital of Guangzhou Medical UniversityGuangzhouChina
| | - Zeping Yan
- Department of Thoracic SurgeryChina State Key Laboratory of Respiratory Disease & National Clinical Research Center for Respiratory Diseasethe Key laboratory of Advanced Interdisciplinary Studies Centerthe First Affiliated Hospital of Guangzhou Medical UniversityGuangzhouChina
| | - Weiqiang Xu
- Department of Thoracic SurgeryChina State Key Laboratory of Respiratory Disease & National Clinical Research Center for Respiratory Diseasethe Key laboratory of Advanced Interdisciplinary Studies Centerthe First Affiliated Hospital of Guangzhou Medical UniversityGuangzhouChina
| | | | - Yifan Qi
- School of Health Policy and ManagementNanjing Medical UniversityNanjingChina
- Laboratory for Digital Intelligence & Health GovernanceNanjing Medical UniversityNanjingChina
| | - Zhiwei Wang
- Department of Thoracic SurgeryChina State Key Laboratory of Respiratory Disease & National Clinical Research Center for Respiratory Diseasethe Key laboratory of Advanced Interdisciplinary Studies Centerthe First Affiliated Hospital of Guangzhou Medical UniversityGuangzhouChina
| | - Yajing Liao
- Department of Thoracic SurgeryChina State Key Laboratory of Respiratory Disease & National Clinical Research Center for Respiratory Diseasethe Key laboratory of Advanced Interdisciplinary Studies Centerthe First Affiliated Hospital of Guangzhou Medical UniversityGuangzhouChina
| | - Peixuan Lin
- Department of Thoracic SurgeryChina State Key Laboratory of Respiratory Disease & National Clinical Research Center for Respiratory Diseasethe Key laboratory of Advanced Interdisciplinary Studies Centerthe First Affiliated Hospital of Guangzhou Medical UniversityGuangzhouChina
| | - Jiameng Li
- Department of Thoracic SurgeryChina State Key Laboratory of Respiratory Disease & National Clinical Research Center for Respiratory Diseasethe Key laboratory of Advanced Interdisciplinary Studies Centerthe First Affiliated Hospital of Guangzhou Medical UniversityGuangzhouChina
| | - Wei Wang
- Department of Thoracic SurgeryChina State Key Laboratory of Respiratory Disease & National Clinical Research Center for Respiratory Diseasethe Key laboratory of Advanced Interdisciplinary Studies Centerthe First Affiliated Hospital of Guangzhou Medical UniversityGuangzhouChina
| | - Li Li
- Department of Respiratory DiseaseThe First People's Hospital of Kashi PrefectureKashiChina
| | - Meijia Wang
- Department of Respiratory and Critical Care MedicineNational Clinical Research Center of Respiratory DiseaseKey Laboratory of Pulmonary Diseases of Health MinistryTongji HospitalTongji Medical CollegeHuazhong University of Science and TechnologyWuhanHubeiChina
| | - YunHui Zhang
- Department of Respiratory DiseaseThe First People's Hospital of Yunnan ProvinceKunmingChina
| | - Lizong Deng
- School of Health Policy and ManagementNanjing Medical UniversityNanjingChina
- Laboratory for Digital Intelligence & Health GovernanceNanjing Medical UniversityNanjingChina
| | - Taijiao Jiang
- Guangzhou National LaboratoryGuangzhouChina
- State Key Laboratory of Respiratory DiseaseThe Key laboratory of Advanced Interdisciplinary Studies Centerthe First Affiliated Hospital of Guangzhou Medical UniversityGuangzhouGuangdongChina
| | - Jianxing He
- Department of Thoracic SurgeryChina State Key Laboratory of Respiratory Disease & National Clinical Research Center for Respiratory Diseasethe Key laboratory of Advanced Interdisciplinary Studies Centerthe First Affiliated Hospital of Guangzhou Medical UniversityGuangzhouChina
- Guangzhou National LaboratoryGuangzhouChina
| |
Collapse
|
12
|
Wang H, Yan S, Wang W, Chen Y, Hong J, He Q, Diao X, Lin Y, Chen Y, Cao Y, Guo W, Fang W. Cropformer: An interpretable deep learning framework for crop genomic prediction. PLANT COMMUNICATIONS 2024:101223. [PMID: 39690739 DOI: 10.1016/j.xplc.2024.101223] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/03/2024] [Revised: 10/15/2024] [Accepted: 12/12/2024] [Indexed: 12/19/2024]
Abstract
Machine learning and deep learning are extensively employed in genomic selection (GS) to expedite the identification of superior genotypes and accelerate breeding cycles. However, a significant challenge with current data-driven deep learning models in GS lies in their low robustness and poor interpretability. To address these challenges, we developed Cropformer, a deep learning framework for predicting crop phenotypes and exploring downstream tasks. This framework combines convolutional neural networks with multiple self-attention mechanisms to improve accuracy. The ability of Cropformer to predict complex phenotypic traits was extensively evaluated on more than 20 traits across five major crops: maize, rice, wheat, foxtail millet, and tomato. Evaluation results show that Cropformer outperforms other GS methods in both precision and robustness, achieving up to a 7.5% improvement in prediction accuracy compared to the runner-up model. Additionally, Cropformer enhances the analysis and mining of genes associated with traits. We identified numerous single nucleotide polymorphisms (SNPs) with potential effects on maize phenotypic traits and revealed key genetic variations underlying these differences. Cropformer represents a significant advancement in predictive performance and gene identification, providing a powerful general tool for improving genomic design in crop breeding. Cropformer is freely accessible at https://cgris.net/cropformer.
Collapse
Affiliation(s)
- Hao Wang
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Shen Yan
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Wenxi Wang
- Frontiers Science Center for Molecular Design Breeding, Key Laboratory of Crop Heterosis and Utilization (MOE), and Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing 100193, China
| | - Yongming Chen
- Frontiers Science Center for Molecular Design Breeding, Key Laboratory of Crop Heterosis and Utilization (MOE), and Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing 100193, China; State Key Laboratory of Wheat Improvement, Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agricultural Sciences in Weifang, Shandong 261325, China
| | - Jingpeng Hong
- College of Information and Management Science, Henan Agricultural University, Zhengzhou 450002, China
| | - Qiang He
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Xianmin Diao
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Yunan Lin
- School of Engineering and Design, Technical University Munich, 85521 Munich, Germany
| | - Yanqing Chen
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Yongsheng Cao
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China.
| | - Weilong Guo
- Frontiers Science Center for Molecular Design Breeding, Key Laboratory of Crop Heterosis and Utilization (MOE), and Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing 100193, China.
| | - Wei Fang
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China.
| |
Collapse
|
13
|
Pan S, Liu Z, Han Y, Zhang D, Zhao X, Li J, Wang K. Using the Pearson's correlation coefficient as the sole metric to measure the accuracy of quantitative trait prediction: is it sufficient? FRONTIERS IN PLANT SCIENCE 2024; 15:1480463. [PMID: 39719937 PMCID: PMC11667204 DOI: 10.3389/fpls.2024.1480463] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/14/2024] [Accepted: 11/19/2024] [Indexed: 12/26/2024]
Abstract
How to evaluate the accuracy of quantitative trait prediction is crucial to choose the best model among several possible choices in plant breeding. Pearson's correlation coefficient (PCC), serving as a metric for quantifying the strength of the linear association between two variables, is widely used to evaluate the accuracy of the quantitative trait prediction models, and generally performs well in most circumstances. However, PCC may not always offer a comprehensive view of predictive accuracy, especially in cases involving nonlinear relationships or complex dependencies in machine learning-based methods. It has been found that many papers on quantitative trait prediction solely use PCC as a single metric to evaluate the accuracy of their models, which is insufficient and limited from a formal perspective. This study addresses this crucial issue by presenting a typical example and conducting a comparative analysis of PCC and nine other evaluation metrics using four traditional methods and four machine learning-based methods, thereby contributing to the improvement of practical applicability and reliability of plant quantitative trait prediction models. It is recommended to employ PCC in conjunction with other evaluation metrics in a targeted manner based on specific application scenarios to reduce the likelihood of drawing misleading conclusions.
Collapse
Affiliation(s)
- Shouhui Pan
- Information Technology Research Center, Beijing Academy of Agriculture and Forestry Sciences, Beijing, China
- National Engineering Research Center for Information Technology in Agriculture, Beijing, China
| | - Zhongqiang Liu
- Information Technology Research Center, Beijing Academy of Agriculture and Forestry Sciences, Beijing, China
- National Engineering Research Center for Information Technology in Agriculture, Beijing, China
| | - Yanyun Han
- Information Technology Research Center, Beijing Academy of Agriculture and Forestry Sciences, Beijing, China
- National Engineering Research Center for Information Technology in Agriculture, Beijing, China
| | - Dongfeng Zhang
- Information Technology Research Center, Beijing Academy of Agriculture and Forestry Sciences, Beijing, China
- National Engineering Research Center for Information Technology in Agriculture, Beijing, China
| | - Xiangyu Zhao
- Information Technology Research Center, Beijing Academy of Agriculture and Forestry Sciences, Beijing, China
- National Engineering Research Center for Information Technology in Agriculture, Beijing, China
| | - Jinlong Li
- Information Technology Research Center, Beijing Academy of Agriculture and Forestry Sciences, Beijing, China
- National Engineering Research Center for Information Technology in Agriculture, Beijing, China
| | - Kaiyi Wang
- Information Technology Research Center, Beijing Academy of Agriculture and Forestry Sciences, Beijing, China
- National Engineering Research Center for Information Technology in Agriculture, Beijing, China
| |
Collapse
|
14
|
Wu B, Xiong H, Zhuo L, Xiao Y, Yan J, Yang W. Multi-view BLUP: a promising solution for post-omics data integrative prediction. J Genet Genomics 2024:S1673-8527(24)00332-1. [PMID: 39645028 DOI: 10.1016/j.jgg.2024.11.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2024] [Revised: 11/27/2024] [Accepted: 11/27/2024] [Indexed: 12/09/2024]
Abstract
Phenotypic prediction is a promising strategy for accelerating plant breeding. Data from multiple sources (called multi-view data) can provide complementary information to characterize a biological object from various aspects. By integrating multi-view information into phenotypic prediction, a multi-view best linear unbiased prediction (MVBLUP) method was proposed in this paper. To measure the importance of multiple data views, the differential evolution algorithm with an early stopping mechanism was used, by which we obtained a multi-view kinship matrix and then incorporated it into the BLUP model for phenotypic prediction. To further illustrate the characteristics of MVBLUP, we performed the empirical experiments on four multi-view datasets in different crops. Compared to the single-view method, the prediction accuracy of the MVBLUP method has improved by 0.038 to 0.201 on average. The results demonstrate that the MVBLUP is an effective integrative prediction method for multi-view data.
Collapse
Affiliation(s)
- Bingjie Wu
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, Hubei 430070, China
| | - Huijuan Xiong
- College of Informatics, Huazhong Agricultural University, Wuhan, Hubei 430070, China
| | - Lin Zhuo
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, Hubei 430070, China
| | - Yingjie Xiao
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, Hubei 430070, China; Hubei Hongshan Laboratory, Wuhan, Hubei 430070, China
| | - Jianbing Yan
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, Hubei 430070, China; Hubei Hongshan Laboratory, Wuhan, Hubei 430070, China
| | - Wenyu Yang
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, Hubei 430070, China; College of Informatics, Huazhong Agricultural University, Wuhan, Hubei 430070, China.
| |
Collapse
|
15
|
Shuang Z, Xingyu X, Yue C, Mingjing Y. Explainable Machine Learning Predictions for the Benefit From Chemotherapy in Advanced Non-Small Cell Lung Cancer Without Available Targeted Mutations. THE CLINICAL RESPIRATORY JOURNAL 2024; 18:e70044. [PMID: 39696772 DOI: 10.1111/crj.70044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/18/2024] [Revised: 10/16/2024] [Accepted: 12/08/2024] [Indexed: 12/20/2024]
Abstract
BACKGROUND Non-small cell lung cancer (NSCLC) is a global health challenge. Chemotherapy remains the standard therapy for advanced NSCLC without mutations, but drug resistance often reduces effectiveness. Developing more effective methods to predict and monitor chemotherapy benefits early is crucial. METHODS We carried out a retrospective cohort study of NSCLC patients without targeted mutations who received chemotherapy at West China Hospital from 2009 to 2013. We identified variables associated with chemotherapy outcomes and built four predictive models by machine learning. Shapley additive explanations (SHAP) interpreted the best model's predictions. The Kaplan-Meier method assessed key variables' impact on 5-year overall survival. RESULTS The study enrolled 461 NSCLC patients. Eight variables were selected for the model: differentiation, surgery history, neutrophil-to-lymphocyte ratio (NLR), platelet-to-lymphocyte ratio (PLR), total bilirubin (TBIL), total protein (TP), alanine aminotransferase (ALT), and lactate dehydrogenase (LDH). The extreme gradient boosting (Xgboost) model exhibited superior discriminatory ability in predicting complete response (CR) probabilities to chemotherapy, with an AUC of 0.78. SHAP plots showed surgery history and high differentiation were related to CR benefits from chemotherapy. Absence of surgery, higher NLR, higher PLR, and higher LDH were all independent prognostic factors for poor survivals in NSCLC patients without mutations receiving chemotherapy. CONCLUSIONS By machine learning, we developed a predictive model to assess chemotherapy benefits in NSCLC patients without targeted mutations, utilizing eight readily available and non-invasive clinical indicators. Demonstrating satisfactory predictive performance and clinical practicability, this model may help clinicians identify patients' tendency to benefit from chemotherapy, potentially improving their prognosis.
Collapse
Affiliation(s)
- Zhao Shuang
- Department of Respiratory and Critical Care Medicine, State Key Laboratory of Respiratory Health and Multimorbidity, West China Hospital, Sichuan University, Chengdu, Sichuan, China
| | - Xiong Xingyu
- Department of Respiratory and Critical Care Medicine, State Key Laboratory of Respiratory Health and Multimorbidity, West China Hospital, Sichuan University, Chengdu, Sichuan, China
| | - Cheng Yue
- Department of Respiratory and Critical Care Medicine, State Key Laboratory of Respiratory Health and Multimorbidity, West China Hospital, Sichuan University, Chengdu, Sichuan, China
| | - Yu Mingjing
- Department of Respiratory and Critical Care Medicine, State Key Laboratory of Respiratory Health and Multimorbidity, West China Hospital, Sichuan University, Chengdu, Sichuan, China
| |
Collapse
|
16
|
Xu Y, Yang W, Qiu J, Zhou K, Yu G, Zhang Y, Wang X, Jiao Y, Wang X, Hu S, Zhang X, Li P, Lu Y, Chen R, Tao T, Yang Z, Xu Y, Xu C. Metabolic marker-assisted genomic prediction improves hybrid breeding. PLANT COMMUNICATIONS 2024:101199. [PMID: 39614617 DOI: 10.1016/j.xplc.2024.101199] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/25/2024] [Revised: 10/31/2024] [Accepted: 11/26/2024] [Indexed: 12/01/2024]
Abstract
Hybrid breeding is widely acknowledged as the most effective method for increasing crop yield, particularly in maize and rice. However, a major challenge in hybrid breeding is the selection of desirable combinations from the vast pool of potential crosses. Genomic selection (GS) has emerged as a powerful tool to tackle this challenge, but its success in practical breeding depends on prediction accuracy. Several strategies have been explored to enhance prediction accuracy for complex traits, such as the incorporation of functional markers and multi-omics data. Metabolome-wide association studies (MWAS) help to identify metabolites that are closely linked to phenotypes, known as metabolic markers. However, the use of preselected metabolic markers from parental lines to predict hybrid performance has not yet been explored. In this study, we developed a novel approach called metabolic marker-assisted genomic prediction (MM_GP), which incorporates significant metabolites identified from MWAS into GS models to improve the accuracy of genomic hybrid prediction. In maize and rice hybrid populations, MM_GP outperformed genomic prediction (GP) for all traits, regardless of the method used (genomic best linear unbiased prediction or eXtreme gradient boosting). On average, MM_GP demonstrated 4.6% and 13.6% higher predictive abilities than GP for maize and rice, respectively. MM_GP could also match or even surpass the predictive ability of M_GP (integrated genomic-metabolomic prediction) for most traits. In maize, the integration of only six metabolic markers significantly associated with multiple traits resulted in 5.0% and 3.1% higher average predictive ability compared with GP and M_GP, respectively. With advances in high-throughput metabolomics technologies and prediction models, this approach holds great promise for revolutionizing genomic hybrid breeding by enhancing its accuracy and efficiency.
Collapse
Affiliation(s)
- Yang Xu
- Key Laboratory of Plant Functional Genomics of the Ministry of Education/Jiangsu Key Laboratory of Crop Genomics and Molecular Breeding/Zhongshan Biological Breeding Laboratory/Jiangsu Co-Innovation Center for Modern Production Technology of Grain Crops, College of Agriculture, Yangzhou University, Yangzhou 225009, China
| | - Wenyan Yang
- Key Laboratory of Plant Functional Genomics of the Ministry of Education/Jiangsu Key Laboratory of Crop Genomics and Molecular Breeding/Zhongshan Biological Breeding Laboratory/Jiangsu Co-Innovation Center for Modern Production Technology of Grain Crops, College of Agriculture, Yangzhou University, Yangzhou 225009, China
| | - Jie Qiu
- Shanghai Key Laboratory of Plant Molecular Sciences, College of Life Sciences, Shanghai Normal University, Shanghai 200234, China
| | - Kai Zhou
- Key Laboratory of Plant Functional Genomics of the Ministry of Education/Jiangsu Key Laboratory of Crop Genomics and Molecular Breeding/Zhongshan Biological Breeding Laboratory/Jiangsu Co-Innovation Center for Modern Production Technology of Grain Crops, College of Agriculture, Yangzhou University, Yangzhou 225009, China
| | - Guangning Yu
- Key Laboratory of Plant Functional Genomics of the Ministry of Education/Jiangsu Key Laboratory of Crop Genomics and Molecular Breeding/Zhongshan Biological Breeding Laboratory/Jiangsu Co-Innovation Center for Modern Production Technology of Grain Crops, College of Agriculture, Yangzhou University, Yangzhou 225009, China
| | - Yuxiang Zhang
- Key Laboratory of Plant Functional Genomics of the Ministry of Education/Jiangsu Key Laboratory of Crop Genomics and Molecular Breeding/Zhongshan Biological Breeding Laboratory/Jiangsu Co-Innovation Center for Modern Production Technology of Grain Crops, College of Agriculture, Yangzhou University, Yangzhou 225009, China
| | - Xin Wang
- Key Laboratory of Plant Functional Genomics of the Ministry of Education/Jiangsu Key Laboratory of Crop Genomics and Molecular Breeding/Zhongshan Biological Breeding Laboratory/Jiangsu Co-Innovation Center for Modern Production Technology of Grain Crops, College of Agriculture, Yangzhou University, Yangzhou 225009, China
| | - Yuxin Jiao
- Key Laboratory of Plant Functional Genomics of the Ministry of Education/Jiangsu Key Laboratory of Crop Genomics and Molecular Breeding/Zhongshan Biological Breeding Laboratory/Jiangsu Co-Innovation Center for Modern Production Technology of Grain Crops, College of Agriculture, Yangzhou University, Yangzhou 225009, China
| | - Xinyi Wang
- Key Laboratory of Plant Functional Genomics of the Ministry of Education/Jiangsu Key Laboratory of Crop Genomics and Molecular Breeding/Zhongshan Biological Breeding Laboratory/Jiangsu Co-Innovation Center for Modern Production Technology of Grain Crops, College of Agriculture, Yangzhou University, Yangzhou 225009, China
| | - Shujun Hu
- Key Laboratory of Plant Functional Genomics of the Ministry of Education/Jiangsu Key Laboratory of Crop Genomics and Molecular Breeding/Zhongshan Biological Breeding Laboratory/Jiangsu Co-Innovation Center for Modern Production Technology of Grain Crops, College of Agriculture, Yangzhou University, Yangzhou 225009, China
| | - Xuecai Zhang
- International Maize and Wheat Improvement Center (CIMMYT), Mexico D.F. 06600, Mexico
| | - Pengcheng Li
- Key Laboratory of Plant Functional Genomics of the Ministry of Education/Jiangsu Key Laboratory of Crop Genomics and Molecular Breeding/Zhongshan Biological Breeding Laboratory/Jiangsu Co-Innovation Center for Modern Production Technology of Grain Crops, College of Agriculture, Yangzhou University, Yangzhou 225009, China
| | - Yue Lu
- Key Laboratory of Plant Functional Genomics of the Ministry of Education/Jiangsu Key Laboratory of Crop Genomics and Molecular Breeding/Zhongshan Biological Breeding Laboratory/Jiangsu Co-Innovation Center for Modern Production Technology of Grain Crops, College of Agriculture, Yangzhou University, Yangzhou 225009, China
| | - Rujia Chen
- Key Laboratory of Plant Functional Genomics of the Ministry of Education/Jiangsu Key Laboratory of Crop Genomics and Molecular Breeding/Zhongshan Biological Breeding Laboratory/Jiangsu Co-Innovation Center for Modern Production Technology of Grain Crops, College of Agriculture, Yangzhou University, Yangzhou 225009, China
| | - Tianyun Tao
- Key Laboratory of Plant Functional Genomics of the Ministry of Education/Jiangsu Key Laboratory of Crop Genomics and Molecular Breeding/Zhongshan Biological Breeding Laboratory/Jiangsu Co-Innovation Center for Modern Production Technology of Grain Crops, College of Agriculture, Yangzhou University, Yangzhou 225009, China
| | - Zefeng Yang
- Key Laboratory of Plant Functional Genomics of the Ministry of Education/Jiangsu Key Laboratory of Crop Genomics and Molecular Breeding/Zhongshan Biological Breeding Laboratory/Jiangsu Co-Innovation Center for Modern Production Technology of Grain Crops, College of Agriculture, Yangzhou University, Yangzhou 225009, China
| | - Yunbi Xu
- Peking University Institute of Advanced Agricultural Sciences, Weifang, Shandong 261325, China; BGI Bioverse, Shenzhen 518083, China; MolBreeding Biotechnology Co., Ltd., Shijiazhuang 050035, China.
| | - Chenwu Xu
- Key Laboratory of Plant Functional Genomics of the Ministry of Education/Jiangsu Key Laboratory of Crop Genomics and Molecular Breeding/Zhongshan Biological Breeding Laboratory/Jiangsu Co-Innovation Center for Modern Production Technology of Grain Crops, College of Agriculture, Yangzhou University, Yangzhou 225009, China.
| |
Collapse
|
17
|
Zhang H, Bao S, Zhao X, Bai Y, Lv Y, Gao P, Li F, Zhang W. Genome-Wide Association Study and Phenotype Prediction of Reproductive Traits in Large White Pigs. Animals (Basel) 2024; 14:3348. [PMID: 39682314 DOI: 10.3390/ani14233348] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2024] [Revised: 11/14/2024] [Accepted: 11/19/2024] [Indexed: 12/18/2024] Open
Abstract
In a study involving 385 Large White pigs, a genome-wide association study (GWAS) was conducted to investigate reproductive traits, specifically the number of healthy litters (NHs) and the number of weaned litters (NWs). Several SNP loci, including ALGA0098819, ALGA0037969, and H3GA0032302, were significantly associated with these traits. In the combined-parity analysis, candidate genes, such as BLVRA, STK17A, PSMA2, and C7orf25, were identified. GO and KEGG pathway enrichment analyses revealed that these genes are involved in key biological processes, including organic synthesis, the regulation of sperm activity, spermatogenesis, and meiosis. In the by-parity analysis, the PLCXD3 gene was significantly associated with the NW trait in the second and fourth parities, while RNASEH1, PYM1, and SEPTIN9 were linked to cell proliferation, DNA repair, and metabolism, suggesting their potential role in regulating reproductive traits. These findings provide new molecular markers for the genetic study of reproductive traits in Large White pigs. For the phenotypic prediction of NH and NW traits, several machine learning models (GBDT, RF, LightGBM, and Adaboost.R2), as well as traditional models (GBLUP, BRR, and BL), were evaluated using SNP data in varying proportions. After PCA processing, the GBDT model achieved the highest PCC for NH (0.141), while LightGBM reached the highest PCC for NW (0.146). The MAE, MSE, and RMSE results showed that the traditional models exhibited stable error rates, while the machine learning models performed comparatively better across the different SNP ratios. Overall, PCA processing provided some improvement in the predictive performance of all of the models, though the overall increase in accuracy was limited.
Collapse
Affiliation(s)
- Hao Zhang
- School of Software Technology, Shanxi Agricultural University, Jinzhong 030801, China
| | - Shiqian Bao
- School of Software Technology, Shanxi Agricultural University, Jinzhong 030801, China
| | - Xiaona Zhao
- School of Software Technology, Shanxi Agricultural University, Jinzhong 030801, China
| | - Yangfan Bai
- School of Software Technology, Shanxi Agricultural University, Jinzhong 030801, China
| | - Yangcheng Lv
- School of Software Technology, Shanxi Agricultural University, Jinzhong 030801, China
| | - Pengfei Gao
- School of Software Technology, Shanxi Agricultural University, Jinzhong 030801, China
| | - Fuzhong Li
- School of Software Technology, Shanxi Agricultural University, Jinzhong 030801, China
| | - Wuping Zhang
- School of Software Technology, Shanxi Agricultural University, Jinzhong 030801, China
| |
Collapse
|
18
|
Yu S, Liu L, Wang H, Yan S, Zheng S, Ning J, Luo R, Fu X, Deng X. AtML: An Arabidopsis thaliana root cell identity recognition tool for medicinal ingredient accumulation. Methods 2024; 231:61-69. [PMID: 39293728 DOI: 10.1016/j.ymeth.2024.09.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2024] [Revised: 08/05/2024] [Accepted: 09/12/2024] [Indexed: 09/20/2024] Open
Abstract
Arabidopsis thaliana synthesizes various medicinal compounds, and serves as a model plant for medicinal plant research. Single-cell transcriptomics technologies are essential for understanding the developmental trajectory of plant roots, facilitating the analysis of synthesis and accumulation patterns of medicinal compounds in different cell subpopulations. Although methods for interpreting single-cell transcriptomics data are rapidly advancing in Arabidopsis, challenges remain in precisely annotating cell identity due to the lack of marker genes for certain cell types. In this work, we trained a machine learning system, AtML, using sequencing datasets from six cell subpopulations, comprising a total of 6000 cells, to predict Arabidopsis root cell stages and identify biomarkers through complete model interpretability. Performance testing using an external dataset revealed that AtML achieved 96.50% accuracy and 96.51% recall. Through the interpretability provided by AtML, our model identified 160 important marker genes, contributing to the understanding of cell type annotations. In conclusion, we trained AtML to efficiently identify Arabidopsis root cell stages, providing a new tool for elucidating the mechanisms of medicinal compound accumulation in Arabidopsis roots.
Collapse
Affiliation(s)
- Shicong Yu
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Rice Research Institute, Sichuan Agricultural University, Chengdu 611130, China
| | - Lijia Liu
- Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Hao Wang
- Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Shen Yan
- Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Shuqin Zheng
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Rice Research Institute, Sichuan Agricultural University, Chengdu 611130, China
| | - Jing Ning
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Rice Research Institute, Sichuan Agricultural University, Chengdu 611130, China
| | - Ruxian Luo
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Rice Research Institute, Sichuan Agricultural University, Chengdu 611130, China
| | - Xiangzheng Fu
- Research Institute of Hunan University in Chongqing, Chongqing 401120, China.
| | - Xiaoshu Deng
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Rice Research Institute, Sichuan Agricultural University, Chengdu 611130, China; Chongqing Academy of Chinese Materia Medica, Chongqing 400065, China.
| |
Collapse
|
19
|
Fu Q, Wu Y, Zhu M, Xia Y, Yu Q, Liu Z, Ma X, Yang R. Identifying cardiovascular disease risk in the U.S. population using environmental volatile organic compounds exposure: A machine learning predictive model based on the SHAP methodology. ECOTOXICOLOGY AND ENVIRONMENTAL SAFETY 2024; 286:117210. [PMID: 39447292 DOI: 10.1016/j.ecoenv.2024.117210] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/06/2024] [Revised: 09/26/2024] [Accepted: 10/14/2024] [Indexed: 10/26/2024]
Abstract
BACKGROUND Cardiovascular disease (CVD) remains a leading cause of mortality globally. Environmental pollutants, specifically volatile organic compounds (VOCs), have been identified as significant risk factors. This study aims to develop a machine learning (ML) model to predict CVD risk based on VOC exposure and demographic data using SHapley Additive exPlanations (SHAP) for interpretability. METHODS We utilized data from the National Health and Nutrition Examination Survey (NHANES) from 2011 to 2018, comprising 5098 participants. VOC exposure was assessed through 15 urinary metabolite metrics. The dataset was split into a training set (70 %) and a test set (30 %). Six ML models were developed, including Random Forest (RF), Light Gradient Boosting Machine (LightGBM), Decision Tree (DT), Extreme Gradient Boosting (XGBoost), Multi-Layer Perceptron (MLP), and Support Vector Machines (SVM). Model performance was evaluated using the Area Under the Receiver Operating Characteristic Curve (AUROC), accuracy, balanced accuracy, F1 score, J-index, kappa, Matthew's correlation coefficient (MCC), positive predictive value (PPV), negative predictive value (NPV), sensitivity (sens), specificity (spec) and SHAP was applied to interpret the best-performing model. RESULTS The RF model exhibited the highest predictive performance with an ROC of 0.8143. SHAP analysis identified age and ATCA as the most significant predictors, with ATCA showing a protective effect against CVD, particularly in older adults and those with hypertension. The study found a significant interaction between ATCA levels and age, indicating that the protective effect of ATCA is more pronounced in older individuals due to increased oxidative stress and inflammatory responses associated with aging. E-values analysis suggested robustness to unmeasured confounders. CONCLUSIONS This study is the first to utilize VOC exposure data to construct an ML model for predicting CVD risk. The findings highlight the potential of combining environmental exposure data with demographic information to enhance CVD risk prediction, supporting the development of personalized prevention and intervention strategies.
Collapse
Affiliation(s)
- Qingan Fu
- Cardiovascular medicine department, The Second Affiliated Hospital of Nanchang University, Nanchang, Jiangxi 330006, China
| | - Yanze Wu
- Department of Neurosurgery, The Second Affiliated Hospital, Jiangxi Medical College, Nanchang University, Nanchang, Jiangxi 330006, China
| | - Min Zhu
- Gastroenterology Department, The First People's Hospital of Xiushui County, Jiujiang, Jiangxi, China
| | - Yunlei Xia
- Cardiovascular medicine department, The Second Affiliated Hospital of Nanchang University, Nanchang, Jiangxi 330006, China
| | - Qingyun Yu
- Cardiovascular medicine department, The Second Affiliated Hospital of Nanchang University, Nanchang, Jiangxi 330006, China
| | - Zhekang Liu
- Rheumatology and immunology department, The Second Affiliated Hospital of Nanchang University, Nanchang, Jiangxi 330006, China
| | - Xiaowei Ma
- Cardiovascular medicine department, The Second Affiliated Hospital of Nanchang University, Nanchang, Jiangxi 330006, China
| | - Renqiang Yang
- Cardiovascular medicine department, The Second Affiliated Hospital of Nanchang University, Nanchang, Jiangxi 330006, China.
| |
Collapse
|
20
|
Liu Y, Dou X, Yan X, Ma S, Ye C, Wang X, Lu J. Using machine learning approaches to develop a fast and easy-to-perform diagnostic tool for patients with light chain amyloidosis: a retrospective real-world study. Ann Hematol 2024:10.1007/s00277-024-06015-0. [PMID: 39480584 DOI: 10.1007/s00277-024-06015-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2024] [Accepted: 09/17/2024] [Indexed: 11/02/2024]
Abstract
Immunoglobulin light chain (AL) amyloidosis is a severe disorder caused by the accumulation of amyloid fibrils, leading to organ failure. Early diagnosis is crucial to prevent irreversible damage, yet it remains a challenge due to nonspecific symptoms that often appear later in the disease progression. A retrospective study analyzed data collected from 133 AL amyloidosis patients and 271 non-AL patients with similar symptoms but different diagnoses between January 1st, 2017, and September 30th, 2022. Demographic data and laboratory test results were collected. Subsequently, significant features were identified by both logistic regression and independent expert clinical ability. Eventually, logistic regression and four machine learning (ML) algorithms were employed to construct a diagnostic model, utilizing fivefold cross-validation and blind set testing to identify the optimal model. The study successfully identified nine independent predictors of AL amyloidosis patients with kidney or cardiac involvement, respectively. Two models were developed to identify key features that distinguish AL amyloidosis from nephrotic syndrome and hypertrophic cardiomyopathy, respectively. The light gradient boosting machine (LightGBM) model emerged as the most effective, demonstrating superior performance with the area under curve (AUC) of 0.90 in both models, alongside high sensitivity, specificity, and F1-score. This research highlights the potential of using a machine learning-based LightGBM model to facilitate early and accurate diagnosis of AL amyloidosis. The model's effectiveness suggests it could be a valuable tool in clinical settings, aiding in the timely identification of AL amyloidosis among patients with non-specific symptoms. Further validation in diverse populations is recommended to establish its universal applicability.
Collapse
Affiliation(s)
- Yang Liu
- Department of Hematology, Peking University People's Hospital, No.11 Xizhimen South St, Xicheng District, Beijing, China
- Beijing Key Laboratory of Hematopoietic Stem Cell Transplantation, Peking University, Beijing, China
| | - Xuelin Dou
- Department of Hematology, Peking University People's Hospital, No.11 Xizhimen South St, Xicheng District, Beijing, China
- Beijing Key Laboratory of Hematopoietic Stem Cell Transplantation, Peking University, Beijing, China
| | - Xiaojing Yan
- Department of Hematology, The First Affiliated Hospital of China Medical University, Shenyang, China
| | - Shiyu Ma
- Department of Hematology, The First Affiliated Hospital of China Medical University, Shenyang, China
| | - Chong Ye
- Medical Affairs, Johnson & Johnson Innovative Medicine, Beijing, China
| | - Xiaohong Wang
- Medical Affairs, Johnson & Johnson Innovative Medicine, Shanghai, China
| | - Jin Lu
- Department of Hematology, Peking University People's Hospital, No.11 Xizhimen South St, Xicheng District, Beijing, China.
- Beijing Key Laboratory of Hematopoietic Stem Cell Transplantation, Peking University, Beijing, China.
| |
Collapse
|
21
|
Zhu W, Li W, Zhang H, Li L. Big data and artificial intelligence-aided crop breeding: Progress and prospects. JOURNAL OF INTEGRATIVE PLANT BIOLOGY 2024. [PMID: 39467106 DOI: 10.1111/jipb.13791] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/30/2024] [Revised: 08/25/2024] [Accepted: 09/10/2024] [Indexed: 10/30/2024]
Abstract
The past decade has witnessed rapid developments in gene discovery, biological big data (BBD), artificial intelligence (AI)-aided technologies, and molecular breeding. These advancements are expected to accelerate crop breeding under the pressure of increasing demands for food. Here, we first summarize current breeding methods and discuss the need for new ways to support breeding efforts. Then, we review how to combine BBD and AI technologies for genetic dissection, exploring functional genes, predicting regulatory elements and functional domains, and phenotypic prediction. Finally, we propose the concept of intelligent precision design breeding (IPDB) driven by AI technology and offer ideas about how to implement IPDB. We hope that IPDB will enhance the predictability, efficiency, and cost of crop breeding compared with current technologies. As an example of IPDB, we explore the possibilities offered by CropGPT, which combines biological techniques, bioinformatics, and breeding art from breeders, and presents an open, shareable, and cooperative breeding system. IPDB provides integrated services and communication platforms for biologists, bioinformatics experts, germplasm resource specialists, breeders, dealers, and farmers, and should be well suited for future breeding.
Collapse
Affiliation(s)
- Wanchao Zhu
- Key Laboratory of Biology and Genetic Improvement of Maize in Arid Area of Northwest Region, College of Agronomy, Northwest A&F University, Yangling, 712100, China
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
| | - Weifu Li
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
- Engineering Research Center of Intelligent Technology for Agriculture, Ministry of Education, Wuhan, 430070, China
| | - Hongwei Zhang
- State Key Laboratory of Crop Gene Resources and Breeding, National Key Facility for Crop Gene Resources and Genetic Improvement, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
| | - Lin Li
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
| |
Collapse
|
22
|
Mbarek L, Chen S, Jin A, Pan Y, Meng X, Yang X, Xu Z, Jiang Y, Wang Y. Predicting 3-month poor functional outcomes of acute ischemic stroke in young patients using machine learning. Eur J Med Res 2024; 29:494. [PMID: 39385211 PMCID: PMC11466038 DOI: 10.1186/s40001-024-02056-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2024] [Accepted: 09/09/2024] [Indexed: 10/12/2024] Open
Abstract
BACKGROUND Prediction of short-term outcomes in young patients with acute ischemic stroke (AIS) may assist in making therapy decisions. Machine learning (ML) is increasingly used in healthcare due to its high accuracy. This study aims to use a ML-based predictive model for poor 3-month functional outcomes in young AIS patients and to compare the predictive performance of ML models with the logistic regression model. METHODS We enrolled AIS patients aged between 18 and 50 years from the Third Chinese National Stroke Registry (CNSR-III), collected between 2015 and 2018. A modified Rankin Scale (mRS) ≥ 3 was a poor functional outcome at 3 months. Four ML tree models were developed: The extreme Gradient Boosting (XGBoost), Light Gradient Boosted Machine (lightGBM), Random Forest (RF), and The Gradient Boosting Decision Trees (GBDT), compared with logistic regression. We assess the model performance based on both discrimination and calibration. RESULTS A total of 2268 young patients with a mean age of 44.3 ± 5.5 years were included. Among them, (9%) had poor functional outcomes. The mRS at admission, living alone conditions, and high National Institutes of Health Stroke Scale (NIHSS) at discharge remained independent predictors of poor 3-month outcomes. The best AUC in the test group was XGBoost (AUC = 0.801), followed by GBDT, RF, and lightGBM (AUCs of 0.795, 0, 794, and 0.792, respectively). The XGBoost, RF, and lightGBM models were significantly better than logistic regression (P < 0.05). CONCLUSIONS ML outperformed logistic regression, where XGBoost the boost was the best model for predicting poor functional outcomes in young AIS patients. It is important to consider living alone conditions with high severity scores to improve stroke prognosis.
Collapse
Affiliation(s)
- Lamia Mbarek
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing, China
- China National Clinical Research Center for Neurological Diseases, Beijing Tiantan Hospital, Capital Medical University, No.119 South 4th Ring West Road, Fengtai District, Beijing, 100070, China
| | - Siding Chen
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing, China
- China National Clinical Research Center for Neurological Diseases, Beijing Tiantan Hospital, Capital Medical University, No.119 South 4th Ring West Road, Fengtai District, Beijing, 100070, China
- Changping Laboratory, Beijing, China
| | - Aoming Jin
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing, China
- China National Clinical Research Center for Neurological Diseases, Beijing Tiantan Hospital, Capital Medical University, No.119 South 4th Ring West Road, Fengtai District, Beijing, 100070, China
| | - Yuesong Pan
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing, China
- China National Clinical Research Center for Neurological Diseases, Beijing Tiantan Hospital, Capital Medical University, No.119 South 4th Ring West Road, Fengtai District, Beijing, 100070, China
| | - Xia Meng
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing, China
- China National Clinical Research Center for Neurological Diseases, Beijing Tiantan Hospital, Capital Medical University, No.119 South 4th Ring West Road, Fengtai District, Beijing, 100070, China
| | - Xiaomeng Yang
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing, China
| | - Zhe Xu
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing, China
- China National Clinical Research Center for Neurological Diseases, Beijing Tiantan Hospital, Capital Medical University, No.119 South 4th Ring West Road, Fengtai District, Beijing, 100070, China
| | - Yong Jiang
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing, China.
- China National Clinical Research Center for Neurological Diseases, Beijing Tiantan Hospital, Capital Medical University, No.119 South 4th Ring West Road, Fengtai District, Beijing, 100070, China.
- Changping Laboratory, Beijing, China.
- Beijing Advanced Innovation Center for Big Data-Based Precision Medicine, Beihang University and Capital Medical University, Beijing, 100091, China.
| | - Yongjun Wang
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing, China.
- China National Clinical Research Center for Neurological Diseases, Beijing Tiantan Hospital, Capital Medical University, No.119 South 4th Ring West Road, Fengtai District, Beijing, 100070, China.
- Changping Laboratory, Beijing, China.
- Research Unit of Artificial Intelligence in Cerebrovascular Disease, Chinese Academy of Medical Sciences, Beijing, 2019RU018, China.
- Beijing Advanced Innovation Centre for Big Data-Based Precision Medicine, Beihang University, Capital Medical University, Beijing, China.
- Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Beijing, China.
- Advanced Innovation Center for Human Brain Protection, Capital Medical University, Beijing, China.
| |
Collapse
|
23
|
Yang B, Lu H, Ran Y. Advancing non-alcoholic fatty liver disease prediction: a comprehensive machine learning approach integrating SHAP interpretability and multi-cohort validation. Front Endocrinol (Lausanne) 2024; 15:1450317. [PMID: 39439566 PMCID: PMC11493712 DOI: 10.3389/fendo.2024.1450317] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/17/2024] [Accepted: 09/18/2024] [Indexed: 10/25/2024] Open
Abstract
Introduction Non-alcoholic fatty liver disease (NAFLD) represents a major global health challenge, often undiagnosed because of suboptimal screening tools. Advances in machine learning (ML) offer potential improvements in predictive diagnostics, leveraging complex clinical datasets. Methods We utilized a comprehensive dataset from the Dryad database for model development and training and performed external validation using data from the National Health and Nutrition Examination Survey (NHANES) 2017-2020 cycles. Seven distinct ML models were developed and rigorously evaluated. Additionally, we employed the SHapley Additive exPlanations (SHAP) method to enhance the interpretability of the models, allowing for a detailed understanding of how each variable contributes to predictive outcomes. Results A total of 14,913 participants were eligible for this study. Among the seven constructed models, the light gradient boosting machine achieved the highest performance, with an area under the receiver operating characteristic curve of 0.90 in the internal validation set and 0.81 in the external NHANES validation cohort. In detailed performance metrics, it maintained an accuracy of 87%, a sensitivity of 92.9%, and an F1 score of 0.92. Key predictive variables identified included alanine aminotransferase, gammaglutamyl transpeptidase, triglyceride glucose-waist circumference, metabolic score for insulin resistance, and HbA1c, which are strongly associated with metabolic dysfunctions integral to NAFLD progression. Conclusions The integration of ML with SHAP interpretability provides a robust predictive tool for NAFLD, enhancing the early identification and potential management of the disease. The model's high accuracy and generalizability across diverse populations highlight its clinical utility, though future enhancements should include longitudinal data and lifestyle factors to refine risk assessments further.
Collapse
Affiliation(s)
- Bo Yang
- Department of Gastroenterology and Hepatology, Guizhou Aerospace Hospital, Zunyi, China
| | - Huaguan Lu
- Technology Innovation Center, Hunan University of Chinese Medicine, Changsha, China
| | - Yinghui Ran
- Department of Gastroenterology, Affiliated Hospital of Zunyi Medical University, Zunyi, China
| |
Collapse
|
24
|
Shinohara I, Inui A, Mifune Y, Yamaura K, Kuroda R. Posture Estimation Model Combined With Machine Learning Estimates the Radial Abduction Angle of the Thumb With High Accuracy. Cureus 2024; 16:e71034. [PMID: 39512988 PMCID: PMC11540810 DOI: 10.7759/cureus.71034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/07/2024] [Indexed: 11/15/2024] Open
Abstract
The thumb function is complex, and accurate evaluation through images or videos is difficult. Pose estimation, a technology that uses artificial intelligence (AI) to estimate skeletal detection of the body, is gaining popularity. In this study, we combined the pose estimation library MediaPipe-Hands and five machine learning (ML) models to predict the radial abduction angle of the thumb. Radial abduction movements of 20 hands from 10 healthy volunteers were captured on video and processed into 5,000 images. Angle measurements by goniometer were used as true values to evaluate the angle reliability of the MediaPipe-Hands and the angle reliability of the MediaPipe-Hands combined with ML. The correlation coefficient (CC) between the angle measured by goniometry and the angle calculated by MediaPipe-Hands was 0.84. In contrast, applying ML to MediaPipe-Hands resulted in models with improved accuracy, and all models showed high CCs (0.94-099) with angle measurements taken by goniometry. The ML model also predicted the abduction angles when the camera was taken from three different angles. In visualizing the features that the AI deemed important, the ML model predicted the abduction angle by focusing on the tip distance between the thumb and index finger along with the angle of the metacarpophalangeal joint between the thumb and middle finger. These results enable angle estimation even without frontal imaging with a camera, and expansion of this system may lead to real-time functional assessment in telemedicine and rehabilitation without the need for physical contact.
Collapse
Affiliation(s)
- Issei Shinohara
- Department of Orthopedic Surgery, Kobe University Graduate School of Medicine, Kobe, JPN
| | - Atsuyuki Inui
- Department of Orthopedic Surgery, Kobe University Graduate School of Medicine, Kobe, JPN
| | - Yutaka Mifune
- Department of Orthopedic Surgery, Kobe University Graduate School of Medicine, Kobe, JPN
| | - Kohei Yamaura
- Department of Orthopedic Surgery, Kobe University Graduate School of Medicine, Kobe, JPN
| | - Ryosuke Kuroda
- Department of Orthopedic Surgery, Kobe University Graduate School of Medicine, Kobe, JPN
| |
Collapse
|
25
|
Do VH, Nguyen VS, Nguyen SH, Le DQ, Nguyen TT, Nguyen CH, Ho TH, Vo NS, Nguyen T, Nguyen HA, Cao MD. PanKA: Leveraging population pangenome to predict antibiotic resistance. iScience 2024; 27:110623. [PMID: 39228791 PMCID: PMC11369404 DOI: 10.1016/j.isci.2024.110623] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Revised: 04/14/2024] [Accepted: 07/29/2024] [Indexed: 09/05/2024] Open
Abstract
Machine learning has the potential to be a powerful tool in the fight against antimicrobial resistance (AMR), a critical global health issue. Machine learning can identify resistance mechanisms from DNA sequence data without prior knowledge. The first step in building a machine learning model is a feature extraction from sequencing data. Traditional methods like single nucleotide polymorphism (SNP) calling and k-mer counting yield numerous, often redundant features, complicating prediction and analysis. In this paper, we propose PanKA, a method using the pangenome to extract a concise set of relevant features for predicting AMR. PanKA not only enables fast model training and prediction but also improves accuracy. Applied to the Escherichia coli and Klebsiella pneumoniae bacterial species, our model is more accurate than conventional and state-of-the-art methods in predicting AMR.
Collapse
Affiliation(s)
- Van Hoan Do
- Center for Applied Mathematics and Informatics, Le Quy Don Technical University, Hanoi, Vietnam
| | - Van Sang Nguyen
- Center for Biomedical Informatics, Vingroup Big Data Institute, Hanoi, Vietnam
| | | | - Duc Quang Le
- Faculty of IT, Hanoi University of Civil Engineering, Hanoi, Vietnam
| | - Tam Thi Nguyen
- Oxford University Clinical Research Unit, Hanoi, Vietnam
| | - Canh Hao Nguyen
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto, Japan
| | - Tho Huu Ho
- Department of Medical Microbiology, The 103 Military Hospital, Vietnam Military Medical University, Hanoi, Vietnam
- Department of Genomics & Cytogenetics, Institute of Biomedicine & Pharmacy, Vietnam Military Medical University, Hanoi, Vietnam
| | - Nam S. Vo
- Center for Biomedical Informatics, Vingroup Big Data Institute, Hanoi, Vietnam
| | | | | | | |
Collapse
|
26
|
Xu L, Li C, Gao S, Zhao L, Guan C, Shen X, Zhu Z, Guo C, Zhang L, Yang C, Bu Q, Zhou B, Xu Y. Personalized Prediction of Long-Term Renal Function Prognosis Following Nephrectomy Using Interpretable Machine Learning Algorithms: Case-Control Study. JMIR Med Inform 2024; 12:e52837. [PMID: 39303280 PMCID: PMC11452755 DOI: 10.2196/52837] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Revised: 04/08/2024] [Accepted: 07/21/2024] [Indexed: 09/22/2024] Open
Abstract
BACKGROUND Acute kidney injury (AKI) is a common adverse outcome following nephrectomy. The progression from AKI to acute kidney disease (AKD) and subsequently to chronic kidney disease (CKD) remains a concern; yet, the predictive mechanisms for these transitions are not fully understood. Interpretable machine learning (ML) models offer insights into how clinical features influence long-term renal function outcomes after nephrectomy, providing a more precise framework for identifying patients at risk and supporting improved clinical decision-making processes. OBJECTIVE This study aimed to (1) evaluate postnephrectomy rates of AKI, AKD, and CKD, analyzing long-term renal outcomes along different trajectories; (2) interpret AKD and CKD models using Shapley Additive Explanations values and Local Interpretable Model-Agnostic Explanations algorithm; and (3) develop a web-based tool for estimating AKD or CKD risk after nephrectomy. METHODS We conducted a retrospective cohort study involving patients who underwent nephrectomy between July 2012 and June 2019. Patient data were randomly split into training, validation, and test sets, maintaining a ratio of 76.5:8.5:15. Eight ML algorithms were used to construct predictive models for postoperative AKD and CKD. The performance of the best-performing models was assessed using various metrics. We used various Shapley Additive Explanations plots and Local Interpretable Model-Agnostic Explanations bar plots to interpret the model and generated directed acyclic graphs to explore the potential causal relationships between features. Additionally, we developed a web-based prediction tool using the top 10 features for AKD prediction and the top 5 features for CKD prediction. RESULTS The study cohort comprised 1559 patients. Incidence rates for AKI, AKD, and CKD were 21.7% (n=330), 15.3% (n=238), and 10.6% (n=165), respectively. Among the evaluated ML models, the Light Gradient-Boosting Machine (LightGBM) model demonstrated superior performance, with an area under the receiver operating characteristic curve of 0.97 for AKD prediction and 0.96 for CKD prediction. Performance metrics and plots highlighted the model's competence in discrimination, calibration, and clinical applicability. Operative duration, hemoglobin, blood loss, urine protein, and hematocrit were identified as the top 5 features associated with predicted AKD. Baseline estimated glomerular filtration rate, pathology, trajectories of renal function, age, and total bilirubin were the top 5 features associated with predicted CKD. Additionally, we developed a web application using the LightGBM model to estimate AKD and CKD risks. CONCLUSIONS An interpretable ML model effectively elucidated its decision-making process in identifying patients at risk of AKD and CKD following nephrectomy by enumerating critical features. The web-based calculator, found on the LightGBM model, can assist in formulating more personalized and evidence-based clinical strategies.
Collapse
Affiliation(s)
- Lingyu Xu
- Department of Nephrology, The Affiliated Hospital of Qingdao University, Qingdao, China
| | - Chenyu Li
- Department of Nephrology, The Affiliated Hospital of Qingdao University, Qingdao, China
- Medizinische Klinik und Poliklinik IV, Klinikum der Universität, Munich, Germany
| | - Shuang Gao
- Ocean University of China, Qingdao, CN, Qingdao, China
| | - Long Zhao
- Department of Nephrology, The Affiliated Hospital of Qingdao University, Qingdao, China
| | - Chen Guan
- Department of Nephrology, The Affiliated Hospital of Qingdao University, Qingdao, China
| | - Xuefei Shen
- Department of Nephrology, The Affiliated Hospital of Qingdao University, Qingdao, China
| | - Zhihui Zhu
- Center of Structural Heart Disease, Beijing Anzhen Hospital, Capital Medical University, Beijing, China
| | - Cheng Guo
- Allianz Technology, Allianz, Munich, Germany
| | - Liwei Zhang
- Institute of Diabetes and Regeneration Research, Helmholtz Diabetes Center, Helmholtz Center Munich, Neuherberg, Germany
| | - Chengyu Yang
- Department of Nephrology, The Affiliated Hospital of Qingdao University, Qingdao, China
| | - Quandong Bu
- Department of Nephrology, The Affiliated Hospital of Qingdao University, Qingdao, China
| | - Bin Zhou
- Department of Nephrology, The Affiliated Hospital of Qingdao University, Qingdao, China
| | - Yan Xu
- Department of Nephrology, The Affiliated Hospital of Qingdao University, Qingdao, China
| |
Collapse
|
27
|
Cheng Q, Wang X. Machine Learning for AI Breeding in Plants. GENOMICS, PROTEOMICS & BIOINFORMATICS 2024; 22:qzae051. [PMID: 38954837 PMCID: PMC11479635 DOI: 10.1093/gpbjnl/qzae051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/11/2024] [Revised: 06/21/2024] [Accepted: 06/25/2024] [Indexed: 07/04/2024]
Affiliation(s)
- Qian Cheng
- State Key Laboratory of Maize Bio-breeding, National Maize Improvement Center, Frontiers Science Center for Molecular Design Breeding, China Agricultural University, Beijing 100094, China
| | - Xiangfeng Wang
- State Key Laboratory of Maize Bio-breeding, National Maize Improvement Center, Frontiers Science Center for Molecular Design Breeding, China Agricultural University, Beijing 100094, China
| |
Collapse
|
28
|
Xiang Y, Xia C, Li L, Wei R, Rong T, Liu H, Lan H. Genomic prediction of yield-related traits and genome-based establishment of heterotic pattern in maize hybrid breeding of Southwest China. FRONTIERS IN PLANT SCIENCE 2024; 15:1441555. [PMID: 39315371 PMCID: PMC11416964 DOI: 10.3389/fpls.2024.1441555] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/31/2024] [Accepted: 08/21/2024] [Indexed: 09/25/2024]
Abstract
When genomic prediction is implemented in breeding maize (Zea mays L.), it can accelerate the breeding process and reduce cost to a large extent. In this study, 11 yield-related traits of maize were used to evaluate four genomic prediction methods including rrBLUP, HEBLP|A, RF, and LightGBM. In all the 11 traits, rrBLUP had similar predictive accuracy to HEBLP|A, and so did RF to LightGBM, but rrBLUP and HEBLP|A outperformed RF and LightGBM in 8 traits. Furthermore, genomic prediction-based heterotic pattern of yield was established based on 64620 crosses of maize in Southwest China, and the result showed that one of the parent lines of the top 5% crosses came from temp-tropic or tropic germplasm, which is highly consistent with the actual situation in breeding, and that heterotic pattern (Reid+ × Suwan+) will be a major heterotic pattern of Southwest China in the future.
Collapse
Affiliation(s)
- Yong Xiang
- Maize Research Institute/State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Sichuan Agricultural University, Chengdu, Sichuan, China
| | - Chao Xia
- Maize Research Institute/State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Sichuan Agricultural University, Chengdu, Sichuan, China
| | - Lujiang Li
- Maize Research Institute/State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Sichuan Agricultural University, Chengdu, Sichuan, China
| | - Rujun Wei
- Maize Research Institute/State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Sichuan Agricultural University, Chengdu, Sichuan, China
| | - Tingzhao Rong
- Maize Research Institute/State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Sichuan Agricultural University, Chengdu, Sichuan, China
| | - Hailan Liu
- Maize Research Institute/State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Sichuan Agricultural University, Chengdu, Sichuan, China
| | - Hai Lan
- Maize Research Institute/State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Sichuan Agricultural University, Chengdu, Sichuan, China
| |
Collapse
|
29
|
Ren Y, Wu C, Zhou H, Hu X, Miao Z. Dual-extraction modeling: A multi-modal deep-learning architecture for phenotypic prediction and functional gene mining of complex traits. PLANT COMMUNICATIONS 2024; 5:101002. [PMID: 38872306 DOI: 10.1016/j.xplc.2024.101002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/25/2024] [Revised: 05/27/2024] [Accepted: 06/11/2024] [Indexed: 06/15/2024]
Abstract
Despite considerable advances in extracting crucial insights from bio-omics data to unravel the intricate mechanisms underlying complex traits, the absence of a universal multi-modal computational tool with robust interpretability for accurate phenotype prediction and identification of trait-associated genes remains a challenge. This study introduces the dual-extraction modeling (DEM) approach, a multi-modal deep-learning architecture designed to extract representative features from heterogeneous omics datasets, enabling the prediction of complex trait phenotypes. Through comprehensive benchmarking experiments, we demonstrate the efficacy of DEM in classification and regression prediction of complex traits. DEM consistently exhibits superior accuracy, robustness, generalizability, and flexibility. Notably, we establish its effectiveness in predicting pleiotropic genes that influence both flowering time and rosette leaf number, underscoring its commendable interpretability. In addition, we have developed user-friendly software to facilitate seamless utilization of DEM's functions. In summary, this study presents a state-of-the-art approach with the ability to effectively predict qualitative and quantitative traits and identify functional genes, confirming its potential as a valuable tool for exploring the genetic basis of complex traits.
Collapse
Affiliation(s)
- Yanlin Ren
- State Key Laboratory for Crop Stress Resistance and High-Efficiency Production, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Chenhua Wu
- State Key Laboratory for Crop Stress Resistance and High-Efficiency Production, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - He Zhou
- State Key Laboratory for Crop Stress Resistance and High-Efficiency Production, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Xiaona Hu
- College of Chemistry & Pharmacy, Northwest A&F University, Yangling, Shaanxi 712100, China.
| | - Zhenyan Miao
- State Key Laboratory for Crop Stress Resistance and High-Efficiency Production, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, Shaanxi 712100, China; Key Laboratory of Biology and Genetics Improvement of Maize in Arid Area of Northwest Region, Ministry of Agriculture, Northwest A&F University, Yangling, Shaanxi 712100, China.
| |
Collapse
|
30
|
Ge R, Xia Y, Jiang M, Jia G, Jing X, Li Y, Cai Y. HybAVPnet: A Novel Hybrid Network Architecture for Antiviral Peptides Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:1358-1365. [PMID: 38587961 DOI: 10.1109/tcbb.2024.3385635] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/10/2024]
Abstract
Viruses pose a great threat to human production and life, thus the research and development of antiviral drugs is urgently needed. Antiviral peptides play an important role in drug design and development. Compared with the time-consuming and laborious wet chemical experiment methods, it is critical to use computational methods to predict antiviral peptides accurately and rapidly. However, due to limited data, accurate prediction of antiviral peptides is still challenging and extracting effective feature representations from sequences is crucial for creating accurate models. This study introduces a novel two-step approach, named HybAVPnet, to predict antiviral peptides with a hybrid network architecture based on neural networks and traditional machine learning methods. We adopted a stacking-like structure to capture both the long-term dependencies and local evolution information to achieve a comprehensive and diverse prediction using the predicted labels and probabilities. Using an ensemble technique with the different kinds of features can reduce the variance without increasing the bias. The experimental result shows HybAVPnet can achieve better and more robust performance compared with the state-of-the-art methods, which makes it useful for the research and development of antiviral drugs. Meanwhile, it can also be extended to other peptide recognition problems because of its generalization ability.
Collapse
|
31
|
Ba Q, Yuan X, Wang Y, Shen N, Xie H, Lu Y. Development and Validation of Machine Learning Algorithms for Prediction of Colorectal Polyps Based on Electronic Health Records. Biomedicines 2024; 12:1955. [PMID: 39335469 PMCID: PMC11429196 DOI: 10.3390/biomedicines12091955] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2024] [Revised: 08/02/2024] [Accepted: 08/22/2024] [Indexed: 09/30/2024] Open
Abstract
BACKGROUND Colorectal Polyps are the main source of precancerous lesions in colorectal cancer. To increase the early diagnosis of tumors and improve their screening, we aimed to develop a simple and non-invasive diagnostic prediction model for colorectal polyps based on machine learning (ML) and using accessible health examination records. METHODS We conducted a single-center observational retrospective study in China. The derivation cohort, consisting of 5426 individuals who underwent colonoscopy screening from January 2021 to January 2024, was separated for training (cohort 1) and validation (cohort 2). The variables considered in this study included demographic data, vital signs, and laboratory results recorded by health examination records. With features selected by univariate analysis and Lasso regression analysis, nine machine learning methods were utilized to develop a colorectal polyp diagnostic model. Several evaluation indexes, including the area under the receiver-operating-characteristic curve (AUC), were used to compare the predictive performance. The SHapley additive explanation method (SHAP) was used to rank the feature importance and explain the final model. RESULTS 14 independent predictors were identified as the most valuable features to establish the models. The adaptive boosting machine (AdaBoost) model exhibited the best performance among the 9 ML models in cohort 1, with accuracy, sensitivity, specificity, positive predictive value, negative predictive value, F1 score, and AUC (95% CI) of 0.632 (0.618-0.646), 0.635 (0.550-0.721), 0.674 (0.591-0.758), 0.593 (0.576-0.611), 0.673 (0.654-0.691), 0.608 (0.560-0.655) and 0.687 (0.626-0.749), respectively. The final model gave an AUC of 0.675 in cohort 2. Additionally, the precision recall (PR) curve for the AdaBoost model reached the highest AUPR of 0.648, positioning it nearest to the upper right corner. SHAP analysis provided visualized explanations, reaffirming the critical factors associated with the risk of colorectal polyps in the asymptomatic population. CONCLUSIONS This study integrated the clinical and laboratory indicators with machine learning techniques to establish the predictive model for colorectal polyps, providing non-invasive, cost-effective screening strategies for asymptomatic individuals and guiding decisions for further examination and treatment.
Collapse
Affiliation(s)
- Qinwen Ba
- Department of Laboratory Medicine, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China
| | - Xu Yuan
- Department of Laboratory Medicine, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China
| | - Yun Wang
- Department of Laboratory Medicine, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China
| | - Na Shen
- Department of Laboratory Medicine, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China
| | - Huaping Xie
- Department of Gastroenterology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China
| | - Yanjun Lu
- Department of Laboratory Medicine, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China
| |
Collapse
|
32
|
Deng L, Zhao J, Wang T, Liu B, Jiang J, Jia P, Liu D, Li G. Construction and validation of predictive models for intravenous immunoglobulin-resistant Kawasaki disease using an interpretable machine learning approach. Clin Exp Pediatr 2024; 67:405-414. [PMID: 39048087 PMCID: PMC11298769 DOI: 10.3345/cep.2024.00549] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/26/2024] [Revised: 04/27/2024] [Accepted: 05/10/2024] [Indexed: 07/27/2024] Open
Abstract
BACKGROUND Intravenous immunoglobulin (IVIG)-resistant Kawasaki disease is associated with coronary artery lesion development. PURPOSE This study aimed to explore the factors associated with IVIG-resistance and construct and validate an interpretable machine learning (ML) prediction model in clinical practice. METHODS Between December 2014 and November 2022, 602 patients were screened and risk factors for IVIG-resistance investigated. Five ML models are used to establish an optimal prediction model. The SHapley Additive exPlanations (SHAP) method was used to interpret the ML model. RESULTS Na+, hemoglobin (Hb), C-reactive protein (CRP), and globulin were independent risk factors for IVIG-resistance. A nonlinear relationship was identified between globulin level and IVIG-resistance. The XGBoost model exhibited excellent performance, with an area under the receiver operating characteristic curve of 0.821, accuracy of 0.748, sensitivity of 0.889, and specificity of 0.683 in the testing set. The XGBoost model was interpreted globally and locally using the SHAP method. CONCLUSION Na+, Hb, CRP, and globulin levels were independently associated with IVIG-resistance. Our findings demonstrate that ML models can reliably predict IVIG-resistance. Moreover, use of the SHAP method to interpret the established XGBoost model's findings would provide evidence of IVIG-resistance and guide the individualized treatment of Kawasaki disease.
Collapse
Affiliation(s)
- Linfan Deng
- Department of Pediatrics, The Affiliated Hospital of Southwest Medical University, Luzhou, China
- Sichuan Clinical Research Center for Birth Defects, Luzhou, China
- Mianyang Central Hospital, School of Medicine, University of Electronic Science and Technology of China, Mianyang, China
| | - Jian Zhao
- Department of Pediatrics, The Affiliated Hospital of Southwest Medical University, Luzhou, China
- Sichuan Clinical Research Center for Birth Defects, Luzhou, China
| | - Ting Wang
- Department of Pediatrics, The Affiliated Hospital of Southwest Medical University, Luzhou, China
- Sichuan Clinical Research Center for Birth Defects, Luzhou, China
| | - Bin Liu
- Department of Pediatrics, The Affiliated Hospital of Southwest Medical University, Luzhou, China
- Sichuan Clinical Research Center for Birth Defects, Luzhou, China
| | - Jun Jiang
- Department of General Surgery (Thyroid Surgery), The Affiliated Hospital of Southwest Medical University, Luzhou, China
- Metabolic Vascular Diseases Key Laboratory of Sichuan Province, Luzhou, China
| | - Peng Jia
- Department of Pediatrics, Zigong First People’s Hospital, Zigong, China
| | - Dong Liu
- Department of Pediatrics, The Affiliated Hospital of Southwest Medical University, Luzhou, China
- Sichuan Clinical Research Center for Birth Defects, Luzhou, China
| | - Gang Li
- Department of Pediatrics, The Affiliated Hospital of Southwest Medical University, Luzhou, China
- Sichuan Clinical Research Center for Birth Defects, Luzhou, China
| |
Collapse
|
33
|
Abou Hajal A, Bryce RA, Amor BB, Atatreh N, Ghattas MA. Boosting the Accuracy and Chemical Space Coverage of the Detection of Small Colloidal Aggregating Molecules Using the BAD Molecule Filter. J Chem Inf Model 2024; 64:4991-5005. [PMID: 38920403 DOI: 10.1021/acs.jcim.4c00363] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/27/2024]
Abstract
The ability to conduct effective high throughput screening (HTS) campaigns in drug discovery is often hampered by the detection of false positives in these assays due to small colloidally aggregating molecules (SCAMs). SCAMs can produce artifactual hits in HTS by nonspecific inhibition of the protein target. In this work, we present a new computational prediction tool for detecting SCAMs based on their 2D chemical structure. The tool, called the boosted aggregation detection (BAD) molecule filter, employs decision tree ensemble methods, namely, the CatBoost classifier and the light gradient-boosting machine, to significantly improve the detection of SCAMs. In developing the filter, we explore models trained on individual data sets, a consensus approach using these models, and, third, a merged data set approach, each tailored for specific drug discovery needs. The individual data set method emerged as most effective, achieving 93% sensitivity and 90% specificity, outperforming existing state-of-the-art models by 20 and 5%, respectively. The consensus models offer broader chemical space coverage, exceeding 90% for all testing sets. This feature is an important aspect particularly for early stage medicinal chemistry projects, and provides information on applicability domain. Meanwhile, the merged data set models demonstrated robust performance, with a notable sensitivity of 79% in the comprehensive 10-fold cross-validation test set. A SHAP analysis of model features indicates the importance of hydrophobicity and molecular complexity as primary factors influencing the aggregation propensity. The BAD molecule filter is readily accessible for the public usage on https://molmodlab-aau.com/Tools.html. This filter provides a new, more robust tool for aggregate prediction in the early stages of drug discovery to optimize hit rates and reduce associated testing and validation overheads.
Collapse
Affiliation(s)
- Abdallah Abou Hajal
- College of Pharmacy, Al Ain University, Abu Dhabi 112612, United Arab Emirates
- AAU Health and Biomedical Research Center, Al Ain University, Abu Dhabi 112612, United Arab Emirates
| | - Richard A Bryce
- Division of Pharmacy and Optometry, School of Health Sciences, University of Manchester, Oxford Road, Manchester M13 9PL, U.K
| | - Boulbaba Ben Amor
- Core42, Inception/G42, Abu Dhabi 2282, United Arab Emirates
- IMT Nord Europe, Villeneuve D'Ascq 59650 France
| | - Noor Atatreh
- College of Pharmacy, Al Ain University, Abu Dhabi 112612, United Arab Emirates
- AAU Health and Biomedical Research Center, Al Ain University, Abu Dhabi 112612, United Arab Emirates
| | - Mohammad A Ghattas
- College of Pharmacy, Al Ain University, Abu Dhabi 112612, United Arab Emirates
- AAU Health and Biomedical Research Center, Al Ain University, Abu Dhabi 112612, United Arab Emirates
| |
Collapse
|
34
|
Li J, Zhang D, Yang F, Zhang Q, Pan S, Zhao X, Zhang Q, Han Y, Yang J, Wang K, Zhao C. TrG2P: A transfer-learning-based tool integrating multi-trait data for accurate prediction of crop yield. PLANT COMMUNICATIONS 2024; 5:100975. [PMID: 38751121 PMCID: PMC11287160 DOI: 10.1016/j.xplc.2024.100975] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Revised: 04/14/2024] [Accepted: 05/11/2024] [Indexed: 06/24/2024]
Abstract
Yield prediction is the primary goal of genomic selection (GS)-assisted crop breeding. Because yield is a complex quantitative trait, making predictions from genotypic data is challenging. Transfer learning can produce an effective model for a target task by leveraging knowledge from a different, but related, source domain and is considered a great potential method for improving yield prediction by integrating multi-trait data. However, it has not previously been applied to genotype-to-phenotype prediction owing to the lack of an efficient implementation framework. We therefore developed TrG2P, a transfer-learning-based framework. TrG2P first employs convolutional neural networks (CNN) to train models using non-yield-trait phenotypic and genotypic data, thus obtaining pre-trained models. Subsequently, the convolutional layer parameters from these pre-trained models are transferred to the yield prediction task, and the fully connected layers are retrained, thus obtaining fine-tuned models. Finally, the convolutional layer and the first fully connected layer of the fine-tuned models are fused, and the last fully connected layer is trained to enhance prediction performance. We applied TrG2P to five sets of genotypic and phenotypic data from maize (Zea mays), rice (Oryza sativa), and wheat (Triticum aestivum) and compared its model precision to that of seven other popular GS tools: ridge regression best linear unbiased prediction (rrBLUP), random forest, support vector regression, light gradient boosting machine (LightGBM), CNN, DeepGS, and deep neural network for genomic prediction (DNNGP). TrG2P improved the accuracy of yield prediction by 39.9%, 6.8%, and 1.8% in rice, maize, and wheat, respectively, compared with predictions generated by the best-performing comparison model. Our work therefore demonstrates that transfer learning is an effective strategy for improving yield prediction by integrating information from non-yield-trait data. We attribute its enhanced prediction accuracy to the valuable information available from traits associated with yield and to training dataset augmentation. The Python implementation of TrG2P is available at https://github.com/lijinlong1991/TrG2P. The web-based tool is available at http://trg2p.ebreed.cn:81.
Collapse
Affiliation(s)
- Jinlong Li
- Information Technology Research Center, Beijing Academy of Agriculture and Forestry Sciences, Beijing 100097, China; National Engineering Research Center for Information Technology in Agriculture, Beijing 100097, China
| | - Dongfeng Zhang
- Information Technology Research Center, Beijing Academy of Agriculture and Forestry Sciences, Beijing 100097, China; National Engineering Research Center for Information Technology in Agriculture, Beijing 100097, China
| | - Feng Yang
- Information Technology Research Center, Beijing Academy of Agriculture and Forestry Sciences, Beijing 100097, China; National Engineering Research Center for Information Technology in Agriculture, Beijing 100097, China
| | - Qiusi Zhang
- Information Technology Research Center, Beijing Academy of Agriculture and Forestry Sciences, Beijing 100097, China; National Engineering Research Center for Information Technology in Agriculture, Beijing 100097, China
| | - Shouhui Pan
- Information Technology Research Center, Beijing Academy of Agriculture and Forestry Sciences, Beijing 100097, China; National Engineering Research Center for Information Technology in Agriculture, Beijing 100097, China
| | - Xiangyu Zhao
- Information Technology Research Center, Beijing Academy of Agriculture and Forestry Sciences, Beijing 100097, China; National Engineering Research Center for Information Technology in Agriculture, Beijing 100097, China
| | - Qi Zhang
- Information Technology Research Center, Beijing Academy of Agriculture and Forestry Sciences, Beijing 100097, China; National Engineering Research Center for Information Technology in Agriculture, Beijing 100097, China
| | - Yanyun Han
- Information Technology Research Center, Beijing Academy of Agriculture and Forestry Sciences, Beijing 100097, China; National Engineering Research Center for Information Technology in Agriculture, Beijing 100097, China
| | - Jinliang Yang
- Department of Agronomy and Horticulture, University of Nebraska-Lincoln, Lincoln, NE 68583, USA; Center for Plant Science Innovation, University of Nebraska-Lincoln, Lincoln, NE 68583, USA
| | - Kaiyi Wang
- Information Technology Research Center, Beijing Academy of Agriculture and Forestry Sciences, Beijing 100097, China; National Engineering Research Center for Information Technology in Agriculture, Beijing 100097, China.
| | - Chunjiang Zhao
- Information Technology Research Center, Beijing Academy of Agriculture and Forestry Sciences, Beijing 100097, China; National Engineering Research Center for Information Technology in Agriculture, Beijing 100097, China.
| |
Collapse
|
35
|
Duan H, Dai X, Shi Q, Cheng Y, Ge Y, Chang S, Liu W, Wang F, Shi H, Hu J. Enhancing genome-wide populus trait prediction through deep convolutional neural networks. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2024; 119:735-745. [PMID: 38741374 DOI: 10.1111/tpj.16790] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/12/2024] [Revised: 04/02/2024] [Accepted: 04/18/2024] [Indexed: 05/16/2024]
Abstract
As a promising model, genome-based plant breeding has greatly promoted the improvement of agronomic traits. Traditional methods typically adopt linear regression models with clear assumptions, neither obtaining the linkage between phenotype and genotype nor providing good ideas for modification. Nonlinear models are well characterized in capturing complex nonadditive effects, filling this gap under traditional methods. Taking populus as the research object, this paper constructs a deep learning method, DCNGP, which can effectively predict the traits including 65 phenotypes. The method was trained on three datasets, and compared with other four classic models-Bayesian ridge regression (BRR), Elastic Net, support vector regression, and dualCNN. The results show that DCNGP has five typical advantages in performance: strong prediction ability on multiple experimental datasets; the incorporation of batch normalization layers and Early-Stopping technology enhancing the generalization capabilities and prediction stability on test data; learning potent features from the data and thus circumventing the tedious steps of manual production; the introduction of a Gaussian Noise layer enhancing predictive capabilities in the case of inherent uncertainties or perturbations; fewer hyperparameters aiding to reduce tuning time across datasets and improve auto-search efficiency. In this way, DCNGP shows powerful predictive ability from genotype to phenotype, which provide an important theoretical reference for building more robust populus breeding programs.
Collapse
Affiliation(s)
- Huaichuan Duan
- Laboratory of Tumor Targeted and Immune Therapy, Clinical Research Center for Breast, State Key Laboratory of Biotherapy, West China Hospital, Sichuan University and Collaborative Innovation Center for Biotherapy, Chengdu, China
- Key Laboratory of Medicinal and Edible Plants Resources Development of Sichuan Education Department, School of Pharmacy, Chengdu University, Chengdu, China
| | - Xiangwei Dai
- School of Computer Science and Artificial Intelligence, Changzhou University, Changzhou, China
| | - Quanshan Shi
- Key Laboratory of Medicinal and Edible Plants Resources Development of Sichuan Education Department, School of Pharmacy, Chengdu University, Chengdu, China
| | - Yan Cheng
- Laboratory of Tumor Targeted and Immune Therapy, Clinical Research Center for Breast, State Key Laboratory of Biotherapy, West China Hospital, Sichuan University and Collaborative Innovation Center for Biotherapy, Chengdu, China
| | - Yutong Ge
- Key Laboratory of Medicinal and Edible Plants Resources Development of Sichuan Education Department, School of Pharmacy, Chengdu University, Chengdu, China
| | - Shan Chang
- School of Computer Science and Artificial Intelligence, Changzhou University, Changzhou, China
| | - Wei Liu
- School of Life Science, Leshan Normal University, Leshan, China
| | - Feng Wang
- School of Computer Science and Artificial Intelligence, Changzhou University, Changzhou, China
- School of Computer Engineering, Suzhou Vocational University, Suzhou, China
| | - Hubing Shi
- Laboratory of Tumor Targeted and Immune Therapy, Clinical Research Center for Breast, State Key Laboratory of Biotherapy, West China Hospital, Sichuan University and Collaborative Innovation Center for Biotherapy, Chengdu, China
| | - Jianping Hu
- Key Laboratory of Medicinal and Edible Plants Resources Development of Sichuan Education Department, School of Pharmacy, Chengdu University, Chengdu, China
| |
Collapse
|
36
|
Yang S, Xu P. HemoDL: Hemolytic peptides prediction by double ensemble engines from Rich sequence-derived and transformer-enhanced information. Anal Biochem 2024; 690:115523. [PMID: 38552762 DOI: 10.1016/j.ab.2024.115523] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Revised: 03/20/2024] [Accepted: 03/22/2024] [Indexed: 04/02/2024]
Abstract
Hemolytic peptides can trigger hemolysis by rupturing red blood cells' membranes and triggering cell disruption. Due to the labor-intensive and time-consuming in-lab identification process, accurate, high-throughput hemolytic peptide prediction is crucial for the growth of peptide sequence data in proteomics and peptidomics. In this study, we offer the HemoDL ensemble learning model, which learns the distinct distribution of sequence characteristics for predicting the hemolytic activity of peptides using a double LightGBM framework. To determine the most informative encoding features, we compare 17 widely used features across four benchmark datasets. Our investigation reveals that CTD, BPF, Charge, AAC, GDPC, ATC, QSO, and transformer-based features exhibit more positive contributions to detecting the hemolytic activity of peptides. Comparison with eight state-of-the-art methods demonstrates that HemoDL outperforms other models, attaining higher Matthews Correlation Coefficient values on four test datasets, ranging from 6.30% to 16.04%, 6.63%-11.26%, 4.76%-9.92%, and 7.41%-15.03%, respectively. Additionally, we provide the HemoDL with a user-friendly graphical interface available at https://github.com/abcair/HemoDL. In summary, the HemoDL model, leveraging CTD, BPF, Charge, AAC, GDPC, ATC, QSO and transformer-based encoding features within a double LightGBM learning framework, achieves high accuracy in predicting the hemolytic activity of peptides.
Collapse
Affiliation(s)
- Sen Yang
- School of Computer Science and Artificial Intelligence Aliyun School of Big Data School of Software, Changzhou University, Changzhou, 213164, China; The Affiliated Changzhou No.2 People's Hospital of Nanjing Medical University, Changzhou, 213164, China
| | - Piao Xu
- College of Economics and Management, Nanjing Forestry University, China.
| |
Collapse
|
37
|
Li H, Jiang L, Yang K, Shang S, Li M, Lv Z. iNP_ESM: Neuropeptide Identification Based on Evolutionary Scale Modeling and Unified Representation Embedding Features. Int J Mol Sci 2024; 25:7049. [PMID: 39000158 PMCID: PMC11240975 DOI: 10.3390/ijms25137049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2024] [Revised: 06/17/2024] [Accepted: 06/25/2024] [Indexed: 07/16/2024] Open
Abstract
Neuropeptides are biomolecules with crucial physiological functions. Accurate identification of neuropeptides is essential for understanding nervous system regulatory mechanisms. However, traditional analysis methods are expensive and laborious, and the development of effective machine learning models continues to be a subject of current research. Hence, in this research, we constructed an SVM-based machine learning neuropeptide predictor, iNP_ESM, by integrating protein language models Evolutionary Scale Modeling (ESM) and Unified Representation (UniRep) for the first time. Our model utilized feature fusion and feature selection strategies to improve prediction accuracy during optimization. In addition, we validated the effectiveness of the optimization strategy with UMAP (Uniform Manifold Approximation and Projection) visualization. iNP_ESM outperforms existing models on a variety of machine learning evaluation metrics, with an accuracy of up to 0.937 in cross-validation and 0.928 in independent testing, demonstrating optimal neuropeptide recognition capabilities. We anticipate improved neuropeptide data in the future, and we believe that the iNP_ESM model will have broader applications in the research and clinical treatment of neurological diseases.
Collapse
Affiliation(s)
- Honghao Li
- College of Biomedical Engineering, Sichuan University, Chengdu 610041, China
| | - Liangzhen Jiang
- College of Food and Biological Engineering, Chengdu University, Chengdu 610106, China
- Country Key Laboratory of Coarse Cereal Processing, Ministry of Agriculture and Rural Affairs, Chengdu 610106, China
| | - Kaixiang Yang
- College of Software Engineering, Sichuan University, Chengdu 610041, China
| | - Shulin Shang
- College of Biomedical Engineering, Sichuan University, Chengdu 610041, China
| | - Mingxin Li
- College of Biomedical Engineering, Sichuan University, Chengdu 610041, China
| | - Zhibin Lv
- College of Biomedical Engineering, Sichuan University, Chengdu 610041, China
| |
Collapse
|
38
|
Wang XY, Ren CX, Fan QW, Xu YP, Wang LW, Mao ZL, Cai XZ. Integrated Assays of Genome-Wide Association Study, Multi-Omics Co-Localization, and Machine Learning Associated Calcium Signaling Genes with Oilseed Rape Resistance to Sclerotinia sclerotiorum. Int J Mol Sci 2024; 25:6932. [PMID: 39000053 PMCID: PMC11240920 DOI: 10.3390/ijms25136932] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2024] [Revised: 06/20/2024] [Accepted: 06/20/2024] [Indexed: 07/14/2024] Open
Abstract
Sclerotinia sclerotiorum (Ss) is one of the most devastating fungal pathogens, causing huge yield loss in multiple economically important crops including oilseed rape. Plant resistance to Ss pertains to quantitative disease resistance (QDR) controlled by multiple minor genes. Genome-wide identification of genes involved in QDR to Ss is yet to be conducted. In this study, we integrated several assays including genome-wide association study (GWAS), multi-omics co-localization, and machine learning prediction to identify, on a genome-wide scale, genes involved in the oilseed rape QDR to Ss. Employing GWAS and multi-omics co-localization, we identified seven resistance-associated loci (RALs) associated with oilseed rape resistance to Ss. Furthermore, we developed a machine learning algorithm and named it Integrative Multi-Omics Analysis and Machine Learning for Target Gene Prediction (iMAP), which integrates multi-omics data to rapidly predict disease resistance-related genes within a broad chromosomal region. Through iMAP based on the identified RALs, we revealed multiple calcium signaling genes related to the QDR to Ss. Population-level analysis of selective sweeps and haplotypes of variants confirmed the positive selection of the predicted calcium signaling genes during evolution. Overall, this study has developed an algorithm that integrates multi-omics data and machine learning methods, providing a powerful tool for predicting target genes associated with specific traits. Furthermore, it makes a basis for further understanding the role and mechanisms of calcium signaling genes in the QDR to Ss.
Collapse
Affiliation(s)
- Xin-Yao Wang
- Key Laboratory of Biology and Ecological Control of Crop Pathogens and Insects of Zhejiang Province, Institute of Biotechnology, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 310058, China; (X.-Y.W.); (C.-X.R.); (Q.-W.F.); (L.-W.W.); (Z.-L.M.)
| | - Chun-Xiu Ren
- Key Laboratory of Biology and Ecological Control of Crop Pathogens and Insects of Zhejiang Province, Institute of Biotechnology, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 310058, China; (X.-Y.W.); (C.-X.R.); (Q.-W.F.); (L.-W.W.); (Z.-L.M.)
| | - Qing-Wen Fan
- Key Laboratory of Biology and Ecological Control of Crop Pathogens and Insects of Zhejiang Province, Institute of Biotechnology, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 310058, China; (X.-Y.W.); (C.-X.R.); (Q.-W.F.); (L.-W.W.); (Z.-L.M.)
| | - You-Ping Xu
- Centre of Analysis and Measurement, Zhejiang University, 866 Yu Hang Tang Road, Hangzhou 310058, China;
| | - Lu-Wen Wang
- Key Laboratory of Biology and Ecological Control of Crop Pathogens and Insects of Zhejiang Province, Institute of Biotechnology, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 310058, China; (X.-Y.W.); (C.-X.R.); (Q.-W.F.); (L.-W.W.); (Z.-L.M.)
| | - Zhou-Lu Mao
- Key Laboratory of Biology and Ecological Control of Crop Pathogens and Insects of Zhejiang Province, Institute of Biotechnology, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 310058, China; (X.-Y.W.); (C.-X.R.); (Q.-W.F.); (L.-W.W.); (Z.-L.M.)
| | - Xin-Zhong Cai
- Key Laboratory of Biology and Ecological Control of Crop Pathogens and Insects of Zhejiang Province, Institute of Biotechnology, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 310058, China; (X.-Y.W.); (C.-X.R.); (Q.-W.F.); (L.-W.W.); (Z.-L.M.)
- Hainan Institute, Zhejiang University, Sanya 572025, China
| |
Collapse
|
39
|
Lin N, Shao X, Wu H, Jiang R, Wu M. Heavy Metal Concentration Estimation for Different Farmland Soils Based on Projection Pursuit and LightGBM with Hyperspectral Images. SENSORS (BASEL, SWITZERLAND) 2024; 24:3251. [PMID: 38794105 PMCID: PMC11125194 DOI: 10.3390/s24103251] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/15/2024] [Revised: 05/12/2024] [Accepted: 05/19/2024] [Indexed: 05/26/2024]
Abstract
Heavy metal pollution in farmland soil threatens soil environmental quality. It is an important task to quickly grasp the status of heavy metal pollution in farmland soil in a region. Hyperspectral remote sensing technology has been widely used in soil heavy metal concentration monitoring. How to improve the accuracy and reliability of its estimation model is a hot topic. This study analyzed 440 soil samples from Sihe Town and the surrounding agricultural areas in Yushu City, Jilin Province. Considering the differences between different types of soils, a local regression model of heavy metal concentrations (As and Cu) was established based on projection pursuit (PP) and light gradient boosting machine (LightGBM) algorithms. Based on the estimations, a spatial distribution map of soil heavy metals in the region was drawn. The findings of this study showed that considering the differences between different soils to construct a local regression estimation model of soil heavy metal concentration improved the estimation accuracy. Specifically, the relative percent difference (RPD) of As and Cu element estimations in black soil increased the most, by 0.30 and 0.26, respectively. The regional spatial distribution map of heavy metal concentration derived from local regression showed high spatial variability. The number of characteristic bands screened by the PP method accounted for 10-13% of the total spectral bands, effectively reducing the model complexity. Compared with the traditional machine model, the LightGBM model showed better estimation ability, and the highest determination coefficients (R2) of different soil validation sets reached 0.73 (As) and 0.75 (Cu), respectively. In this study, the constructed PP-LightGBM estimation model takes into account the differences in soil types, which effectively improves the accuracy and reliability of hyperspectral image estimation of soil heavy metal concentration and provides a reference for drawing large-scale spatial distributions of heavy metals from hyperspectral images and mastering soil environmental quality.
Collapse
Affiliation(s)
- Nan Lin
- College of Surveying and Exploration Engineering, Jilin Jianzhu University, Changchun 130118, China; (N.L.); (X.S.); (M.W.)
- Jilin Province Natural Resources Remote Sensing Information Technology Innovation Laboratory, Changchun 130118, China
| | - Xiaofan Shao
- College of Surveying and Exploration Engineering, Jilin Jianzhu University, Changchun 130118, China; (N.L.); (X.S.); (M.W.)
| | - Huizhi Wu
- Henan Academy of Geology, Zhengzhou 450016, China
| | - Ranzhe Jiang
- College of Biological and Agricultural Engineering, Jilin University, Changchun 130012, China;
| | - Menghong Wu
- College of Surveying and Exploration Engineering, Jilin Jianzhu University, Changchun 130118, China; (N.L.); (X.S.); (M.W.)
- College of Resource and Environmental Science, Jilin Agricultural University, Changchun 130118, China
| |
Collapse
|
40
|
Tong K, Chen X, Yan S, Dai L, Liao Y, Li Z, Wang T. PlantMine: A Machine-Learning Framework to Detect Core SNPs in Rice Genomics. Genes (Basel) 2024; 15:603. [PMID: 38790232 PMCID: PMC11120712 DOI: 10.3390/genes15050603] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2024] [Revised: 05/05/2024] [Accepted: 05/07/2024] [Indexed: 05/26/2024] Open
Abstract
As a fundamental global staple crop, rice plays a pivotal role in human nutrition and agricultural production systems. However, its complex genetic architecture and extensive trait variability pose challenges for breeders and researchers in optimizing yield and quality. Particularly to expedite breeding methods like genomic selection, isolating core SNPs related to target traits from genome-wide data reduces irrelevant mutation noise, enhancing computational precision and efficiency. Thus, exploring efficient computational approaches to mine core SNPs is of great importance. This study introduces PlantMine, an innovative computational framework that integrates feature selection and machine learning techniques to effectively identify core SNPs critical for the improvement of rice traits. Utilizing the dataset from the 3000 Rice Genomes Project, we applied different algorithms for analysis. The findings underscore the effectiveness of combining feature selection with machine learning in accurately identifying core SNPs, offering a promising avenue to expedite rice breeding efforts and improve crop productivity and resilience to stress.
Collapse
Affiliation(s)
- Kai Tong
- School of Biological Engineering, Sichuan University of Science & Engineering, Yibin 644000, China; (K.T.); (L.D.); (Y.L.)
| | - Xiaojing Chen
- National Agriculture Science Data Center, Agricultural Information Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China;
- National Nanfan Research Institute, Chinese Academy of Agricultural Sciences, Sanya 572024, China
| | - Shen Yan
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China;
| | - Liangli Dai
- School of Biological Engineering, Sichuan University of Science & Engineering, Yibin 644000, China; (K.T.); (L.D.); (Y.L.)
| | - Yuxue Liao
- School of Biological Engineering, Sichuan University of Science & Engineering, Yibin 644000, China; (K.T.); (L.D.); (Y.L.)
| | - Zhaoling Li
- School of Biological Engineering, Sichuan University of Science & Engineering, Yibin 644000, China; (K.T.); (L.D.); (Y.L.)
| | - Ting Wang
- Agricultural Information Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China
- Key Laboratory of Big Agri-Data, Ministry of Agriculture and Rural Areas, Beijing 100081, China
| |
Collapse
|
41
|
Ehara Y, Inui A, Mifune Y, Nishimoto H, Yamaura K, Kato T, Furukawa T, Tanaka S, Kusunose M, Takigami S, Kuroda R. Estimating the Thumb Rotation Angle by Using a Tablet Device With a Posture Estimation Artificial Intelligence Model. Cureus 2024; 16:e59657. [PMID: 38707751 PMCID: PMC11069636 DOI: 10.7759/cureus.59657] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/03/2024] [Indexed: 05/07/2024] Open
Abstract
MediaPipe Hand (MediaPipe) is an artificial intelligence (AI)-based pose estimation library. In this study, MediaPipe was combined with four machine learning (ML) models to estimate the rotation angle of the thumb. Videos of the right hands of 15 healthy volunteers were recorded and processed into 9000 images. The rotation angle of the thumb (defined as angle θ from the palmar plane, which is defined as 0°) was measured using an angle measuring device, expressed in a radian system. Angle θ was then estimated by the ML model by using parameters calculated from the hand coordinates detected by MediaPipe. The linear regression model showed a root mean square error (RMSE) of 12.23, a mean absolute error (MAE) of 9.9, and a correlation coefficient of 0.91. The ElasticNet model showed an RMSE of 12.23, an MAE of 9.95, and a correlation coefficient of 0.91; the support vector machine (SVM) model showed an RMSE of 4.7, an MAE of 2.5, and a correlation coefficient of 0.99. The LightGBM model achieved high values: an RMSE of 4.58, an MAE of 2.62, and a correlation coefficient of 0.99. Based on these findings, we concluded that the thumb rotation angle can be estimated with high accuracy by combining MediaPipe and ML.
Collapse
Affiliation(s)
- Yutaka Ehara
- Department of Orthopaedic Surgery, Kobe University Graduate School of Medicine, Kobe, JPN
| | - Atsuyuki Inui
- Department of Orthopaedic Surgery, Kobe University Graduate School of Medicine, Kobe, JPN
| | - Yutaka Mifune
- Department of Orthopaedic Surgery, Kobe University Graduate School of Medicine, Kobe, JPN
| | - Hanako Nishimoto
- Department of Orthopaedic Surgery, Kobe University Graduate School of Medicine, Kobe, JPN
| | - Kohei Yamaura
- Department of Orthopaedic Surgery, Kobe University Graduate School of Medicine, Kobe, JPN
| | - Tatsuo Kato
- Department of Orthopaedic Surgery, Kobe University Graduate School of Medicine, Kobe, JPN
| | - Takahiro Furukawa
- Department of Orthopaedic Surgery, Kobe University Graduate School of Medicine, Kobe, JPN
| | - Shuya Tanaka
- Department of Orthopaedic Surgery, Kobe University Graduate School of Medicine, Kobe, JPN
| | - Masaya Kusunose
- Department of Orthopaedic Surgery, Kobe University Graduate School of Medicine, Kobe, JPN
| | - Shunsaku Takigami
- Department of Orthopaedic Surgery, Kobe University Graduate School of Medicine, Kobe, JPN
| | - Ryosuke Kuroda
- Department of Orthopaedic Surgery, Kobe University Graduate School of Medicine, Kobe, JPN
| |
Collapse
|
42
|
Yang X, Yu S, Yan S, Wang H, Fang W, Chen Y, Ma X, Han L. Progress in Rice Breeding Based on Genomic Research. Genes (Basel) 2024; 15:564. [PMID: 38790193 PMCID: PMC11121554 DOI: 10.3390/genes15050564] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Revised: 04/18/2024] [Accepted: 04/25/2024] [Indexed: 05/26/2024] Open
Abstract
The role of rice genomics in breeding progress is becoming increasingly important. Deeper research into the rice genome will contribute to the identification and utilization of outstanding functional genes, enriching the diversity and genetic basis of breeding materials and meeting the diverse demands for various improvements. Here, we review the significant contributions of rice genomics research to breeding progress over the last 25 years, discussing the profound impact of genomics on rice genome sequencing, functional gene exploration, and novel breeding methods, and we provide valuable insights for future research and breeding practices.
Collapse
Affiliation(s)
- Xingye Yang
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China; (X.Y.); (S.Y.); (H.W.); (W.F.); (Y.C.)
| | - Shicong Yu
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Rice Research Institute, Sichuan Agricultural University, Chengdu 611130, China;
| | - Shen Yan
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China; (X.Y.); (S.Y.); (H.W.); (W.F.); (Y.C.)
| | - Hao Wang
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China; (X.Y.); (S.Y.); (H.W.); (W.F.); (Y.C.)
| | - Wei Fang
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China; (X.Y.); (S.Y.); (H.W.); (W.F.); (Y.C.)
| | - Yanqing Chen
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China; (X.Y.); (S.Y.); (H.W.); (W.F.); (Y.C.)
| | - Xiaoding Ma
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China; (X.Y.); (S.Y.); (H.W.); (W.F.); (Y.C.)
| | - Longzhi Han
- National Crop Genebank, Institute of Crop Science, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| |
Collapse
|
43
|
Shinohara I, Mifune Y, Inui A, Nishimoto H, Yoshikawa T, Kato T, Furukawa T, Tanaka S, Kusunose M, Hoshino Y, Matsushita T, Mitani M, Kuroda R. Re-tear after arthroscopic rotator cuff tear surgery: risk analysis using machine learning. J Shoulder Elbow Surg 2024; 33:815-822. [PMID: 37625694 DOI: 10.1016/j.jse.2023.07.017] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/12/2023] [Revised: 07/06/2023] [Accepted: 07/16/2023] [Indexed: 08/27/2023]
Abstract
BACKGROUND Postoperative rotator cuff retear after arthroscopic rotator cuff repair (ARCR) is still a major problem. Various risk factors such as age, gender, and tear size have been reported. Recently, magnetic resonance imaging-based stump classification was reported as an index of rotator cuff fragility. Although stump type 3 is reported to have a high retear rate, there are few reports on the risk of postoperative retear based on this classification. Machine learning (ML), an artificial intelligence technique, allows for more flexible predictive models than conventional statistical methods and has been applied to predict clinical outcomes. In this study, we used ML to predict postoperative retear risk after ARCR. METHODS The retrospective case-control study included 353 patients who underwent surgical treatment for complete rotator cuff tear using the suture-bridge technique. Patients who initially presented with retears and traumatic tears were excluded. In study participants, after the initial tear repair, rotator cuff retears were diagnosed by magnetic resonance imaging; Sugaya classification types IV and V were defined as re-tears. Age, gender, stump classification, tear size, Goutallier classification, presence of diabetes, and hyperlipidemia were used for ML parameters to predict the risk of retear. Using Python's Scikit-learn as an ML library, five different AI models (logistic regression, random forest, AdaBoost, CatBoost, LightGBM) were trained on the existing data, and the prediction models were applied to the test dataset. The performance of these ML models was measured by the area under the receiver operating characteristic curve. Additionally, key features affecting retear were evaluated. RESULTS The area under the receiver operating characteristic curve for logistic regression was 0.78, random forest 0.82, AdaBoost 0.78, CatBoost 0.83, and LightGBM 0.87, respectively for each model. LightGBM showed the highest score. The important factors for model prediction were age, stump classification, and tear size. CONCLUSIONS The ML classifier model predicted retears after ARCR with high accuracy, and the AI model showed that the most important characteristics affecting retears were age and imaging findings, including stump classification. This model may be able to predict postoperative rotator cuff retears based on clinical features.
Collapse
Affiliation(s)
- Issei Shinohara
- Department of Orthopaedic Surgery, Kobe University Graduate School of Medicine, Kobe, Hyogo, Japan
| | - Yutaka Mifune
- Department of Orthopaedic Surgery, Kobe University Graduate School of Medicine, Kobe, Hyogo, Japan.
| | - Atsuyuki Inui
- Department of Orthopaedic Surgery, Kobe University Graduate School of Medicine, Kobe, Hyogo, Japan
| | - Hanako Nishimoto
- Department of Orthopaedic Surgery, Kobe University Graduate School of Medicine, Kobe, Hyogo, Japan
| | - Tomoya Yoshikawa
- Department of Orthopaedic Surgery, Kobe University Graduate School of Medicine, Kobe, Hyogo, Japan
| | - Tatsuo Kato
- Department of Orthopaedic Surgery, Kobe University Graduate School of Medicine, Kobe, Hyogo, Japan
| | - Takahiro Furukawa
- Department of Orthopaedic Surgery, Kobe University Graduate School of Medicine, Kobe, Hyogo, Japan
| | - Shuya Tanaka
- Department of Orthopaedic Surgery, Kobe University Graduate School of Medicine, Kobe, Hyogo, Japan
| | - Masaya Kusunose
- Department of Orthopaedic Surgery, Himeji St Mary's Hospital, Himeji, Hyogo, Japan
| | - Yuichi Hoshino
- Department of Orthopaedic Surgery, Kobe University Graduate School of Medicine, Kobe, Hyogo, Japan
| | - Takehiko Matsushita
- Department of Orthopaedic Surgery, Kobe University Graduate School of Medicine, Kobe, Hyogo, Japan
| | - Makoto Mitani
- Department of Orthopaedic Surgery, Himeji St Mary's Hospital, Himeji, Hyogo, Japan
| | - Ryosuke Kuroda
- Department of Orthopaedic Surgery, Kobe University Graduate School of Medicine, Kobe, Hyogo, Japan
| |
Collapse
|
44
|
Guo Q, Xie F, Zhong F, Wen W, Zhang X, Yu X, Wang X, Huang B, Li L, Wang X. Application of interpretable machine learning algorithms to predict distant metastasis in ovarian clear cell carcinoma. Cancer Med 2024; 13:e7161. [PMID: 38613173 PMCID: PMC11015070 DOI: 10.1002/cam4.7161] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2023] [Revised: 03/16/2024] [Accepted: 03/26/2024] [Indexed: 04/14/2024] Open
Abstract
BACKGROUND Ovarian clear cell carcinoma (OCCC) represents a subtype of ovarian epithelial carcinoma (OEC) known for its limited responsiveness to chemotherapy, and the onset of distant metastasis significantly impacts patient prognoses. This study aimed to identify potential risk factors contributing to the occurrence of distant metastasis in OCCC. METHODS Utilizing the Surveillance, Epidemiology, and End Results (SEER) database, we identified patients diagnosed with OCCC between 2004 and 2015. The most influential factors were selected through the application of Gaussian Naive Bayes (GNB) and Adaboost machine learning algorithms, employing a Venn test for further refinement. Subsequently, six machine learning (ML) techniques, namely XGBoost, LightGBM, Random Forest (RF), Adaptive Boosting (Adaboost), Support Vector Machine (SVM), and Multilayer Perceptron (MLP), were employed to construct predictive models for distant metastasis. Shapley Additive Interpretation (SHAP) analysis facilitated a visual interpretation for individual patient. Model validity was assessed using accuracy, sensitivity, specificity, positive predictive value, negative predictive value, F1 score, and the area under the receiver operating characteristic curve (AUC). RESULTS In the realm of predicting distant metastasis, the Random Forest (RF) model outperformed the other five machine learning algorithms. The RF model demonstrated accuracy, sensitivity, specificity, positive predictive value, negative predictive value, F1 score, and AUC (95% CI) values of 0.792 (0.762-0.823), 0.904 (0.835-0.973), 0.759 (0.731-0.787), 0.221 (0.186-0.256), 0.974 (0.967-0.982), 0.353 (0.306-0.399), and 0.834 (0.696-0.967), respectively, surpassing the performance of other models. Additionally, the calibration curve's Brier Score (95%) for the RF model reached the minimum value of 0.06256 (0.05753-0.06759). SHAP analysis provided independent explanations, reaffirming the critical clinical factors associated with the risk of metastasis in OCCC patients. CONCLUSIONS This study successfully established a precise predictive model for OCCC patient metastasis using machine learning techniques, offering valuable support to clinicians in making informed clinical decisions.
Collapse
Affiliation(s)
- Qin‐Hua Guo
- Jiangxi Province Key Laboratory of Laboratory Medicine, Jiangxi Provincial Clinical Research Center for Laboratory Medicine, Department of Clinical Laboratory, The Second Affiliated HospitalJiangxi Medical College, Nanchang UniversityNanchangJiangxiChina
- Department of Clinical LaboratoryThe First Hospital of Nanchang (The Third Affiliated Hospital of Nanchang University)NanchangJiangxiChina
- School of Public HealthNanchang UniversityNanchangJiangxiChina
| | - Feng‐Chun Xie
- Department of Clinical LaboratoryNanchang Renai Obstetrics and Gynecology HospitalNanchangJiangxiChina
| | - Fang‐Min Zhong
- Jiangxi Province Key Laboratory of Laboratory Medicine, Jiangxi Provincial Clinical Research Center for Laboratory Medicine, Department of Clinical Laboratory, The Second Affiliated HospitalJiangxi Medical College, Nanchang UniversityNanchangJiangxiChina
| | - Wen Wen
- Jiangxi Province Key Laboratory of Laboratory Medicine, Jiangxi Provincial Clinical Research Center for Laboratory Medicine, Department of Clinical Laboratory, The Second Affiliated HospitalJiangxi Medical College, Nanchang UniversityNanchangJiangxiChina
- School of Public HealthNanchang UniversityNanchangJiangxiChina
| | - Xue‐Ru Zhang
- Jiangxi Province Key Laboratory of Laboratory Medicine, Jiangxi Provincial Clinical Research Center for Laboratory Medicine, Department of Clinical Laboratory, The Second Affiliated HospitalJiangxi Medical College, Nanchang UniversityNanchangJiangxiChina
- School of Public HealthNanchang UniversityNanchangJiangxiChina
| | - Xia‐Jing Yu
- Jiangxi Province Key Laboratory of Laboratory Medicine, Jiangxi Provincial Clinical Research Center for Laboratory Medicine, Department of Clinical Laboratory, The Second Affiliated HospitalJiangxi Medical College, Nanchang UniversityNanchangJiangxiChina
- School of Public HealthNanchang UniversityNanchangJiangxiChina
| | - Xin‐Lu Wang
- Jiangxi Province Key Laboratory of Laboratory Medicine, Jiangxi Provincial Clinical Research Center for Laboratory Medicine, Department of Clinical Laboratory, The Second Affiliated HospitalJiangxi Medical College, Nanchang UniversityNanchangJiangxiChina
- School of Public HealthNanchang UniversityNanchangJiangxiChina
| | - Bo Huang
- Jiangxi Province Key Laboratory of Laboratory Medicine, Jiangxi Provincial Clinical Research Center for Laboratory Medicine, Department of Clinical Laboratory, The Second Affiliated HospitalJiangxi Medical College, Nanchang UniversityNanchangJiangxiChina
| | - Li‐Ping Li
- Department of Clinical LaboratoryThe First Hospital of Nanchang (The Third Affiliated Hospital of Nanchang University)NanchangJiangxiChina
| | - Xiao‐Zhong Wang
- Jiangxi Province Key Laboratory of Laboratory Medicine, Jiangxi Provincial Clinical Research Center for Laboratory Medicine, Department of Clinical Laboratory, The Second Affiliated HospitalJiangxi Medical College, Nanchang UniversityNanchangJiangxiChina
- Department of Clinical LaboratoryThe First Hospital of Nanchang (The Third Affiliated Hospital of Nanchang University)NanchangJiangxiChina
- School of Public HealthNanchang UniversityNanchangJiangxiChina
| |
Collapse
|
45
|
Jiang Z, Liu L, Du L, Lv S, Liang F, Luo Y, Wang C, Shen Q. Machine learning for the early prediction of acute respiratory distress syndrome (ARDS) in patients with sepsis in the ICU based on clinical data. Heliyon 2024; 10:e28143. [PMID: 38533071 PMCID: PMC10963609 DOI: 10.1016/j.heliyon.2024.e28143] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 02/28/2024] [Accepted: 03/12/2024] [Indexed: 03/28/2024] Open
Abstract
Background Acute respiratory distress syndrome (ARDS) is a fatal outcome of severe sepsis. Machine learning models are helpful for accurately predicting ARDS in patients with sepsis at an early stage. Objective We aim to develop a machine-learning model for predicting ARDS in patients with sepsis in the intensive care unit (ICU). Methods The initial clinical data of patients with sepsis admitted to the hospital (including population characteristics, clinical diagnosis, complications, and laboratory tests) were used to predict ARDS, and screen out the crucial variables. After comparing eight different algorithms, namely, XG boost, logistic regression, light GBM, random forest, GaussianNB, complement NB, support vector machine (SVM), and K nearest neighbors (KNN), rebuilding a prediction model with the best one. When remodeling with the best algorithm, 10% was randomly selected to test, and the remaining was trained for cross-validation. Using the area under the curve (AUC), sensitivity, accuracy, specificity, positive and negative predictive value, F1 score, kappa value, and clinical decision curve to evaluate the model's performance. Eventually, the application in the model illustrated by the SHAP package. Results Ten critical features were screened utilizing the lasso method, namely, PaO2/PAO2, A-aDO2, PO2(T), CRP, gender, PO2, RDW, MCH, SG, and chlorine. The prior ranking of variables demonstrated that PaO2/PAO2 was the most significant variable. Among the eight algorithms, the performance of the Gaussian NB algorithm was significantly better than that of the others. After remodeling with the best algorithm, the AUC in the training and validation sets were 0.777 and 0.770, respectively, and the algorithm performed well in the test set (AUC = 0.781, accuracy = 78.6%, sensitivity = 82.4%, F1 score = 0.824). A comparison of the overlap factors with those of previous models revealed that the model we developed performs better. Conclusion Sepsis-associated ARDS can be accurately predicted early via a machine learning model based on existing clinical data. These findings are helpful for accurate identification and improvement of the prognosis in patients with sepsis-associated ARDS.
Collapse
Affiliation(s)
- Zhenzhen Jiang
- Department of Blood Transfusion, The Third Xiangya Hospital, Central South University, Changsha, China
| | - Leping Liu
- Department of Pediatrics, The Third Xiangya Hospital, Central South University, Changsha, China
| | - Lin Du
- Department of Blood Transfusion, The Third Xiangya Hospital, Central South University, Changsha, China
| | - Shanshan Lv
- Department of Blood Transfusion, The Third Xiangya Hospital, Central South University, Changsha, China
| | - Fang Liang
- Department of Hematology and Critical Care Medicine, The Third Xiangya Hospital, Central South University, Changsha, China
| | - Yanwei Luo
- Department of Blood Transfusion, The Third Xiangya Hospital, Central South University, Changsha, China
| | - Chunjiang Wang
- Department of Pharmacy, The Third Xiangya Hospital, Central South University, Changsha, China
| | - Qin Shen
- Department of Radiology, The Second Xiangya Hospital, Central South University, Changsha, China
| |
Collapse
|
46
|
Jiang C, Zhang B, Jiang W, Liu P, Kong Y, Zhang J, Teng W. Metal ion stimulation-related gene signatures correlate with clinical and immunologic characteristics of glioma. Heliyon 2024; 10:e27189. [PMID: 38533032 PMCID: PMC10963200 DOI: 10.1016/j.heliyon.2024.e27189] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Revised: 02/23/2024] [Accepted: 02/26/2024] [Indexed: 03/28/2024] Open
Abstract
Background Environmental factors serve as one of the important pathogenic factors for gliomas. Yet people focus only on the effect of electromagnetic radiation on its pathogenicity, while metals in the environment are neglected. This study aimed to investigate the relationship between metal ion stimulation and the clinical characteristics and immune status of GM patients. Methods Firstly, mRNA expression profiles of GM patients and normal subjects were obtained from Chinese GM Genome Atlas (CGGA) and Gene Expression Omnibus (GEO) to identify differentially expressed metal ion stimulation-related genes(DEMISGs). Secondly, two molecular subtypes were identified and validated based on these DEMISGs using consensus clustering. Diagnostic and prognostic models for GM were constructed after screening these features based on machine learning. Finally, supervised classification and unsupervised clustering were combined to classify and predict the grade of GM based on SHAP values. Results GM patients are divided into two different response states to metal ion stimulation, M1 and M2, which are related to the grade and IDH status of the GM. Six genes with diagnostic value were obtained: SLC30A3, CRHBP, SYT13, DLG2, CDK1, and WNT5A. The AUC in the external validation set was higher than 0.90. The SHAP value improves the performance of classification prediction. Conclusion The gene features associated with metal ion stimulation are related to the clinical and immune characteristics of transgenic patients. XGboost/LightGBM Kmeans has a higher classification prediction accuracy in predicting glioma grades compared to using purely supervised classification techniques.
Collapse
Affiliation(s)
- Chengzhi Jiang
- Shandong Second Medical University, Weifang, Shandong, 261053, People's Republic of China
| | - Binbin Zhang
- Qingdao Municipal Hospital (Group), Qingdao, Shandong, 266000, People's Republic of China
| | - Wenjuan Jiang
- Qingdao Municipal Hospital (Group), Qingdao, Shandong, 266000, People's Republic of China
| | - Pengtao Liu
- Shandong Second Medical University, Weifang, Shandong, 261053, People's Republic of China
| | - Yujia Kong
- Shandong Second Medical University, Weifang, Shandong, 261053, People's Republic of China
| | - Jianhua Zhang
- Jining Medical University, Jining, Shandong, 272067, People's Republic of China
| | - Wenjie Teng
- Shandong Second Medical University, Weifang, Shandong, 261053, People's Republic of China
| |
Collapse
|
47
|
Murmu S, Sinha D, Chaurasia H, Sharma S, Das R, Jha GK, Archak S. A review of artificial intelligence-assisted omics techniques in plant defense: current trends and future directions. FRONTIERS IN PLANT SCIENCE 2024; 15:1292054. [PMID: 38504888 PMCID: PMC10948452 DOI: 10.3389/fpls.2024.1292054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/10/2023] [Accepted: 01/24/2024] [Indexed: 03/21/2024]
Abstract
Plants intricately deploy defense systems to counter diverse biotic and abiotic stresses. Omics technologies, spanning genomics, transcriptomics, proteomics, and metabolomics, have revolutionized the exploration of plant defense mechanisms, unraveling molecular intricacies in response to various stressors. However, the complexity and scale of omics data necessitate sophisticated analytical tools for meaningful insights. This review delves into the application of artificial intelligence algorithms, particularly machine learning and deep learning, as promising approaches for deciphering complex omics data in plant defense research. The overview encompasses key omics techniques and addresses the challenges and limitations inherent in current AI-assisted omics approaches. Moreover, it contemplates potential future directions in this dynamic field. In summary, AI-assisted omics techniques present a robust toolkit, enabling a profound understanding of the molecular foundations of plant defense and paving the way for more effective crop protection strategies amidst climate change and emerging diseases.
Collapse
Affiliation(s)
- Sneha Murmu
- Indian Agricultural Statistics Research Institute, Indian Council of Agricultural Research (ICAR), New Delhi, India
| | - Dipro Sinha
- Indian Agricultural Statistics Research Institute, Indian Council of Agricultural Research (ICAR), New Delhi, India
| | - Himanshushekhar Chaurasia
- Central Institute for Research on Cotton Technology, Indian Council of Agricultural Research (ICAR), Mumbai, India
| | - Soumya Sharma
- Indian Agricultural Statistics Research Institute, Indian Council of Agricultural Research (ICAR), New Delhi, India
| | - Ritwika Das
- Indian Agricultural Statistics Research Institute, Indian Council of Agricultural Research (ICAR), New Delhi, India
| | - Girish Kumar Jha
- Indian Agricultural Statistics Research Institute, Indian Council of Agricultural Research (ICAR), New Delhi, India
| | - Sunil Archak
- National Bureau of Plant Genetic Resources, Indian Council of Agricultural Research (ICAR), New Delhi, India
| |
Collapse
|
48
|
Li W, Zhang Y, Zhou X, Quan X, Chen B, Hou X, Xu Q, He W, Chen L, Liu X, Zhang Y, Xiang T, Li R, Liu Q, Wu SN, Wang K, Liu W, Zheng J, Luan H, Yu X, Chen A, Xu C, Luo T, Hu Z. Ensemble learning-assisted prediction of prolonged hospital length of stay after spine correction surgery: a multi-center cohort study. J Orthop Surg Res 2024; 19:112. [PMID: 38308336 PMCID: PMC10838003 DOI: 10.1186/s13018-024-04576-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Accepted: 01/23/2024] [Indexed: 02/04/2024] Open
Abstract
PURPOSE This research aimed to develop a machine learning model to predict the potential risk of prolonged length of stay in hospital before operation, which can be used to strengthen patient management. METHODS Patients who underwent posterior spinal deformity surgery (PSDS) from eleven medical institutions in China between 2015 and 2022 were included. Detailed preoperative patient data, including demographics, medical history, comorbidities, preoperative laboratory results, and surgery details, were collected from their electronic medical records. The cohort was randomly divided into a training dataset and a validation dataset with a ratio of 70:30. Based on Boruta algorithm, nine different machine learning algorithms and a stack ensemble model were trained after hyperparameters tuning visualization and evaluated on the area under the receiver operating characteristic curve (AUROC), precision-recall curve, calibration, and decision curve analysis. Visualization of Shapley Additive exPlanations method finally contributed to explaining model prediction. RESULTS Of the 162 included patients, the K Nearest Neighbors algorithm performed the best in the validation group compared with other machine learning models (yielding an AUROC of 0.8191 and PRAUC of 0.6175). The top five contributing variables were the preoperative hemoglobin, height, body mass index, age, and preoperative white blood cells. A web-based calculator was further developed to improve the predictive model's clinical operability. CONCLUSIONS Our study established and validated a clinical predictive model for prolonged postoperative hospitalization duration in patients who underwent PSDS, which offered valuable prognostic information for preoperative planning and postoperative care for clinicians. Trial registration ClinicalTrials.gov identifier NCT05867732, retrospectively registered May 22, 2023, https://classic. CLINICALTRIALS gov/ct2/show/NCT05867732 .
Collapse
Affiliation(s)
- Wenle Li
- State Key Laboratory of Molecular Vaccinology and Molecular, Diagnostics and Center for Molecular Imaging and Translational Medicine, School of Public Health, Xiamen University, Xiamen, China.
- Key Laboratory of Neurological Diseases, The Second Affiliated Hospital of Xuzhou Medical University, Xuzhou, Jiangsu, China.
- Department of Spinal Surgery, Guangxi Medical University Affiliated Liuzhou People's Hospital, Liuzhou, China.
| | - Yusi Zhang
- Cancer Center, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China
- Precision Medicine Center, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China
- Department of Medical Oncology, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China
| | - Xin Zhou
- Third Hospital of Shanxi Medical University, Shanxi Bethune Hospital, Shanxi Academy of Medical Sciences, Tongji Shanxi Hospital, Taiyuan, 030032, China
| | - Xubin Quan
- Department of Spinal Surgery, Guangxi Medical University Affiliated Liuzhou People's Hospital, Liuzhou, China
| | - Binghao Chen
- Department of Spinal Surgery, Guangxi Medical University Affiliated Liuzhou People's Hospital, Liuzhou, China
| | - Xuewen Hou
- Department of Radiology, The First Dongguan Affiliated Hospital, Guangdong Medical University, Dongguan, China
| | - Qizhong Xu
- Department of Radiology, The First Affiliated Hospital of Shenzhen University, Shenzhen Second People's Hospital, Shenzhen, China
| | - Weiheng He
- Department of Radiology, People's Hospital of Ningxia Hui Autonomous Region, Yinchuan, China
| | - Liang Chen
- Department of Radiology, Hubei Provincial Hospital of Traditional Chinese Medicine, Wuhan, China
| | - Xiaozhu Liu
- Department of Critical Care Medicine, Beijing Shijitan Hospital, Capital Medical University, Beijing, China
- Department of Cardiology, The Second Affiliated Hospital of Chongqing Medical University, Chongqing, China
| | - Yang Zhang
- College of Medical Informatics, Chongqing Medical University, Chongqing, China
- Medical Data Science Academy, Chongqing Medical University, Chongqing, China
| | - Tianyu Xiang
- Information Center, The University-Town Hospital of Chongqing Medical University, Chongqing, China
| | - Runmin Li
- Department of Foot and Ankle Surgery, Honghui Hospital, Xi'an Jiaotong University, Xi'an, Shaanxi Province, China
| | - Qiang Liu
- Department of Orthopedics, Xianyang Central Hospital, Xianyang, Shannxi, China
| | - Shi-Nan Wu
- Eye Institute of Xiamen University, School of Medicine, Xiamen University, Xiamen, Fujian, China
| | - Kai Wang
- Key Laboratory of Neurological Diseases, The Second Affiliated Hospital of Xuzhou Medical University, Xuzhou, Jiangsu, China
| | - Wencai Liu
- Department of Orthopedics, Shanghai Sixth People's Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, 200233, China
| | - Jialiang Zheng
- Cancer Research Center, School of Medicine, Xiamen University, Xiamen, China
| | - Haopeng Luan
- Department of Spine Surgery, The Six Affiliated Hospital of Xinjiang Medical University, Urumqi, Xinjiang, China
| | - Xiaolin Yu
- Department of Orthopedics, Affiliated Hospital of Guizhou Medical University, Guiyang, Guizhou, China
| | - Anfa Chen
- Department of Orthopedics, Jiangxi Province Hospital of Integrated Chinese and Western Medicine, Nanchang, China
| | - Chan Xu
- State Key Laboratory of Molecular Vaccinology and Molecular, Diagnostics and Center for Molecular Imaging and Translational Medicine, School of Public Health, Xiamen University, Xiamen, China
| | - Tongqing Luo
- Department of Spinal Surgery, Guangxi Medical University Affiliated Liuzhou People's Hospital, Liuzhou, China.
| | - Zhaohui Hu
- Department of Spinal Surgery, Guangxi Medical University Affiliated Liuzhou People's Hospital, Liuzhou, China.
| |
Collapse
|
49
|
Chen J, Tan C, Zhu M, Zhang C, Wang Z, Ni X, Liu Y, Wei T, Wei X, Fang X, Xu Y, Huang X, Qiu J, Liu H. CropGS-Hub: a comprehensive database of genotype and phenotype resources for genomic prediction in major crops. Nucleic Acids Res 2024; 52:D1519-D1529. [PMID: 38000385 PMCID: PMC10767954 DOI: 10.1093/nar/gkad1062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Revised: 10/15/2023] [Accepted: 10/25/2023] [Indexed: 11/26/2023] Open
Abstract
The explosive amount of multi-omics data has brought a paradigm shift both in academic research and further application in life science. However, managing and reusing the growing resources of genomic and phenotype data points presents considerable challenges for the research community. There is an urgent need for an integrated database that combines genome-wide association studies (GWAS) with genomic selection (GS). Here, we present CropGS-Hub, a comprehensive database comprising genotype, phenotype, and GWAS signals, as well as a one-stop platform with built-in algorithms for genomic prediction and crossing design. This database encompasses a comprehensive collection of over 224 billion genotype data and 434 thousand phenotype data generated from >30 000 individuals in 14 representative populations belonging to 7 major crop species. Moreover, the platform implemented three complete functional genomic selection related modules including phenotype prediction, user model training and crossing design, as well as a fast SNP genotyper plugin-in called SNPGT specifically built for CropGS-Hub, aiming to assist crop scientists and breeders without necessitating coding skills. CropGS-Hub can be accessed at https://iagr.genomics.cn/CropGS/.
Collapse
Affiliation(s)
- Jiaxin Chen
- Shanghai Key Laboratory of Plant Molecular Sciences, Shanghai Collaborative Innovation Center of Plant Germplasm Resources, College of Life Sciences, Shanghai Normal University, Shanghai, China
| | - Cong Tan
- State Key Laboratory of Agricultural Genomics, Key Laboratory of Genomics, Ministry of Agriculture, BGI Research, Shenzhen 518083, China
- BGI Research, Wuhan 430074, China
| | - Min Zhu
- Shanghai Key Laboratory of Plant Molecular Sciences, Shanghai Collaborative Innovation Center of Plant Germplasm Resources, College of Life Sciences, Shanghai Normal University, Shanghai, China
| | - Chenyang Zhang
- State Key Laboratory of Agricultural Genomics, Key Laboratory of Genomics, Ministry of Agriculture, BGI Research, Shenzhen 518083, China
- BGI Bioverse, Shenzhen 518083, China
| | - Zhihan Wang
- Shanghai Key Laboratory of Plant Molecular Sciences, Shanghai Collaborative Innovation Center of Plant Germplasm Resources, College of Life Sciences, Shanghai Normal University, Shanghai, China
| | - Xuemei Ni
- State Key Laboratory of Agricultural Genomics, Key Laboratory of Genomics, Ministry of Agriculture, BGI Research, Shenzhen 518083, China
- BGI Bioverse, Shenzhen 518083, China
| | - Yanlin Liu
- Shanghai Key Laboratory of Plant Molecular Sciences, Shanghai Collaborative Innovation Center of Plant Germplasm Resources, College of Life Sciences, Shanghai Normal University, Shanghai, China
| | - Tong Wei
- State Key Laboratory of Agricultural Genomics, Key Laboratory of Genomics, Ministry of Agriculture, BGI Research, Shenzhen 518083, China
- BGI Research, Wuhan 430074, China
| | - XiaoFeng Wei
- China National GeneBank, BGI, Shenzhen 518120, China
| | - Xiaodong Fang
- State Key Laboratory of Agricultural Genomics, Key Laboratory of Genomics, Ministry of Agriculture, BGI Research, Shenzhen 518083, China
- BGI Research, Sanya 572025, China
| | - Yang Xu
- Agricultural College, Yangzhou University, Yangzhou 225009, China
| | - Xuehui Huang
- Shanghai Key Laboratory of Plant Molecular Sciences, Shanghai Collaborative Innovation Center of Plant Germplasm Resources, College of Life Sciences, Shanghai Normal University, Shanghai, China
| | - Jie Qiu
- Shanghai Key Laboratory of Plant Molecular Sciences, Shanghai Collaborative Innovation Center of Plant Germplasm Resources, College of Life Sciences, Shanghai Normal University, Shanghai, China
| | - Huan Liu
- State Key Laboratory of Agricultural Genomics, Key Laboratory of Genomics, Ministry of Agriculture, BGI Research, Shenzhen 518083, China
- BGI Bioverse, Shenzhen 518083, China
| |
Collapse
|
50
|
An F, Ge Y, Ye W, Ji L, Chen K, Wang Y, Zhang X, Dong S, Shen Y, Zhao J, Gao X, Junankar S, Chan RB, Christodoulou D, Wen W, Lu P, Zhan Q. Machine learning identifies a 5-serum cytokine panel for the early detection of chronic atrophy gastritis patients. Cancer Biomark 2024; 41:25-40. [PMID: 39269824 PMCID: PMC11495322 DOI: 10.3233/cbm-240023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2024] [Accepted: 07/19/2024] [Indexed: 09/15/2024]
Abstract
BACKGROUND Chronic atrophy gastritis (CAG) is a high-risk pre-cancerous lesion for gastric cancer (GC). The early and accurate detection and discrimination of CAG from benign forms of gastritis (e.g. chronic superficial gastritis, CSG) is critical for optimal management of GC. However, accurate non-invasive methods for the diagnosis of CAG are currently lacking. Cytokines cause inflammation and drive cancer transformation in GC, but their utility as a diagnostic for CAG is poorly characterized. METHODS Blood samples were collected, and 40 cytokines were quantified using a multiplexed immunoassay from 247 patients undergoing screening via endoscopy. Patients were divided into discovery and validation sets. Each cytokine importance was ranked using the feature selection algorithm Boruta. The cytokines with the highest feature importance were selected for machine learning (ML), using the LightGBM algorithm. RESULTS Five serum cytokines (IL-10, TNF-α, Eotaxin, IP-10 and SDF-1a) that could discriminate between CAG and CSG patients were identified and used for predictive model construction. This model was robust and could identify CAG patients with high performance (AUC = 0.88, Accuracy = 0.78). This compared favorably to the conventional approach using the PGI/PGII ratio (AUC = 0.59). CONCLUSION Using state-of-the-art ML and a blood-based immunoassay, we developed an improved non-invasive screening method for the detection of precancerous GC lesions. FUNDING Supported in part by grants from: Jiangsu Science and Technology Project (no. BK20211039); Top Talent Support Program for young and middle-aged people of Wuxi Health Committee (BJ2023008); Medical Key Discipline Program of Wuxi Health Commission (ZDXK2021010), Wuxi Science and Technology Bureau Project (no. N20201004); Scientific Research Program of Wuxi Health Commission (Z202208, J202104).
Collapse
Affiliation(s)
- Fangmei An
- Department of Gastroenterology, Affiliated Wuxi People’s Hospital of Nanjing Medical University, Wuxi People’s Hospital, Wuxi Medical Center, Nanjing Medical University, National Clinical Research Center for Digestive Diseases (Xi ’an) Jiangsu Branch Wuxi, Jiangsu, China
| | - Yan Ge
- Department of Gastroenterology, Affiliated Wuxi People’s Hospital of Nanjing Medical University, Wuxi People’s Hospital, Wuxi Medical Center, Nanjing Medical University, National Clinical Research Center for Digestive Diseases (Xi ’an) Jiangsu Branch Wuxi, Jiangsu, China
- AliveX Biotech, Shanghai, China
| | - Wei Ye
- AliveX Biotech, Shanghai, China
| | - Lin Ji
- Department of Gastroenterology, Affiliated Wuxi People’s Hospital of Nanjing Medical University, Wuxi People’s Hospital, Wuxi Medical Center, Nanjing Medical University, National Clinical Research Center for Digestive Diseases (Xi ’an) Jiangsu Branch Wuxi, Jiangsu, China
| | - Ke Chen
- Department of Gastroenterology, Affiliated Wuxi People’s Hospital of Nanjing Medical University, Wuxi People’s Hospital, Wuxi Medical Center, Nanjing Medical University, National Clinical Research Center for Digestive Diseases (Xi ’an) Jiangsu Branch Wuxi, Jiangsu, China
| | - Yunfei Wang
- Department of Gastroenterology, Affiliated Wuxi People’s Hospital of Nanjing Medical University, Wuxi People’s Hospital, Wuxi Medical Center, Nanjing Medical University, National Clinical Research Center for Digestive Diseases (Xi ’an) Jiangsu Branch Wuxi, Jiangsu, China
| | - Xiaoxue Zhang
- Department of Gastroenterology, Affiliated Wuxi People’s Hospital of Nanjing Medical University, Wuxi People’s Hospital, Wuxi Medical Center, Nanjing Medical University, National Clinical Research Center for Digestive Diseases (Xi ’an) Jiangsu Branch Wuxi, Jiangsu, China
| | - Shengrong Dong
- Department of Gastroenterology, Affiliated Wuxi People’s Hospital of Nanjing Medical University, Wuxi People’s Hospital, Wuxi Medical Center, Nanjing Medical University, National Clinical Research Center for Digestive Diseases (Xi ’an) Jiangsu Branch Wuxi, Jiangsu, China
| | - Yao Shen
- Department of Gastroenterology, Affiliated Wuxi People’s Hospital of Nanjing Medical University, Wuxi People’s Hospital, Wuxi Medical Center, Nanjing Medical University, National Clinical Research Center for Digestive Diseases (Xi ’an) Jiangsu Branch Wuxi, Jiangsu, China
| | - Jiamin Zhao
- Department of Gastroenterology, Affiliated Wuxi People’s Hospital of Nanjing Medical University, Wuxi People’s Hospital, Wuxi Medical Center, Nanjing Medical University, National Clinical Research Center for Digestive Diseases (Xi ’an) Jiangsu Branch Wuxi, Jiangsu, China
| | - Xiaojuan Gao
- Department of Gastroenterology, Affiliated Wuxi People’s Hospital of Nanjing Medical University, Wuxi People’s Hospital, Wuxi Medical Center, Nanjing Medical University, National Clinical Research Center for Digestive Diseases (Xi ’an) Jiangsu Branch Wuxi, Jiangsu, China
| | | | | | | | - Wen Wen
- AliveX Biotech, Shanghai, China
| | - Peihua Lu
- Department of Medical Oncology, Affiliated Wuxi People’s Hospital of Nanjing Medical University, Wuxi People’s Hospital, Wuxi Medical Center, Nanjing Medical University, Wuxi, Jiangsu, China
| | - Qiang Zhan
- Department of Gastroenterology, Affiliated Wuxi People’s Hospital of Nanjing Medical University, Wuxi People’s Hospital, Wuxi Medical Center, Nanjing Medical University, National Clinical Research Center for Digestive Diseases (Xi ’an) Jiangsu Branch Wuxi, Jiangsu, China
| |
Collapse
|