1
|
Wu B, Xiong H, Zhuo L, Xiao Y, Yan J, Yang W. Multi-view BLUP: a promising solution for post-omics data integrative prediction. J Genet Genomics 2024:S1673-8527(24)00332-1. [PMID: 39645028 DOI: 10.1016/j.jgg.2024.11.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2024] [Revised: 11/27/2024] [Accepted: 11/27/2024] [Indexed: 12/09/2024]
Abstract
Phenotypic prediction is a promising strategy for accelerating plant breeding. Data from multiple sources (called multi-view data) can provide complementary information to characterize a biological object from various aspects. By integrating multi-view information into phenotypic prediction, a multi-view best linear unbiased prediction (MVBLUP) method was proposed in this paper. To measure the importance of multiple data views, the differential evolution algorithm with an early stopping mechanism was used, by which we obtained a multi-view kinship matrix and then incorporated it into the BLUP model for phenotypic prediction. To further illustrate the characteristics of MVBLUP, we performed the empirical experiments on four multi-view datasets in different crops. Compared to the single-view method, the prediction accuracy of the MVBLUP method has improved by 0.038 to 0.201 on average. The results demonstrate that the MVBLUP is an effective integrative prediction method for multi-view data.
Collapse
Affiliation(s)
- Bingjie Wu
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, Hubei 430070, China
| | - Huijuan Xiong
- College of Informatics, Huazhong Agricultural University, Wuhan, Hubei 430070, China
| | - Lin Zhuo
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, Hubei 430070, China
| | - Yingjie Xiao
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, Hubei 430070, China; Hubei Hongshan Laboratory, Wuhan, Hubei 430070, China
| | - Jianbing Yan
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, Hubei 430070, China; Hubei Hongshan Laboratory, Wuhan, Hubei 430070, China
| | - Wenyu Yang
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, Hubei 430070, China; College of Informatics, Huazhong Agricultural University, Wuhan, Hubei 430070, China.
| |
Collapse
|
2
|
Shuang Z, Xingyu X, Yue C, Mingjing Y. Explainable Machine Learning Predictions for the Benefit From Chemotherapy in Advanced Non-Small Cell Lung Cancer Without Available Targeted Mutations. THE CLINICAL RESPIRATORY JOURNAL 2024; 18:e70044. [PMID: 39696772 DOI: 10.1111/crj.70044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/18/2024] [Revised: 10/16/2024] [Accepted: 12/08/2024] [Indexed: 12/20/2024]
Abstract
BACKGROUND Non-small cell lung cancer (NSCLC) is a global health challenge. Chemotherapy remains the standard therapy for advanced NSCLC without mutations, but drug resistance often reduces effectiveness. Developing more effective methods to predict and monitor chemotherapy benefits early is crucial. METHODS We carried out a retrospective cohort study of NSCLC patients without targeted mutations who received chemotherapy at West China Hospital from 2009 to 2013. We identified variables associated with chemotherapy outcomes and built four predictive models by machine learning. Shapley additive explanations (SHAP) interpreted the best model's predictions. The Kaplan-Meier method assessed key variables' impact on 5-year overall survival. RESULTS The study enrolled 461 NSCLC patients. Eight variables were selected for the model: differentiation, surgery history, neutrophil-to-lymphocyte ratio (NLR), platelet-to-lymphocyte ratio (PLR), total bilirubin (TBIL), total protein (TP), alanine aminotransferase (ALT), and lactate dehydrogenase (LDH). The extreme gradient boosting (Xgboost) model exhibited superior discriminatory ability in predicting complete response (CR) probabilities to chemotherapy, with an AUC of 0.78. SHAP plots showed surgery history and high differentiation were related to CR benefits from chemotherapy. Absence of surgery, higher NLR, higher PLR, and higher LDH were all independent prognostic factors for poor survivals in NSCLC patients without mutations receiving chemotherapy. CONCLUSIONS By machine learning, we developed a predictive model to assess chemotherapy benefits in NSCLC patients without targeted mutations, utilizing eight readily available and non-invasive clinical indicators. Demonstrating satisfactory predictive performance and clinical practicability, this model may help clinicians identify patients' tendency to benefit from chemotherapy, potentially improving their prognosis.
Collapse
Affiliation(s)
- Zhao Shuang
- Department of Respiratory and Critical Care Medicine, State Key Laboratory of Respiratory Health and Multimorbidity, West China Hospital, Sichuan University, Chengdu, Sichuan, China
| | - Xiong Xingyu
- Department of Respiratory and Critical Care Medicine, State Key Laboratory of Respiratory Health and Multimorbidity, West China Hospital, Sichuan University, Chengdu, Sichuan, China
| | - Cheng Yue
- Department of Respiratory and Critical Care Medicine, State Key Laboratory of Respiratory Health and Multimorbidity, West China Hospital, Sichuan University, Chengdu, Sichuan, China
| | - Yu Mingjing
- Department of Respiratory and Critical Care Medicine, State Key Laboratory of Respiratory Health and Multimorbidity, West China Hospital, Sichuan University, Chengdu, Sichuan, China
| |
Collapse
|
3
|
Zhang H, Bao S, Zhao X, Bai Y, Lv Y, Gao P, Li F, Zhang W. Genome-Wide Association Study and Phenotype Prediction of Reproductive Traits in Large White Pigs. Animals (Basel) 2024; 14:3348. [PMID: 39682314 DOI: 10.3390/ani14233348] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2024] [Revised: 11/14/2024] [Accepted: 11/19/2024] [Indexed: 12/18/2024] Open
Abstract
In a study involving 385 Large White pigs, a genome-wide association study (GWAS) was conducted to investigate reproductive traits, specifically the number of healthy litters (NHs) and the number of weaned litters (NWs). Several SNP loci, including ALGA0098819, ALGA0037969, and H3GA0032302, were significantly associated with these traits. In the combined-parity analysis, candidate genes, such as BLVRA, STK17A, PSMA2, and C7orf25, were identified. GO and KEGG pathway enrichment analyses revealed that these genes are involved in key biological processes, including organic synthesis, the regulation of sperm activity, spermatogenesis, and meiosis. In the by-parity analysis, the PLCXD3 gene was significantly associated with the NW trait in the second and fourth parities, while RNASEH1, PYM1, and SEPTIN9 were linked to cell proliferation, DNA repair, and metabolism, suggesting their potential role in regulating reproductive traits. These findings provide new molecular markers for the genetic study of reproductive traits in Large White pigs. For the phenotypic prediction of NH and NW traits, several machine learning models (GBDT, RF, LightGBM, and Adaboost.R2), as well as traditional models (GBLUP, BRR, and BL), were evaluated using SNP data in varying proportions. After PCA processing, the GBDT model achieved the highest PCC for NH (0.141), while LightGBM reached the highest PCC for NW (0.146). The MAE, MSE, and RMSE results showed that the traditional models exhibited stable error rates, while the machine learning models performed comparatively better across the different SNP ratios. Overall, PCA processing provided some improvement in the predictive performance of all of the models, though the overall increase in accuracy was limited.
Collapse
Affiliation(s)
- Hao Zhang
- School of Software Technology, Shanxi Agricultural University, Jinzhong 030801, China
| | - Shiqian Bao
- School of Software Technology, Shanxi Agricultural University, Jinzhong 030801, China
| | - Xiaona Zhao
- School of Software Technology, Shanxi Agricultural University, Jinzhong 030801, China
| | - Yangfan Bai
- School of Software Technology, Shanxi Agricultural University, Jinzhong 030801, China
| | - Yangcheng Lv
- School of Software Technology, Shanxi Agricultural University, Jinzhong 030801, China
| | - Pengfei Gao
- School of Software Technology, Shanxi Agricultural University, Jinzhong 030801, China
| | - Fuzhong Li
- School of Software Technology, Shanxi Agricultural University, Jinzhong 030801, China
| | - Wuping Zhang
- School of Software Technology, Shanxi Agricultural University, Jinzhong 030801, China
| |
Collapse
|
4
|
Yu S, Liu L, Wang H, Yan S, Zheng S, Ning J, Luo R, Fu X, Deng X. AtML: An Arabidopsis thaliana root cell identity recognition tool for medicinal ingredient accumulation. Methods 2024; 231:61-69. [PMID: 39293728 DOI: 10.1016/j.ymeth.2024.09.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2024] [Revised: 08/05/2024] [Accepted: 09/12/2024] [Indexed: 09/20/2024] Open
Abstract
Arabidopsis thaliana synthesizes various medicinal compounds, and serves as a model plant for medicinal plant research. Single-cell transcriptomics technologies are essential for understanding the developmental trajectory of plant roots, facilitating the analysis of synthesis and accumulation patterns of medicinal compounds in different cell subpopulations. Although methods for interpreting single-cell transcriptomics data are rapidly advancing in Arabidopsis, challenges remain in precisely annotating cell identity due to the lack of marker genes for certain cell types. In this work, we trained a machine learning system, AtML, using sequencing datasets from six cell subpopulations, comprising a total of 6000 cells, to predict Arabidopsis root cell stages and identify biomarkers through complete model interpretability. Performance testing using an external dataset revealed that AtML achieved 96.50% accuracy and 96.51% recall. Through the interpretability provided by AtML, our model identified 160 important marker genes, contributing to the understanding of cell type annotations. In conclusion, we trained AtML to efficiently identify Arabidopsis root cell stages, providing a new tool for elucidating the mechanisms of medicinal compound accumulation in Arabidopsis roots.
Collapse
Affiliation(s)
- Shicong Yu
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Rice Research Institute, Sichuan Agricultural University, Chengdu 611130, China
| | - Lijia Liu
- Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Hao Wang
- Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Shen Yan
- Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Shuqin Zheng
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Rice Research Institute, Sichuan Agricultural University, Chengdu 611130, China
| | - Jing Ning
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Rice Research Institute, Sichuan Agricultural University, Chengdu 611130, China
| | - Ruxian Luo
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Rice Research Institute, Sichuan Agricultural University, Chengdu 611130, China
| | - Xiangzheng Fu
- Research Institute of Hunan University in Chongqing, Chongqing 401120, China.
| | - Xiaoshu Deng
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Rice Research Institute, Sichuan Agricultural University, Chengdu 611130, China; Chongqing Academy of Chinese Materia Medica, Chongqing 400065, China.
| |
Collapse
|
5
|
Fu Q, Wu Y, Zhu M, Xia Y, Yu Q, Liu Z, Ma X, Yang R. Identifying cardiovascular disease risk in the U.S. population using environmental volatile organic compounds exposure: A machine learning predictive model based on the SHAP methodology. ECOTOXICOLOGY AND ENVIRONMENTAL SAFETY 2024; 286:117210. [PMID: 39447292 DOI: 10.1016/j.ecoenv.2024.117210] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/06/2024] [Revised: 09/26/2024] [Accepted: 10/14/2024] [Indexed: 10/26/2024]
Abstract
BACKGROUND Cardiovascular disease (CVD) remains a leading cause of mortality globally. Environmental pollutants, specifically volatile organic compounds (VOCs), have been identified as significant risk factors. This study aims to develop a machine learning (ML) model to predict CVD risk based on VOC exposure and demographic data using SHapley Additive exPlanations (SHAP) for interpretability. METHODS We utilized data from the National Health and Nutrition Examination Survey (NHANES) from 2011 to 2018, comprising 5098 participants. VOC exposure was assessed through 15 urinary metabolite metrics. The dataset was split into a training set (70 %) and a test set (30 %). Six ML models were developed, including Random Forest (RF), Light Gradient Boosting Machine (LightGBM), Decision Tree (DT), Extreme Gradient Boosting (XGBoost), Multi-Layer Perceptron (MLP), and Support Vector Machines (SVM). Model performance was evaluated using the Area Under the Receiver Operating Characteristic Curve (AUROC), accuracy, balanced accuracy, F1 score, J-index, kappa, Matthew's correlation coefficient (MCC), positive predictive value (PPV), negative predictive value (NPV), sensitivity (sens), specificity (spec) and SHAP was applied to interpret the best-performing model. RESULTS The RF model exhibited the highest predictive performance with an ROC of 0.8143. SHAP analysis identified age and ATCA as the most significant predictors, with ATCA showing a protective effect against CVD, particularly in older adults and those with hypertension. The study found a significant interaction between ATCA levels and age, indicating that the protective effect of ATCA is more pronounced in older individuals due to increased oxidative stress and inflammatory responses associated with aging. E-values analysis suggested robustness to unmeasured confounders. CONCLUSIONS This study is the first to utilize VOC exposure data to construct an ML model for predicting CVD risk. The findings highlight the potential of combining environmental exposure data with demographic information to enhance CVD risk prediction, supporting the development of personalized prevention and intervention strategies.
Collapse
Affiliation(s)
- Qingan Fu
- Cardiovascular medicine department, The Second Affiliated Hospital of Nanchang University, Nanchang, Jiangxi 330006, China
| | - Yanze Wu
- Department of Neurosurgery, The Second Affiliated Hospital, Jiangxi Medical College, Nanchang University, Nanchang, Jiangxi 330006, China
| | - Min Zhu
- Gastroenterology Department, The First People's Hospital of Xiushui County, Jiujiang, Jiangxi, China
| | - Yunlei Xia
- Cardiovascular medicine department, The Second Affiliated Hospital of Nanchang University, Nanchang, Jiangxi 330006, China
| | - Qingyun Yu
- Cardiovascular medicine department, The Second Affiliated Hospital of Nanchang University, Nanchang, Jiangxi 330006, China
| | - Zhekang Liu
- Rheumatology and immunology department, The Second Affiliated Hospital of Nanchang University, Nanchang, Jiangxi 330006, China
| | - Xiaowei Ma
- Cardiovascular medicine department, The Second Affiliated Hospital of Nanchang University, Nanchang, Jiangxi 330006, China
| | - Renqiang Yang
- Cardiovascular medicine department, The Second Affiliated Hospital of Nanchang University, Nanchang, Jiangxi 330006, China.
| |
Collapse
|
6
|
Liu Y, Dou X, Yan X, Ma S, Ye C, Wang X, Lu J. Using machine learning approaches to develop a fast and easy-to-perform diagnostic tool for patients with light chain amyloidosis: a retrospective real-world study. Ann Hematol 2024:10.1007/s00277-024-06015-0. [PMID: 39480584 DOI: 10.1007/s00277-024-06015-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2024] [Accepted: 09/17/2024] [Indexed: 11/02/2024]
Abstract
Immunoglobulin light chain (AL) amyloidosis is a severe disorder caused by the accumulation of amyloid fibrils, leading to organ failure. Early diagnosis is crucial to prevent irreversible damage, yet it remains a challenge due to nonspecific symptoms that often appear later in the disease progression. A retrospective study analyzed data collected from 133 AL amyloidosis patients and 271 non-AL patients with similar symptoms but different diagnoses between January 1st, 2017, and September 30th, 2022. Demographic data and laboratory test results were collected. Subsequently, significant features were identified by both logistic regression and independent expert clinical ability. Eventually, logistic regression and four machine learning (ML) algorithms were employed to construct a diagnostic model, utilizing fivefold cross-validation and blind set testing to identify the optimal model. The study successfully identified nine independent predictors of AL amyloidosis patients with kidney or cardiac involvement, respectively. Two models were developed to identify key features that distinguish AL amyloidosis from nephrotic syndrome and hypertrophic cardiomyopathy, respectively. The light gradient boosting machine (LightGBM) model emerged as the most effective, demonstrating superior performance with the area under curve (AUC) of 0.90 in both models, alongside high sensitivity, specificity, and F1-score. This research highlights the potential of using a machine learning-based LightGBM model to facilitate early and accurate diagnosis of AL amyloidosis. The model's effectiveness suggests it could be a valuable tool in clinical settings, aiding in the timely identification of AL amyloidosis among patients with non-specific symptoms. Further validation in diverse populations is recommended to establish its universal applicability.
Collapse
Affiliation(s)
- Yang Liu
- Department of Hematology, Peking University People's Hospital, No.11 Xizhimen South St, Xicheng District, Beijing, China
- Beijing Key Laboratory of Hematopoietic Stem Cell Transplantation, Peking University, Beijing, China
| | - Xuelin Dou
- Department of Hematology, Peking University People's Hospital, No.11 Xizhimen South St, Xicheng District, Beijing, China
- Beijing Key Laboratory of Hematopoietic Stem Cell Transplantation, Peking University, Beijing, China
| | - Xiaojing Yan
- Department of Hematology, The First Affiliated Hospital of China Medical University, Shenyang, China
| | - Shiyu Ma
- Department of Hematology, The First Affiliated Hospital of China Medical University, Shenyang, China
| | - Chong Ye
- Medical Affairs, Johnson & Johnson Innovative Medicine, Beijing, China
| | - Xiaohong Wang
- Medical Affairs, Johnson & Johnson Innovative Medicine, Shanghai, China
| | - Jin Lu
- Department of Hematology, Peking University People's Hospital, No.11 Xizhimen South St, Xicheng District, Beijing, China.
- Beijing Key Laboratory of Hematopoietic Stem Cell Transplantation, Peking University, Beijing, China.
| |
Collapse
|
7
|
Zhu W, Li W, Zhang H, Li L. Big data and artificial intelligence-aided crop breeding: Progress and prospects. JOURNAL OF INTEGRATIVE PLANT BIOLOGY 2024. [PMID: 39467106 DOI: 10.1111/jipb.13791] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/30/2024] [Revised: 08/25/2024] [Accepted: 09/10/2024] [Indexed: 10/30/2024]
Abstract
The past decade has witnessed rapid developments in gene discovery, biological big data (BBD), artificial intelligence (AI)-aided technologies, and molecular breeding. These advancements are expected to accelerate crop breeding under the pressure of increasing demands for food. Here, we first summarize current breeding methods and discuss the need for new ways to support breeding efforts. Then, we review how to combine BBD and AI technologies for genetic dissection, exploring functional genes, predicting regulatory elements and functional domains, and phenotypic prediction. Finally, we propose the concept of intelligent precision design breeding (IPDB) driven by AI technology and offer ideas about how to implement IPDB. We hope that IPDB will enhance the predictability, efficiency, and cost of crop breeding compared with current technologies. As an example of IPDB, we explore the possibilities offered by CropGPT, which combines biological techniques, bioinformatics, and breeding art from breeders, and presents an open, shareable, and cooperative breeding system. IPDB provides integrated services and communication platforms for biologists, bioinformatics experts, germplasm resource specialists, breeders, dealers, and farmers, and should be well suited for future breeding.
Collapse
Affiliation(s)
- Wanchao Zhu
- Key Laboratory of Biology and Genetic Improvement of Maize in Arid Area of Northwest Region, College of Agronomy, Northwest A&F University, Yangling, 712100, China
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
| | - Weifu Li
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
- Engineering Research Center of Intelligent Technology for Agriculture, Ministry of Education, Wuhan, 430070, China
| | - Hongwei Zhang
- State Key Laboratory of Crop Gene Resources and Breeding, National Key Facility for Crop Gene Resources and Genetic Improvement, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
| | - Lin Li
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
| |
Collapse
|
8
|
Mbarek L, Chen S, Jin A, Pan Y, Meng X, Yang X, Xu Z, Jiang Y, Wang Y. Predicting 3-month poor functional outcomes of acute ischemic stroke in young patients using machine learning. Eur J Med Res 2024; 29:494. [PMID: 39385211 PMCID: PMC11466038 DOI: 10.1186/s40001-024-02056-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2024] [Accepted: 09/09/2024] [Indexed: 10/12/2024] Open
Abstract
BACKGROUND Prediction of short-term outcomes in young patients with acute ischemic stroke (AIS) may assist in making therapy decisions. Machine learning (ML) is increasingly used in healthcare due to its high accuracy. This study aims to use a ML-based predictive model for poor 3-month functional outcomes in young AIS patients and to compare the predictive performance of ML models with the logistic regression model. METHODS We enrolled AIS patients aged between 18 and 50 years from the Third Chinese National Stroke Registry (CNSR-III), collected between 2015 and 2018. A modified Rankin Scale (mRS) ≥ 3 was a poor functional outcome at 3 months. Four ML tree models were developed: The extreme Gradient Boosting (XGBoost), Light Gradient Boosted Machine (lightGBM), Random Forest (RF), and The Gradient Boosting Decision Trees (GBDT), compared with logistic regression. We assess the model performance based on both discrimination and calibration. RESULTS A total of 2268 young patients with a mean age of 44.3 ± 5.5 years were included. Among them, (9%) had poor functional outcomes. The mRS at admission, living alone conditions, and high National Institutes of Health Stroke Scale (NIHSS) at discharge remained independent predictors of poor 3-month outcomes. The best AUC in the test group was XGBoost (AUC = 0.801), followed by GBDT, RF, and lightGBM (AUCs of 0.795, 0, 794, and 0.792, respectively). The XGBoost, RF, and lightGBM models were significantly better than logistic regression (P < 0.05). CONCLUSIONS ML outperformed logistic regression, where XGBoost the boost was the best model for predicting poor functional outcomes in young AIS patients. It is important to consider living alone conditions with high severity scores to improve stroke prognosis.
Collapse
Affiliation(s)
- Lamia Mbarek
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing, China
- China National Clinical Research Center for Neurological Diseases, Beijing Tiantan Hospital, Capital Medical University, No.119 South 4th Ring West Road, Fengtai District, Beijing, 100070, China
| | - Siding Chen
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing, China
- China National Clinical Research Center for Neurological Diseases, Beijing Tiantan Hospital, Capital Medical University, No.119 South 4th Ring West Road, Fengtai District, Beijing, 100070, China
- Changping Laboratory, Beijing, China
| | - Aoming Jin
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing, China
- China National Clinical Research Center for Neurological Diseases, Beijing Tiantan Hospital, Capital Medical University, No.119 South 4th Ring West Road, Fengtai District, Beijing, 100070, China
| | - Yuesong Pan
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing, China
- China National Clinical Research Center for Neurological Diseases, Beijing Tiantan Hospital, Capital Medical University, No.119 South 4th Ring West Road, Fengtai District, Beijing, 100070, China
| | - Xia Meng
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing, China
- China National Clinical Research Center for Neurological Diseases, Beijing Tiantan Hospital, Capital Medical University, No.119 South 4th Ring West Road, Fengtai District, Beijing, 100070, China
| | - Xiaomeng Yang
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing, China
| | - Zhe Xu
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing, China
- China National Clinical Research Center for Neurological Diseases, Beijing Tiantan Hospital, Capital Medical University, No.119 South 4th Ring West Road, Fengtai District, Beijing, 100070, China
| | - Yong Jiang
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing, China.
- China National Clinical Research Center for Neurological Diseases, Beijing Tiantan Hospital, Capital Medical University, No.119 South 4th Ring West Road, Fengtai District, Beijing, 100070, China.
- Changping Laboratory, Beijing, China.
- Beijing Advanced Innovation Center for Big Data-Based Precision Medicine, Beihang University and Capital Medical University, Beijing, 100091, China.
| | - Yongjun Wang
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing, China.
- China National Clinical Research Center for Neurological Diseases, Beijing Tiantan Hospital, Capital Medical University, No.119 South 4th Ring West Road, Fengtai District, Beijing, 100070, China.
- Changping Laboratory, Beijing, China.
- Research Unit of Artificial Intelligence in Cerebrovascular Disease, Chinese Academy of Medical Sciences, Beijing, 2019RU018, China.
- Beijing Advanced Innovation Centre for Big Data-Based Precision Medicine, Beihang University, Capital Medical University, Beijing, China.
- Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Beijing, China.
- Advanced Innovation Center for Human Brain Protection, Capital Medical University, Beijing, China.
| |
Collapse
|
9
|
Yang B, Lu H, Ran Y. Advancing non-alcoholic fatty liver disease prediction: a comprehensive machine learning approach integrating SHAP interpretability and multi-cohort validation. Front Endocrinol (Lausanne) 2024; 15:1450317. [PMID: 39439566 PMCID: PMC11493712 DOI: 10.3389/fendo.2024.1450317] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/17/2024] [Accepted: 09/18/2024] [Indexed: 10/25/2024] Open
Abstract
Introduction Non-alcoholic fatty liver disease (NAFLD) represents a major global health challenge, often undiagnosed because of suboptimal screening tools. Advances in machine learning (ML) offer potential improvements in predictive diagnostics, leveraging complex clinical datasets. Methods We utilized a comprehensive dataset from the Dryad database for model development and training and performed external validation using data from the National Health and Nutrition Examination Survey (NHANES) 2017-2020 cycles. Seven distinct ML models were developed and rigorously evaluated. Additionally, we employed the SHapley Additive exPlanations (SHAP) method to enhance the interpretability of the models, allowing for a detailed understanding of how each variable contributes to predictive outcomes. Results A total of 14,913 participants were eligible for this study. Among the seven constructed models, the light gradient boosting machine achieved the highest performance, with an area under the receiver operating characteristic curve of 0.90 in the internal validation set and 0.81 in the external NHANES validation cohort. In detailed performance metrics, it maintained an accuracy of 87%, a sensitivity of 92.9%, and an F1 score of 0.92. Key predictive variables identified included alanine aminotransferase, gammaglutamyl transpeptidase, triglyceride glucose-waist circumference, metabolic score for insulin resistance, and HbA1c, which are strongly associated with metabolic dysfunctions integral to NAFLD progression. Conclusions The integration of ML with SHAP interpretability provides a robust predictive tool for NAFLD, enhancing the early identification and potential management of the disease. The model's high accuracy and generalizability across diverse populations highlight its clinical utility, though future enhancements should include longitudinal data and lifestyle factors to refine risk assessments further.
Collapse
Affiliation(s)
- Bo Yang
- Department of Gastroenterology and Hepatology, Guizhou Aerospace Hospital, Zunyi, China
| | - Huaguan Lu
- Technology Innovation Center, Hunan University of Chinese Medicine, Changsha, China
| | - Yinghui Ran
- Department of Gastroenterology, Affiliated Hospital of Zunyi Medical University, Zunyi, China
| |
Collapse
|
10
|
Shinohara I, Inui A, Mifune Y, Yamaura K, Kuroda R. Posture Estimation Model Combined With Machine Learning Estimates the Radial Abduction Angle of the Thumb With High Accuracy. Cureus 2024; 16:e71034. [PMID: 39512988 PMCID: PMC11540810 DOI: 10.7759/cureus.71034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/07/2024] [Indexed: 11/15/2024] Open
Abstract
The thumb function is complex, and accurate evaluation through images or videos is difficult. Pose estimation, a technology that uses artificial intelligence (AI) to estimate skeletal detection of the body, is gaining popularity. In this study, we combined the pose estimation library MediaPipe-Hands and five machine learning (ML) models to predict the radial abduction angle of the thumb. Radial abduction movements of 20 hands from 10 healthy volunteers were captured on video and processed into 5,000 images. Angle measurements by goniometer were used as true values to evaluate the angle reliability of the MediaPipe-Hands and the angle reliability of the MediaPipe-Hands combined with ML. The correlation coefficient (CC) between the angle measured by goniometry and the angle calculated by MediaPipe-Hands was 0.84. In contrast, applying ML to MediaPipe-Hands resulted in models with improved accuracy, and all models showed high CCs (0.94-099) with angle measurements taken by goniometry. The ML model also predicted the abduction angles when the camera was taken from three different angles. In visualizing the features that the AI deemed important, the ML model predicted the abduction angle by focusing on the tip distance between the thumb and index finger along with the angle of the metacarpophalangeal joint between the thumb and middle finger. These results enable angle estimation even without frontal imaging with a camera, and expansion of this system may lead to real-time functional assessment in telemedicine and rehabilitation without the need for physical contact.
Collapse
Affiliation(s)
- Issei Shinohara
- Department of Orthopedic Surgery, Kobe University Graduate School of Medicine, Kobe, JPN
| | - Atsuyuki Inui
- Department of Orthopedic Surgery, Kobe University Graduate School of Medicine, Kobe, JPN
| | - Yutaka Mifune
- Department of Orthopedic Surgery, Kobe University Graduate School of Medicine, Kobe, JPN
| | - Kohei Yamaura
- Department of Orthopedic Surgery, Kobe University Graduate School of Medicine, Kobe, JPN
| | - Ryosuke Kuroda
- Department of Orthopedic Surgery, Kobe University Graduate School of Medicine, Kobe, JPN
| |
Collapse
|
11
|
Do VH, Nguyen VS, Nguyen SH, Le DQ, Nguyen TT, Nguyen CH, Ho TH, Vo NS, Nguyen T, Nguyen HA, Cao MD. PanKA: Leveraging population pangenome to predict antibiotic resistance. iScience 2024; 27:110623. [PMID: 39228791 PMCID: PMC11369404 DOI: 10.1016/j.isci.2024.110623] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Revised: 04/14/2024] [Accepted: 07/29/2024] [Indexed: 09/05/2024] Open
Abstract
Machine learning has the potential to be a powerful tool in the fight against antimicrobial resistance (AMR), a critical global health issue. Machine learning can identify resistance mechanisms from DNA sequence data without prior knowledge. The first step in building a machine learning model is a feature extraction from sequencing data. Traditional methods like single nucleotide polymorphism (SNP) calling and k-mer counting yield numerous, often redundant features, complicating prediction and analysis. In this paper, we propose PanKA, a method using the pangenome to extract a concise set of relevant features for predicting AMR. PanKA not only enables fast model training and prediction but also improves accuracy. Applied to the Escherichia coli and Klebsiella pneumoniae bacterial species, our model is more accurate than conventional and state-of-the-art methods in predicting AMR.
Collapse
Affiliation(s)
- Van Hoan Do
- Center for Applied Mathematics and Informatics, Le Quy Don Technical University, Hanoi, Vietnam
| | - Van Sang Nguyen
- Center for Biomedical Informatics, Vingroup Big Data Institute, Hanoi, Vietnam
| | | | - Duc Quang Le
- Faculty of IT, Hanoi University of Civil Engineering, Hanoi, Vietnam
| | - Tam Thi Nguyen
- Oxford University Clinical Research Unit, Hanoi, Vietnam
| | - Canh Hao Nguyen
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto, Japan
| | - Tho Huu Ho
- Department of Medical Microbiology, The 103 Military Hospital, Vietnam Military Medical University, Hanoi, Vietnam
- Department of Genomics & Cytogenetics, Institute of Biomedicine & Pharmacy, Vietnam Military Medical University, Hanoi, Vietnam
| | - Nam S. Vo
- Center for Biomedical Informatics, Vingroup Big Data Institute, Hanoi, Vietnam
| | | | | | | |
Collapse
|
12
|
Xu L, Li C, Gao S, Zhao L, Guan C, Shen X, Zhu Z, Guo C, Zhang L, Yang C, Bu Q, Zhou B, Xu Y. Personalized Prediction of Long-Term Renal Function Prognosis Following Nephrectomy Using Interpretable Machine Learning Algorithms: Case-Control Study. JMIR Med Inform 2024; 12:e52837. [PMID: 39303280 PMCID: PMC11452755 DOI: 10.2196/52837] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Revised: 04/08/2024] [Accepted: 07/21/2024] [Indexed: 09/22/2024] Open
Abstract
BACKGROUND Acute kidney injury (AKI) is a common adverse outcome following nephrectomy. The progression from AKI to acute kidney disease (AKD) and subsequently to chronic kidney disease (CKD) remains a concern; yet, the predictive mechanisms for these transitions are not fully understood. Interpretable machine learning (ML) models offer insights into how clinical features influence long-term renal function outcomes after nephrectomy, providing a more precise framework for identifying patients at risk and supporting improved clinical decision-making processes. OBJECTIVE This study aimed to (1) evaluate postnephrectomy rates of AKI, AKD, and CKD, analyzing long-term renal outcomes along different trajectories; (2) interpret AKD and CKD models using Shapley Additive Explanations values and Local Interpretable Model-Agnostic Explanations algorithm; and (3) develop a web-based tool for estimating AKD or CKD risk after nephrectomy. METHODS We conducted a retrospective cohort study involving patients who underwent nephrectomy between July 2012 and June 2019. Patient data were randomly split into training, validation, and test sets, maintaining a ratio of 76.5:8.5:15. Eight ML algorithms were used to construct predictive models for postoperative AKD and CKD. The performance of the best-performing models was assessed using various metrics. We used various Shapley Additive Explanations plots and Local Interpretable Model-Agnostic Explanations bar plots to interpret the model and generated directed acyclic graphs to explore the potential causal relationships between features. Additionally, we developed a web-based prediction tool using the top 10 features for AKD prediction and the top 5 features for CKD prediction. RESULTS The study cohort comprised 1559 patients. Incidence rates for AKI, AKD, and CKD were 21.7% (n=330), 15.3% (n=238), and 10.6% (n=165), respectively. Among the evaluated ML models, the Light Gradient-Boosting Machine (LightGBM) model demonstrated superior performance, with an area under the receiver operating characteristic curve of 0.97 for AKD prediction and 0.96 for CKD prediction. Performance metrics and plots highlighted the model's competence in discrimination, calibration, and clinical applicability. Operative duration, hemoglobin, blood loss, urine protein, and hematocrit were identified as the top 5 features associated with predicted AKD. Baseline estimated glomerular filtration rate, pathology, trajectories of renal function, age, and total bilirubin were the top 5 features associated with predicted CKD. Additionally, we developed a web application using the LightGBM model to estimate AKD and CKD risks. CONCLUSIONS An interpretable ML model effectively elucidated its decision-making process in identifying patients at risk of AKD and CKD following nephrectomy by enumerating critical features. The web-based calculator, found on the LightGBM model, can assist in formulating more personalized and evidence-based clinical strategies.
Collapse
Affiliation(s)
- Lingyu Xu
- Department of Nephrology, The Affiliated Hospital of Qingdao University, Qingdao, China
| | - Chenyu Li
- Department of Nephrology, The Affiliated Hospital of Qingdao University, Qingdao, China
- Medizinische Klinik und Poliklinik IV, Klinikum der Universität, Munich, Germany
| | - Shuang Gao
- Ocean University of China, Qingdao, CN, Qingdao, China
| | - Long Zhao
- Department of Nephrology, The Affiliated Hospital of Qingdao University, Qingdao, China
| | - Chen Guan
- Department of Nephrology, The Affiliated Hospital of Qingdao University, Qingdao, China
| | - Xuefei Shen
- Department of Nephrology, The Affiliated Hospital of Qingdao University, Qingdao, China
| | - Zhihui Zhu
- Center of Structural Heart Disease, Beijing Anzhen Hospital, Capital Medical University, Beijing, China
| | - Cheng Guo
- Allianz Technology, Allianz, Munich, Germany
| | - Liwei Zhang
- Institute of Diabetes and Regeneration Research, Helmholtz Diabetes Center, Helmholtz Center Munich, Neuherberg, Germany
| | - Chengyu Yang
- Department of Nephrology, The Affiliated Hospital of Qingdao University, Qingdao, China
| | - Quandong Bu
- Department of Nephrology, The Affiliated Hospital of Qingdao University, Qingdao, China
| | - Bin Zhou
- Department of Nephrology, The Affiliated Hospital of Qingdao University, Qingdao, China
| | - Yan Xu
- Department of Nephrology, The Affiliated Hospital of Qingdao University, Qingdao, China
| |
Collapse
|
13
|
Cheng Q, Wang X. Machine Learning for AI Breeding in Plants. GENOMICS, PROTEOMICS & BIOINFORMATICS 2024; 22:qzae051. [PMID: 38954837 PMCID: PMC11479635 DOI: 10.1093/gpbjnl/qzae051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/11/2024] [Revised: 06/21/2024] [Accepted: 06/25/2024] [Indexed: 07/04/2024]
Affiliation(s)
- Qian Cheng
- State Key Laboratory of Maize Bio-breeding, National Maize Improvement Center, Frontiers Science Center for Molecular Design Breeding, China Agricultural University, Beijing 100094, China
| | - Xiangfeng Wang
- State Key Laboratory of Maize Bio-breeding, National Maize Improvement Center, Frontiers Science Center for Molecular Design Breeding, China Agricultural University, Beijing 100094, China
| |
Collapse
|
14
|
Xiang Y, Xia C, Li L, Wei R, Rong T, Liu H, Lan H. Genomic prediction of yield-related traits and genome-based establishment of heterotic pattern in maize hybrid breeding of Southwest China. FRONTIERS IN PLANT SCIENCE 2024; 15:1441555. [PMID: 39315371 PMCID: PMC11416964 DOI: 10.3389/fpls.2024.1441555] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/31/2024] [Accepted: 08/21/2024] [Indexed: 09/25/2024]
Abstract
When genomic prediction is implemented in breeding maize (Zea mays L.), it can accelerate the breeding process and reduce cost to a large extent. In this study, 11 yield-related traits of maize were used to evaluate four genomic prediction methods including rrBLUP, HEBLP|A, RF, and LightGBM. In all the 11 traits, rrBLUP had similar predictive accuracy to HEBLP|A, and so did RF to LightGBM, but rrBLUP and HEBLP|A outperformed RF and LightGBM in 8 traits. Furthermore, genomic prediction-based heterotic pattern of yield was established based on 64620 crosses of maize in Southwest China, and the result showed that one of the parent lines of the top 5% crosses came from temp-tropic or tropic germplasm, which is highly consistent with the actual situation in breeding, and that heterotic pattern (Reid+ × Suwan+) will be a major heterotic pattern of Southwest China in the future.
Collapse
Affiliation(s)
- Yong Xiang
- Maize Research Institute/State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Sichuan Agricultural University, Chengdu, Sichuan, China
| | - Chao Xia
- Maize Research Institute/State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Sichuan Agricultural University, Chengdu, Sichuan, China
| | - Lujiang Li
- Maize Research Institute/State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Sichuan Agricultural University, Chengdu, Sichuan, China
| | - Rujun Wei
- Maize Research Institute/State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Sichuan Agricultural University, Chengdu, Sichuan, China
| | - Tingzhao Rong
- Maize Research Institute/State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Sichuan Agricultural University, Chengdu, Sichuan, China
| | - Hailan Liu
- Maize Research Institute/State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Sichuan Agricultural University, Chengdu, Sichuan, China
| | - Hai Lan
- Maize Research Institute/State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Sichuan Agricultural University, Chengdu, Sichuan, China
| |
Collapse
|
15
|
Ren Y, Wu C, Zhou H, Hu X, Miao Z. Dual-extraction modeling: A multi-modal deep-learning architecture for phenotypic prediction and functional gene mining of complex traits. PLANT COMMUNICATIONS 2024; 5:101002. [PMID: 38872306 DOI: 10.1016/j.xplc.2024.101002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/25/2024] [Revised: 05/27/2024] [Accepted: 06/11/2024] [Indexed: 06/15/2024]
Abstract
Despite considerable advances in extracting crucial insights from bio-omics data to unravel the intricate mechanisms underlying complex traits, the absence of a universal multi-modal computational tool with robust interpretability for accurate phenotype prediction and identification of trait-associated genes remains a challenge. This study introduces the dual-extraction modeling (DEM) approach, a multi-modal deep-learning architecture designed to extract representative features from heterogeneous omics datasets, enabling the prediction of complex trait phenotypes. Through comprehensive benchmarking experiments, we demonstrate the efficacy of DEM in classification and regression prediction of complex traits. DEM consistently exhibits superior accuracy, robustness, generalizability, and flexibility. Notably, we establish its effectiveness in predicting pleiotropic genes that influence both flowering time and rosette leaf number, underscoring its commendable interpretability. In addition, we have developed user-friendly software to facilitate seamless utilization of DEM's functions. In summary, this study presents a state-of-the-art approach with the ability to effectively predict qualitative and quantitative traits and identify functional genes, confirming its potential as a valuable tool for exploring the genetic basis of complex traits.
Collapse
Affiliation(s)
- Yanlin Ren
- State Key Laboratory for Crop Stress Resistance and High-Efficiency Production, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Chenhua Wu
- State Key Laboratory for Crop Stress Resistance and High-Efficiency Production, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - He Zhou
- State Key Laboratory for Crop Stress Resistance and High-Efficiency Production, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Xiaona Hu
- College of Chemistry & Pharmacy, Northwest A&F University, Yangling, Shaanxi 712100, China.
| | - Zhenyan Miao
- State Key Laboratory for Crop Stress Resistance and High-Efficiency Production, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, Shaanxi 712100, China; Key Laboratory of Biology and Genetics Improvement of Maize in Arid Area of Northwest Region, Ministry of Agriculture, Northwest A&F University, Yangling, Shaanxi 712100, China.
| |
Collapse
|
16
|
Ge R, Xia Y, Jiang M, Jia G, Jing X, Li Y, Cai Y. HybAVPnet: A Novel Hybrid Network Architecture for Antiviral Peptides Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:1358-1365. [PMID: 38587961 DOI: 10.1109/tcbb.2024.3385635] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/10/2024]
Abstract
Viruses pose a great threat to human production and life, thus the research and development of antiviral drugs is urgently needed. Antiviral peptides play an important role in drug design and development. Compared with the time-consuming and laborious wet chemical experiment methods, it is critical to use computational methods to predict antiviral peptides accurately and rapidly. However, due to limited data, accurate prediction of antiviral peptides is still challenging and extracting effective feature representations from sequences is crucial for creating accurate models. This study introduces a novel two-step approach, named HybAVPnet, to predict antiviral peptides with a hybrid network architecture based on neural networks and traditional machine learning methods. We adopted a stacking-like structure to capture both the long-term dependencies and local evolution information to achieve a comprehensive and diverse prediction using the predicted labels and probabilities. Using an ensemble technique with the different kinds of features can reduce the variance without increasing the bias. The experimental result shows HybAVPnet can achieve better and more robust performance compared with the state-of-the-art methods, which makes it useful for the research and development of antiviral drugs. Meanwhile, it can also be extended to other peptide recognition problems because of its generalization ability.
Collapse
|
17
|
Ba Q, Yuan X, Wang Y, Shen N, Xie H, Lu Y. Development and Validation of Machine Learning Algorithms for Prediction of Colorectal Polyps Based on Electronic Health Records. Biomedicines 2024; 12:1955. [PMID: 39335469 PMCID: PMC11429196 DOI: 10.3390/biomedicines12091955] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2024] [Revised: 08/02/2024] [Accepted: 08/22/2024] [Indexed: 09/30/2024] Open
Abstract
BACKGROUND Colorectal Polyps are the main source of precancerous lesions in colorectal cancer. To increase the early diagnosis of tumors and improve their screening, we aimed to develop a simple and non-invasive diagnostic prediction model for colorectal polyps based on machine learning (ML) and using accessible health examination records. METHODS We conducted a single-center observational retrospective study in China. The derivation cohort, consisting of 5426 individuals who underwent colonoscopy screening from January 2021 to January 2024, was separated for training (cohort 1) and validation (cohort 2). The variables considered in this study included demographic data, vital signs, and laboratory results recorded by health examination records. With features selected by univariate analysis and Lasso regression analysis, nine machine learning methods were utilized to develop a colorectal polyp diagnostic model. Several evaluation indexes, including the area under the receiver-operating-characteristic curve (AUC), were used to compare the predictive performance. The SHapley additive explanation method (SHAP) was used to rank the feature importance and explain the final model. RESULTS 14 independent predictors were identified as the most valuable features to establish the models. The adaptive boosting machine (AdaBoost) model exhibited the best performance among the 9 ML models in cohort 1, with accuracy, sensitivity, specificity, positive predictive value, negative predictive value, F1 score, and AUC (95% CI) of 0.632 (0.618-0.646), 0.635 (0.550-0.721), 0.674 (0.591-0.758), 0.593 (0.576-0.611), 0.673 (0.654-0.691), 0.608 (0.560-0.655) and 0.687 (0.626-0.749), respectively. The final model gave an AUC of 0.675 in cohort 2. Additionally, the precision recall (PR) curve for the AdaBoost model reached the highest AUPR of 0.648, positioning it nearest to the upper right corner. SHAP analysis provided visualized explanations, reaffirming the critical factors associated with the risk of colorectal polyps in the asymptomatic population. CONCLUSIONS This study integrated the clinical and laboratory indicators with machine learning techniques to establish the predictive model for colorectal polyps, providing non-invasive, cost-effective screening strategies for asymptomatic individuals and guiding decisions for further examination and treatment.
Collapse
Affiliation(s)
- Qinwen Ba
- Department of Laboratory Medicine, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China
| | - Xu Yuan
- Department of Laboratory Medicine, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China
| | - Yun Wang
- Department of Laboratory Medicine, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China
| | - Na Shen
- Department of Laboratory Medicine, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China
| | - Huaping Xie
- Department of Gastroenterology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China
| | - Yanjun Lu
- Department of Laboratory Medicine, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China
| |
Collapse
|
18
|
Deng L, Zhao J, Wang T, Liu B, Jiang J, Jia P, Liu D, Li G. Construction and validation of predictive models for intravenous immunoglobulin-resistant Kawasaki disease using an interpretable machine learning approach. Clin Exp Pediatr 2024; 67:405-414. [PMID: 39048087 PMCID: PMC11298769 DOI: 10.3345/cep.2024.00549] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/26/2024] [Revised: 04/27/2024] [Accepted: 05/10/2024] [Indexed: 07/27/2024] Open
Abstract
BACKGROUND Intravenous immunoglobulin (IVIG)-resistant Kawasaki disease is associated with coronary artery lesion development. PURPOSE This study aimed to explore the factors associated with IVIG-resistance and construct and validate an interpretable machine learning (ML) prediction model in clinical practice. METHODS Between December 2014 and November 2022, 602 patients were screened and risk factors for IVIG-resistance investigated. Five ML models are used to establish an optimal prediction model. The SHapley Additive exPlanations (SHAP) method was used to interpret the ML model. RESULTS Na+, hemoglobin (Hb), C-reactive protein (CRP), and globulin were independent risk factors for IVIG-resistance. A nonlinear relationship was identified between globulin level and IVIG-resistance. The XGBoost model exhibited excellent performance, with an area under the receiver operating characteristic curve of 0.821, accuracy of 0.748, sensitivity of 0.889, and specificity of 0.683 in the testing set. The XGBoost model was interpreted globally and locally using the SHAP method. CONCLUSION Na+, Hb, CRP, and globulin levels were independently associated with IVIG-resistance. Our findings demonstrate that ML models can reliably predict IVIG-resistance. Moreover, use of the SHAP method to interpret the established XGBoost model's findings would provide evidence of IVIG-resistance and guide the individualized treatment of Kawasaki disease.
Collapse
Affiliation(s)
- Linfan Deng
- Department of Pediatrics, The Affiliated Hospital of Southwest Medical University, Luzhou, China
- Sichuan Clinical Research Center for Birth Defects, Luzhou, China
- Mianyang Central Hospital, School of Medicine, University of Electronic Science and Technology of China, Mianyang, China
| | - Jian Zhao
- Department of Pediatrics, The Affiliated Hospital of Southwest Medical University, Luzhou, China
- Sichuan Clinical Research Center for Birth Defects, Luzhou, China
| | - Ting Wang
- Department of Pediatrics, The Affiliated Hospital of Southwest Medical University, Luzhou, China
- Sichuan Clinical Research Center for Birth Defects, Luzhou, China
| | - Bin Liu
- Department of Pediatrics, The Affiliated Hospital of Southwest Medical University, Luzhou, China
- Sichuan Clinical Research Center for Birth Defects, Luzhou, China
| | - Jun Jiang
- Department of General Surgery (Thyroid Surgery), The Affiliated Hospital of Southwest Medical University, Luzhou, China
- Metabolic Vascular Diseases Key Laboratory of Sichuan Province, Luzhou, China
| | - Peng Jia
- Department of Pediatrics, Zigong First People’s Hospital, Zigong, China
| | - Dong Liu
- Department of Pediatrics, The Affiliated Hospital of Southwest Medical University, Luzhou, China
- Sichuan Clinical Research Center for Birth Defects, Luzhou, China
| | - Gang Li
- Department of Pediatrics, The Affiliated Hospital of Southwest Medical University, Luzhou, China
- Sichuan Clinical Research Center for Birth Defects, Luzhou, China
| |
Collapse
|
19
|
Abou Hajal A, Bryce RA, Amor BB, Atatreh N, Ghattas MA. Boosting the Accuracy and Chemical Space Coverage of the Detection of Small Colloidal Aggregating Molecules Using the BAD Molecule Filter. J Chem Inf Model 2024; 64:4991-5005. [PMID: 38920403 DOI: 10.1021/acs.jcim.4c00363] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/27/2024]
Abstract
The ability to conduct effective high throughput screening (HTS) campaigns in drug discovery is often hampered by the detection of false positives in these assays due to small colloidally aggregating molecules (SCAMs). SCAMs can produce artifactual hits in HTS by nonspecific inhibition of the protein target. In this work, we present a new computational prediction tool for detecting SCAMs based on their 2D chemical structure. The tool, called the boosted aggregation detection (BAD) molecule filter, employs decision tree ensemble methods, namely, the CatBoost classifier and the light gradient-boosting machine, to significantly improve the detection of SCAMs. In developing the filter, we explore models trained on individual data sets, a consensus approach using these models, and, third, a merged data set approach, each tailored for specific drug discovery needs. The individual data set method emerged as most effective, achieving 93% sensitivity and 90% specificity, outperforming existing state-of-the-art models by 20 and 5%, respectively. The consensus models offer broader chemical space coverage, exceeding 90% for all testing sets. This feature is an important aspect particularly for early stage medicinal chemistry projects, and provides information on applicability domain. Meanwhile, the merged data set models demonstrated robust performance, with a notable sensitivity of 79% in the comprehensive 10-fold cross-validation test set. A SHAP analysis of model features indicates the importance of hydrophobicity and molecular complexity as primary factors influencing the aggregation propensity. The BAD molecule filter is readily accessible for the public usage on https://molmodlab-aau.com/Tools.html. This filter provides a new, more robust tool for aggregate prediction in the early stages of drug discovery to optimize hit rates and reduce associated testing and validation overheads.
Collapse
Affiliation(s)
- Abdallah Abou Hajal
- College of Pharmacy, Al Ain University, Abu Dhabi 112612, United Arab Emirates
- AAU Health and Biomedical Research Center, Al Ain University, Abu Dhabi 112612, United Arab Emirates
| | - Richard A Bryce
- Division of Pharmacy and Optometry, School of Health Sciences, University of Manchester, Oxford Road, Manchester M13 9PL, U.K
| | - Boulbaba Ben Amor
- Core42, Inception/G42, Abu Dhabi 2282, United Arab Emirates
- IMT Nord Europe, Villeneuve D'Ascq 59650 France
| | - Noor Atatreh
- College of Pharmacy, Al Ain University, Abu Dhabi 112612, United Arab Emirates
- AAU Health and Biomedical Research Center, Al Ain University, Abu Dhabi 112612, United Arab Emirates
| | - Mohammad A Ghattas
- College of Pharmacy, Al Ain University, Abu Dhabi 112612, United Arab Emirates
- AAU Health and Biomedical Research Center, Al Ain University, Abu Dhabi 112612, United Arab Emirates
| |
Collapse
|
20
|
Li J, Zhang D, Yang F, Zhang Q, Pan S, Zhao X, Zhang Q, Han Y, Yang J, Wang K, Zhao C. TrG2P: A transfer-learning-based tool integrating multi-trait data for accurate prediction of crop yield. PLANT COMMUNICATIONS 2024; 5:100975. [PMID: 38751121 PMCID: PMC11287160 DOI: 10.1016/j.xplc.2024.100975] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Revised: 04/14/2024] [Accepted: 05/11/2024] [Indexed: 06/24/2024]
Abstract
Yield prediction is the primary goal of genomic selection (GS)-assisted crop breeding. Because yield is a complex quantitative trait, making predictions from genotypic data is challenging. Transfer learning can produce an effective model for a target task by leveraging knowledge from a different, but related, source domain and is considered a great potential method for improving yield prediction by integrating multi-trait data. However, it has not previously been applied to genotype-to-phenotype prediction owing to the lack of an efficient implementation framework. We therefore developed TrG2P, a transfer-learning-based framework. TrG2P first employs convolutional neural networks (CNN) to train models using non-yield-trait phenotypic and genotypic data, thus obtaining pre-trained models. Subsequently, the convolutional layer parameters from these pre-trained models are transferred to the yield prediction task, and the fully connected layers are retrained, thus obtaining fine-tuned models. Finally, the convolutional layer and the first fully connected layer of the fine-tuned models are fused, and the last fully connected layer is trained to enhance prediction performance. We applied TrG2P to five sets of genotypic and phenotypic data from maize (Zea mays), rice (Oryza sativa), and wheat (Triticum aestivum) and compared its model precision to that of seven other popular GS tools: ridge regression best linear unbiased prediction (rrBLUP), random forest, support vector regression, light gradient boosting machine (LightGBM), CNN, DeepGS, and deep neural network for genomic prediction (DNNGP). TrG2P improved the accuracy of yield prediction by 39.9%, 6.8%, and 1.8% in rice, maize, and wheat, respectively, compared with predictions generated by the best-performing comparison model. Our work therefore demonstrates that transfer learning is an effective strategy for improving yield prediction by integrating information from non-yield-trait data. We attribute its enhanced prediction accuracy to the valuable information available from traits associated with yield and to training dataset augmentation. The Python implementation of TrG2P is available at https://github.com/lijinlong1991/TrG2P. The web-based tool is available at http://trg2p.ebreed.cn:81.
Collapse
Affiliation(s)
- Jinlong Li
- Information Technology Research Center, Beijing Academy of Agriculture and Forestry Sciences, Beijing 100097, China; National Engineering Research Center for Information Technology in Agriculture, Beijing 100097, China
| | - Dongfeng Zhang
- Information Technology Research Center, Beijing Academy of Agriculture and Forestry Sciences, Beijing 100097, China; National Engineering Research Center for Information Technology in Agriculture, Beijing 100097, China
| | - Feng Yang
- Information Technology Research Center, Beijing Academy of Agriculture and Forestry Sciences, Beijing 100097, China; National Engineering Research Center for Information Technology in Agriculture, Beijing 100097, China
| | - Qiusi Zhang
- Information Technology Research Center, Beijing Academy of Agriculture and Forestry Sciences, Beijing 100097, China; National Engineering Research Center for Information Technology in Agriculture, Beijing 100097, China
| | - Shouhui Pan
- Information Technology Research Center, Beijing Academy of Agriculture and Forestry Sciences, Beijing 100097, China; National Engineering Research Center for Information Technology in Agriculture, Beijing 100097, China
| | - Xiangyu Zhao
- Information Technology Research Center, Beijing Academy of Agriculture and Forestry Sciences, Beijing 100097, China; National Engineering Research Center for Information Technology in Agriculture, Beijing 100097, China
| | - Qi Zhang
- Information Technology Research Center, Beijing Academy of Agriculture and Forestry Sciences, Beijing 100097, China; National Engineering Research Center for Information Technology in Agriculture, Beijing 100097, China
| | - Yanyun Han
- Information Technology Research Center, Beijing Academy of Agriculture and Forestry Sciences, Beijing 100097, China; National Engineering Research Center for Information Technology in Agriculture, Beijing 100097, China
| | - Jinliang Yang
- Department of Agronomy and Horticulture, University of Nebraska-Lincoln, Lincoln, NE 68583, USA; Center for Plant Science Innovation, University of Nebraska-Lincoln, Lincoln, NE 68583, USA
| | - Kaiyi Wang
- Information Technology Research Center, Beijing Academy of Agriculture and Forestry Sciences, Beijing 100097, China; National Engineering Research Center for Information Technology in Agriculture, Beijing 100097, China.
| | - Chunjiang Zhao
- Information Technology Research Center, Beijing Academy of Agriculture and Forestry Sciences, Beijing 100097, China; National Engineering Research Center for Information Technology in Agriculture, Beijing 100097, China.
| |
Collapse
|
21
|
Duan H, Dai X, Shi Q, Cheng Y, Ge Y, Chang S, Liu W, Wang F, Shi H, Hu J. Enhancing genome-wide populus trait prediction through deep convolutional neural networks. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2024; 119:735-745. [PMID: 38741374 DOI: 10.1111/tpj.16790] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/12/2024] [Revised: 04/02/2024] [Accepted: 04/18/2024] [Indexed: 05/16/2024]
Abstract
As a promising model, genome-based plant breeding has greatly promoted the improvement of agronomic traits. Traditional methods typically adopt linear regression models with clear assumptions, neither obtaining the linkage between phenotype and genotype nor providing good ideas for modification. Nonlinear models are well characterized in capturing complex nonadditive effects, filling this gap under traditional methods. Taking populus as the research object, this paper constructs a deep learning method, DCNGP, which can effectively predict the traits including 65 phenotypes. The method was trained on three datasets, and compared with other four classic models-Bayesian ridge regression (BRR), Elastic Net, support vector regression, and dualCNN. The results show that DCNGP has five typical advantages in performance: strong prediction ability on multiple experimental datasets; the incorporation of batch normalization layers and Early-Stopping technology enhancing the generalization capabilities and prediction stability on test data; learning potent features from the data and thus circumventing the tedious steps of manual production; the introduction of a Gaussian Noise layer enhancing predictive capabilities in the case of inherent uncertainties or perturbations; fewer hyperparameters aiding to reduce tuning time across datasets and improve auto-search efficiency. In this way, DCNGP shows powerful predictive ability from genotype to phenotype, which provide an important theoretical reference for building more robust populus breeding programs.
Collapse
Affiliation(s)
- Huaichuan Duan
- Laboratory of Tumor Targeted and Immune Therapy, Clinical Research Center for Breast, State Key Laboratory of Biotherapy, West China Hospital, Sichuan University and Collaborative Innovation Center for Biotherapy, Chengdu, China
- Key Laboratory of Medicinal and Edible Plants Resources Development of Sichuan Education Department, School of Pharmacy, Chengdu University, Chengdu, China
| | - Xiangwei Dai
- School of Computer Science and Artificial Intelligence, Changzhou University, Changzhou, China
| | - Quanshan Shi
- Key Laboratory of Medicinal and Edible Plants Resources Development of Sichuan Education Department, School of Pharmacy, Chengdu University, Chengdu, China
| | - Yan Cheng
- Laboratory of Tumor Targeted and Immune Therapy, Clinical Research Center for Breast, State Key Laboratory of Biotherapy, West China Hospital, Sichuan University and Collaborative Innovation Center for Biotherapy, Chengdu, China
| | - Yutong Ge
- Key Laboratory of Medicinal and Edible Plants Resources Development of Sichuan Education Department, School of Pharmacy, Chengdu University, Chengdu, China
| | - Shan Chang
- School of Computer Science and Artificial Intelligence, Changzhou University, Changzhou, China
| | - Wei Liu
- School of Life Science, Leshan Normal University, Leshan, China
| | - Feng Wang
- School of Computer Science and Artificial Intelligence, Changzhou University, Changzhou, China
- School of Computer Engineering, Suzhou Vocational University, Suzhou, China
| | - Hubing Shi
- Laboratory of Tumor Targeted and Immune Therapy, Clinical Research Center for Breast, State Key Laboratory of Biotherapy, West China Hospital, Sichuan University and Collaborative Innovation Center for Biotherapy, Chengdu, China
| | - Jianping Hu
- Key Laboratory of Medicinal and Edible Plants Resources Development of Sichuan Education Department, School of Pharmacy, Chengdu University, Chengdu, China
| |
Collapse
|
22
|
Yang S, Xu P. HemoDL: Hemolytic peptides prediction by double ensemble engines from Rich sequence-derived and transformer-enhanced information. Anal Biochem 2024; 690:115523. [PMID: 38552762 DOI: 10.1016/j.ab.2024.115523] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Revised: 03/20/2024] [Accepted: 03/22/2024] [Indexed: 04/02/2024]
Abstract
Hemolytic peptides can trigger hemolysis by rupturing red blood cells' membranes and triggering cell disruption. Due to the labor-intensive and time-consuming in-lab identification process, accurate, high-throughput hemolytic peptide prediction is crucial for the growth of peptide sequence data in proteomics and peptidomics. In this study, we offer the HemoDL ensemble learning model, which learns the distinct distribution of sequence characteristics for predicting the hemolytic activity of peptides using a double LightGBM framework. To determine the most informative encoding features, we compare 17 widely used features across four benchmark datasets. Our investigation reveals that CTD, BPF, Charge, AAC, GDPC, ATC, QSO, and transformer-based features exhibit more positive contributions to detecting the hemolytic activity of peptides. Comparison with eight state-of-the-art methods demonstrates that HemoDL outperforms other models, attaining higher Matthews Correlation Coefficient values on four test datasets, ranging from 6.30% to 16.04%, 6.63%-11.26%, 4.76%-9.92%, and 7.41%-15.03%, respectively. Additionally, we provide the HemoDL with a user-friendly graphical interface available at https://github.com/abcair/HemoDL. In summary, the HemoDL model, leveraging CTD, BPF, Charge, AAC, GDPC, ATC, QSO and transformer-based encoding features within a double LightGBM learning framework, achieves high accuracy in predicting the hemolytic activity of peptides.
Collapse
Affiliation(s)
- Sen Yang
- School of Computer Science and Artificial Intelligence Aliyun School of Big Data School of Software, Changzhou University, Changzhou, 213164, China; The Affiliated Changzhou No.2 People's Hospital of Nanjing Medical University, Changzhou, 213164, China
| | - Piao Xu
- College of Economics and Management, Nanjing Forestry University, China.
| |
Collapse
|
23
|
Li H, Jiang L, Yang K, Shang S, Li M, Lv Z. iNP_ESM: Neuropeptide Identification Based on Evolutionary Scale Modeling and Unified Representation Embedding Features. Int J Mol Sci 2024; 25:7049. [PMID: 39000158 PMCID: PMC11240975 DOI: 10.3390/ijms25137049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2024] [Revised: 06/17/2024] [Accepted: 06/25/2024] [Indexed: 07/16/2024] Open
Abstract
Neuropeptides are biomolecules with crucial physiological functions. Accurate identification of neuropeptides is essential for understanding nervous system regulatory mechanisms. However, traditional analysis methods are expensive and laborious, and the development of effective machine learning models continues to be a subject of current research. Hence, in this research, we constructed an SVM-based machine learning neuropeptide predictor, iNP_ESM, by integrating protein language models Evolutionary Scale Modeling (ESM) and Unified Representation (UniRep) for the first time. Our model utilized feature fusion and feature selection strategies to improve prediction accuracy during optimization. In addition, we validated the effectiveness of the optimization strategy with UMAP (Uniform Manifold Approximation and Projection) visualization. iNP_ESM outperforms existing models on a variety of machine learning evaluation metrics, with an accuracy of up to 0.937 in cross-validation and 0.928 in independent testing, demonstrating optimal neuropeptide recognition capabilities. We anticipate improved neuropeptide data in the future, and we believe that the iNP_ESM model will have broader applications in the research and clinical treatment of neurological diseases.
Collapse
Affiliation(s)
- Honghao Li
- College of Biomedical Engineering, Sichuan University, Chengdu 610041, China
| | - Liangzhen Jiang
- College of Food and Biological Engineering, Chengdu University, Chengdu 610106, China
- Country Key Laboratory of Coarse Cereal Processing, Ministry of Agriculture and Rural Affairs, Chengdu 610106, China
| | - Kaixiang Yang
- College of Software Engineering, Sichuan University, Chengdu 610041, China
| | - Shulin Shang
- College of Biomedical Engineering, Sichuan University, Chengdu 610041, China
| | - Mingxin Li
- College of Biomedical Engineering, Sichuan University, Chengdu 610041, China
| | - Zhibin Lv
- College of Biomedical Engineering, Sichuan University, Chengdu 610041, China
| |
Collapse
|
24
|
Wang XY, Ren CX, Fan QW, Xu YP, Wang LW, Mao ZL, Cai XZ. Integrated Assays of Genome-Wide Association Study, Multi-Omics Co-Localization, and Machine Learning Associated Calcium Signaling Genes with Oilseed Rape Resistance to Sclerotinia sclerotiorum. Int J Mol Sci 2024; 25:6932. [PMID: 39000053 PMCID: PMC11240920 DOI: 10.3390/ijms25136932] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2024] [Revised: 06/20/2024] [Accepted: 06/20/2024] [Indexed: 07/14/2024] Open
Abstract
Sclerotinia sclerotiorum (Ss) is one of the most devastating fungal pathogens, causing huge yield loss in multiple economically important crops including oilseed rape. Plant resistance to Ss pertains to quantitative disease resistance (QDR) controlled by multiple minor genes. Genome-wide identification of genes involved in QDR to Ss is yet to be conducted. In this study, we integrated several assays including genome-wide association study (GWAS), multi-omics co-localization, and machine learning prediction to identify, on a genome-wide scale, genes involved in the oilseed rape QDR to Ss. Employing GWAS and multi-omics co-localization, we identified seven resistance-associated loci (RALs) associated with oilseed rape resistance to Ss. Furthermore, we developed a machine learning algorithm and named it Integrative Multi-Omics Analysis and Machine Learning for Target Gene Prediction (iMAP), which integrates multi-omics data to rapidly predict disease resistance-related genes within a broad chromosomal region. Through iMAP based on the identified RALs, we revealed multiple calcium signaling genes related to the QDR to Ss. Population-level analysis of selective sweeps and haplotypes of variants confirmed the positive selection of the predicted calcium signaling genes during evolution. Overall, this study has developed an algorithm that integrates multi-omics data and machine learning methods, providing a powerful tool for predicting target genes associated with specific traits. Furthermore, it makes a basis for further understanding the role and mechanisms of calcium signaling genes in the QDR to Ss.
Collapse
Affiliation(s)
- Xin-Yao Wang
- Key Laboratory of Biology and Ecological Control of Crop Pathogens and Insects of Zhejiang Province, Institute of Biotechnology, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 310058, China; (X.-Y.W.); (C.-X.R.); (Q.-W.F.); (L.-W.W.); (Z.-L.M.)
| | - Chun-Xiu Ren
- Key Laboratory of Biology and Ecological Control of Crop Pathogens and Insects of Zhejiang Province, Institute of Biotechnology, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 310058, China; (X.-Y.W.); (C.-X.R.); (Q.-W.F.); (L.-W.W.); (Z.-L.M.)
| | - Qing-Wen Fan
- Key Laboratory of Biology and Ecological Control of Crop Pathogens and Insects of Zhejiang Province, Institute of Biotechnology, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 310058, China; (X.-Y.W.); (C.-X.R.); (Q.-W.F.); (L.-W.W.); (Z.-L.M.)
| | - You-Ping Xu
- Centre of Analysis and Measurement, Zhejiang University, 866 Yu Hang Tang Road, Hangzhou 310058, China;
| | - Lu-Wen Wang
- Key Laboratory of Biology and Ecological Control of Crop Pathogens and Insects of Zhejiang Province, Institute of Biotechnology, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 310058, China; (X.-Y.W.); (C.-X.R.); (Q.-W.F.); (L.-W.W.); (Z.-L.M.)
| | - Zhou-Lu Mao
- Key Laboratory of Biology and Ecological Control of Crop Pathogens and Insects of Zhejiang Province, Institute of Biotechnology, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 310058, China; (X.-Y.W.); (C.-X.R.); (Q.-W.F.); (L.-W.W.); (Z.-L.M.)
| | - Xin-Zhong Cai
- Key Laboratory of Biology and Ecological Control of Crop Pathogens and Insects of Zhejiang Province, Institute of Biotechnology, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 310058, China; (X.-Y.W.); (C.-X.R.); (Q.-W.F.); (L.-W.W.); (Z.-L.M.)
- Hainan Institute, Zhejiang University, Sanya 572025, China
| |
Collapse
|
25
|
Lin N, Shao X, Wu H, Jiang R, Wu M. Heavy Metal Concentration Estimation for Different Farmland Soils Based on Projection Pursuit and LightGBM with Hyperspectral Images. SENSORS (BASEL, SWITZERLAND) 2024; 24:3251. [PMID: 38794105 PMCID: PMC11125194 DOI: 10.3390/s24103251] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/15/2024] [Revised: 05/12/2024] [Accepted: 05/19/2024] [Indexed: 05/26/2024]
Abstract
Heavy metal pollution in farmland soil threatens soil environmental quality. It is an important task to quickly grasp the status of heavy metal pollution in farmland soil in a region. Hyperspectral remote sensing technology has been widely used in soil heavy metal concentration monitoring. How to improve the accuracy and reliability of its estimation model is a hot topic. This study analyzed 440 soil samples from Sihe Town and the surrounding agricultural areas in Yushu City, Jilin Province. Considering the differences between different types of soils, a local regression model of heavy metal concentrations (As and Cu) was established based on projection pursuit (PP) and light gradient boosting machine (LightGBM) algorithms. Based on the estimations, a spatial distribution map of soil heavy metals in the region was drawn. The findings of this study showed that considering the differences between different soils to construct a local regression estimation model of soil heavy metal concentration improved the estimation accuracy. Specifically, the relative percent difference (RPD) of As and Cu element estimations in black soil increased the most, by 0.30 and 0.26, respectively. The regional spatial distribution map of heavy metal concentration derived from local regression showed high spatial variability. The number of characteristic bands screened by the PP method accounted for 10-13% of the total spectral bands, effectively reducing the model complexity. Compared with the traditional machine model, the LightGBM model showed better estimation ability, and the highest determination coefficients (R2) of different soil validation sets reached 0.73 (As) and 0.75 (Cu), respectively. In this study, the constructed PP-LightGBM estimation model takes into account the differences in soil types, which effectively improves the accuracy and reliability of hyperspectral image estimation of soil heavy metal concentration and provides a reference for drawing large-scale spatial distributions of heavy metals from hyperspectral images and mastering soil environmental quality.
Collapse
Affiliation(s)
- Nan Lin
- College of Surveying and Exploration Engineering, Jilin Jianzhu University, Changchun 130118, China; (N.L.); (X.S.); (M.W.)
- Jilin Province Natural Resources Remote Sensing Information Technology Innovation Laboratory, Changchun 130118, China
| | - Xiaofan Shao
- College of Surveying and Exploration Engineering, Jilin Jianzhu University, Changchun 130118, China; (N.L.); (X.S.); (M.W.)
| | - Huizhi Wu
- Henan Academy of Geology, Zhengzhou 450016, China
| | - Ranzhe Jiang
- College of Biological and Agricultural Engineering, Jilin University, Changchun 130012, China;
| | - Menghong Wu
- College of Surveying and Exploration Engineering, Jilin Jianzhu University, Changchun 130118, China; (N.L.); (X.S.); (M.W.)
- College of Resource and Environmental Science, Jilin Agricultural University, Changchun 130118, China
| |
Collapse
|
26
|
Tong K, Chen X, Yan S, Dai L, Liao Y, Li Z, Wang T. PlantMine: A Machine-Learning Framework to Detect Core SNPs in Rice Genomics. Genes (Basel) 2024; 15:603. [PMID: 38790232 PMCID: PMC11120712 DOI: 10.3390/genes15050603] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2024] [Revised: 05/05/2024] [Accepted: 05/07/2024] [Indexed: 05/26/2024] Open
Abstract
As a fundamental global staple crop, rice plays a pivotal role in human nutrition and agricultural production systems. However, its complex genetic architecture and extensive trait variability pose challenges for breeders and researchers in optimizing yield and quality. Particularly to expedite breeding methods like genomic selection, isolating core SNPs related to target traits from genome-wide data reduces irrelevant mutation noise, enhancing computational precision and efficiency. Thus, exploring efficient computational approaches to mine core SNPs is of great importance. This study introduces PlantMine, an innovative computational framework that integrates feature selection and machine learning techniques to effectively identify core SNPs critical for the improvement of rice traits. Utilizing the dataset from the 3000 Rice Genomes Project, we applied different algorithms for analysis. The findings underscore the effectiveness of combining feature selection with machine learning in accurately identifying core SNPs, offering a promising avenue to expedite rice breeding efforts and improve crop productivity and resilience to stress.
Collapse
Affiliation(s)
- Kai Tong
- School of Biological Engineering, Sichuan University of Science & Engineering, Yibin 644000, China; (K.T.); (L.D.); (Y.L.)
| | - Xiaojing Chen
- National Agriculture Science Data Center, Agricultural Information Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China;
- National Nanfan Research Institute, Chinese Academy of Agricultural Sciences, Sanya 572024, China
| | - Shen Yan
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China;
| | - Liangli Dai
- School of Biological Engineering, Sichuan University of Science & Engineering, Yibin 644000, China; (K.T.); (L.D.); (Y.L.)
| | - Yuxue Liao
- School of Biological Engineering, Sichuan University of Science & Engineering, Yibin 644000, China; (K.T.); (L.D.); (Y.L.)
| | - Zhaoling Li
- School of Biological Engineering, Sichuan University of Science & Engineering, Yibin 644000, China; (K.T.); (L.D.); (Y.L.)
| | - Ting Wang
- Agricultural Information Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China
- Key Laboratory of Big Agri-Data, Ministry of Agriculture and Rural Areas, Beijing 100081, China
| |
Collapse
|
27
|
Ehara Y, Inui A, Mifune Y, Nishimoto H, Yamaura K, Kato T, Furukawa T, Tanaka S, Kusunose M, Takigami S, Kuroda R. Estimating the Thumb Rotation Angle by Using a Tablet Device With a Posture Estimation Artificial Intelligence Model. Cureus 2024; 16:e59657. [PMID: 38707751 PMCID: PMC11069636 DOI: 10.7759/cureus.59657] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/03/2024] [Indexed: 05/07/2024] Open
Abstract
MediaPipe Hand (MediaPipe) is an artificial intelligence (AI)-based pose estimation library. In this study, MediaPipe was combined with four machine learning (ML) models to estimate the rotation angle of the thumb. Videos of the right hands of 15 healthy volunteers were recorded and processed into 9000 images. The rotation angle of the thumb (defined as angle θ from the palmar plane, which is defined as 0°) was measured using an angle measuring device, expressed in a radian system. Angle θ was then estimated by the ML model by using parameters calculated from the hand coordinates detected by MediaPipe. The linear regression model showed a root mean square error (RMSE) of 12.23, a mean absolute error (MAE) of 9.9, and a correlation coefficient of 0.91. The ElasticNet model showed an RMSE of 12.23, an MAE of 9.95, and a correlation coefficient of 0.91; the support vector machine (SVM) model showed an RMSE of 4.7, an MAE of 2.5, and a correlation coefficient of 0.99. The LightGBM model achieved high values: an RMSE of 4.58, an MAE of 2.62, and a correlation coefficient of 0.99. Based on these findings, we concluded that the thumb rotation angle can be estimated with high accuracy by combining MediaPipe and ML.
Collapse
Affiliation(s)
- Yutaka Ehara
- Department of Orthopaedic Surgery, Kobe University Graduate School of Medicine, Kobe, JPN
| | - Atsuyuki Inui
- Department of Orthopaedic Surgery, Kobe University Graduate School of Medicine, Kobe, JPN
| | - Yutaka Mifune
- Department of Orthopaedic Surgery, Kobe University Graduate School of Medicine, Kobe, JPN
| | - Hanako Nishimoto
- Department of Orthopaedic Surgery, Kobe University Graduate School of Medicine, Kobe, JPN
| | - Kohei Yamaura
- Department of Orthopaedic Surgery, Kobe University Graduate School of Medicine, Kobe, JPN
| | - Tatsuo Kato
- Department of Orthopaedic Surgery, Kobe University Graduate School of Medicine, Kobe, JPN
| | - Takahiro Furukawa
- Department of Orthopaedic Surgery, Kobe University Graduate School of Medicine, Kobe, JPN
| | - Shuya Tanaka
- Department of Orthopaedic Surgery, Kobe University Graduate School of Medicine, Kobe, JPN
| | - Masaya Kusunose
- Department of Orthopaedic Surgery, Kobe University Graduate School of Medicine, Kobe, JPN
| | - Shunsaku Takigami
- Department of Orthopaedic Surgery, Kobe University Graduate School of Medicine, Kobe, JPN
| | - Ryosuke Kuroda
- Department of Orthopaedic Surgery, Kobe University Graduate School of Medicine, Kobe, JPN
| |
Collapse
|
28
|
Yang X, Yu S, Yan S, Wang H, Fang W, Chen Y, Ma X, Han L. Progress in Rice Breeding Based on Genomic Research. Genes (Basel) 2024; 15:564. [PMID: 38790193 PMCID: PMC11121554 DOI: 10.3390/genes15050564] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Revised: 04/18/2024] [Accepted: 04/25/2024] [Indexed: 05/26/2024] Open
Abstract
The role of rice genomics in breeding progress is becoming increasingly important. Deeper research into the rice genome will contribute to the identification and utilization of outstanding functional genes, enriching the diversity and genetic basis of breeding materials and meeting the diverse demands for various improvements. Here, we review the significant contributions of rice genomics research to breeding progress over the last 25 years, discussing the profound impact of genomics on rice genome sequencing, functional gene exploration, and novel breeding methods, and we provide valuable insights for future research and breeding practices.
Collapse
Affiliation(s)
- Xingye Yang
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China; (X.Y.); (S.Y.); (H.W.); (W.F.); (Y.C.)
| | - Shicong Yu
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Rice Research Institute, Sichuan Agricultural University, Chengdu 611130, China;
| | - Shen Yan
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China; (X.Y.); (S.Y.); (H.W.); (W.F.); (Y.C.)
| | - Hao Wang
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China; (X.Y.); (S.Y.); (H.W.); (W.F.); (Y.C.)
| | - Wei Fang
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China; (X.Y.); (S.Y.); (H.W.); (W.F.); (Y.C.)
| | - Yanqing Chen
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China; (X.Y.); (S.Y.); (H.W.); (W.F.); (Y.C.)
| | - Xiaoding Ma
- State Key Laboratory of Crop Gene Resources and Breeding, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081, China; (X.Y.); (S.Y.); (H.W.); (W.F.); (Y.C.)
| | - Longzhi Han
- National Crop Genebank, Institute of Crop Science, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| |
Collapse
|
29
|
Shinohara I, Mifune Y, Inui A, Nishimoto H, Yoshikawa T, Kato T, Furukawa T, Tanaka S, Kusunose M, Hoshino Y, Matsushita T, Mitani M, Kuroda R. Re-tear after arthroscopic rotator cuff tear surgery: risk analysis using machine learning. J Shoulder Elbow Surg 2024; 33:815-822. [PMID: 37625694 DOI: 10.1016/j.jse.2023.07.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/12/2023] [Revised: 07/06/2023] [Accepted: 07/16/2023] [Indexed: 08/27/2023]
Abstract
BACKGROUND Postoperative rotator cuff retear after arthroscopic rotator cuff repair (ARCR) is still a major problem. Various risk factors such as age, gender, and tear size have been reported. Recently, magnetic resonance imaging-based stump classification was reported as an index of rotator cuff fragility. Although stump type 3 is reported to have a high retear rate, there are few reports on the risk of postoperative retear based on this classification. Machine learning (ML), an artificial intelligence technique, allows for more flexible predictive models than conventional statistical methods and has been applied to predict clinical outcomes. In this study, we used ML to predict postoperative retear risk after ARCR. METHODS The retrospective case-control study included 353 patients who underwent surgical treatment for complete rotator cuff tear using the suture-bridge technique. Patients who initially presented with retears and traumatic tears were excluded. In study participants, after the initial tear repair, rotator cuff retears were diagnosed by magnetic resonance imaging; Sugaya classification types IV and V were defined as re-tears. Age, gender, stump classification, tear size, Goutallier classification, presence of diabetes, and hyperlipidemia were used for ML parameters to predict the risk of retear. Using Python's Scikit-learn as an ML library, five different AI models (logistic regression, random forest, AdaBoost, CatBoost, LightGBM) were trained on the existing data, and the prediction models were applied to the test dataset. The performance of these ML models was measured by the area under the receiver operating characteristic curve. Additionally, key features affecting retear were evaluated. RESULTS The area under the receiver operating characteristic curve for logistic regression was 0.78, random forest 0.82, AdaBoost 0.78, CatBoost 0.83, and LightGBM 0.87, respectively for each model. LightGBM showed the highest score. The important factors for model prediction were age, stump classification, and tear size. CONCLUSIONS The ML classifier model predicted retears after ARCR with high accuracy, and the AI model showed that the most important characteristics affecting retears were age and imaging findings, including stump classification. This model may be able to predict postoperative rotator cuff retears based on clinical features.
Collapse
Affiliation(s)
- Issei Shinohara
- Department of Orthopaedic Surgery, Kobe University Graduate School of Medicine, Kobe, Hyogo, Japan
| | - Yutaka Mifune
- Department of Orthopaedic Surgery, Kobe University Graduate School of Medicine, Kobe, Hyogo, Japan.
| | - Atsuyuki Inui
- Department of Orthopaedic Surgery, Kobe University Graduate School of Medicine, Kobe, Hyogo, Japan
| | - Hanako Nishimoto
- Department of Orthopaedic Surgery, Kobe University Graduate School of Medicine, Kobe, Hyogo, Japan
| | - Tomoya Yoshikawa
- Department of Orthopaedic Surgery, Kobe University Graduate School of Medicine, Kobe, Hyogo, Japan
| | - Tatsuo Kato
- Department of Orthopaedic Surgery, Kobe University Graduate School of Medicine, Kobe, Hyogo, Japan
| | - Takahiro Furukawa
- Department of Orthopaedic Surgery, Kobe University Graduate School of Medicine, Kobe, Hyogo, Japan
| | - Shuya Tanaka
- Department of Orthopaedic Surgery, Kobe University Graduate School of Medicine, Kobe, Hyogo, Japan
| | - Masaya Kusunose
- Department of Orthopaedic Surgery, Himeji St Mary's Hospital, Himeji, Hyogo, Japan
| | - Yuichi Hoshino
- Department of Orthopaedic Surgery, Kobe University Graduate School of Medicine, Kobe, Hyogo, Japan
| | - Takehiko Matsushita
- Department of Orthopaedic Surgery, Kobe University Graduate School of Medicine, Kobe, Hyogo, Japan
| | - Makoto Mitani
- Department of Orthopaedic Surgery, Himeji St Mary's Hospital, Himeji, Hyogo, Japan
| | - Ryosuke Kuroda
- Department of Orthopaedic Surgery, Kobe University Graduate School of Medicine, Kobe, Hyogo, Japan
| |
Collapse
|
30
|
Guo Q, Xie F, Zhong F, Wen W, Zhang X, Yu X, Wang X, Huang B, Li L, Wang X. Application of interpretable machine learning algorithms to predict distant metastasis in ovarian clear cell carcinoma. Cancer Med 2024; 13:e7161. [PMID: 38613173 PMCID: PMC11015070 DOI: 10.1002/cam4.7161] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2023] [Revised: 03/16/2024] [Accepted: 03/26/2024] [Indexed: 04/14/2024] Open
Abstract
BACKGROUND Ovarian clear cell carcinoma (OCCC) represents a subtype of ovarian epithelial carcinoma (OEC) known for its limited responsiveness to chemotherapy, and the onset of distant metastasis significantly impacts patient prognoses. This study aimed to identify potential risk factors contributing to the occurrence of distant metastasis in OCCC. METHODS Utilizing the Surveillance, Epidemiology, and End Results (SEER) database, we identified patients diagnosed with OCCC between 2004 and 2015. The most influential factors were selected through the application of Gaussian Naive Bayes (GNB) and Adaboost machine learning algorithms, employing a Venn test for further refinement. Subsequently, six machine learning (ML) techniques, namely XGBoost, LightGBM, Random Forest (RF), Adaptive Boosting (Adaboost), Support Vector Machine (SVM), and Multilayer Perceptron (MLP), were employed to construct predictive models for distant metastasis. Shapley Additive Interpretation (SHAP) analysis facilitated a visual interpretation for individual patient. Model validity was assessed using accuracy, sensitivity, specificity, positive predictive value, negative predictive value, F1 score, and the area under the receiver operating characteristic curve (AUC). RESULTS In the realm of predicting distant metastasis, the Random Forest (RF) model outperformed the other five machine learning algorithms. The RF model demonstrated accuracy, sensitivity, specificity, positive predictive value, negative predictive value, F1 score, and AUC (95% CI) values of 0.792 (0.762-0.823), 0.904 (0.835-0.973), 0.759 (0.731-0.787), 0.221 (0.186-0.256), 0.974 (0.967-0.982), 0.353 (0.306-0.399), and 0.834 (0.696-0.967), respectively, surpassing the performance of other models. Additionally, the calibration curve's Brier Score (95%) for the RF model reached the minimum value of 0.06256 (0.05753-0.06759). SHAP analysis provided independent explanations, reaffirming the critical clinical factors associated with the risk of metastasis in OCCC patients. CONCLUSIONS This study successfully established a precise predictive model for OCCC patient metastasis using machine learning techniques, offering valuable support to clinicians in making informed clinical decisions.
Collapse
Affiliation(s)
- Qin‐Hua Guo
- Jiangxi Province Key Laboratory of Laboratory Medicine, Jiangxi Provincial Clinical Research Center for Laboratory Medicine, Department of Clinical Laboratory, The Second Affiliated HospitalJiangxi Medical College, Nanchang UniversityNanchangJiangxiChina
- Department of Clinical LaboratoryThe First Hospital of Nanchang (The Third Affiliated Hospital of Nanchang University)NanchangJiangxiChina
- School of Public HealthNanchang UniversityNanchangJiangxiChina
| | - Feng‐Chun Xie
- Department of Clinical LaboratoryNanchang Renai Obstetrics and Gynecology HospitalNanchangJiangxiChina
| | - Fang‐Min Zhong
- Jiangxi Province Key Laboratory of Laboratory Medicine, Jiangxi Provincial Clinical Research Center for Laboratory Medicine, Department of Clinical Laboratory, The Second Affiliated HospitalJiangxi Medical College, Nanchang UniversityNanchangJiangxiChina
| | - Wen Wen
- Jiangxi Province Key Laboratory of Laboratory Medicine, Jiangxi Provincial Clinical Research Center for Laboratory Medicine, Department of Clinical Laboratory, The Second Affiliated HospitalJiangxi Medical College, Nanchang UniversityNanchangJiangxiChina
- School of Public HealthNanchang UniversityNanchangJiangxiChina
| | - Xue‐Ru Zhang
- Jiangxi Province Key Laboratory of Laboratory Medicine, Jiangxi Provincial Clinical Research Center for Laboratory Medicine, Department of Clinical Laboratory, The Second Affiliated HospitalJiangxi Medical College, Nanchang UniversityNanchangJiangxiChina
- School of Public HealthNanchang UniversityNanchangJiangxiChina
| | - Xia‐Jing Yu
- Jiangxi Province Key Laboratory of Laboratory Medicine, Jiangxi Provincial Clinical Research Center for Laboratory Medicine, Department of Clinical Laboratory, The Second Affiliated HospitalJiangxi Medical College, Nanchang UniversityNanchangJiangxiChina
- School of Public HealthNanchang UniversityNanchangJiangxiChina
| | - Xin‐Lu Wang
- Jiangxi Province Key Laboratory of Laboratory Medicine, Jiangxi Provincial Clinical Research Center for Laboratory Medicine, Department of Clinical Laboratory, The Second Affiliated HospitalJiangxi Medical College, Nanchang UniversityNanchangJiangxiChina
- School of Public HealthNanchang UniversityNanchangJiangxiChina
| | - Bo Huang
- Jiangxi Province Key Laboratory of Laboratory Medicine, Jiangxi Provincial Clinical Research Center for Laboratory Medicine, Department of Clinical Laboratory, The Second Affiliated HospitalJiangxi Medical College, Nanchang UniversityNanchangJiangxiChina
| | - Li‐Ping Li
- Department of Clinical LaboratoryThe First Hospital of Nanchang (The Third Affiliated Hospital of Nanchang University)NanchangJiangxiChina
| | - Xiao‐Zhong Wang
- Jiangxi Province Key Laboratory of Laboratory Medicine, Jiangxi Provincial Clinical Research Center for Laboratory Medicine, Department of Clinical Laboratory, The Second Affiliated HospitalJiangxi Medical College, Nanchang UniversityNanchangJiangxiChina
- Department of Clinical LaboratoryThe First Hospital of Nanchang (The Third Affiliated Hospital of Nanchang University)NanchangJiangxiChina
- School of Public HealthNanchang UniversityNanchangJiangxiChina
| |
Collapse
|
31
|
Jiang Z, Liu L, Du L, Lv S, Liang F, Luo Y, Wang C, Shen Q. Machine learning for the early prediction of acute respiratory distress syndrome (ARDS) in patients with sepsis in the ICU based on clinical data. Heliyon 2024; 10:e28143. [PMID: 38533071 PMCID: PMC10963609 DOI: 10.1016/j.heliyon.2024.e28143] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 02/28/2024] [Accepted: 03/12/2024] [Indexed: 03/28/2024] Open
Abstract
Background Acute respiratory distress syndrome (ARDS) is a fatal outcome of severe sepsis. Machine learning models are helpful for accurately predicting ARDS in patients with sepsis at an early stage. Objective We aim to develop a machine-learning model for predicting ARDS in patients with sepsis in the intensive care unit (ICU). Methods The initial clinical data of patients with sepsis admitted to the hospital (including population characteristics, clinical diagnosis, complications, and laboratory tests) were used to predict ARDS, and screen out the crucial variables. After comparing eight different algorithms, namely, XG boost, logistic regression, light GBM, random forest, GaussianNB, complement NB, support vector machine (SVM), and K nearest neighbors (KNN), rebuilding a prediction model with the best one. When remodeling with the best algorithm, 10% was randomly selected to test, and the remaining was trained for cross-validation. Using the area under the curve (AUC), sensitivity, accuracy, specificity, positive and negative predictive value, F1 score, kappa value, and clinical decision curve to evaluate the model's performance. Eventually, the application in the model illustrated by the SHAP package. Results Ten critical features were screened utilizing the lasso method, namely, PaO2/PAO2, A-aDO2, PO2(T), CRP, gender, PO2, RDW, MCH, SG, and chlorine. The prior ranking of variables demonstrated that PaO2/PAO2 was the most significant variable. Among the eight algorithms, the performance of the Gaussian NB algorithm was significantly better than that of the others. After remodeling with the best algorithm, the AUC in the training and validation sets were 0.777 and 0.770, respectively, and the algorithm performed well in the test set (AUC = 0.781, accuracy = 78.6%, sensitivity = 82.4%, F1 score = 0.824). A comparison of the overlap factors with those of previous models revealed that the model we developed performs better. Conclusion Sepsis-associated ARDS can be accurately predicted early via a machine learning model based on existing clinical data. These findings are helpful for accurate identification and improvement of the prognosis in patients with sepsis-associated ARDS.
Collapse
Affiliation(s)
- Zhenzhen Jiang
- Department of Blood Transfusion, The Third Xiangya Hospital, Central South University, Changsha, China
| | - Leping Liu
- Department of Pediatrics, The Third Xiangya Hospital, Central South University, Changsha, China
| | - Lin Du
- Department of Blood Transfusion, The Third Xiangya Hospital, Central South University, Changsha, China
| | - Shanshan Lv
- Department of Blood Transfusion, The Third Xiangya Hospital, Central South University, Changsha, China
| | - Fang Liang
- Department of Hematology and Critical Care Medicine, The Third Xiangya Hospital, Central South University, Changsha, China
| | - Yanwei Luo
- Department of Blood Transfusion, The Third Xiangya Hospital, Central South University, Changsha, China
| | - Chunjiang Wang
- Department of Pharmacy, The Third Xiangya Hospital, Central South University, Changsha, China
| | - Qin Shen
- Department of Radiology, The Second Xiangya Hospital, Central South University, Changsha, China
| |
Collapse
|
32
|
Jiang C, Zhang B, Jiang W, Liu P, Kong Y, Zhang J, Teng W. Metal ion stimulation-related gene signatures correlate with clinical and immunologic characteristics of glioma. Heliyon 2024; 10:e27189. [PMID: 38533032 PMCID: PMC10963200 DOI: 10.1016/j.heliyon.2024.e27189] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Revised: 02/23/2024] [Accepted: 02/26/2024] [Indexed: 03/28/2024] Open
Abstract
Background Environmental factors serve as one of the important pathogenic factors for gliomas. Yet people focus only on the effect of electromagnetic radiation on its pathogenicity, while metals in the environment are neglected. This study aimed to investigate the relationship between metal ion stimulation and the clinical characteristics and immune status of GM patients. Methods Firstly, mRNA expression profiles of GM patients and normal subjects were obtained from Chinese GM Genome Atlas (CGGA) and Gene Expression Omnibus (GEO) to identify differentially expressed metal ion stimulation-related genes(DEMISGs). Secondly, two molecular subtypes were identified and validated based on these DEMISGs using consensus clustering. Diagnostic and prognostic models for GM were constructed after screening these features based on machine learning. Finally, supervised classification and unsupervised clustering were combined to classify and predict the grade of GM based on SHAP values. Results GM patients are divided into two different response states to metal ion stimulation, M1 and M2, which are related to the grade and IDH status of the GM. Six genes with diagnostic value were obtained: SLC30A3, CRHBP, SYT13, DLG2, CDK1, and WNT5A. The AUC in the external validation set was higher than 0.90. The SHAP value improves the performance of classification prediction. Conclusion The gene features associated with metal ion stimulation are related to the clinical and immune characteristics of transgenic patients. XGboost/LightGBM Kmeans has a higher classification prediction accuracy in predicting glioma grades compared to using purely supervised classification techniques.
Collapse
Affiliation(s)
- Chengzhi Jiang
- Shandong Second Medical University, Weifang, Shandong, 261053, People's Republic of China
| | - Binbin Zhang
- Qingdao Municipal Hospital (Group), Qingdao, Shandong, 266000, People's Republic of China
| | - Wenjuan Jiang
- Qingdao Municipal Hospital (Group), Qingdao, Shandong, 266000, People's Republic of China
| | - Pengtao Liu
- Shandong Second Medical University, Weifang, Shandong, 261053, People's Republic of China
| | - Yujia Kong
- Shandong Second Medical University, Weifang, Shandong, 261053, People's Republic of China
| | - Jianhua Zhang
- Jining Medical University, Jining, Shandong, 272067, People's Republic of China
| | - Wenjie Teng
- Shandong Second Medical University, Weifang, Shandong, 261053, People's Republic of China
| |
Collapse
|
33
|
Murmu S, Sinha D, Chaurasia H, Sharma S, Das R, Jha GK, Archak S. A review of artificial intelligence-assisted omics techniques in plant defense: current trends and future directions. FRONTIERS IN PLANT SCIENCE 2024; 15:1292054. [PMID: 38504888 PMCID: PMC10948452 DOI: 10.3389/fpls.2024.1292054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/10/2023] [Accepted: 01/24/2024] [Indexed: 03/21/2024]
Abstract
Plants intricately deploy defense systems to counter diverse biotic and abiotic stresses. Omics technologies, spanning genomics, transcriptomics, proteomics, and metabolomics, have revolutionized the exploration of plant defense mechanisms, unraveling molecular intricacies in response to various stressors. However, the complexity and scale of omics data necessitate sophisticated analytical tools for meaningful insights. This review delves into the application of artificial intelligence algorithms, particularly machine learning and deep learning, as promising approaches for deciphering complex omics data in plant defense research. The overview encompasses key omics techniques and addresses the challenges and limitations inherent in current AI-assisted omics approaches. Moreover, it contemplates potential future directions in this dynamic field. In summary, AI-assisted omics techniques present a robust toolkit, enabling a profound understanding of the molecular foundations of plant defense and paving the way for more effective crop protection strategies amidst climate change and emerging diseases.
Collapse
Affiliation(s)
- Sneha Murmu
- Indian Agricultural Statistics Research Institute, Indian Council of Agricultural Research (ICAR), New Delhi, India
| | - Dipro Sinha
- Indian Agricultural Statistics Research Institute, Indian Council of Agricultural Research (ICAR), New Delhi, India
| | - Himanshushekhar Chaurasia
- Central Institute for Research on Cotton Technology, Indian Council of Agricultural Research (ICAR), Mumbai, India
| | - Soumya Sharma
- Indian Agricultural Statistics Research Institute, Indian Council of Agricultural Research (ICAR), New Delhi, India
| | - Ritwika Das
- Indian Agricultural Statistics Research Institute, Indian Council of Agricultural Research (ICAR), New Delhi, India
| | - Girish Kumar Jha
- Indian Agricultural Statistics Research Institute, Indian Council of Agricultural Research (ICAR), New Delhi, India
| | - Sunil Archak
- National Bureau of Plant Genetic Resources, Indian Council of Agricultural Research (ICAR), New Delhi, India
| |
Collapse
|
34
|
Li W, Zhang Y, Zhou X, Quan X, Chen B, Hou X, Xu Q, He W, Chen L, Liu X, Zhang Y, Xiang T, Li R, Liu Q, Wu SN, Wang K, Liu W, Zheng J, Luan H, Yu X, Chen A, Xu C, Luo T, Hu Z. Ensemble learning-assisted prediction of prolonged hospital length of stay after spine correction surgery: a multi-center cohort study. J Orthop Surg Res 2024; 19:112. [PMID: 38308336 PMCID: PMC10838003 DOI: 10.1186/s13018-024-04576-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Accepted: 01/23/2024] [Indexed: 02/04/2024] Open
Abstract
PURPOSE This research aimed to develop a machine learning model to predict the potential risk of prolonged length of stay in hospital before operation, which can be used to strengthen patient management. METHODS Patients who underwent posterior spinal deformity surgery (PSDS) from eleven medical institutions in China between 2015 and 2022 were included. Detailed preoperative patient data, including demographics, medical history, comorbidities, preoperative laboratory results, and surgery details, were collected from their electronic medical records. The cohort was randomly divided into a training dataset and a validation dataset with a ratio of 70:30. Based on Boruta algorithm, nine different machine learning algorithms and a stack ensemble model were trained after hyperparameters tuning visualization and evaluated on the area under the receiver operating characteristic curve (AUROC), precision-recall curve, calibration, and decision curve analysis. Visualization of Shapley Additive exPlanations method finally contributed to explaining model prediction. RESULTS Of the 162 included patients, the K Nearest Neighbors algorithm performed the best in the validation group compared with other machine learning models (yielding an AUROC of 0.8191 and PRAUC of 0.6175). The top five contributing variables were the preoperative hemoglobin, height, body mass index, age, and preoperative white blood cells. A web-based calculator was further developed to improve the predictive model's clinical operability. CONCLUSIONS Our study established and validated a clinical predictive model for prolonged postoperative hospitalization duration in patients who underwent PSDS, which offered valuable prognostic information for preoperative planning and postoperative care for clinicians. Trial registration ClinicalTrials.gov identifier NCT05867732, retrospectively registered May 22, 2023, https://classic. CLINICALTRIALS gov/ct2/show/NCT05867732 .
Collapse
Affiliation(s)
- Wenle Li
- State Key Laboratory of Molecular Vaccinology and Molecular, Diagnostics and Center for Molecular Imaging and Translational Medicine, School of Public Health, Xiamen University, Xiamen, China.
- Key Laboratory of Neurological Diseases, The Second Affiliated Hospital of Xuzhou Medical University, Xuzhou, Jiangsu, China.
- Department of Spinal Surgery, Guangxi Medical University Affiliated Liuzhou People's Hospital, Liuzhou, China.
| | - Yusi Zhang
- Cancer Center, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China
- Precision Medicine Center, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China
- Department of Medical Oncology, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China
| | - Xin Zhou
- Third Hospital of Shanxi Medical University, Shanxi Bethune Hospital, Shanxi Academy of Medical Sciences, Tongji Shanxi Hospital, Taiyuan, 030032, China
| | - Xubin Quan
- Department of Spinal Surgery, Guangxi Medical University Affiliated Liuzhou People's Hospital, Liuzhou, China
| | - Binghao Chen
- Department of Spinal Surgery, Guangxi Medical University Affiliated Liuzhou People's Hospital, Liuzhou, China
| | - Xuewen Hou
- Department of Radiology, The First Dongguan Affiliated Hospital, Guangdong Medical University, Dongguan, China
| | - Qizhong Xu
- Department of Radiology, The First Affiliated Hospital of Shenzhen University, Shenzhen Second People's Hospital, Shenzhen, China
| | - Weiheng He
- Department of Radiology, People's Hospital of Ningxia Hui Autonomous Region, Yinchuan, China
| | - Liang Chen
- Department of Radiology, Hubei Provincial Hospital of Traditional Chinese Medicine, Wuhan, China
| | - Xiaozhu Liu
- Department of Critical Care Medicine, Beijing Shijitan Hospital, Capital Medical University, Beijing, China
- Department of Cardiology, The Second Affiliated Hospital of Chongqing Medical University, Chongqing, China
| | - Yang Zhang
- College of Medical Informatics, Chongqing Medical University, Chongqing, China
- Medical Data Science Academy, Chongqing Medical University, Chongqing, China
| | - Tianyu Xiang
- Information Center, The University-Town Hospital of Chongqing Medical University, Chongqing, China
| | - Runmin Li
- Department of Foot and Ankle Surgery, Honghui Hospital, Xi'an Jiaotong University, Xi'an, Shaanxi Province, China
| | - Qiang Liu
- Department of Orthopedics, Xianyang Central Hospital, Xianyang, Shannxi, China
| | - Shi-Nan Wu
- Eye Institute of Xiamen University, School of Medicine, Xiamen University, Xiamen, Fujian, China
| | - Kai Wang
- Key Laboratory of Neurological Diseases, The Second Affiliated Hospital of Xuzhou Medical University, Xuzhou, Jiangsu, China
| | - Wencai Liu
- Department of Orthopedics, Shanghai Sixth People's Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, 200233, China
| | - Jialiang Zheng
- Cancer Research Center, School of Medicine, Xiamen University, Xiamen, China
| | - Haopeng Luan
- Department of Spine Surgery, The Six Affiliated Hospital of Xinjiang Medical University, Urumqi, Xinjiang, China
| | - Xiaolin Yu
- Department of Orthopedics, Affiliated Hospital of Guizhou Medical University, Guiyang, Guizhou, China
| | - Anfa Chen
- Department of Orthopedics, Jiangxi Province Hospital of Integrated Chinese and Western Medicine, Nanchang, China
| | - Chan Xu
- State Key Laboratory of Molecular Vaccinology and Molecular, Diagnostics and Center for Molecular Imaging and Translational Medicine, School of Public Health, Xiamen University, Xiamen, China
| | - Tongqing Luo
- Department of Spinal Surgery, Guangxi Medical University Affiliated Liuzhou People's Hospital, Liuzhou, China.
| | - Zhaohui Hu
- Department of Spinal Surgery, Guangxi Medical University Affiliated Liuzhou People's Hospital, Liuzhou, China.
| |
Collapse
|
35
|
Chen J, Tan C, Zhu M, Zhang C, Wang Z, Ni X, Liu Y, Wei T, Wei X, Fang X, Xu Y, Huang X, Qiu J, Liu H. CropGS-Hub: a comprehensive database of genotype and phenotype resources for genomic prediction in major crops. Nucleic Acids Res 2024; 52:D1519-D1529. [PMID: 38000385 PMCID: PMC10767954 DOI: 10.1093/nar/gkad1062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Revised: 10/15/2023] [Accepted: 10/25/2023] [Indexed: 11/26/2023] Open
Abstract
The explosive amount of multi-omics data has brought a paradigm shift both in academic research and further application in life science. However, managing and reusing the growing resources of genomic and phenotype data points presents considerable challenges for the research community. There is an urgent need for an integrated database that combines genome-wide association studies (GWAS) with genomic selection (GS). Here, we present CropGS-Hub, a comprehensive database comprising genotype, phenotype, and GWAS signals, as well as a one-stop platform with built-in algorithms for genomic prediction and crossing design. This database encompasses a comprehensive collection of over 224 billion genotype data and 434 thousand phenotype data generated from >30 000 individuals in 14 representative populations belonging to 7 major crop species. Moreover, the platform implemented three complete functional genomic selection related modules including phenotype prediction, user model training and crossing design, as well as a fast SNP genotyper plugin-in called SNPGT specifically built for CropGS-Hub, aiming to assist crop scientists and breeders without necessitating coding skills. CropGS-Hub can be accessed at https://iagr.genomics.cn/CropGS/.
Collapse
Affiliation(s)
- Jiaxin Chen
- Shanghai Key Laboratory of Plant Molecular Sciences, Shanghai Collaborative Innovation Center of Plant Germplasm Resources, College of Life Sciences, Shanghai Normal University, Shanghai, China
| | - Cong Tan
- State Key Laboratory of Agricultural Genomics, Key Laboratory of Genomics, Ministry of Agriculture, BGI Research, Shenzhen 518083, China
- BGI Research, Wuhan 430074, China
| | - Min Zhu
- Shanghai Key Laboratory of Plant Molecular Sciences, Shanghai Collaborative Innovation Center of Plant Germplasm Resources, College of Life Sciences, Shanghai Normal University, Shanghai, China
| | - Chenyang Zhang
- State Key Laboratory of Agricultural Genomics, Key Laboratory of Genomics, Ministry of Agriculture, BGI Research, Shenzhen 518083, China
- BGI Bioverse, Shenzhen 518083, China
| | - Zhihan Wang
- Shanghai Key Laboratory of Plant Molecular Sciences, Shanghai Collaborative Innovation Center of Plant Germplasm Resources, College of Life Sciences, Shanghai Normal University, Shanghai, China
| | - Xuemei Ni
- State Key Laboratory of Agricultural Genomics, Key Laboratory of Genomics, Ministry of Agriculture, BGI Research, Shenzhen 518083, China
- BGI Bioverse, Shenzhen 518083, China
| | - Yanlin Liu
- Shanghai Key Laboratory of Plant Molecular Sciences, Shanghai Collaborative Innovation Center of Plant Germplasm Resources, College of Life Sciences, Shanghai Normal University, Shanghai, China
| | - Tong Wei
- State Key Laboratory of Agricultural Genomics, Key Laboratory of Genomics, Ministry of Agriculture, BGI Research, Shenzhen 518083, China
- BGI Research, Wuhan 430074, China
| | - XiaoFeng Wei
- China National GeneBank, BGI, Shenzhen 518120, China
| | - Xiaodong Fang
- State Key Laboratory of Agricultural Genomics, Key Laboratory of Genomics, Ministry of Agriculture, BGI Research, Shenzhen 518083, China
- BGI Research, Sanya 572025, China
| | - Yang Xu
- Agricultural College, Yangzhou University, Yangzhou 225009, China
| | - Xuehui Huang
- Shanghai Key Laboratory of Plant Molecular Sciences, Shanghai Collaborative Innovation Center of Plant Germplasm Resources, College of Life Sciences, Shanghai Normal University, Shanghai, China
| | - Jie Qiu
- Shanghai Key Laboratory of Plant Molecular Sciences, Shanghai Collaborative Innovation Center of Plant Germplasm Resources, College of Life Sciences, Shanghai Normal University, Shanghai, China
| | - Huan Liu
- State Key Laboratory of Agricultural Genomics, Key Laboratory of Genomics, Ministry of Agriculture, BGI Research, Shenzhen 518083, China
- BGI Bioverse, Shenzhen 518083, China
| |
Collapse
|
36
|
An F, Ge Y, Ye W, Ji L, Chen K, Wang Y, Zhang X, Dong S, Shen Y, Zhao J, Gao X, Junankar S, Chan RB, Christodoulou D, Wen W, Lu P, Zhan Q. Machine learning identifies a 5-serum cytokine panel for the early detection of chronic atrophy gastritis patients. Cancer Biomark 2024; 41:25-40. [PMID: 39269824 PMCID: PMC11495322 DOI: 10.3233/cbm-240023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2024] [Accepted: 07/19/2024] [Indexed: 09/15/2024]
Abstract
BACKGROUND Chronic atrophy gastritis (CAG) is a high-risk pre-cancerous lesion for gastric cancer (GC). The early and accurate detection and discrimination of CAG from benign forms of gastritis (e.g. chronic superficial gastritis, CSG) is critical for optimal management of GC. However, accurate non-invasive methods for the diagnosis of CAG are currently lacking. Cytokines cause inflammation and drive cancer transformation in GC, but their utility as a diagnostic for CAG is poorly characterized. METHODS Blood samples were collected, and 40 cytokines were quantified using a multiplexed immunoassay from 247 patients undergoing screening via endoscopy. Patients were divided into discovery and validation sets. Each cytokine importance was ranked using the feature selection algorithm Boruta. The cytokines with the highest feature importance were selected for machine learning (ML), using the LightGBM algorithm. RESULTS Five serum cytokines (IL-10, TNF-α, Eotaxin, IP-10 and SDF-1a) that could discriminate between CAG and CSG patients were identified and used for predictive model construction. This model was robust and could identify CAG patients with high performance (AUC = 0.88, Accuracy = 0.78). This compared favorably to the conventional approach using the PGI/PGII ratio (AUC = 0.59). CONCLUSION Using state-of-the-art ML and a blood-based immunoassay, we developed an improved non-invasive screening method for the detection of precancerous GC lesions. FUNDING Supported in part by grants from: Jiangsu Science and Technology Project (no. BK20211039); Top Talent Support Program for young and middle-aged people of Wuxi Health Committee (BJ2023008); Medical Key Discipline Program of Wuxi Health Commission (ZDXK2021010), Wuxi Science and Technology Bureau Project (no. N20201004); Scientific Research Program of Wuxi Health Commission (Z202208, J202104).
Collapse
Affiliation(s)
- Fangmei An
- Department of Gastroenterology, Affiliated Wuxi People’s Hospital of Nanjing Medical University, Wuxi People’s Hospital, Wuxi Medical Center, Nanjing Medical University, National Clinical Research Center for Digestive Diseases (Xi ’an) Jiangsu Branch Wuxi, Jiangsu, China
| | - Yan Ge
- Department of Gastroenterology, Affiliated Wuxi People’s Hospital of Nanjing Medical University, Wuxi People’s Hospital, Wuxi Medical Center, Nanjing Medical University, National Clinical Research Center for Digestive Diseases (Xi ’an) Jiangsu Branch Wuxi, Jiangsu, China
- AliveX Biotech, Shanghai, China
| | - Wei Ye
- AliveX Biotech, Shanghai, China
| | - Lin Ji
- Department of Gastroenterology, Affiliated Wuxi People’s Hospital of Nanjing Medical University, Wuxi People’s Hospital, Wuxi Medical Center, Nanjing Medical University, National Clinical Research Center for Digestive Diseases (Xi ’an) Jiangsu Branch Wuxi, Jiangsu, China
| | - Ke Chen
- Department of Gastroenterology, Affiliated Wuxi People’s Hospital of Nanjing Medical University, Wuxi People’s Hospital, Wuxi Medical Center, Nanjing Medical University, National Clinical Research Center for Digestive Diseases (Xi ’an) Jiangsu Branch Wuxi, Jiangsu, China
| | - Yunfei Wang
- Department of Gastroenterology, Affiliated Wuxi People’s Hospital of Nanjing Medical University, Wuxi People’s Hospital, Wuxi Medical Center, Nanjing Medical University, National Clinical Research Center for Digestive Diseases (Xi ’an) Jiangsu Branch Wuxi, Jiangsu, China
| | - Xiaoxue Zhang
- Department of Gastroenterology, Affiliated Wuxi People’s Hospital of Nanjing Medical University, Wuxi People’s Hospital, Wuxi Medical Center, Nanjing Medical University, National Clinical Research Center for Digestive Diseases (Xi ’an) Jiangsu Branch Wuxi, Jiangsu, China
| | - Shengrong Dong
- Department of Gastroenterology, Affiliated Wuxi People’s Hospital of Nanjing Medical University, Wuxi People’s Hospital, Wuxi Medical Center, Nanjing Medical University, National Clinical Research Center for Digestive Diseases (Xi ’an) Jiangsu Branch Wuxi, Jiangsu, China
| | - Yao Shen
- Department of Gastroenterology, Affiliated Wuxi People’s Hospital of Nanjing Medical University, Wuxi People’s Hospital, Wuxi Medical Center, Nanjing Medical University, National Clinical Research Center for Digestive Diseases (Xi ’an) Jiangsu Branch Wuxi, Jiangsu, China
| | - Jiamin Zhao
- Department of Gastroenterology, Affiliated Wuxi People’s Hospital of Nanjing Medical University, Wuxi People’s Hospital, Wuxi Medical Center, Nanjing Medical University, National Clinical Research Center for Digestive Diseases (Xi ’an) Jiangsu Branch Wuxi, Jiangsu, China
| | - Xiaojuan Gao
- Department of Gastroenterology, Affiliated Wuxi People’s Hospital of Nanjing Medical University, Wuxi People’s Hospital, Wuxi Medical Center, Nanjing Medical University, National Clinical Research Center for Digestive Diseases (Xi ’an) Jiangsu Branch Wuxi, Jiangsu, China
| | | | | | | | - Wen Wen
- AliveX Biotech, Shanghai, China
| | - Peihua Lu
- Department of Medical Oncology, Affiliated Wuxi People’s Hospital of Nanjing Medical University, Wuxi People’s Hospital, Wuxi Medical Center, Nanjing Medical University, Wuxi, Jiangsu, China
| | - Qiang Zhan
- Department of Gastroenterology, Affiliated Wuxi People’s Hospital of Nanjing Medical University, Wuxi People’s Hospital, Wuxi Medical Center, Nanjing Medical University, National Clinical Research Center for Digestive Diseases (Xi ’an) Jiangsu Branch Wuxi, Jiangsu, China
| |
Collapse
|
37
|
Huang Y, Qi Z, Li J, You J, Zhang X, Wang M. Genetic interrogation of phenotypic plasticity informs genome-enabled breeding in cotton. J Genet Genomics 2023; 50:971-982. [PMID: 37211312 DOI: 10.1016/j.jgg.2023.05.004] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2023] [Revised: 04/19/2023] [Accepted: 05/04/2023] [Indexed: 05/23/2023]
Abstract
Phenotypic plasticity, or the ability to adapt to and thrive in changing climates and variable environments, is essential for developmental programs in plants. Despite its importance, the genetic underpinnings of phenotypic plasticity for key agronomic traits remain poorly understood in many crops. In this study, we aim to fill this gap by using genome-wide association studies to identify genetic variations associated with phenotypic plasticity in upland cotton (Gossypium hirsutum L.). We identified 73 additive quantitative trait loci (QTLs), 32 dominant QTLs, and 6799 epistatic QTLs associated with 20 traits. We also identified 117 additive QTLs, 28 dominant QTLs, and 4691 epistatic QTLs associated with phenotypic plasticity in 19 traits. Our findings reveal new genetic factors, including additive, dominant, and epistatic QTLs, that are linked to phenotypic plasticity and agronomic traits. Meanwhile, we find that the genetic factors controlling the mean phenotype and phenotypic plasticity are largely independent in upland cotton, indicating the potential for simultaneous improvement. Additionally, we envision a genomic design strategy by utilizing the identified QTLs to facilitate cotton breeding. Taken together, our study provides new insights into the genetic basis of phenotypic plasticity in cotton, which should be valuable for future breeding.
Collapse
Affiliation(s)
- Yuefan Huang
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan, Hubei 430070, China
| | - Zhengyang Qi
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan, Hubei 430070, China
| | - Jianying Li
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan, Hubei 430070, China
| | - Jiaqi You
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan, Hubei 430070, China
| | - Xianlong Zhang
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan, Hubei 430070, China.
| | - Maojun Wang
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan, Hubei 430070, China.
| |
Collapse
|
38
|
Wu C, Zhang Y, Ying Z, Li L, Wang J, Yu H, Zhang M, Feng X, Wei X, Xu X. A transformer-based genomic prediction method fused with knowledge-guided module. Brief Bioinform 2023; 25:bbad438. [PMID: 38058185 PMCID: PMC10701102 DOI: 10.1093/bib/bbad438] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Revised: 10/15/2023] [Accepted: 11/03/2023] [Indexed: 12/08/2023] Open
Abstract
Genomic prediction (GP) uses single nucleotide polymorphisms (SNPs) to establish associations between markers and phenotypes. Selection of early individuals by genomic estimated breeding value shortens the generation interval and speeds up the breeding process. Recently, methods based on deep learning (DL) have gained great attention in the field of GP. In this study, we explore the application of Transformer-based structures to GP and develop a novel deep-learning model named GPformer. GPformer obtains a global view by gleaning beneficial information from all relevant SNPs regardless of the physical distance between SNPs. Comprehensive experimental results on five different crop datasets show that GPformer outperforms ridge regression-based linear unbiased prediction (RR-BLUP), support vector regression (SVR), light gradient boosting machine (LightGBM) and deep neural network genomic prediction (DNNGP) in terms of mean absolute error, Pearson's correlation coefficient and the proposed metric consistent index. Furthermore, we introduce a knowledge-guided module (KGM) to extract genome-wide association studies-based information, which is fused into GPformer as prior knowledge. KGM is very flexible and can be plugged into any DL network. Ablation studies of KGM on three datasets illustrate the efficiency of KGM adequately. Moreover, GPformer is robust and stable to hyperparameters and can generalize to each phenotype of every dataset, which is suitable for practical application scenarios.
Collapse
Affiliation(s)
- Cuiling Wu
- Institute of Intelligent Computing, Zhejiang Lab, Hangzhou 311121, China
| | - Yiyi Zhang
- Institute of Intelligent Computing, Zhejiang Lab, Hangzhou 311121, China
| | - Zhiwen Ying
- Institute of Intelligent Computing, Zhejiang Lab, Hangzhou 311121, China
| | - Ling Li
- Institute of Intelligent Computing, Zhejiang Lab, Hangzhou 311121, China
| | - Jun Wang
- Institute of Intelligent Computing, Zhejiang Lab, Hangzhou 311121, China
| | - Hui Yu
- Northeast Institute of Geography and Agroecology, Chinese Academy of Sciences, Changchun 130012, China
| | - Mengchen Zhang
- State Key Laboratory of Rice Biology, China National Rice Research Institute, Hangzhou 310006, China
| | - Xianzhong Feng
- Institute of Intelligent Computing, Zhejiang Lab, Hangzhou 311121, China
- Northeast Institute of Geography and Agroecology, Chinese Academy of Sciences, Changchun 130012, China
| | - Xinghua Wei
- Institute of Intelligent Computing, Zhejiang Lab, Hangzhou 311121, China
- State Key Laboratory of Rice Biology, China National Rice Research Institute, Hangzhou 310006, China
| | - Xiaogang Xu
- School of Computer and Information Engineering, Zhejiang Gongshang University, Hangzhou 310018, China
| |
Collapse
|
39
|
Wang H, Lin YN, Yan S, Hong JP, Tan JR, Chen YQ, Cao YS, Fang W. NRTPredictor: identifying rice root cell state in single-cell RNA-seq via ensemble learning. PLANT METHODS 2023; 19:119. [PMID: 37925413 PMCID: PMC10625708 DOI: 10.1186/s13007-023-01092-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Accepted: 10/15/2023] [Indexed: 11/06/2023]
Abstract
BACKGROUND Single-cell RNA sequencing (scRNA-seq) measurements of gene expression show great promise for studying the cellular heterogeneity of rice roots. How precisely annotating cell identity is a major unresolved problem in plant scRNA-seq analysis due to the inherent high dimensionality and sparsity. RESULTS To address this challenge, we present NRTPredictor, an ensemble-learning system, to predict rice root cell stage and mine biomarkers through complete model interpretability. The performance of NRTPredictor was evaluated using a test dataset, with 98.01% accuracy and 95.45% recall. With the power of interpretability provided by NRTPredictor, our model recognizes 110 marker genes partially involved in phenylpropanoid biosynthesis. Expression patterns of rice root could be mapped by the above-mentioned candidate genes, showing the superiority of NRTPredictor. Integrated analysis of scRNA and bulk RNA-seq data revealed aberrant expression of Epidermis cell subpopulations in flooding, Pi, and salt stresses. CONCLUSION Taken together, our results demonstrate that NRTPredictor is a useful tool for automated prediction of rice root cell stage and provides a valuable resource for deciphering the rice root cellular heterogeneity and the molecular mechanisms of flooding, Pi, and salt stresses. Based on the proposed model, a free webserver has been established, which is available at https://www.cgris.net/nrtp .
Collapse
Affiliation(s)
- Hao Wang
- The Innovation Team of Crop Germplasm Resources Preservation and Information, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
| | - Yu-Nan Lin
- The Innovation Team of Crop Germplasm Resources Preservation and Information, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
| | - Shen Yan
- The Innovation Team of Crop Germplasm Resources Preservation and Information, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
| | - Jing-Peng Hong
- The Innovation Team of Crop Germplasm Resources Preservation and Information, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
| | - Jia-Rui Tan
- The Innovation Team of Crop Germplasm Resources Preservation and Information, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100081, China
| | - Yan-Qing Chen
- The Innovation Team of Crop Germplasm Resources Preservation and Information, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100081, China.
| | - Yong-Sheng Cao
- The Innovation Team of Crop Germplasm Resources Preservation and Information, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100081, China.
| | - Wei Fang
- The Innovation Team of Crop Germplasm Resources Preservation and Information, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100081, China.
| |
Collapse
|
40
|
Akutsu H, Na’iem M, Widiyatno, Indrioko S, Sawitri, Purnomo S, Uchiyama K, Tsumura Y, Tani N. Comparing modeling methods of genomic prediction for growth traits of a tropical timber species, Shorea macrophylla. FRONTIERS IN PLANT SCIENCE 2023; 14:1241908. [PMID: 38023878 PMCID: PMC10644202 DOI: 10.3389/fpls.2023.1241908] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/17/2023] [Accepted: 09/13/2023] [Indexed: 12/01/2023]
Abstract
Introduction Shorea macrophylla is a commercially important tropical tree species grown for timber and oil. It is amenable to plantation forestry due to its fast initial growth. Genomic selection (GS) has been used in tree breeding studies to shorten long breeding cycles but has not previously been applied to S. macrophylla. Methods To build genomic prediction models for GS, leaves and growth trait data were collected from a half-sib progeny population of S. macrophylla in Sari Bumi Kusuma forest concession, central Kalimantan, Indonesia. 18037 SNP markers were identified in two ddRAD-seq libraries. Genomic prediction models based on these SNPs were then generated for diameter at breast height and total height in the 7th year from planting (D7 and H7). Results and discussion These traits were chosen because of their relatively high narrow-sense genomic heritability and because seven years was considered long enough to assess initial growth. Genomic prediction models were built using 6 methods and their derivatives with the full set of identified SNPs and subsets of 48, 96, and 192 SNPs selected based on the results of a genome-wide association study (GWAS). The GBLUP and RKHS methods gave the highest predictive ability for D7 and H7 with the sets of selected SNPs and showed that D7 has an additive genetic architecture while H7 has an epistatic genetic architecture. LightGBM and CNN1D also achieved high predictive abilities for D7 with 48 and 96 selected SNPs, and for H7 with 96 and 192 selected SNPs, showing that gradient boosting decision trees and deep learning can be useful in genomic prediction. Predictive abilities were higher in H7 when smaller number of SNP subsets selected by GWAS p-value was used, However, D7 showed the contrary tendency, which might have originated from the difference in genetic architecture between primary and secondary growth of the species. This study suggests that GS with GWAS-based SNP selection can be used in breeding for non-cultivated tree species to improve initial growth and reduce genotyping costs for next-generation seedlings.
Collapse
Affiliation(s)
- Haruto Akutsu
- Graduate School of Science and Technology, University of Tsukuba, Tsukuba, Ibaraki, Japan
| | - Mohammad Na’iem
- Faculty of Forestry, Gadjah Mada University, Yogyakarta, Indonesia
| | - Widiyatno
- Faculty of Forestry, Gadjah Mada University, Yogyakarta, Indonesia
| | - Sapto Indrioko
- Faculty of Forestry, Gadjah Mada University, Yogyakarta, Indonesia
| | - Sawitri
- Faculty of Forestry, Gadjah Mada University, Yogyakarta, Indonesia
| | - Susilo Purnomo
- PT. Sari Bumi Kusuma, Pontianak, West Kalimantan, Indonesia
| | - Kentaro Uchiyama
- Department of Forest Molecular Genetics and Biotechnology, Forestry and Forest Products Research Institute, Tsukuba, Ibaraki, Japan
| | - Yoshihiko Tsumura
- Faculty of Life and Environmental Sciences, University of Tsukuba, Tsukuba, Ibaraki, Japan
| | - Naoki Tani
- Faculty of Life and Environmental Sciences, University of Tsukuba, Tsukuba, Ibaraki, Japan
- Forestry Division, Japan International Research Center for Agricultural Sciences, Tsukuba, Ibaraki, Japan
| |
Collapse
|
41
|
Fang K, Zheng X, Lin X, Dai Z. Unveiling Osteoporosis Through Radiomics Analysis of Hip CT Imaging. Acad Radiol 2023; 31:S1076-6332(23)00544-5. [PMID: 39492007 DOI: 10.1016/j.acra.2023.10.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2023] [Revised: 10/02/2023] [Accepted: 10/03/2023] [Indexed: 11/05/2024]
Abstract
RATIONALE AND OBJECTIVES This study aims to investigate the use of radiomics analysis of hip CT imaging to unveil osteoporosis. MATERIALS AND METHODS The researchers analyzed hip CT scans from a cohort of patients, including both osteoporotic and healthy individuals. Radiomics technique are employed to extract a comprehensive array of features from these images, encompassing texture, shape, and intensity alterations. Radiomics analysis using the 10 most commonly used machine learning models was employed to identify screened radiomics features for the detection of osteoporosis in patients. In addition to radiomics features, the basic information of patients is also utilized as training data for these machine learning models to accurately identify the presence of osteoporosis. A comparison would be made between the efficiency of recognizing radiomics features and the efficiency of recognizing patient basic information. The machine learning model that achieves the highest performance would be chosen to integrate patient basic information and radiomics features for the development of clinical nomograms. RESULT After a thorough screening process, 16 radiomics features were selected as input parameters for the machine learning model. In the test group, the highest accuracy achieved using radiomics features was 0.849, with an area under the curve (AUC) of 0.919. Evaluation of clinical features identified age and gender as closely associated with osteoporosis. Among these features, the KNN model exhibited the highest accuracy of 0.731 and an AUC of 0.658 in the test group. Comparing the performance of radiomics and clinical features, radiomics features demonstrated superior AUC values in the machine learning models. Ultimately, the XGBoost model, utilizing both radiomics and clinical features, was selected as the final Nomogram prediction model. In the test group, this model achieved an accuracy of 0.882 and an AUC of 0.886 in screening for osteoporosis. CONCLUSION Radiomics features derived from hip CT scans exhibit strong screening capabilities for osteoporosis. Furthermore, when combined with easily obtainable clinical features like patient age and gender, an effective screening efficacy for osteoporosis can be achieved.
Collapse
Affiliation(s)
- Kaibin Fang
- Department of Orthopaedic Surgery, The Second Affiliated Hospital of Fujian Medical University, No. 34, Zhongshanbeilu, Quanzhou, 362000, China (K.F., X.L., Z.D.)
| | - Xiaoling Zheng
- Liming Vocational University, Quanzhou, 362000, China (X.Z.)
| | - Xiaocong Lin
- Department of Orthopaedic Surgery, The Second Affiliated Hospital of Fujian Medical University, No. 34, Zhongshanbeilu, Quanzhou, 362000, China (K.F., X.L., Z.D.)
| | - Zhangsheng Dai
- Department of Orthopaedic Surgery, The Second Affiliated Hospital of Fujian Medical University, No. 34, Zhongshanbeilu, Quanzhou, 362000, China (K.F., X.L., Z.D.).
| |
Collapse
|
42
|
Liang J, Bian M, Chen H, Yan K, Li Z, Qin Y, Wang D, Zhu C, Huang W, Yi L, Sun J, Mao Y, Hao Z. Gradient boosting DD-MLP Net: An ensemble learning model using near-infrared spectroscopy to classify after-stroke dyskinesia degree during exercise. JOURNAL OF BIOPHOTONICS 2023; 16:e202300029. [PMID: 37280169 DOI: 10.1002/jbio.202300029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/27/2023] [Revised: 04/25/2023] [Accepted: 06/02/2023] [Indexed: 06/08/2023]
Abstract
This study aims to develop an automatic assessment of after-stroke dyskinesias degree by combining machine learning and near-infrared spectroscopy (NIRS). Thirty-five subjects were divided into five stages (healthy, patient: Brunnstrom stages 3, 4, 5, 6). NIRS was used to record the muscular hemodynamic responses from bilateral femoris (biceps brachii) muscles during passive and active upper (lower) limbs circular exercise. We used the D-S evidence theory to conduct feature information fusion and established a Gradient Boosting DD-MLP Net model, combining the dendrite network and multilayer perceptron, to realize automatic dyskinesias degree evaluation. Our model classified the upper limb dyskinesias with high accuracy: 98.91% under the passive mode and 98.69% under the active mode, and classified the lower limb dyskinesias with high accuracy: 99.45% and 99.63% under the passive and active modes, respectively. Our model combined with NIRS has great potential in monitoring the after-stroke dyskinesias degree and guiding rehabilitation training.
Collapse
Affiliation(s)
- Jianbin Liang
- School of Mechatronic Engineering and Automation, Foshan University, Foshan, China
| | - Minjie Bian
- Department of Rehabilitation Medicine, The Seventh Affiliated Hospital, Sun Yat-sen University, Shenzhen, China
| | - Hucheng Chen
- School of Mechatronic Engineering and Automation, Foshan University, Foshan, China
| | - Kecheng Yan
- School of Mechatronic Engineering and Automation, Foshan University, Foshan, China
| | - Zhihao Li
- School of Medicine, Foshan University, Foshan, China
| | - Yanmei Qin
- School of Medicine, Foshan University, Foshan, China
| | - Dongyang Wang
- School of Mechatronic Engineering and Automation, Foshan University, Foshan, China
| | - Chunjie Zhu
- School of Mechatronic Engineering and Automation, Foshan University, Foshan, China
| | - Wenzhu Huang
- The Fifth Affiliated Hospital of Foshan, Foshan University, Foshan, China
| | - Li Yi
- School of Mechatronic Engineering and Automation, Foshan University, Foshan, China
| | - Jinyan Sun
- School of Medicine, Foshan University, Foshan, China
| | - Yurong Mao
- Department of Rehabilitation Medicine, The Seventh Affiliated Hospital, Sun Yat-sen University, Shenzhen, China
| | - Zhifeng Hao
- College of Science, Shantou University, Shantou, China
| |
Collapse
|
43
|
Liu XW, Shi TY, Gao D, Ma CY, Lin H, Yan D, Deng KJ. iPADD: A Computational Tool for Predicting Potential Antidiabetic Drugs Using Machine Learning Algorithms. J Chem Inf Model 2023; 63:4960-4969. [PMID: 37499224 DOI: 10.1021/acs.jcim.3c00564] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
Diabetes mellitus is a chronic metabolic disease, which causes an imbalance in blood glucose homeostasis and further leads to severe complications. With the increasing population of diabetes, there is an urgent need to develop drugs to treat diabetes. The development of artificial intelligence provides a powerful tool for accelerating the discovery of antidiabetic drugs. This work aims to establish a predictor called iPADD for discovering potential antidiabetic drugs. In the predictor, we used four kinds of molecular fingerprints and their combinations to encode the drugs and then adopted minimum-redundancy-maximum-relevance (mRMR) combined with an incremental feature selection strategy to screen optimal features. Based on the optimal feature subset, eight machine learning algorithms were applied to train models by using 5-fold cross-validation. The best model could produce an accuracy (Acc) of 0.983 with the area under the receiver operating characteristic curve (auROC) value of 0.989 on an independent test set. To further validate the performance of iPADD, we selected 65 natural products for case analysis, including 13 natural products in clinical trials as positive samples and 52 natural products as negative samples. Except for abscisic acid, our model can give correct prediction results. Molecular docking illustrated that quercetin and resveratrol stably bound with the diabetes target NR1I2. These results are consistent with the model prediction results of iPADD, indicating that the machine learning model has a strong generalization ability. The source code of iPADD is available at https://github.com/llllxw/iPADD.
Collapse
Affiliation(s)
- Xiao-Wei Liu
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Tian-Yu Shi
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Dong Gao
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Cai-Yi Ma
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Hao Lin
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Dan Yan
- Beijing Friendship Hospital, Capital Medical University, Beijing 100050, China
- Beijing Institute of Clinical Pharmacy, Beijing 100050, China
| | - Ke-Jun Deng
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| |
Collapse
|
44
|
Li L, Pu C, Jin N, Zhu L, Hu Y, Cascone P, Tao Y, Zhang H. Prediction of 5-year overall survival of tongue cancer based machine learning. BMC Oral Health 2023; 23:567. [PMID: 37574562 PMCID: PMC10423415 DOI: 10.1186/s12903-023-03255-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2023] [Accepted: 07/27/2023] [Indexed: 08/15/2023] Open
Abstract
OBJECTIVE We aimed to develop a 5-year overall survival prediction model for patients with oral tongue squamous cell carcinoma based on machine learning methods. SUBJECTS AND METHODS The data were obtained from electronic medical records of 224 OTSCC patients at the PLA General Hospital. A five-year overall survival prediction model was constructed using logistic regression, Support Vector Machines, Decision Tree, Random Forest, Extreme Gradient Boosting, and Light Gradient Boosting Machine. Model performance was evaluated according to the area under the curve (AUC) of the receiver operating characteristic curve. The output of the optimal model was explained using the Python package (SHapley Additive exPlanations, SHAP). RESULTS After passing through the grid search and secondary modeling, the Light Gradient Boosting Machine was the best prediction model (AUC = 0.860). As explained by SHapley Additive exPlanations, N-stage, age, systemic inflammation response index, positive lymph nodes, plasma fibrinogen, lymphocyte-to-monocyte ratio, neutrophil percentage, and T-stage could perform a 5-year overall survival prediction for OTSCC. The 5-year survival rate was 42%. CONCLUSION The Light Gradient Boosting Machine prediction model predicted 5-year overall survival in OTSCC patients, and this predictive tool has potential prognostic implications for patients with OTSCC.
Collapse
Affiliation(s)
- Liangbo Li
- Medical School of Chinese PLA, Beijing, China
- Department of Stomatology, Chinese PLA General Hospital, 28 Fuxing Road, Haidian District, Beijing, 100853, China
| | - Cheng Pu
- Key Laboratory of Animal Disease and Human Health of Sichuan Province, Chengdu, China
- College of Veterinary Medicine, Sichuan Agricultural University, Sichuan, China
| | - Nenghao Jin
- Medical School of Chinese PLA, Beijing, China
- Department of Stomatology, Chinese PLA General Hospital, 28 Fuxing Road, Haidian District, Beijing, 100853, China
| | - Liang Zhu
- Medical School of Chinese PLA, Beijing, China
- Department of Stomatology, Chinese PLA General Hospital, 28 Fuxing Road, Haidian District, Beijing, 100853, China
| | - Yanchun Hu
- Key Laboratory of Animal Disease and Human Health of Sichuan Province, Chengdu, China
- College of Veterinary Medicine, Sichuan Agricultural University, Sichuan, China
| | - Piero Cascone
- Unicamillus International Meical University, Rome, Italy
| | - Ye Tao
- Department of Stomatology, Chinese PLA General Hospital, 28 Fuxing Road, Haidian District, Beijing, 100853, China.
| | - Haizhong Zhang
- Department of Stomatology, Chinese PLA General Hospital, 28 Fuxing Road, Haidian District, Beijing, 100853, China.
| |
Collapse
|
45
|
Wang Q, Jiang S, Li T, Qiu Z, Yan J, Fu R, Ma C, Wang X, Jiang S, Cheng Q. G2P Provides an Integrative Environment for Multi-model genomic selection analysis to improve genotype-to-phenotype prediction. FRONTIERS IN PLANT SCIENCE 2023; 14:1207139. [PMID: 37600179 PMCID: PMC10437076 DOI: 10.3389/fpls.2023.1207139] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Accepted: 07/21/2023] [Indexed: 08/22/2023]
Abstract
Genotype-to-phenotype (G2P) prediction has become a mainstream paradigm to facilitate genomic selection (GS)-assisted breeding in the seed industry. Many methods have been introduced for building GS models, but their prediction precision may vary depending on species and specific traits. Therefore, evaluation of multiple models and selection of the appropriate one is crucial to effective GS analysis. Here, we present the G2P container developed for the Singularity platform, which not only contains a library of 16 state-of-the-art GS models and 13 evaluation metrics. G2P works as an integrative environment offering comprehensive, unbiased evaluation analyses of the 16 GS models, which may be run in parallel on high-performance computing clusters. Based on the evaluation outcome, G2P performs auto-ensemble algorithms that not only can automatically select the most precise models but also can integrate prediction results from multiple models. This functionality should further improve the precision of G2P prediction. Another noteworthy function is the refinement design of the training set, in which G2P optimizes the training set based on the genetic diversity analysis of a studied population. Although the training samples in the optimized set are fewer than in the original set, the prediction precision is almost equivalent to that obtained when using the whole set. This functionality is quite useful in practice, as it reduces the cost of phenotyping when constructing training population. The G2P container and source codes are freely accessible at https://g2p-env.github.io/.
Collapse
Affiliation(s)
- Qian Wang
- Frontiers Science Center for Molecular Design Breeding, China Agricultural University, Beijing, China
- National Maize Improvement Center of China, College of Agriculture and Biotechnology, China Agricultural University, Beijing, China
| | - Shan Jiang
- Frontiers Science Center for Molecular Design Breeding, China Agricultural University, Beijing, China
- National Maize Improvement Center of China, College of Agriculture and Biotechnology, China Agricultural University, Beijing, China
| | - Tong Li
- Frontiers Science Center for Molecular Design Breeding, China Agricultural University, Beijing, China
- National Maize Improvement Center of China, College of Agriculture and Biotechnology, China Agricultural University, Beijing, China
| | - Zhixu Qiu
- Key Laboratory of Biology and Genetics Improvement of Maize in Arid Area of Northwest Region, Ministry of Agriculture, Northwest A&F University, Yangling, Shaanxi, China
- State Key Laboratory of Crop Stress Biology for Arid Areas, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Shaanxi, Yangling, China
| | - Jun Yan
- Frontiers Science Center for Molecular Design Breeding, China Agricultural University, Beijing, China
- National Maize Improvement Center of China, College of Agriculture and Biotechnology, China Agricultural University, Beijing, China
| | - Ran Fu
- Frontiers Science Center for Molecular Design Breeding, China Agricultural University, Beijing, China
- National Maize Improvement Center of China, College of Agriculture and Biotechnology, China Agricultural University, Beijing, China
| | - Chuang Ma
- Key Laboratory of Biology and Genetics Improvement of Maize in Arid Area of Northwest Region, Ministry of Agriculture, Northwest A&F University, Yangling, Shaanxi, China
- State Key Laboratory of Crop Stress Biology for Arid Areas, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Shaanxi, Yangling, China
| | - Xiangfeng Wang
- Frontiers Science Center for Molecular Design Breeding, China Agricultural University, Beijing, China
- National Maize Improvement Center of China, College of Agriculture and Biotechnology, China Agricultural University, Beijing, China
| | - Shuqin Jiang
- Frontiers Science Center for Molecular Design Breeding, China Agricultural University, Beijing, China
- National Maize Improvement Center of China, College of Agriculture and Biotechnology, China Agricultural University, Beijing, China
| | - Qian Cheng
- Frontiers Science Center for Molecular Design Breeding, China Agricultural University, Beijing, China
- National Maize Improvement Center of China, College of Agriculture and Biotechnology, China Agricultural University, Beijing, China
| |
Collapse
|
46
|
Heilmann PG, Frisch M, Abbadi A, Kox T, Herzog E. Stacked ensembles on basis of parentage information can predict hybrid performance with an accuracy comparable to marker-based GBLUP. FRONTIERS IN PLANT SCIENCE 2023; 14:1178902. [PMID: 37546247 PMCID: PMC10401275 DOI: 10.3389/fpls.2023.1178902] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/03/2023] [Accepted: 06/26/2023] [Indexed: 08/08/2023]
Abstract
Testcross factorials in newly established hybrid breeding programs are often highly unbalanced, incomplete, and characterized by predominance of special combining ability (SCA) over general combining ability (GCA). This results in a low efficiency of GCA-based selection. Machine learning algorithms might improve prediction of hybrid performance in such testcross factorials, as they have been successfully applied to find complex underlying patterns in sparse data. Our objective was to compare the prediction accuracy of machine learning algorithms to that of GCA-based prediction and genomic best linear unbiased prediction (GBLUP) in six unbalanced incomplete factorials from hybrid breeding programs of rapeseed, wheat, and corn. We investigated a range of machine learning algorithms with three different types of predictor variables: (a) information on parentage of hybrids, (b) in addition hybrid performance of crosses of the parental lines with other crossing partners, and (c) genotypic marker data. In two highly incomplete and unbalanced factorials from rapeseed, in which the SCA variance contributed considerably to the genetic variance, stacked ensembles of gradient boosting machines based on parentage information outperformed GCA prediction. The stacked ensembles increased prediction accuracy from 0.39 to 0.45, and from 0.48 to 0.54 compared to GCA prediction. The prediction accuracy reached by stacked ensembles without marker data reached values comparable to those of GBLUP that requires marker data. We conclude that hybrid prediction with stacked ensembles of gradient boosting machines based on parentage information is a promising approach that is worth further investigations with other data sets in which SCA variance is high.
Collapse
Affiliation(s)
| | - Matthias Frisch
- Institute of Agronomy and Plant Breeding II, Justus Liebig University, Gießen, Germany
| | | | | | - Eva Herzog
- Institute of Agronomy and Plant Breeding II, Justus Liebig University, Gießen, Germany
| |
Collapse
|
47
|
Kusunose M, Inui A, Nishimoto H, Mifune Y, Yoshikawa T, Shinohara I, Furukawa T, Kato T, Tanaka S, Kuroda R. Measurement of Shoulder Abduction Angle with Posture Estimation Artificial Intelligence Model. SENSORS (BASEL, SWITZERLAND) 2023; 23:6445. [PMID: 37514738 PMCID: PMC10416158 DOI: 10.3390/s23146445] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Revised: 07/10/2023] [Accepted: 07/14/2023] [Indexed: 07/30/2023]
Abstract
Substantial advancements in markerless motion capture accuracy exist, but discrepancies persist when measuring joint angles compared to those taken with a goniometer. This study integrates machine learning techniques with markerless motion capture, with an aim to enhance this accuracy. Two artificial intelligence-based libraries-MediaPipe and LightGBM-were employed in executing markerless motion capture and shoulder abduction angle estimation. The motion of ten healthy volunteers was captured using smartphone cameras with right shoulder abduction angles ranging from 10° to 160°. The cameras were set diagonally at 45°, 30°, 15°, 0°, -15°, or -30° relative to the participant situated at a distance of 3 m. To estimate the abduction angle, machine learning models were developed considering the angle data from the goniometer as the ground truth. The model performance was evaluated using the coefficient of determination R2 and mean absolute percentage error, which were 0.988 and 1.539%, respectively, for the trained model. This approach could estimate the shoulder abduction angle, even if the camera was positioned diagonally with respect to the object. Thus, the proposed models can be utilized for the real-time estimation of shoulder motion during rehabilitation or sports motion.
Collapse
Affiliation(s)
| | - Atsuyuki Inui
- Department of Orthopaedic Surgery, Kobe University Graduate School of Medicine, Kobe 650-0017, Japan; (M.K.); (H.N.); (Y.M.); (T.Y.); (I.S.); (T.F.); (T.K.); (S.T.); (R.K.)
| | | | | | | | | | | | | | | | | |
Collapse
|
48
|
Guo S, Huang X, Situ Y, Huang Q, Guan K, Huang J, Wang W, Bai X, Liu Z, Wu Y, Qiao Z. Interpretable Machine-Learning and Big Data Mining to Predict Gas Diffusivity in Metal-Organic Frameworks. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2023; 10:e2301461. [PMID: 37166040 PMCID: PMC10375163 DOI: 10.1002/advs.202301461] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/05/2023] [Revised: 04/14/2023] [Indexed: 05/12/2023]
Abstract
For gas separation and catalysis by metal-organic frameworks (MOFs), gas diffusion has a substantial impact on the process' overall rate, so it is necessary to determine the molecular diffusion behavior within the MOFs. In this study, an interpretable machine learing (ML) model, light gradient boosting machine (LGBM), is trained to predict the molecular diffusivity and selectivity of 9 gases (Kr, Xe, CH4 , N2 , H2 S, O2 , CO2 , H2 , and He). For these 9 gases, LGBM displays high accuracy (average R2 = 0.962) and superior extrapolation for the diffusivity of C2 H6 . And this model calculation is five orders of magnitude faster than molecular dynamics (MD) simulations. Subsequently, using the trained LGBM model, an interactive desktop application is developed that can help researchers quickly and accurately calculate the diffusion of molecules in porous crystal materials. Finally, the authors find the difference in the molecular polarizability (ΔPol) is the key factor governing the diffusion selectivity by combining the trained LGBM model with the Shapley additive explanation (SHAP). By the calculation of interpretable ML, the optimal MOFs are selected for separating binary gas mixtures and CO2 methanation. This work provides a new direction for exploring the structure-property relationships of MOFs and realizing the rapid calculation of molecular diffusivity.
Collapse
Affiliation(s)
- Shuya Guo
- Guangzhou Key Laboratory for New Energy and Green CatalysisSchool of Chemistry and Chemical EngineeringGuangzhou UniversityGuangzhou510006China
| | - Xiaoshan Huang
- Guangzhou Key Laboratory for New Energy and Green CatalysisSchool of Chemistry and Chemical EngineeringGuangzhou UniversityGuangzhou510006China
| | - Yizhen Situ
- Guangzhou Key Laboratory for New Energy and Green CatalysisSchool of Chemistry and Chemical EngineeringGuangzhou UniversityGuangzhou510006China
| | - Qiuhong Huang
- Guangzhou Key Laboratory for New Energy and Green CatalysisSchool of Chemistry and Chemical EngineeringGuangzhou UniversityGuangzhou510006China
| | - Kexin Guan
- Guangzhou Key Laboratory for New Energy and Green CatalysisSchool of Chemistry and Chemical EngineeringGuangzhou UniversityGuangzhou510006China
| | - Jiaxin Huang
- Guangzhou Key Laboratory for New Energy and Green CatalysisSchool of Chemistry and Chemical EngineeringGuangzhou UniversityGuangzhou510006China
| | - Wei Wang
- Guangzhou Key Laboratory for New Energy and Green CatalysisSchool of Chemistry and Chemical EngineeringGuangzhou UniversityGuangzhou510006China
| | - Xiangning Bai
- Guangzhou Key Laboratory for New Energy and Green CatalysisSchool of Chemistry and Chemical EngineeringGuangzhou UniversityGuangzhou510006China
| | - Zili Liu
- Guangzhou Key Laboratory for New Energy and Green CatalysisSchool of Chemistry and Chemical EngineeringGuangzhou UniversityGuangzhou510006China
| | - Yufang Wu
- Guangzhou Key Laboratory for New Energy and Green CatalysisSchool of Chemistry and Chemical EngineeringGuangzhou UniversityGuangzhou510006China
| | - Zhiwei Qiao
- Guangzhou Key Laboratory for New Energy and Green CatalysisSchool of Chemistry and Chemical EngineeringGuangzhou UniversityGuangzhou510006China
- Joint Institute of Guangzhou University & Institute of Corrosion Science and TechnologyGuangzhou UniversityGuangzhou510006China
| |
Collapse
|
49
|
He Q, Tang S, Zhi H, Chen J, Zhang J, Liang H, Alam O, Li H, Zhang H, Xing L, Li X, Zhang W, Wang H, Shi J, Du H, Wu H, Wang L, Yang P, Xing L, Yan H, Song Z, Liu J, Wang H, Tian X, Qiao Z, Feng G, Guo R, Zhu W, Ren Y, Hao H, Li M, Zhang A, Guo E, Yan F, Li Q, Liu Y, Tian B, Zhao X, Jia R, Feng B, Zhang J, Wei J, Lai J, Jia G, Purugganan M, Diao X. A graph-based genome and pan-genome variation of the model plant Setaria. Nat Genet 2023:10.1038/s41588-023-01423-w. [PMID: 37291196 DOI: 10.1038/s41588-023-01423-w] [Citation(s) in RCA: 38] [Impact Index Per Article: 38.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2022] [Accepted: 05/08/2023] [Indexed: 06/10/2023]
Abstract
Setaria italica (foxtail millet), a founder crop of East Asian agriculture, is a model plant for C4 photosynthesis and developing approaches to adaptive breeding across multiple climates. Here we established the Setaria pan-genome by assembling 110 representative genomes from a worldwide collection. The pan-genome is composed of 73,528 gene families, of which 23.8%, 42.9%, 29.4% and 3.9% are core, soft core, dispensable and private genes, respectively; 202,884 nonredundant structural variants were also detected. The characterization of pan-genomic variants suggests their importance during foxtail millet domestication and improvement, as exemplified by the identification of the yield gene SiGW3, where a 366-bp presence/absence promoter variant accompanies gene expression variation. We developed a graph-based genome and performed large-scale genetic studies for 68 traits across 13 environments, identifying potential genes for millet improvement at different geographic sites. These can be used in marker-assisted breeding, genomic selection and genome editing to accelerate crop improvement under different climatic conditions.
Collapse
Affiliation(s)
- Qiang He
- Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Sha Tang
- Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Hui Zhi
- Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Jinfeng Chen
- State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Jun Zhang
- Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Hongkai Liang
- Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Ornob Alam
- Center for Genomics and Systems Biology, New York University, New York City, NY, USA
| | - Hongbo Li
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Hui Zhang
- Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
- College of Agronomy, Northwest A & F University, Yangling, China
| | - Lihe Xing
- Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Xukai Li
- College of Life Sciences, Shanxi Agricultural University, Taigu, China
| | - Wei Zhang
- Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Hailong Wang
- Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Junpeng Shi
- State Key Laboratory of Plant Physiology and Biochemistry & National Maize Improvement Center, Department of Plant Genetics and Breeding, China Agricultural University, Beijing, China
| | - Huilong Du
- School of Life Sciences, Institute of Life Sciences and Green Development, Hebei University, Baoding, China
| | - Hongpo Wu
- Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Liwei Wang
- Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Ping Yang
- Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Lu Xing
- Anyang Academy of Agriculture Sciences, Anyang, China
| | - Hongshan Yan
- Anyang Academy of Agriculture Sciences, Anyang, China
| | | | - Jinrong Liu
- Anyang Academy of Agriculture Sciences, Anyang, China
| | - Haigang Wang
- Center for Agricultural Genetic Resources Research, Shanxi Agricultural University, Taiyuan, China
| | - Xiang Tian
- Center for Agricultural Genetic Resources Research, Shanxi Agricultural University, Taiyuan, China
| | - Zhijun Qiao
- Center for Agricultural Genetic Resources Research, Shanxi Agricultural University, Taiyuan, China
| | - Guojun Feng
- Research Institute of Cereal Crops, Xinjiang Academy of Agricultural Sciences, Urumqi, China
| | - Ruifeng Guo
- Institute of High Latitude Crops, Shanxi Agricultural University, Datong, China
| | - Wenjuan Zhu
- Institute of High Latitude Crops, Shanxi Agricultural University, Datong, China
| | - Yuemei Ren
- Institute of High Latitude Crops, Shanxi Agricultural University, Datong, China
| | - Hongbo Hao
- Institute of Dry-Land Farming, Hebei Academy of Agricultural and Forestry Sciences, Hengshui, China
| | - Mingzhe Li
- Institute of Dry-Land Farming, Hebei Academy of Agricultural and Forestry Sciences, Hengshui, China
| | - Aiying Zhang
- Millet Research Institute, Shanxi Agricultural University, Changzhi, China
| | - Erhu Guo
- Millet Research Institute, Shanxi Agricultural University, Changzhi, China
| | - Feng Yan
- Qiqihar Sub-Academy of Heilongjiang Academy of Agricultural Sciences, Qiqihar, China
| | - Qingquan Li
- Qiqihar Sub-Academy of Heilongjiang Academy of Agricultural Sciences, Qiqihar, China
| | - Yanli Liu
- Cangzhou Academy of Agriculture and Forestry Sciences, Cangzhou, China
| | - Bohong Tian
- Cangzhou Academy of Agriculture and Forestry Sciences, Cangzhou, China
| | - Xiaoqin Zhao
- Dingxi Academy of Agricultural Sciences, Dingxi, China
| | - Ruiling Jia
- Dingxi Academy of Agricultural Sciences, Dingxi, China
| | - Baili Feng
- College of Agronomy, Northwest A & F University, Yangling, China
| | - Jiewei Zhang
- Beijing Key Laboratory of Agricultural Genetic Resources and Biotechnology, Beijing Academy of Agriculture and Forestry Sciences, Beijing, China
| | - Jianhua Wei
- Beijing Key Laboratory of Agricultural Genetic Resources and Biotechnology, Beijing Academy of Agriculture and Forestry Sciences, Beijing, China
| | - Jinsheng Lai
- State Key Laboratory of Plant Physiology and Biochemistry & National Maize Improvement Center, Department of Plant Genetics and Breeding, China Agricultural University, Beijing, China
| | - Guanqing Jia
- Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, China.
| | - Michael Purugganan
- Center for Genomics and Systems Biology, New York University, New York City, NY, USA.
- Center for Genomics and Systems Biology, New York University Abu Dhabi, Abu Dhabi, United Arab Emirates.
| | - Xianmin Diao
- Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, China.
| |
Collapse
|
50
|
Hong X, Lv J, Li Z, Xiong Y, Zhang J, Chen HF. Sequence-based machine learning method for predicting the effects of phosphorylation on protein-protein interactions. Int J Biol Macromol 2023; 243:125233. [PMID: 37290543 DOI: 10.1016/j.ijbiomac.2023.125233] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2023] [Revised: 06/02/2023] [Accepted: 06/03/2023] [Indexed: 06/10/2023]
Abstract
Protein phosphorylation, catalyzed by kinases, is an important biochemical process, which plays an essential role in multiple cell signaling pathways. Meanwhile, protein-protein interactions (PPI) constitute the signaling pathways. Abnormal phosphorylation status on protein can regulate protein functions through PPI to evoke severe diseases, such as Cancer and Alzheimer's disease. Due to the limited experimental evidence and high costs to experimentally identify novel evidence of phosphorylation regulation on PPI, it is necessary to develop a high-accuracy and user-friendly artificial intelligence method to predict phosphorylation effect on PPI. Here, we proposed a novel sequence-based machine learning method named PhosPPI, which achieved better identification performance (Accuracy and AUC) than other competing predictive methods of Betts, HawkDock and FoldX. PhosPPI is now freely available in web server (https://phosppi.sjtu.edu.cn/). This tool can help the user to identify functional phosphorylation sites affecting PPI and explore phosphorylation-associated disease mechanism and drug development.
Collapse
Affiliation(s)
- Xiaokun Hong
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Center for Systems Biomedicine, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Jiyang Lv
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Center for Systems Biomedicine, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Zhengxin Li
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Center for Systems Biomedicine, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Yi Xiong
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Center for Systems Biomedicine, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Jian Zhang
- Department of Pathophysiology, Key Laboratory of Cell Differentiation and Apoptosis of Chinese Ministry of Education, Shanghai Jiao-Tong University School of Medicine (SJTU-SM), Shanghai 200025, China.
| | - Hai-Feng Chen
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, Department of Bioinformatics and Biostatistics, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Center for Systems Biomedicine, Shanghai Jiao Tong University, Shanghai 200240, China.
| |
Collapse
|