Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Azodi CB, Tang J, Shiu SH. Opening the Black Box: Interpretable Machine Learning for Geneticists. Trends Genet 2020;36:442-455. [PMID: 32396837 DOI: 10.1016/j.tig.2020.03.005] [Citation(s) in RCA: 104] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2020] [Revised: 03/12/2020] [Accepted: 03/16/2020] [Indexed: 01/16/2023]

For:	Azodi CB, Tang J, Shiu SH. Opening the Black Box: Interpretable Machine Learning for Geneticists. Trends Genet 2020;36:442-455. [PMID: 32396837 DOI: 10.1016/j.tig.2020.03.005] [Citation(s) in RCA: 104] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2020] [Revised: 03/12/2020] [Accepted: 03/16/2020] [Indexed: 01/16/2023]

Number

Cited by Other Article(s)

Bi X, Wang J, Xue B, He C, Liu F, Chen H, Lin LL, Dong B, Li B, Jin C, Pan J, Xue W, Ye J. SERSomes for metabolic phenotyping and prostate cancer diagnosis. Cell Rep Med 2024;5:101579. [PMID: 38776910 PMCID: PMC11228451 DOI: 10.1016/j.xcrm.2024.101579] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Revised: 03/08/2024] [Accepted: 04/29/2024] [Indexed: 05/25/2024]

Affiliation(s)

Xinyuan Bi State Key Laboratory of Systems Medicine for Cancer, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, P.R. China
Jiayi Wang Department of Urology, Ren Ji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, P.R. China
Bingsen Xue State Key Laboratory of Systems Medicine for Cancer, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, P.R. China; Shanghai Artificial Intelligence Laboratory, Shanghai, China
Chang He State Key Laboratory of Systems Medicine for Cancer, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, P.R. China
Fugang Liu State Key Laboratory of Systems Medicine for Cancer, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, P.R. China
Haoran Chen State Key Laboratory of Systems Medicine for Cancer, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, P.R. China
Linley Li Lin State Key Laboratory of Systems Medicine for Cancer, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, P.R. China
Baijun Dong Department of Urology, Jiading District Central Hospital Affiliated Shanghai University of Medicine & Health Science, Shanghai, P.R. China
Butang Li Department of Urology, Ningbo Hangzhou Bay Hospital, Ningbo, Zhejiang, P.R. China
Cheng Jin State Key Laboratory of Systems Medicine for Cancer, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, P.R. China; Shanghai Artificial Intelligence Laboratory, Shanghai, China; Institute of Medical Robotics, Shanghai Jiao Tong University, Shanghai, P.R. China.
Jiahua Pan Department of Urology, Ren Ji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, P.R. China.
Wei Xue Department of Urology, Ren Ji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, P.R. China.
Jian Ye State Key Laboratory of Systems Medicine for Cancer, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, P.R. China; Institute of Medical Robotics, Shanghai Jiao Tong University, Shanghai, P.R. China; Shanghai Key Laboratory of Gynecologic Oncology, Ren Ji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, P.R. China.

Collapse

Fronk AD, Manzanares MA, Zheng P, Geier A, Anderson K, Stanton S, Zumrut H, Gera S, Munch R, Frederick V, Dhingra P, Arun G, Akerman M. Development and validation of AI/ML derived splice-switching oligonucleotides. Mol Syst Biol 2024;20:676-701. [PMID: 38664594 PMCID: PMC11148135 DOI: 10.1038/s44320-024-00034-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 04/03/2024] [Accepted: 04/09/2024] [Indexed: 06/05/2024] Open

Li X, Wang P, Zhu Y, Zhao W, Pan H, Wang D. Interpretable machine learning model for predicting acute kidney injury in critically ill patients. BMC Med Inform Decis Mak 2024;24:148. [PMID: 38822285 PMCID: PMC11140965 DOI: 10.1186/s12911-024-02537-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Accepted: 05/17/2024] [Indexed: 06/02/2024] Open

Wen C, Zhang X, Li Y, Xiao W, Hu Q, Lei X, Xu T, Liang S, Gao X, Zhang C, Yu Z, Lü M. An interpretable machine learning model for predicting 28-day mortality in patients with sepsis-associated liver injury. PLoS One 2024;19:e0303469. [PMID: 38768153 PMCID: PMC11104601 DOI: 10.1371/journal.pone.0303469] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Accepted: 04/25/2024] [Indexed: 05/22/2024] Open

Abstract

Sepsis-Associated Liver Injury (SALI) is an independent risk factor for death from sepsis. The aim of this study was to develop an interpretable machine learning model for early prediction of 28-day mortality in patients with SALI. Data from the Medical Information Mart for Intensive Care (MIMIC-IV, v2.2, MIMIC-III, v1.4) were used in this study. The study cohort from MIMIC-IV was randomized to the training set (0.7) and the internal validation set (0.3), with MIMIC-III (2001 to 2008) as external validation. The features with more than 20% missing values were deleted and the remaining features were multiple interpolated. Lasso-CV that lasso linear model with iterative fitting along a regularization path in which the best model is selected by cross-validation was used to select important features for model development. Eight machine learning models including Random Forest (RF), Logistic Regression, Decision Tree, Extreme Gradient Boost (XGBoost), K Nearest Neighbor, Support Vector Machine, Generalized Linear Models in which the best model is selected by cross-validation (CV_glmnet), and Linear Discriminant Analysis (LDA) were developed. Shapley additive interpretation (SHAP) was used to improve the interpretability of the optimal model. At last, a total of 1043 patients were included, of whom 710 were from MIMIC-IV and 333 from MIMIC-III. Twenty-four clinically relevant parameters were selected for model construction. For the prediction of 28-day mortality of SALI in the internal validation set, the area under the curve (AUC (95% CI)) of RF was 0.79 (95% CI: 0.73-0.86), and which performed the best. Compared with the traditional disease severity scores including Oxford Acute Severity of Illness Score (OASIS), Sequential Organ Failure Assessment (SOFA), Simplified Acute Physiology Score II (SAPS II), Logistic Organ Dysfunction Score (LODS), Systemic Inflammatory Response Syndrome (SIRS), and Acute Physiology Score III (APS III), RF also had the best performance. SHAP analysis found that Urine output, Charlson Comorbidity Index (CCI), minimal Glasgow Coma Scale (GCS_min), blood urea nitrogen (BUN) and admission_age were the five most important features affecting RF model. Therefore, RF has good predictive ability for 28-day mortality prediction in SALI. Urine output, CCI, GCS_min, BUN and age at admission(admission_age) within 24 h after intensive care unit(ICU) admission contribute significantly to model prediction.

Collapse

Li C, Jia J, Wu F, Zuo L, Cui X. County-level intensity of carbon emissions from crop farming in China during 2000-2019. Sci Data 2024;11:457. [PMID: 38710695 DOI: 10.1038/s41597-024-03296-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Accepted: 04/23/2024] [Indexed: 05/08/2024] Open

Khattak A, Zhang J, Chan PW, Chen F, Matara CM. AI-supported estimation of safety critical wind shear-induced aircraft go-around events utilizing pilot reports. Heliyon 2024;10:e28569. [PMID: 38560193 PMCID: PMC10981122 DOI: 10.1016/j.heliyon.2024.e28569] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2023] [Revised: 03/20/2024] [Accepted: 03/20/2024] [Indexed: 04/04/2024] Open

Wu L, Xu J, Tong W. PERform: assessing model performance with predictivity and explainability readiness formula. JOURNAL OF ENVIRONMENTAL SCIENCE AND HEALTH. PART C, TOXICOLOGY AND CARCINOGENESIS 2024:1-16. [PMID: 38619534 DOI: 10.1080/26896583.2024.2340391] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/16/2024]

Scalzitti N, Miralavy I, Korenchan DE, Farrar CT, Gilad AA, Banzhaf W. Computational peptide discovery with a genetic programming approach. J Comput Aided Mol Des 2024;38:17. [PMID: 38570405 DOI: 10.1007/s10822-024-00558-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Accepted: 03/07/2024] [Indexed: 04/05/2024]

Oh SW, Byun SS, Kim JK, Jeong CW, Kwak C, Hwang EC, Kang SH, Chung J, Kim YJ, Ha YS, Hong SH. Machine learning models for predicting the onset of chronic kidney disease after surgery in patients with renal cell carcinoma. BMC Med Inform Decis Mak 2024;24:85. [PMID: 38519947 PMCID: PMC10960396 DOI: 10.1186/s12911-024-02473-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Accepted: 03/03/2024] [Indexed: 03/25/2024] Open

Wu TY, Li YR, Chang KJ, Fang JC, Urano D, Liu MJ. Modeling alternative translation initiation sites in plants reveals evolutionarily conserved cis-regulatory codes in eukaryotes. Genome Res 2024;34:272-285. [PMID: 38479836 PMCID: PMC10984385 DOI: 10.1101/gr.278100.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Accepted: 02/15/2024] [Indexed: 03/22/2024]

Mota LFM, Arikawa LM, Santos SWB, Fernandes Júnior GA, Alves AAC, Rosa GJM, Mercadante MEZ, Cyrillo JNSG, Carvalheiro R, Albuquerque LG. Benchmarking machine learning and parametric methods for genomic prediction of feed efficiency-related traits in Nellore cattle. Sci Rep 2024;14:6404. [PMID: 38493207 PMCID: PMC10944497 DOI: 10.1038/s41598-024-57234-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Accepted: 03/15/2024] [Indexed: 03/18/2024] Open

Tang AS, Rankin KP, Cerono G, Miramontes S, Mills H, Roger J, Zeng B, Nelson C, Soman K, Woldemariam S, Li Y, Lee A, Bove R, Glymour M, Aghaeepour N, Oskotsky TT, Miller Z, Allen IE, Sanders SJ, Baranzini S, Sirota M. Leveraging electronic health records and knowledge networks for Alzheimer's disease prediction and sex-specific biological insights. NATURE AGING 2024;4:379-395. [PMID: 38383858 PMCID: PMC10950787 DOI: 10.1038/s43587-024-00573-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Accepted: 01/19/2024] [Indexed: 02/23/2024]

Affiliation(s)

Alice S Tang Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA. Graduate Program in Bioengineering, University of California, San Francisco and University of California, Berkeley, San Francisco and Berkeley, CA, USA.
Katherine P Rankin Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA Memory and Aging Center, Department of Neurology, University of California, San Francisco, San Francisco, CA, USA
Gabriel Cerono Weill Institute for Neuroscience. Department of Neurology, University of California, San Francisco, San Francisco, CA, USA
Silvia Miramontes Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
Hunter Mills Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
Jacquelyn Roger Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
Billy Zeng Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
Charlotte Nelson Weill Institute for Neuroscience. Department of Neurology, University of California, San Francisco, San Francisco, CA, USA
Karthik Soman Weill Institute for Neuroscience. Department of Neurology, University of California, San Francisco, San Francisco, CA, USA
Sarah Woldemariam Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
Yaqiao Li Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
Albert Lee Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
Riley Bove Weill Institute for Neuroscience. Department of Neurology, University of California, San Francisco, San Francisco, CA, USA
Maria Glymour Department of Anesthesiology, Pain, and Perioperative Medicine, Stanford University, Palo Alto, CA, USA
Nima Aghaeepour Department of Anesthesiology, Pain, and Perioperative Medicine, Stanford University, Palo Alto, CA, USA Department of Pediatrics, Stanford University, Palo Alto, CA, USA Department of Biomedical Data Science, Stanford University, Palo Alto, CA, USA
Tomiko T Oskotsky Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
Zachary Miller Memory and Aging Center, Department of Neurology, University of California, San Francisco, San Francisco, CA, USA
Isabel E Allen Department of Epidemiology and Biostatistics, University of California, San Francisco, San Francisco, CA, USA
Stephan J Sanders Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA Institute of Developmental and Regenerative Medicine, Department of Paediatrics, University of Oxford, Oxford, UK Department of Psychiatry and Behavioral Sciences, Weill Institute for Neurosciences, University of California, San Francisco, CA, USA
Sergio Baranzini Weill Institute for Neuroscience. Department of Neurology, University of California, San Francisco, San Francisco, CA, USA
Marina Sirota Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA. Department of Pediatrics, University of California, San Francisco, CA, USA.

Collapse

Ghosh SK, Khandoker AH. Investigation on explainable machine learning models to predict chronic kidney diseases. Sci Rep 2024;14:3687. [PMID: 38355876 PMCID: PMC10866953 DOI: 10.1038/s41598-024-54375-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Accepted: 02/12/2024] [Indexed: 02/16/2024] Open

Abstract

Chronic kidney disease (CKD) is a major worldwide health problem, affecting a large proportion of the world's population and leading to higher morbidity and death rates. The early stages of CKD sometimes present without visible symptoms, causing patients to be unaware. Early detection and treatments are critical in reducing complications and improving the overall quality of life for people afflicted. In this work, we investigate the use of an explainable artificial intelligence (XAI)-based strategy, leveraging clinical characteristics, to predict CKD. This study collected clinical data from 491 patients, comprising 56 with CKD and 435 without CKD, encompassing clinical, laboratory, and demographic variables. To develop the predictive model, five machine learning (ML) methods, namely logistic regression (LR), random forest (RF), decision tree (DT), Naïve Bayes (NB), and extreme gradient boosting (XGBoost), were employed. The optimal model was selected based on accuracy and area under the curve (AUC). Additionally, the SHAP (SHapley Additive exPlanations) and LIME (Local Interpretable Model-agnostic Explanations) algorithms were utilized to demonstrate the influence of the features on the optimal model. Among the five models developed, the XGBoost model achieved the best performance with an AUC of 0.9689 and an accuracy of 93.29%. The analysis of feature importance revealed that creatinine, glycosylated hemoglobin type A1C (HgbA1C), and age were the three most influential features in the XGBoost model. The SHAP force analysis further illustrated the model's visualization of individualized CKD predictions. For further insights into individual predictions, we also utilized the LIME algorithm. This study presents an interpretable ML-based approach for the early prediction of CKD. The SHAP and LIME methods enhance the interpretability of ML models and help clinicians better understand the rationale behind the predicted outcomes more effectively.

Collapse

Wang H, Moghe GD, Kovaleski AP, Keller M, Martinson TE, Wright AH, Franklin JL, Hébert-Haché A, Provost C, Reinke M, Atucha A, North MG, Russo JP, Helwi P, Centinari M, Londo JP. NYUS.2: an automated machine learning prediction model for the large-scale real-time simulation of grapevine freezing tolerance in North America. HORTICULTURE RESEARCH 2024;11:uhad286. [PMID: 38487294 PMCID: PMC10939402 DOI: 10.1093/hr/uhad286] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Accepted: 12/17/2023] [Indexed: 03/17/2024]

Affiliation(s)

Hongrui Wang School of Integrative Plant Science, Horticulture Section, Cornell AgriTech, Cornell University, Geneva, NY 14456, USA
Gaurav D Moghe School of Integrative Plant Science, Plant Biology Section, Cornell University, Ithaca, NY 14850, USA
Al P Kovaleski Plant and Agroecosystem Sciences Department, University of Wisconsin–Madison, Madison, WI 53706, USA
Markus Keller Department of Viticulture and Enology, Irrigated Agriculture Research and Extension Center, Washington State University, Prosser, WA 99350, USA
Timothy E Martinson School of Integrative Plant Science, Horticulture Section, Cornell AgriTech, Cornell University, Geneva, NY 14456, USA
A Harrison Wright Kentville Research and Development Centre, Agriculture and Agri-Food Canada, Kentville, Nova Scotia, B4N 1J5, Canada
Jeffrey L Franklin Kentville Research and Development Centre, Agriculture and Agri-Food Canada, Kentville, Nova Scotia, B4N 1J5, Canada
Andréanne Hébert-Haché Centre de Recherche Agroalimentaire de Mirabel, Mirabel, Québec, J7N 2X8, Canada
Caroline Provost Centre de Recherche Agroalimentaire de Mirabel, Mirabel, Québec, J7N 2X8, Canada
Michael Reinke Southwest Michigan Research and Extension Center, Michigan State University, Benton Harbor, MI 49022, USA
Amaya Atucha Plant and Agroecosystem Sciences Department, University of Wisconsin–Madison, Madison, WI 53706, USA
Michael G North Plant and Agroecosystem Sciences Department, University of Wisconsin–Madison, Madison, WI 53706, USA
Jennifer P Russo School of Integrative Plant Science, Horticulture Section, Cornell AgriTech, Cornell University, Geneva, NY 14456, USA
Pierre Helwi Martell & Co., 7 place Edouard Martell, Cognac 16100, France
Michela Centinari Department of Plant Science, The Pennsylvania State University, University Park, PA 16802, USA
Jason P Londo School of Integrative Plant Science, Horticulture Section, Cornell AgriTech, Cornell University, Geneva, NY 14456, USA

Collapse

Hu J, Xu J, Li M, Jiang Z, Mao J, Feng L, Miao K, Li H, Chen J, Bai Z, Li X, Lu G, Li Y. Identification and validation of an explainable prediction model of acute kidney injury with prognostic implications in critically ill children: a prospective multicenter cohort study. EClinicalMedicine 2024;68:102409. [PMID: 38273888 PMCID: PMC10809096 DOI: 10.1016/j.eclinm.2023.102409] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Revised: 12/19/2023] [Accepted: 12/19/2023] [Indexed: 01/27/2024] Open

Abstract

Background

Acute kidney injury (AKI) is a common and serious organ dysfunction in critically ill children. Early identification and prediction of AKI are of great significance. However, current AKI criteria are insufficiently sensitive and specific, and AKI heterogeneity limits the clinical value of AKI biomarkers. This study aimed to establish and validate an explainable prediction model based on the machine learning (ML) approach for AKI, and assess its prognostic implications in children admitted to the pediatric intensive care unit (PICU).

Methods

This multicenter prospective study in China was conducted on critically ill children for the derivation and validation of the prediction model. The derivation cohort, consisting of 957 children admitted to four independent PICUs from September 2020 to January 2021, was separated for training and internal validation, and an external data set of 866 children admitted from February 2021 to February 2022 was employed for external validation. AKI was defined based on serum creatinine and urine output using the Kidney Disease: Improving Global Outcome (KDIGO) criteria. With 33 medical characteristics easily obtained or evaluated during the first 24 h after PICU admission, 11 ML algorithms were used to construct prediction models. Several evaluation indexes, including the area under the receiver-operating-characteristic curve (AUC), were used to compare the predictive performance. The SHapley Additive exPlanation method was used to rank the feature importance and explain the final model. A probability threshold for the final model was identified for AKI prediction and subgrouping. Clinical outcomes were evaluated in various subgroups determined by a combination of the final model and KDIGO criteria.

Findings

The random forest (RF) model performed best in discriminative ability among the 11 ML models. After reducing features according to feature importance rank, an explainable final RF model was established with 8 features. The final model could accurately predict AKI in both internal (AUC = 0.929) and external (AUC = 0.910) validations, and has been translated into a convenient tool to facilitate its utility in clinical settings. Critically ill children with a probability exceeding or equal to the threshold in the final model had a higher risk of death and multiple organ dysfunctions, regardless of whether they met the KDIGO criteria for AKI.

Interpretation

Our explainable ML model was not only successfully developed to accurately predict AKI but was also highly relevant to adverse outcomes in individual children at an early stage of PICU admission, and it mitigated the concern of the "black-box" issue with an undirect interpretation of the ML technique.

Funding

The National Natural Science Foundation of China, Jiangsu Province Science and Technology Support Program, Key talent of women's and children's health of Jiangsu Province, and Postgraduate Research & Practice Innovation Program of Jiangsu Province.

Collapse

Ciobanu-Caraus O, Aicher A, Kernbach JM, Regli L, Serra C, Staartjes VE. A critical moment in machine learning in medicine: on reproducible and interpretable learning. Acta Neurochir (Wien) 2024;166:14. [PMID: 38227273 DOI: 10.1007/s00701-024-05892-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2023] [Accepted: 12/14/2023] [Indexed: 01/17/2024]

Abstract

Over the past two decades, advances in computational power and data availability combined with increased accessibility to pre-trained models have led to an exponential rise in machine learning (ML) publications. While ML may have the potential to transform healthcare, this sharp increase in ML research output without focus on methodological rigor and standard reporting guidelines has fueled a reproducibility crisis. In addition, the rapidly growing complexity of these models compromises their interpretability, which currently impedes their successful and widespread clinical adoption. In medicine, where failure of such models may have severe implications for patients' health, the high requirements for accuracy, robustness, and interpretability confront ML researchers with a unique set of challenges. In this review, we discuss the semantics of reproducibility and interpretability, as well as related issues and challenges, and outline possible solutions to counteracting the "black box". To foster reproducibility, standard reporting guidelines need to be further developed and data or code sharing encouraged. Editors and reviewers may equally play a critical role by establishing high methodological standards and thus preventing the dissemination of low-quality ML publications. To foster interpretable learning, the use of simpler models more suitable for medical data can inform the clinician how results are generated based on input data. Model-agnostic explanation tools, sensitivity analysis, and hidden layer representations constitute further promising approaches to increase interpretability. Balancing model performance and interpretability are important to ensure clinical applicability. We have now reached a critical moment for ML in medicine, where addressing these issues and implementing appropriate solutions will be vital for the future evolution of the field.

Collapse

Huang X, Rymbekova A, Dolgova O, Lao O, Kuhlwilm M. Harnessing deep learning for population genetic inference. Nat Rev Genet 2024;25:61-78. [PMID: 37666948 DOI: 10.1038/s41576-023-00636-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/11/2023] [Indexed: 09/06/2023]

Lang FF, Liu LY, Wang SW. Predictive modeling of perioperative blood transfusion in lumbar posterior interbody fusion using machine learning. Front Physiol 2023;14:1306453. [PMID: 38187137 PMCID: PMC10767743 DOI: 10.3389/fphys.2023.1306453] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2023] [Accepted: 11/06/2023] [Indexed: 01/09/2024] Open

Abstract

Background: Accurate estimation of perioperative blood transfusion risk in lumbar posterior interbody fusion is essential to reduce the number, cost, and complications associated with blood transfusions. Machine learning algorithms have the potential to outperform traditional prediction methods in predicting perioperative blood transfusion. This study aimed to construct a machine learning-based perioperative transfusion risk prediction model for lumbar posterior interbody fusion in order to improve the efficacy of surgical decision-making. Methods: We retrospectively collected clinical data on 1905 patients who underwent lumbar posterior interbody fusion surgery at the Second Hospital of Shanxi Medical University between January 2021 and March 2023. All the data was randomly divided into a training set and a validation set, and the "feature_importances" method provided by eXtreme Gradient Boosting (XGBoost) algorithm was applied to select statistically significant features on the training set to establish five machine learning prediction models. The optimal model was identified by utilizing the area under the curve (AUC) and the probability calibration curve on the validation set. Shapley additive explanations (SHAP) and local interpretable model-agnostic explanations (LIME) were employed for interpretable analysis of the optimal model. Results: In the postoperative outcomes of patients, the number of hospital days in the transfusion group was longer than that in the non-transfusion group. Additionally, the transfusion group experienced higher total hospital costs, 90-day readmission rates, and complication rates within 90 days after surgery than the non-transfusion group. A total of 9 features were selected for the models. The XGBoost model performed best with an AUC value of 0.958. The SHAP values showed that intraoperative blood loss, intraoperative fluid infusion, and number of fused segments were the top 3 most important features affecting perioperative blood transfusion in lumbar posterior interbody fusion. The LIME algorithm was used to interpret the individualized prediction. Conclusion: Surgery, ASA class, levels fused, total intraoperative blood loss, operative time, and preoperative Hb are viable predictors of perioperative blood transfusion in lumbar posterior interbody fusion. The XGBoost model has demonstrated superior predictive efficacy compared to the traditional logistic regression model, making it a more effective decision-making tool for perioperative blood transfusion.

Collapse

Alexander Pyron R. Unsupervised machine learning for species delimitation, integrative taxonomy, and biodiversity conservation. Mol Phylogenet Evol 2023;189:107939. [PMID: 37804960 DOI: 10.1016/j.ympev.2023.107939] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Revised: 09/25/2023] [Accepted: 10/04/2023] [Indexed: 10/09/2023]

Abstract

Integrative taxonomy, combining data from multiple axes of biologically relevant variation, is a major goal of systematics. Ideally, such taxonomies will derive from similarly integrative species-delimitation analyses. Yet, most current methods rely solely or primarily on molecular data, with other layers often incorporated only in a post hoc qualitative or comparative manner. A major limitation is the difficulty of devising quantitative parametric models linking different datasets in a unified ecological and evolutionary framework. Machine Learning (ML) methods offer flexibility in this arena by easily learning high-dimensional associations between observations (e.g., individual specimens) across a wide array of input features (e.g., genetics, geography, environment, and phenotype) to delimit statistically meaningful clusters. Here, I implement an unsupervised method using Self-Organizing (or "Kohonen") Maps (SOMs) for such purposes. Recent extensions called "SuperSOMs" can integrate multiple layers, each of which exerts independent influence on a two-dimensional output grid via empirically estimated weights. The grid cells are then delimited into K distinct units that can be interpreted as species or other entities. I show empirical examples in salamanders (Desmognathus) and snakes (Storeria) with layers representing alleles, space, climate, and traits. Simulations reveal that the SuperSOM approach can detect K = 1, tends not to over-split, reflects contributions from all layers, and limits large layers (e.g., genetic matrices) from overwhelming other datasets, desirable properties addressing major concerns from previous studies. Finally, I suggest that these and similar methods could integrate conservation-relevant layers such as population trends and human encroachment to delimit management units from an explicitly quantitative framework grounded in the ecology and evolution of species limits and boundaries.

Collapse

Sadria M, Layton A, Bader GD. Adversarial training improves model interpretability in single-cell RNA-seq analysis. BIOINFORMATICS ADVANCES 2023;3:vbad166. [PMID: 38099262 PMCID: PMC10719216 DOI: 10.1093/bioadv/vbad166] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 09/28/2023] [Accepted: 11/22/2023] [Indexed: 12/17/2023]

Jia X, Wang T, Zhu H. Advancing Computational Toxicology by Interpretable Machine Learning. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2023;57:17690-17706. [PMID: 37224004 PMCID: PMC10666545 DOI: 10.1021/acs.est.3c00653] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/25/2023] [Revised: 05/05/2023] [Accepted: 05/05/2023] [Indexed: 05/26/2023]

Karim MR, Islam T, Shajalal M, Beyan O, Lange C, Cochez M, Rebholz-Schuhmann D, Decker S. Explainable AI for Bioinformatics: Methods, Tools and Applications. Brief Bioinform 2023;24:bbad236. [PMID: 37478371 DOI: 10.1093/bib/bbad236] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Revised: 05/10/2023] [Accepted: 05/26/2023] [Indexed: 07/23/2023] Open

Abstract

Artificial intelligence (AI) systems utilizing deep neural networks and machine learning (ML) algorithms are widely used for solving critical problems in bioinformatics, biomedical informatics and precision medicine. However, complex ML models that are often perceived as opaque and black-box methods make it difficult to understand the reasoning behind their decisions. This lack of transparency can be a challenge for both end-users and decision-makers, as well as AI developers. In sensitive areas such as healthcare, explainability and accountability are not only desirable properties but also legally required for AI systems that can have a significant impact on human lives. Fairness is another growing concern, as algorithmic decisions should not show bias or discrimination towards certain groups or individuals based on sensitive attributes. Explainable AI (XAI) aims to overcome the opaqueness of black-box models and to provide transparency in how AI systems make decisions. Interpretable ML models can explain how they make predictions and identify factors that influence their outcomes. However, the majority of the state-of-the-art interpretable ML methods are domain-agnostic and have evolved from fields such as computer vision, automated reasoning or statistics, making direct application to bioinformatics problems challenging without customization and domain adaptation. In this paper, we discuss the importance of explainability and algorithmic transparency in the context of bioinformatics. We provide an overview of model-specific and model-agnostic interpretable ML methods and tools and outline their potential limitations. We discuss how existing interpretable ML methods can be customized and fit to bioinformatics research problems. Further, through case studies in bioimaging, cancer genomics and text mining, we demonstrate how XAI methods can improve transparency and decision fairness. Our review aims at providing valuable insights and serving as a starting point for researchers wanting to enhance explainability and decision transparency while solving bioinformatics problems. GitHub: https://github.com/rezacsedu/XAI-for-bioinformatics.

Collapse

Lee NY, Hum M, Tan GP, Seah AC, Kin PT, Tan NC, Law HY, Lee ASG. Degradation of methylation signals in cryopreserved DNA. Clin Epigenetics 2023;15:147. [PMID: 37697422 PMCID: PMC10496221 DOI: 10.1186/s13148-023-01565-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Accepted: 09/06/2023] [Indexed: 09/13/2023] Open

Abstract

BACKGROUND

Blood-based DNA methylation has shown great promise as a biomarker in a wide variety of diseases. Studies of DNA methylation in blood often utilize samples which have been cryopreserved for years or even decades. Therefore, changes in DNA methylation associated with long-term cryopreservation can introduce biases or otherwise mislead methylation analyses of cryopreserved DNA. However, previous studies have presented conflicting results with studies reporting hypomethylation, no effect, or even hypermethylation of DNA following long-term cryopreservation. These studies may have been limited by insufficient sample sizes, or by their profiling of methylation only on an aggregate global scale, or profiling of only a few CpGs.

RESULTS

We analyzed two large prospective cohorts: a discovery (n = 126) and a validation (n = 136) cohort, where DNA was cryopreserved for up to four years. In both cohorts there was no detectable change in mean global methylation across increasing storage durations as DNA. However, when analysis was performed on the level of individual CpG methylation both cohorts exhibited a greater number of hypomethylated than hypermethylated CpGs at q-value < 0.05 (4049 hypomethylated but only 50 hypermethylated CpGs in discovery, and 63 hypomethylated but only 6 hypermethylated CpGs in validation). The results were the same even after controlling for age, storage duration as buffy coat prior to DNA extraction, and estimated cell type composition. Furthermore, we find that in both cohorts, CpGs have a greater likelihood to be hypomethylated the closer they are to a CpG island; except for CpGs at the CpG islands themselves which are less likely to be hypomethylated.

CONCLUSION

Cryopreservation of DNA after a few years results in a detectable bias toward hypomethylation at the level of individual CpG methylation, though when analyzed in aggregate there is no detectable change in mean global methylation. Studies profiling methylation in cryopreserved DNA should be mindful of this hypomethylation bias, and more attention should be directed at developing more stable methods of DNA cryopreservation for biomedical research or clinical use.

Collapse

Li W, Wang T, Ng WWY. Population-Based Hyperparameter Tuning With Multitask Collaboration. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023;34:5719-5731. [PMID: 34878983 DOI: 10.1109/tnnls.2021.3130896] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]

Wang S, Ren Y, Xia B. Estimation of urban AQI based on interpretable machine learning. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2023;30:96562-96574. [PMID: 37580474 DOI: 10.1007/s11356-023-29336-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Accepted: 08/10/2023] [Indexed: 08/16/2023]

Aida H, Ying BW. Efforts to Minimise the Bacterial Genome as a Free-Living Growing System. BIOLOGY 2023;12:1170. [PMID: 37759570 PMCID: PMC10525146 DOI: 10.3390/biology12091170] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/23/2023] [Revised: 08/17/2023] [Accepted: 08/24/2023] [Indexed: 09/29/2023]

Ruperao P, Rangan P, Shah T, Thakur V, Kalia S, Mayes S, Rathore A. The Progression in Developing Genomic Resources for Crop Improvement. Life (Basel) 2023;13:1668. [PMID: 37629524 PMCID: PMC10455509 DOI: 10.3390/life13081668] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 07/21/2023] [Accepted: 07/25/2023] [Indexed: 08/27/2023] Open

Zhao A, Wu Y. Future implications of ChatGPT in pharmaceutical industry: drug discovery and development. Front Pharmacol 2023;14:1194216. [PMID: 37529703 PMCID: PMC10390092 DOI: 10.3389/fphar.2023.1194216] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2023] [Accepted: 06/08/2023] [Indexed: 08/03/2023] Open

Karlsen ST, Rau MH, Sánchez BJ, Jensen K, Zeidan AA. From genotype to phenotype: computational approaches for inferring microbial traits relevant to the food industry. FEMS Microbiol Rev 2023;47:fuad030. [PMID: 37286882 PMCID: PMC10337747 DOI: 10.1093/femsre/fuad030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 05/31/2023] [Accepted: 06/06/2023] [Indexed: 06/09/2023] Open

Jiang Z, Xie W, Zhou X, Pan W, Jiang S, Zhang X, Zhang M, Zhang Z, Lu Y, Wang D. A virtual biopsy study of microsatellite instability in gastric cancer based on deep learning radiomics. Insights Imaging 2023;14:104. [PMID: 37286810 DOI: 10.1186/s13244-023-01438-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2023] [Accepted: 04/15/2023] [Indexed: 06/09/2023] Open

Abstract

OBJECTIVES

This study aims to develop and validate a virtual biopsy model to predict microsatellite instability (MSI) status in preoperative gastric cancer (GC) patients based on clinical information and the radiomics of deep learning algorithms.

METHODS

A total of 223 GC patients with MSI status detected by postoperative immunohistochemical staining (IHC) were retrospectively recruited and randomly assigned to the training (n = 167) and testing (n = 56) sets in a 3:1 ratio. In the training set, 982 high-throughput radiomic features were extracted from preoperative abdominal dynamic contrast-enhanced CT (CECT) and screened. According to the deep learning multilayer perceptron (MLP), 15 optimal features were optimized to establish the radiomic feature score (Rad-score), and LASSO regression was used to screen out clinically independent predictors. Based on logistic regression, the Rad-score and clinically independent predictors were integrated to build the clinical radiomics model and visualized as a nomogram and independently verified in the testing set. The performance and clinical applicability of hybrid model in identifying MSI status were evaluated by the area under the receiver operating characteristic (AUC) curve, calibration curve, and decision curve (DCA).

RESULTS

The AUCs of the clinical image model in training set and testing set were 0.883 [95% CI: 0.822-0.945] and 0.802 [95% CI: 0.666-0.937], respectively. This hybrid model showed good consistency in the calibration curve and clinical applicability in the DCA curve, respectively.

CONCLUSIONS

Using preoperative imaging and clinical information, we developed a deep-learning-based radiomics model for the non-invasive evaluation of MSI in GC patients. This model maybe can potentially support clinical treatment decision making for GC patients.

Collapse

Aarons MF, Young CM, Bruce L, Dwyer DB. Real time prediction of match outcomes in Australian football. J Sports Sci 2023;41:1115-1125. [PMID: 37733399 DOI: 10.1080/02640414.2023.2259266] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Accepted: 08/09/2023] [Indexed: 09/22/2023]

Ahlquist KD, Sugden LA, Ramachandran S. Enabling interpretable machine learning for biological data with reliability scores. PLoS Comput Biol 2023;19:e1011175. [PMID: 37235578 PMCID: PMC10249903 DOI: 10.1371/journal.pcbi.1011175] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Revised: 06/08/2023] [Accepted: 05/10/2023] [Indexed: 05/28/2023] Open

Abstract

Machine learning tools have proven useful across biological disciplines, allowing researchers to draw conclusions from large datasets, and opening up new opportunities for interpreting complex and heterogeneous biological data. Alongside the rapid growth of machine learning, there have also been growing pains: some models that appear to perform well have later been revealed to rely on features of the data that are artifactual or biased; this feeds into the general criticism that machine learning models are designed to optimize model performance over the creation of new biological insights. A natural question arises: how do we develop machine learning models that are inherently interpretable or explainable? In this manuscript, we describe the SWIF(r) reliability score (SRS), a method building on the SWIF(r) generative framework that reflects the trustworthiness of the classification of a specific instance. The concept of the reliability score has the potential to generalize to other machine learning methods. We demonstrate the utility of the SRS when faced with common challenges in machine learning including: 1) an unknown class present in testing data that was not present in training data, 2) systemic mismatch between training and testing data, and 3) instances of testing data that have missing values for some attributes. We explore these applications of the SRS using a range of biological datasets, from agricultural data on seed morphology, to 22 quantitative traits in the UK Biobank, and population genetic simulations and 1000 Genomes Project data. With each of these examples, we demonstrate how the SRS can allow researchers to interrogate their data and training approach thoroughly, and to pair their domain-specific knowledge with powerful machine-learning frameworks. We also compare the SRS to related tools for outlier and novelty detection, and find that it has comparable performance, with the advantage of being able to operate when some data are missing. The SRS, and the broader discussion of interpretable scientific machine learning, will aid researchers in the biological machine learning space as they seek to harness the power of machine learning without sacrificing rigor and biological insight.

Collapse

Wang S, Ren Y, Xia B, Liu K, Li H. Prediction of atmospheric pollutants in urban environment based on coupled deep learning model and sensitivity analysis. CHEMOSPHERE 2023;331:138830. [PMID: 37137395 DOI: 10.1016/j.chemosphere.2023.138830] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/01/2022] [Revised: 04/11/2023] [Accepted: 04/30/2023] [Indexed: 05/05/2023]

Wang Q, Xu T, Xu K, Lu Z, Ying J. Prediction of transport proteins from sequence information with the deep learning approach. Comput Biol Med 2023;160:106974. [PMID: 37167658 DOI: 10.1016/j.compbiomed.2023.106974] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Revised: 04/17/2023] [Accepted: 04/22/2023] [Indexed: 05/13/2023]

Patterson A, Elbasir A, Tian B, Auslander N. Computational Methods Summarizing Mutational Patterns in Cancer: Promise and Limitations for Clinical Applications. Cancers (Basel) 2023;15:cancers15071958. [PMID: 37046619 PMCID: PMC10093138 DOI: 10.3390/cancers15071958] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Revised: 02/24/2023] [Accepted: 03/09/2023] [Indexed: 03/29/2023] Open

Artificial Intelligence for Antimicrobial Resistance Prediction: Challenges and Opportunities towards Practical Implementation. Antibiotics (Basel) 2023;12:antibiotics12030523. [PMID: 36978390 PMCID: PMC10044311 DOI: 10.3390/antibiotics12030523] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Revised: 03/01/2023] [Accepted: 03/03/2023] [Indexed: 03/08/2023] Open

Macedo Mota LF, Bisutti V, Vanzin A, Pegolo S, Toscano A, Schiavon S, Tagliapietra F, Gallo L, Ajmone Marsan P, Cecchinato A. Predicting milk protein fractions using infrared spectroscopy and a gradient boosting machine for breeding purposes in Holstein cattle. J Dairy Sci 2023;106:1853-1873. [PMID: 36710177 DOI: 10.3168/jds.2022-22119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2022] [Accepted: 10/10/2022] [Indexed: 01/29/2023]

Abstract

In recent years, increasing attention has been focused on the genetic evaluation of protein fractions in cow milk with the aim of improving milk quality and technological characteristics. In this context, advances in high-throughput phenotyping by Fourier transform infrared (FTIR) spectroscopy offer the opportunity for large-scale, efficient measurement of novel traits that can be exploited in breeding programs as indicator traits. We took milk samples from 2,558 Holstein cows belonging to 38 herds in northern Italy, operating under different production systems. Fourier transform infrared spectra were collected on the same day as milk sampling and stored for subsequent analysis. Two sets of data (i.e., phenotypes and FTIR spectra) collected in 2 different years (2013 and 2019-2020) were compiled. The following traits were assessed using HPLC: true protein, major casein fractions [α_S1-casein (CN), α_S2-CN, β-CN, κ-CN, and glycosylated-κ-CN], and major whey proteins (β-lactoglobulin and α-lactalbumin), all of which were measured both in grams per liter (g/L) and proportion of total nitrogen (% N). The FTIR predictions were calculated using the gradient boosting machine technique and tested by 3 different cross-validation (CRV) methods. We used the following CRV scenarios: (1) random 10-fold, which randomly split the whole into 10-folds of equal size (9-folds for training and 1-fold for validation); (2) herd/date-out CRV, which assigned 80% of herd/date as the training set with independence of 20% of herd/date assigned as the validation set; (3) forward/backward CRV, which split the data set in training and validation set according with the year of milk sampling (FTIR and gold standard data assessed in 2013 or 2019-2020) using the "old" and "new" databases for training and validation, and vice-versa with independence among them; (4) the CRV for genetic parameters (CRV-gen), where animals without pedigree as assigned as a fixed training population and animals with pedigree information was split in 5-folds, in which 1-fold was assigned to the fixed training population, and 4-folds were assigned to the validation set (independent from the training set). The results (i.e., measures and predictions) of CRV-gen were used to infer the genetic parameters for gold standard laboratory measurements (i.e., proteins assessed with HPLC) and FTIR-based predictions considering the CRV-gen scenario from a bi-trait animal model using single-step genomic BLUP. We found that the prediction accuracies of the gradient boosting machine equations differed according to the way in which the proteins were expressed, achieving higher accuracy when expressed in g/L than when expressed as % N in all CRV scenarios. Concerning the reproducibility of the equations over the different years, the results showed no relevant differences in predictive ability between using "old" data as the training set and "new" data as the validation set and vice-versa. Comparing the additive genetic variance estimates for milk protein fractions between the FTIR predicted and HPLC measures, we found reductions of -19.7% for milk protein fractions expressed in g/L, and -21.19% expressed as % N. Although we found reductions in the heritability estimates, they were small, with values ranging from -1.9 to -7.25% for g/L, and -1.6 to -7.9% for % N. The posterior distributions of the additive genetic correlations (r_a) between the FTIR predictions and the laboratory measurements were generally high (>0.8), even when the milk protein fractions were expressed as % N. Our results show the potential of using FTIR predictions in breeding programs as indicator traits for the selection of animals to enhance milk protein fraction contents. We expect acceptable responses to selection due to the high genetic correlations between HPLC measurements and FTIR predictions.

Collapse

Zhao X, Yoshida N, Ueda T, Sugano H, Tanaka T. Epileptic seizure detection by using interpretable machine learning models. J Neural Eng 2023;20. [PMID: 36603215 DOI: 10.1088/1741-2552/acb089] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2022] [Accepted: 01/05/2023] [Indexed: 01/06/2023]

Abstract

Objective.Accurate detection of epileptic seizures using electroencephalogram (EEG) data is essential for epilepsy diagnosis, but the visual diagnostic process for clinical experts is a time-consuming task. To improve efficiency, some seizure detection methods have been proposed. Regardless of traditional or machine learning methods, the results identify only seizures and non-seizures. Our goal is not only to detect seizures but also to explain the basis for detection and provide reference information to clinical experts.Approach.In this study, we follow the visual diagnosis mechanism used by clinical experts that directly processes plotted EEG image data and apply some commonly used models of LeNet, VGG, deep residual network (ResNet), and vision transformer (ViT) to the EEG image classification task. Before using these models, we propose a data augmentation method using random channel ordering (RCO), which adjusts the channel order to generate new images. The Gradient-weighted class activation mapping (Grad-CAM) and attention layer methods are used to interpret the models.Main results.The RCO method can balance the dataset in seizure and non-seizure classes. The models achieved good performance in the seizure detection task. Moreover, the Grad-CAM and attention layer methods explained the detection basis of the model very well and calculate a value that measures the seizure degree.Significance.Processing EEG data in the form of images can flexibility to use a variety of machine learning models. The imbalance problem that exists widely in clinical practice is well solved by the RCO method. Since the method follows the visual diagnosis mechanism of clinical experts, the model interpretation results can be presented to clinical experts intuitively, and the quantitative information provided by the model is also a good diagnostic reference.

Collapse

Lee YY, Endale M, Wu G, Ruben MD, Francey LJ, Morris AR, Choo NY, Anafi RC, Smith DF, Liu AC, Hogenesch JB. Integration of genome-scale data identifies candidate sleep regulators. Sleep 2023;46:zsac279. [PMID: 36462188 PMCID: PMC9905783 DOI: 10.1093/sleep/zsac279] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2022] [Revised: 09/02/2022] [Indexed: 12/05/2022] Open

Affiliation(s)

Yin Yeng Lee Divisions of Human Genetics and Immunobiology, Department of Pediatrics, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, 45229, USA Department of Pharmacology and Systems Physiology, University of Cincinnati College of Medicine, Cincinnati, OH 45229, USA
Mehari Endale Department of Physiology and Aging, University of Florida College of Medicine, Gainesville, FL 32610, USA
Gang Wu Divisions of Human Genetics and Immunobiology, Department of Pediatrics, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, 45229, USA
Marc D Ruben Divisions of Human Genetics and Immunobiology, Department of Pediatrics, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, 45229, USA
Lauren J Francey Divisions of Human Genetics and Immunobiology, Department of Pediatrics, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, 45229, USA
Andrew R Morris Department of Physiology and Aging, University of Florida College of Medicine, Gainesville, FL 32610, USA
Natalie Y Choo Division of Pediatric Otolaryngology-Head and Neck Surgery, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH 45229, USA
Ron C Anafi Department of Medicine, Chronobiology and Sleep Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
David F Smith Division of Pediatric Otolaryngology-Head and Neck Surgery, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH 45229, USA Division of Pulmonary Medicine and the Sleep Center, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH 45229, USA Center for Circadian Medicine, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH 45229, USA Department of Otolaryngology - Head and Neck Surgery, University of Cincinnati College of Medicine, Cincinnati, OH 45229, USA
Andrew C Liu Department of Physiology and Aging, University of Florida College of Medicine, Gainesville, FL 32610, USA
John B Hogenesch Divisions of Human Genetics and Immunobiology, Department of Pediatrics, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, 45229, USA Center for Circadian Medicine, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH 45229, USA

Collapse

Ren S, Wu S, Weng Q. Physics-informed machine learning methods for biomass gasification modeling by considering monotonic relationships. BIORESOURCE TECHNOLOGY 2023;369:128472. [PMID: 36509306 DOI: 10.1016/j.biortech.2022.128472] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/13/2022] [Revised: 12/05/2022] [Accepted: 12/06/2022] [Indexed: 06/17/2023]

Blankestijn JM, Lopez-Rincon A, Neerincx AH, Vijverberg SJH, Hashimoto S, Gorenjak M, Sardón Prado O, Corcuera-Elosegui P, Korta-Murua J, Pino-Yanes M, Potočnik U, Bang C, Franke A, Wolff C, Brandstetter S, Toncheva AA, Kheiroddin P, Harner S, Kabesch M, Kraneveld AD, Abdel-Aziz MI, Maitland-van der Zee AH. Classifying asthma control using salivary and fecal bacterial microbiome in children with moderate-to-severe asthma. Pediatr Allergy Immunol 2023;34:e13919. [PMID: 36825736 DOI: 10.1111/pai.13919] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Revised: 01/23/2023] [Accepted: 01/24/2023] [Indexed: 02/24/2023]

Affiliation(s)

Jelle M Blankestijn Department of Pulmonary Medicine, Amsterdam UMC, University of Amsterdam, Amsterdam, The Netherlands Amsterdam Institute for Infection and Immunity, Amsterdam, The Netherlands Amsterdam Public Health, Amsterdam, The Netherlands
Alejandro Lopez-Rincon Division of Pharmacology, Utrecht Institute for Pharmaceutical Sciences, Faculty of Science, Utrecht University, Utrecht, The Netherlands Department of Data Science, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht, The Netherlands
Anne H Neerincx Department of Pulmonary Medicine, Amsterdam UMC, University of Amsterdam, Amsterdam, The Netherlands
Susanne J H Vijverberg Department of Pulmonary Medicine, Amsterdam UMC, University of Amsterdam, Amsterdam, The Netherlands
Simone Hashimoto Department of Pulmonary Medicine, Amsterdam UMC, University of Amsterdam, Amsterdam, The Netherlands Department of Pediatric Respiratory Medicine, Emma Children's Hospital, Amsterdam UMC, Amsterdam, The Netherlands
Mario Gorenjak Center for Human Molecular Genetics and Pharmacogenomics, Faculty of Medicine, University of Maribor, Maribor, Slovenia
Olaia Sardón Prado Division of Pediatric Respiratory Medicine, Hospital Universitario Donostia, San Sebastián, Spain Department of Pediatrics, University of the Basque Country (UPV/EHU), San Sebastián, Spain
Paula Corcuera-Elosegui Division of Pediatric Respiratory Medicine, Hospital Universitario Donostia, San Sebastián, Spain
Javier Korta-Murua Division of Pediatric Respiratory Medicine, Hospital Universitario Donostia, San Sebastián, Spain
Maria Pino-Yanes Genomics and Health Group, Department of Biochemistry, Microbiology, Cell Biology and Genetics, Universidad de La Laguna (ULL), San Cristóbal de La Laguna, Spain CIBER de Enfermedades Respiratorias, Instituto de Salud Carlos III, Madrid, Spain Instituto de Tecnologías Biomédicas (ITB), Universidad de La Laguna, La Laguna, Spain
Uroš Potočnik Center for Human Molecular Genetics and Pharmacogenomics, Faculty of Medicine, University of Maribor, Maribor, Slovenia Laboratory for Biochemistry, Molecular Biology and Genomics, Faculty of Chemistry and Chemical Engineering, University of Maribor, Maribor, Slovenia
Corinna Bang Institute of Clinical Molecular Biology, Christian-Albrechts-University of Kiel, Kiel, Germany
Andre Franke Institute of Clinical Molecular Biology, Christian-Albrechts-University of Kiel, Kiel, Germany
Christine Wolff Science and Development Campus Regensburg (WECARE), University Children's Hospital Regensburg (KUNO) at the Hospital St. Hedwig of the Order of St. John, University of Regensburg, Regensburg, Germany
Susanne Brandstetter Science and Development Campus Regensburg (WECARE), University Children's Hospital Regensburg (KUNO) at the Hospital St. Hedwig of the Order of St. John, University of Regensburg, Regensburg, Germany
Antoaneta A Toncheva Science and Development Campus Regensburg (WECARE), University Children's Hospital Regensburg (KUNO) at the Hospital St. Hedwig of the Order of St. John, University of Regensburg, Regensburg, Germany
Parastoo Kheiroddin Science and Development Campus Regensburg (WECARE), University Children's Hospital Regensburg (KUNO) at the Hospital St. Hedwig of the Order of St. John, University of Regensburg, Regensburg, Germany
Susanne Harner Department of Pediatric Pneumology and Allergy, University Children's Hospital Regensburg (KUNO) at the Hospital St. Hedwig of the Order of St. John, University of Regensburg, Regensburg, Germany
Michael Kabesch Science and Development Campus Regensburg (WECARE), University Children's Hospital Regensburg (KUNO) at the Hospital St. Hedwig of the Order of St. John, University of Regensburg, Regensburg, Germany Department of Pediatric Pneumology and Allergy, University Children's Hospital Regensburg (KUNO) at the Hospital St. Hedwig of the Order of St. John, University of Regensburg, Regensburg, Germany
Aletta D Kraneveld Division of Pharmacology, Utrecht Institute for Pharmaceutical Sciences, Faculty of Science, Utrecht University, Utrecht, The Netherlands
Mahmoud I Abdel-Aziz Department of Pulmonary Medicine, Amsterdam UMC, University of Amsterdam, Amsterdam, The Netherlands Amsterdam Institute for Infection and Immunity, Amsterdam, The Netherlands Amsterdam Public Health, Amsterdam, The Netherlands Department of Clinical Pharmacy, Faculty of Pharmacy, Assiut University, Assiut, Egypt
Anke H Maitland-van der Zee Department of Pulmonary Medicine, Amsterdam UMC, University of Amsterdam, Amsterdam, The Netherlands Amsterdam Institute for Infection and Immunity, Amsterdam, The Netherlands Amsterdam Public Health, Amsterdam, The Netherlands Department of Pediatric Respiratory Medicine, Emma Children's Hospital, Amsterdam UMC, Amsterdam, The Netherlands

Collapse

Fritzsche MC, Akyüz K, Cano Abadía M, McLennan S, Marttinen P, Mayrhofer MT, Buyx AM. Ethical layering in AI-driven polygenic risk scores-New complexities, new challenges. Front Genet 2023;14:1098439. [PMID: 36816027 PMCID: PMC9933509 DOI: 10.3389/fgene.2023.1098439] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Accepted: 01/04/2023] [Indexed: 01/27/2023] Open

Abstract

Researchers aim to develop polygenic risk scores as a tool to prevent and more effectively treat serious diseases, disorders and conditions such as breast cancer, type 2 diabetes mellitus and coronary heart disease. Recently, machine learning techniques, in particular deep neural networks, have been increasingly developed to create polygenic risk scores using electronic health records as well as genomic and other health data. While the use of artificial intelligence for polygenic risk scores may enable greater accuracy, performance and prediction, it also presents a range of increasingly complex ethical challenges. The ethical and social issues of many polygenic risk score applications in medicine have been widely discussed. However, in the literature and in practice, the ethical implications of their confluence with the use of artificial intelligence have not yet been sufficiently considered. Based on a comprehensive review of the existing literature, we argue that this stands in need of urgent consideration for research and subsequent translation into the clinical setting. Considering the many ethical layers involved, we will first give a brief overview of the development of artificial intelligence-driven polygenic risk scores, associated ethical and social implications, challenges in artificial intelligence ethics, and finally, explore potential complexities of polygenic risk scores driven by artificial intelligence. We point out emerging complexity regarding fairness, challenges in building trust, explaining and understanding artificial intelligence and polygenic risk scores as well as regulatory uncertainties and further challenges. We strongly advocate taking a proactive approach to embedding ethics in research and implementation processes for polygenic risk scores driven by artificial intelligence.

Collapse

Li MP, Liu WC, Sun BL, Zhong NS, Liu ZL, Huang SH, Zhang ZH, Liu JM. Prediction of bone metastasis in non-small cell lung cancer based on machine learning. Front Oncol 2023;12:1054300. [PMID: 36698411 PMCID: PMC9869148 DOI: 10.3389/fonc.2022.1054300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Accepted: 11/21/2022] [Indexed: 01/12/2023] Open

Abstract

Objective

The purpose of this paper was to develop a machine learning algorithm with good performance in predicting bone metastasis (BM) in non-small cell lung cancer (NSCLC) and establish a simple web predictor based on the algorithm.

Methods

Patients who diagnosed with NSCLC between 2010 and 2018 in the Surveillance, Epidemiology and End Results (SEER) database were involved. To increase the extensibility of the research, data of patients who first diagnosed with NSCLC at the First Affiliated Hospital of Nanchang University between January 2007 and December 2016 were also included in this study. Independent risk factors for BM in NSCLC were screened by univariate and multivariate logistic regression. At this basis, we chose six commonly machine learning algorithms to build predictive models, including Logistic Regression (LR), Decision tree (DT), Random Forest (RF), Gradient Boosting Machine (GBM), Naive Bayes classifiers (NBC) and eXtreme gradient boosting (XGB). Then, the best model was identified to build the web-predictor for predicting BM of NSCLC patients. Finally, area under receiver operating characteristic curve (AUC), accuracy, sensitivity and specificity were used to evaluate the performance of these models.

Results

A total of 50581 NSCLC patients were included in this study, and 5087(10.06%) of them developed BM. The sex, grade, laterality, histology, T stage, N stage, and chemotherapy were independent risk factors for NSCLC. Of these six models, the machine learning model built by the XGB algorithm performed best in both internal and external data setting validation, with AUC scores of 0.808 and 0.841, respectively. Then, the XGB algorithm was used to build a web predictor of BM from NSCLC.

Conclusion

This study developed a web predictor based XGB algorithm for predicting the risk of BM in NSCLC patients, which may assist doctors for clinical decision making.

Collapse

Mahmood U, Li X, Fan Y, Chang W, Niu Y, Li J, Qu C, Lu K. Multi-omics revolution to promote plant breeding efficiency. FRONTIERS IN PLANT SCIENCE 2022;13:1062952. [PMID: 36570904 PMCID: PMC9773847 DOI: 10.3389/fpls.2022.1062952] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/06/2022] [Accepted: 11/24/2022] [Indexed: 06/17/2023]

Affiliation(s)

Umer Mahmood Integrative Science Center of Germplasm Creation in Western China (Chongqing) Science City and Southwest University, College of Agronomy and Biotechnology, Southwest University, Chongqing, China
Xiaodong Li Integrative Science Center of Germplasm Creation in Western China (Chongqing) Science City and Southwest University, College of Agronomy and Biotechnology, Southwest University, Chongqing, China
Yonghai Fan Integrative Science Center of Germplasm Creation in Western China (Chongqing) Science City and Southwest University, College of Agronomy and Biotechnology, Southwest University, Chongqing, China
Wei Chang Integrative Science Center of Germplasm Creation in Western China (Chongqing) Science City and Southwest University, College of Agronomy and Biotechnology, Southwest University, Chongqing, China
Yue Niu Integrative Science Center of Germplasm Creation in Western China (Chongqing) Science City and Southwest University, College of Agronomy and Biotechnology, Southwest University, Chongqing, China
Jiana Li Integrative Science Center of Germplasm Creation in Western China (Chongqing) Science City and Southwest University, College of Agronomy and Biotechnology, Southwest University, Chongqing, China Academy of Agricultural Sciences, Southwest University, Chongqing, China Engineering Research Center of South Upland Agriculture, Ministry of Education, Chongqing, China
Cunmin Qu Integrative Science Center of Germplasm Creation in Western China (Chongqing) Science City and Southwest University, College of Agronomy and Biotechnology, Southwest University, Chongqing, China Academy of Agricultural Sciences, Southwest University, Chongqing, China Engineering Research Center of South Upland Agriculture, Ministry of Education, Chongqing, China
Kun Lu Integrative Science Center of Germplasm Creation in Western China (Chongqing) Science City and Southwest University, College of Agronomy and Biotechnology, Southwest University, Chongqing, China Academy of Agricultural Sciences, Southwest University, Chongqing, China Engineering Research Center of South Upland Agriculture, Ministry of Education, Chongqing, China

Collapse

Non-Invasive Biomarkers for Early Lung Cancer Detection. Cancers (Basel) 2022;14:cancers14235782. [PMID: 36497263 PMCID: PMC9739091 DOI: 10.3390/cancers14235782] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2022] [Revised: 07/18/2022] [Accepted: 07/20/2022] [Indexed: 11/27/2022] Open

Discovery and classification of complex multimorbidity patterns: unravelling chronicity networks and their social profiles. Sci Rep 2022;12:20004. [PMID: 36411299 PMCID: PMC9678882 DOI: 10.1038/s41598-022-23617-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2022] [Accepted: 11/02/2022] [Indexed: 11/23/2022] Open

Samal BR, Loers JU, Vermeirssen V, De Preter K. Opportunities and challenges in interpretable deep learning for drug sensitivity prediction of cancer cells. FRONTIERS IN BIOINFORMATICS 2022;2:1036963. [PMID: 36466148 PMCID: PMC9714662 DOI: 10.3389/fbinf.2022.1036963] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Accepted: 11/03/2022] [Indexed: 01/02/2024] Open

Xu Y, Zhang X, Li H, Zheng H, Zhang J, Olsen MS, Varshney RK, Prasanna BM, Qian Q. Smart breeding driven by big data, artificial intelligence, and integrated genomic-enviromic prediction. MOLECULAR PLANT 2022;15:1664-1695. [PMID: 36081348 DOI: 10.1016/j.molp.2022.09.001] [Citation(s) in RCA: 43] [Impact Index Per Article: 21.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Revised: 08/20/2022] [Accepted: 09/02/2022] [Indexed: 05/12/2023]

Abstract

The first paradigm of plant breeding involves direct selection-based phenotypic observation, followed by predictive breeding using statistical models for quantitative traits constructed based on genetic experimental design and, more recently, by incorporation of molecular marker genotypes. However, plant performance or phenotype (P) is determined by the combined effects of genotype (G), envirotype (E), and genotype by environment interaction (GEI). Phenotypes can be predicted more precisely by training a model using data collected from multiple sources, including spatiotemporal omics (genomics, phenomics, and enviromics across time and space). Integration of 3D information profiles (G-P-E), each with multidimensionality, provides predictive breeding with both tremendous opportunities and great challenges. Here, we first review innovative technologies for predictive breeding. We then evaluate multidimensional information profiles that can be integrated with a predictive breeding strategy, particularly envirotypic data, which have largely been neglected in data collection and are nearly untouched in model construction. We propose a smart breeding scheme, integrated genomic-enviromic prediction (iGEP), as an extension of genomic prediction, using integrated multiomics information, big data technology, and artificial intelligence (mainly focused on machine and deep learning). We discuss how to implement iGEP, including spatiotemporal models, environmental indices, factorial and spatiotemporal structure of plant breeding data, and cross-species prediction. A strategy is then proposed for prediction-based crop redesign at both the macro (individual, population, and species) and micro (gene, metabolism, and network) scales. Finally, we provide perspectives on translating smart breeding into genetic gain through integrative breeding platforms and open-source breeding initiatives. We call for coordinated efforts in smart breeding through iGEP, institutional partnerships, and innovative technological support.

Collapse

Wang F, Wang ZR, Ding XS, Yang H, Guo Y, Su H, Wan XR, Wang LJ, Jiang XY, Xu YH, Chen F, Cui W, Feng FZ. Combining serum peptide signatures with International Federation of Gynecology and Obstetrics (FIGO) risk score to predict the outcomes of patients with gestational trophoblastic neoplasia (GTN) after first-line chemotherapy. Front Oncol 2022;12:982806. [PMID: 36338720 PMCID: PMC9634134 DOI: 10.3389/fonc.2022.982806] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Accepted: 10/06/2022] [Indexed: 11/21/2022] Open

Abstract

Background

Gestational trophoblastic neoplasia (GTN) is a group of clinically rare tumors that develop in the uterus from placental tissue. Currently, its satisfactory curability derives from the timely and accurately classification and refined management for patients. This study aimed to discover biomarkers that could predict the outcomes of GTN patients after first-line chemotherapy.

Methods

A total of 65 GTN patients were included in the study. Patients were divided into the good or poor outcome group and the clinical characteristics of the patients in the two groups were compared. Furthermore, the serum peptide profiles of all patients were uncovered by using weak cation exchange magnetic beads and matrix-assisted laser desorption/ionization time-of-flight mass spectrometry. Feature peaks were identified by three machine learning algorithms and then models were constructed and compared using five machine learning methods. Additionally, liquid chromatography mass spectrometry was used to identify the feature peptides.

Results

Multivariate logistic regression analysis showed that the International Federation of Gynecology and Obstetrics (FIGO) risk score was associated with poor outcomes. Eight feature peaks (m/z =1287, 2042, 2862, 2932, 2950, 3240, 3277 and 6626) were selected for model construction and validation by the three algorithms. Based on the panel combining FIGO risk score and peptide serum signatures, the neural network (nnet) model showed promising performance in both the training (AUC=0.9635) and validation (AUC=0.8788) cohorts. Peaks at m/z 2042, 2862, 2932, 3240 were identified as the partial sequences of transthyretin, fibrinogen alpha chain (FGA), beta-globin and FGA, respectively.

Conclusion

We combined FIGO risk score and serum peptide signatures using the nnet method to construct the model which can accurately predict outcome of GTN patients after first-line chemotherapy. With this model, patients can be further classified and managed, and those with poor predicted outcomes can be given more attention for developing treatment failure.

Collapse

Affiliation(s)

Fei Wang Department of Laboratory Medicine, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
Zi-ran Wang Department of Laboratory Medicine, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
Xue-song Ding Department of Obstetrics and Gynecology, National Clinical Research Center for Obstetric & Gynecologic Diseases, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
Hua Yang Department of Obstetrics and Gynecology, National Clinical Research Center for Obstetric & Gynecologic Diseases, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
Ye Guo Department of Laboratory Medicine, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
Hao Su Department of Obstetrics and Gynecology, National Clinical Research Center for Obstetric & Gynecologic Diseases, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
Xi-run Wan Department of Obstetrics and Gynecology, National Clinical Research Center for Obstetric & Gynecologic Diseases, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
Li-juan Wang Department of Gynecological Oncology, Sun Yat-Sen Memorial Hospital of Sun Yat-Sen University, Guangzhou, China
Xiang-yang Jiang Department of Obstetrics and Gynecology, Shanxi Provincial People’s Hospital, Xian, China
Yan-hua Xu Department of Obstetrics and Gynecology, Jinan Maternity and Child Health Care Hospital, Jinan, China
Feng Chen Department of Clinical Laboratory, State Key Laboratory of Molecular Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
Wei Cui Department of Clinical Laboratory, State Key Laboratory of Molecular Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China *Correspondence: Wei Cui, ; Feng-zhi Feng,
Feng-zhi Feng Department of Obstetrics and Gynecology, National Clinical Research Center for Obstetric & Gynecologic Diseases, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China *Correspondence: Wei Cui, ; Feng-zhi Feng,

Collapse

Towards a better understanding of TF-DNA binding prediction from genomic features. Comput Biol Med 2022;149:105993. [DOI: 10.1016/j.compbiomed.2022.105993] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Revised: 07/12/2022] [Accepted: 08/14/2022] [Indexed: 11/17/2022]