1
|
Bi X, Wang J, Xue B, He C, Liu F, Chen H, Lin LL, Dong B, Li B, Jin C, Pan J, Xue W, Ye J. SERSomes for metabolic phenotyping and prostate cancer diagnosis. Cell Rep Med 2024; 5:101579. [PMID: 38776910 PMCID: PMC11228451 DOI: 10.1016/j.xcrm.2024.101579] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Revised: 03/08/2024] [Accepted: 04/29/2024] [Indexed: 05/25/2024]
Abstract
Molecular phenotypic variations in metabolites offer the promise of rapid profiling of physiological and pathological states for diagnosis, monitoring, and prognosis. Since present methods are expensive, time-consuming, and still not sensitive enough, there is an urgent need for approaches that can interrogate complex biological fluids at a system-wide level. Here, we introduce hyperspectral surface-enhanced Raman spectroscopy (SERS) to profile microliters of biofluidic metabolite extraction in 15 min with a spectral set, SERSome, that can be used to describe the structures and functions of various molecules produced in the biofluid at a specific time via SERS characteristics. The metabolite differences of various biofluids, including cell culture medium and human serum, are successfully profiled, showing a diagnosis accuracy of 80.8% on the internal test set and 73% on the external validation set for prostate cancer, discovering potential biomarkers, and predicting the tissue-level pathological aggressiveness. SERSomes offer a promising methodology for metabolic phenotyping.
Collapse
Affiliation(s)
- Xinyuan Bi
- State Key Laboratory of Systems Medicine for Cancer, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, P.R. China
| | - Jiayi Wang
- Department of Urology, Ren Ji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, P.R. China
| | - Bingsen Xue
- State Key Laboratory of Systems Medicine for Cancer, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, P.R. China; Shanghai Artificial Intelligence Laboratory, Shanghai, China
| | - Chang He
- State Key Laboratory of Systems Medicine for Cancer, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, P.R. China
| | - Fugang Liu
- State Key Laboratory of Systems Medicine for Cancer, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, P.R. China
| | - Haoran Chen
- State Key Laboratory of Systems Medicine for Cancer, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, P.R. China
| | - Linley Li Lin
- State Key Laboratory of Systems Medicine for Cancer, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, P.R. China
| | - Baijun Dong
- Department of Urology, Jiading District Central Hospital Affiliated Shanghai University of Medicine & Health Science, Shanghai, P.R. China
| | - Butang Li
- Department of Urology, Ningbo Hangzhou Bay Hospital, Ningbo, Zhejiang, P.R. China
| | - Cheng Jin
- State Key Laboratory of Systems Medicine for Cancer, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, P.R. China; Shanghai Artificial Intelligence Laboratory, Shanghai, China; Institute of Medical Robotics, Shanghai Jiao Tong University, Shanghai, P.R. China.
| | - Jiahua Pan
- Department of Urology, Ren Ji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, P.R. China.
| | - Wei Xue
- Department of Urology, Ren Ji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, P.R. China.
| | - Jian Ye
- State Key Laboratory of Systems Medicine for Cancer, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, P.R. China; Institute of Medical Robotics, Shanghai Jiao Tong University, Shanghai, P.R. China; Shanghai Key Laboratory of Gynecologic Oncology, Ren Ji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, P.R. China.
| |
Collapse
|
2
|
Fronk AD, Manzanares MA, Zheng P, Geier A, Anderson K, Stanton S, Zumrut H, Gera S, Munch R, Frederick V, Dhingra P, Arun G, Akerman M. Development and validation of AI/ML derived splice-switching oligonucleotides. Mol Syst Biol 2024; 20:676-701. [PMID: 38664594 PMCID: PMC11148135 DOI: 10.1038/s44320-024-00034-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 04/03/2024] [Accepted: 04/09/2024] [Indexed: 06/05/2024] Open
Abstract
Splice-switching oligonucleotides (SSOs) are antisense compounds that act directly on pre-mRNA to modulate alternative splicing (AS). This study demonstrates the value that artificial intelligence/machine learning (AI/ML) provides for the identification of functional, verifiable, and therapeutic SSOs. We trained XGboost tree models using splicing factor (SF) pre-mRNA binding profiles and spliceosome assembly information to identify modulatory SSO binding sites on pre-mRNA. Using Shapley and out-of-bag analyses we also predicted the identity of specific SFs whose binding to pre-mRNA is blocked by SSOs. This step adds considerable transparency to AI/ML-driven drug discovery and informs biological insights useful in further validation steps. We applied this approach to previously established functional SSOs to retrospectively identify the SFs likely to regulate those events. We then took a prospective validation approach using a novel target in triple negative breast cancer (TNBC), NEDD4L exon 13 (NEDD4Le13). Targeting NEDD4Le13 with an AI/ML-designed SSO decreased the proliferative and migratory behavior of TNBC cells via downregulation of the TGFβ pathway. Overall, this study illustrates the ability of AI/ML to extract actionable insights from RNA-seq data.
Collapse
Affiliation(s)
| | | | - Paulina Zheng
- Envisagenics, Inc., Long Island City, NY, 11101, USA
| | - Adam Geier
- Envisagenics, Inc., Long Island City, NY, 11101, USA
| | | | | | - Hasan Zumrut
- Envisagenics, Inc., Long Island City, NY, 11101, USA
| | - Sakshi Gera
- Envisagenics, Inc., Long Island City, NY, 11101, USA
| | - Robin Munch
- Envisagenics, Inc., Long Island City, NY, 11101, USA
| | | | | | - Gayatri Arun
- Envisagenics, Inc., Long Island City, NY, 11101, USA
| | | |
Collapse
|
3
|
Li X, Wang P, Zhu Y, Zhao W, Pan H, Wang D. Interpretable machine learning model for predicting acute kidney injury in critically ill patients. BMC Med Inform Decis Mak 2024; 24:148. [PMID: 38822285 PMCID: PMC11140965 DOI: 10.1186/s12911-024-02537-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Accepted: 05/17/2024] [Indexed: 06/02/2024] Open
Abstract
BACKGROUND This study aimed to create a method for promptly predicting acute kidney injury (AKI) in intensive care patients by applying interpretable, explainable artificial intelligence techniques. METHODS Population data regarding intensive care patients were derived from the Medical Information Mart for Intensive Care IV database from 2008 to 2019. Machine learning (ML) techniques with six methods were created to construct the predicted models for AKI. The performance of each ML model was evaluated by comparing the areas under the curve (AUC). Local Interpretable Model-Agnostic Explanations (LIME) method and Shapley Additive exPlanation values were used to decipher the best model. RESULTS According to inclusion and exclusion criteria, 53,150 severely sick individuals were included in the present study, of which 42,520 (80%) were assigned to the training group, and 10,630 (20%) were allocated to the validation group. Compared to the other five ML models, the eXtreme Gradient Boosting (XGBoost) model greatly predicted AKI following ICU admission, with an AUC of 0.816. The top four contributing variables of the XGBoost model were SOFA score, weight, mechanical ventilation, and the Simplified Acute Physiology Score II. An AKI and Non-AKI cases were predicted separately using the LIME algorithm. CONCLUSION Overall, the constructed clinical feature-based ML models are excellent in predicting AKI in intensive care patients. It would be constructive for physicians to provide early support and timely intervention measures to intensive care patients at risk of AKI.
Collapse
Affiliation(s)
- Xunliang Li
- Department of Nephrology, The Second Affiliated Hospital of Anhui Medical University, Hefei, China
| | - Peng Wang
- Teaching Center for Preventive Medicine, School of Public Health, Anhui Medical University, Hefei, China
| | - Yuke Zhu
- Department of Nephrology, The Second Affiliated Hospital of Anhui Medical University, Hefei, China
| | - Wenman Zhao
- Department of Nephrology, The Second Affiliated Hospital of Anhui Medical University, Hefei, China
| | - Haifeng Pan
- Department of Epidemiology and Biostatistics, School of Public Health, Anhui Medical University, Hefei, China
| | - Deguang Wang
- Department of Nephrology, The Second Affiliated Hospital of Anhui Medical University, Hefei, China.
| |
Collapse
|
4
|
Wen C, Zhang X, Li Y, Xiao W, Hu Q, Lei X, Xu T, Liang S, Gao X, Zhang C, Yu Z, Lü M. An interpretable machine learning model for predicting 28-day mortality in patients with sepsis-associated liver injury. PLoS One 2024; 19:e0303469. [PMID: 38768153 PMCID: PMC11104601 DOI: 10.1371/journal.pone.0303469] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Accepted: 04/25/2024] [Indexed: 05/22/2024] Open
Abstract
Sepsis-Associated Liver Injury (SALI) is an independent risk factor for death from sepsis. The aim of this study was to develop an interpretable machine learning model for early prediction of 28-day mortality in patients with SALI. Data from the Medical Information Mart for Intensive Care (MIMIC-IV, v2.2, MIMIC-III, v1.4) were used in this study. The study cohort from MIMIC-IV was randomized to the training set (0.7) and the internal validation set (0.3), with MIMIC-III (2001 to 2008) as external validation. The features with more than 20% missing values were deleted and the remaining features were multiple interpolated. Lasso-CV that lasso linear model with iterative fitting along a regularization path in which the best model is selected by cross-validation was used to select important features for model development. Eight machine learning models including Random Forest (RF), Logistic Regression, Decision Tree, Extreme Gradient Boost (XGBoost), K Nearest Neighbor, Support Vector Machine, Generalized Linear Models in which the best model is selected by cross-validation (CV_glmnet), and Linear Discriminant Analysis (LDA) were developed. Shapley additive interpretation (SHAP) was used to improve the interpretability of the optimal model. At last, a total of 1043 patients were included, of whom 710 were from MIMIC-IV and 333 from MIMIC-III. Twenty-four clinically relevant parameters were selected for model construction. For the prediction of 28-day mortality of SALI in the internal validation set, the area under the curve (AUC (95% CI)) of RF was 0.79 (95% CI: 0.73-0.86), and which performed the best. Compared with the traditional disease severity scores including Oxford Acute Severity of Illness Score (OASIS), Sequential Organ Failure Assessment (SOFA), Simplified Acute Physiology Score II (SAPS II), Logistic Organ Dysfunction Score (LODS), Systemic Inflammatory Response Syndrome (SIRS), and Acute Physiology Score III (APS III), RF also had the best performance. SHAP analysis found that Urine output, Charlson Comorbidity Index (CCI), minimal Glasgow Coma Scale (GCS_min), blood urea nitrogen (BUN) and admission_age were the five most important features affecting RF model. Therefore, RF has good predictive ability for 28-day mortality prediction in SALI. Urine output, CCI, GCS_min, BUN and age at admission(admission_age) within 24 h after intensive care unit(ICU) admission contribute significantly to model prediction.
Collapse
Affiliation(s)
- Chengli Wen
- Department of Intensive Care Medicine, Department of Critical Care Medicine, The Affiliated Hospital, Southwest Medical University, Luzhou, China
| | - Xu Zhang
- Luzhou Key Laboratory of Human Microecology and Precision Diagnosis and Treatment, Luzhou, China
| | - Yong Li
- Southwest Medical University, Luzhou, China
| | - Wanmeng Xiao
- Luzhou Key Laboratory of Human Microecology and Precision Diagnosis and Treatment, Luzhou, China
- Department of Gastroenterology, The Affiliated Hospital, Southwest Medical University, Luzhou, China
| | - Qinxue Hu
- Department of Intensive Care Medicine, Department of Critical Care Medicine, The Affiliated Hospital, Southwest Medical University, Luzhou, China
| | - Xianying Lei
- Department of Intensive Care Medicine, Department of Critical Care Medicine, The Affiliated Hospital, Southwest Medical University, Luzhou, China
| | - Tao Xu
- Department of Intensive Care Medicine, Department of Critical Care Medicine, The Affiliated Hospital, Southwest Medical University, Luzhou, China
| | - Sicheng Liang
- Luzhou Key Laboratory of Human Microecology and Precision Diagnosis and Treatment, Luzhou, China
- Department of Gastroenterology, The Affiliated Hospital, Southwest Medical University, Luzhou, China
| | - Xiaolan Gao
- Department of Intensive Care Medicine, Department of Critical Care Medicine, The Affiliated Hospital, Southwest Medical University, Luzhou, China
| | - Chao Zhang
- Luzhou Key Laboratory of Human Microecology and Precision Diagnosis and Treatment, Luzhou, China
| | - Zehui Yu
- Laboratory Animal Center, Southwest Medical University, Luzhou, China
| | - Muhan Lü
- Luzhou Key Laboratory of Human Microecology and Precision Diagnosis and Treatment, Luzhou, China
- Department of Gastroenterology, The Affiliated Hospital, Southwest Medical University, Luzhou, China
| |
Collapse
|
5
|
Li C, Jia J, Wu F, Zuo L, Cui X. County-level intensity of carbon emissions from crop farming in China during 2000-2019. Sci Data 2024; 11:457. [PMID: 38710695 DOI: 10.1038/s41597-024-03296-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Accepted: 04/23/2024] [Indexed: 05/08/2024] Open
Abstract
Agriculture is an important contributor to global carbon emissions. With the implementation of the Sustainable Development Goals of the United Nations and China's carbon neutral strategy, accurate estimation of carbon emissions from crop farming is essential to reduce agricultural carbon emissions and promote sustainable food production systems in China. However, previous long-term time series estimates in China have mainly focused on the national and provincial levels, which are insufficient to characterize regional heterogeneity. Here, we selected the county-level administrative district as the basic geographical unit and then generated a county-level dataset on the intensity of carbon emissions from crop farming in China during 2000-2019, using random forest regression with multi-source data. This dataset can be used to delineate spatio-temporal changes in carbon emissions from crop farming in China, providing an important basis for decision makers and researchers to design agricultural carbon reduction strategies in China.
Collapse
Affiliation(s)
- Cheng Li
- Department of Ecology, School of Plant Protection / Joint International Research Laboratory of Agriculture and Agri-Product Safety of the Ministry of Education, Yangzhou University, Yangzhou, 225009, China
| | - Junwen Jia
- School of Systems Science, Beijing Normal University, Beijing, 100875, China
- School of Earth and Environmental Sciences, Cardiff University, Cardiff, CF10 3AT, United Kingdom
| | - Fang Wu
- School of Systems Science, Beijing Normal University, Beijing, 100875, China
| | - Lijun Zuo
- Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, 100093, China
| | - Xuefeng Cui
- School of Systems Science, Beijing Normal University, Beijing, 100875, China.
| |
Collapse
|
6
|
Khattak A, Zhang J, Chan PW, Chen F, Matara CM. AI-supported estimation of safety critical wind shear-induced aircraft go-around events utilizing pilot reports. Heliyon 2024; 10:e28569. [PMID: 38560193 PMCID: PMC10981122 DOI: 10.1016/j.heliyon.2024.e28569] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2023] [Revised: 03/20/2024] [Accepted: 03/20/2024] [Indexed: 04/04/2024] Open
Abstract
The occurrence of wind shear and severe thunderstorms during the final approach phase contributes to nearly half of all aviation accidents. Pilots usually employ the go-around procedure in order to lower the likelihood of an unsafe landing. However, multiple factors influence the go-arounds induced by wind shear. In order to predict the wind shear-induced go-around, this study utilized a cutting-edge AI-based Combined Kernel and Tree Boosting (KTBoost) framework with various data augmentation strategies. First, the KTBoost model was trained, tested, and compared to other Machine Learning models using the data extracted from Hong Kong International Airport (HKIA)-based Pilot Reports for the years 2017-2021. The performance evaluation revealed that the KTBoost model with Synthetic Minority Oversampling Technique - Edited Nearest Neighbor (SMOTE-ENN)- augmented data demonstrated superior performance as measured by the F1-Score (94.37%) and G-Mean (94.87%). Subsequently, the SHapley Additive exPlanations (SHAP) approach was employed to elucidate the interpretation of the KTBoost model using data that had been treated with the SMOTE-ENN technique. According to the findings, flight type, wind shear magnitude, and approach runway contributed the most to the wind shear-induced go-around. Compared to international flights, Hong Kong-based airlines endured the highest number of wind shear-induced go-arounds. Shear due to the tailwind contributed more to the go-around than the headwinds. The runways with the most wind shear-induced Go-arounds were 07C and 07R.
Collapse
Affiliation(s)
- Afaq Khattak
- Key Laboratory of Infrastructure Durability and Operation Safety in Airfield of CAAC, College of Transportation Engineering, Tongji University, 4800 Cao'an Road, Jiading, Shanghai, 201804, China
| | - Jianping Zhang
- Second Research Institute of Civil Aviation Administration of China, Civil Unmanned Aircraft Traffic Management Key Laboratory of Sichuan Province, China
| | - Pak-Wai Chan
- Hong Kong Observatory, 134A Nathan Road, Kowloon, Hong Kong, China
| | - Feng Chen
- Key Laboratory of Infrastructure Durability and Operation Safety in Airfield of CAAC, College of Transportation Engineering, Tongji University, 4800 Cao'an Road, Jiading, Shanghai, 201804, China
| | - Caroline Mongina Matara
- Department of Civil and Construction Engineering, University of Nairobi, P.O. Box 30197-00100, Nairobi, Kenya
| |
Collapse
|
7
|
Wu L, Xu J, Tong W. PERform: assessing model performance with predictivity and explainability readiness formula. JOURNAL OF ENVIRONMENTAL SCIENCE AND HEALTH. PART C, TOXICOLOGY AND CARCINOGENESIS 2024:1-16. [PMID: 38619534 DOI: 10.1080/26896583.2024.2340391] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/16/2024]
Abstract
In the rapidly evolving field of artificial intelligence (AI), explainability has been traditionally assessed in a post-modeling process and is often subjective. In contrary, many quantitative metrics have been routinely used to assess a model's performance. We proposed a unified formular named PERForm, by incorporating explainability as a weight into the existing statistical metrics to provide an integrated and quantitative measure of both predictivity and explainability to guide model selection, application, and evaluation. PERForm was designed as a generic formula and can be applied to any data types. We applied PERForm on a range of diverse datasets, including DILIst, Tox21, and three MAQC-II benchmark datasets, using various modeling algorithms to predict a total of 73 distinct endpoints. For example, AdaBoost algorithms exhibited superior performance (PERForm AUC for AdaBoost is 0.129 where Linear regression is 0) in DILIst prediction, where linear regression outperformed other models in the majority of Tox21 endpoints (PERForm AUC for linear regression is 0.301 where AdaBoost is 0.283 in average). This research marks a significant step toward comprehensively evaluating the utility of an AI model to advance transparency and interpretability, where the tradeoff between a model's performance and its interpretability can have profound implications.
Collapse
Affiliation(s)
- Leihong Wu
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, FDA, Jefferson, AR, USA
| | - Joshua Xu
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, FDA, Jefferson, AR, USA
| | - Weida Tong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, FDA, Jefferson, AR, USA
| |
Collapse
|
8
|
Scalzitti N, Miralavy I, Korenchan DE, Farrar CT, Gilad AA, Banzhaf W. Computational peptide discovery with a genetic programming approach. J Comput Aided Mol Des 2024; 38:17. [PMID: 38570405 DOI: 10.1007/s10822-024-00558-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Accepted: 03/07/2024] [Indexed: 04/05/2024]
Abstract
The development of peptides for therapeutic targets or biomarkers for disease diagnosis is a challenging task in protein engineering. Current approaches are tedious, often time-consuming and require complex laboratory data due to the vast search spaces that need to be considered. In silico methods can accelerate research and substantially reduce costs. Evolutionary algorithms are a promising approach for exploring large search spaces and can facilitate the discovery of new peptides. This study presents the development and use of a new variant of the genetic-programming-based POET algorithm, called POETRegex , where individuals are represented by a list of regular expressions. This algorithm was trained on a small curated dataset and employed to generate new peptides improving the sensitivity of peptides in magnetic resonance imaging with chemical exchange saturation transfer (CEST). The resulting model achieves a performance gain of 20% over the initial POET models and is able to predict a candidate peptide with a 58% performance increase compared to the gold-standard peptide. By combining the power of genetic programming with the flexibility of regular expressions, new peptide targets were identified that improve the sensitivity of detection by CEST. This approach provides a promising research direction for the efficient identification of peptides with therapeutic or diagnostic potential.
Collapse
Affiliation(s)
- Nicolas Scalzitti
- BEACON Center of Evolution in Action, Michigan State University, East Lansing, MI, USA
- Department of Computer Science and Engineering, Michigan State University, East Lansing, MI, USA
| | - Iliya Miralavy
- BEACON Center of Evolution in Action, Michigan State University, East Lansing, MI, USA
- Department of Computer Science and Engineering, Michigan State University, East Lansing, MI, USA
| | - David E Korenchan
- Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
| | - Christian T Farrar
- Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
| | - Assaf A Gilad
- BEACON Center of Evolution in Action, Michigan State University, East Lansing, MI, USA.
- Department of Chemical Engineering, Michigan State University, East Lansing, MI, USA.
- Department of Radiology, Michigan State University, East Lansing, MI, USA.
| | - Wolfgang Banzhaf
- BEACON Center of Evolution in Action, Michigan State University, East Lansing, MI, USA.
- Department of Computer Science and Engineering, Michigan State University, East Lansing, MI, USA.
| |
Collapse
|
9
|
Oh SW, Byun SS, Kim JK, Jeong CW, Kwak C, Hwang EC, Kang SH, Chung J, Kim YJ, Ha YS, Hong SH. Machine learning models for predicting the onset of chronic kidney disease after surgery in patients with renal cell carcinoma. BMC Med Inform Decis Mak 2024; 24:85. [PMID: 38519947 PMCID: PMC10960396 DOI: 10.1186/s12911-024-02473-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Accepted: 03/03/2024] [Indexed: 03/25/2024] Open
Abstract
BACKGROUND Patients with renal cell carcinoma (RCC) have an elevated risk of chronic kidney disease (CKD) following nephrectomy. Therefore, continuous monitoring and subsequent interventions are necessary. It is recommended to evaluate renal function postoperatively. Therefore, a tool to predict CKD onset is essential for postoperative follow-up and management. METHODS We constructed a cohort using data from eight tertiary hospitals from the Korean Renal Cell Carcinoma (KORCC) database. A dataset of 4389 patients with RCC was constructed for analysis from the collected data. Nine machine learning (ML) models were used to classify the occurrence and nonoccurrence of CKD after surgery. The final model was selected based on the area under the receiver operating characteristic (AUROC), and the importance of the variables constituting the model was confirmed using the shapley additive explanation (SHAP) value and Kaplan-Meier survival analyses. RESULTS The gradient boost algorithm was the most effective among the various ML models tested. The gradient boost model demonstrated superior performance with an AUROC of 0.826. The SHAP value confirmed that preoperative eGFR, albumin level, and tumor size had a significant impact on the occurrence of CKD after surgery. CONCLUSIONS We developed a model to predict CKD onset after surgery in patients with RCC. This predictive model is a quantitative approach to evaluate post-surgical CKD risk in patients with RCC, facilitating improved prognosis through personalized postoperative care.
Collapse
Affiliation(s)
- Seol Whan Oh
- Department of Medical Informatics, College of Medicine, The Catholic University of Korea, 06591, Seoul, Korea
- Department of Biomedicine & Health Sciences, The Catholic University of Korea, 06591, Seoul, Korea
| | - Seok-Soo Byun
- Department of Urology, Seoul National University College of Medicine, Seoul National University Bundang Hospital, 13620, Seongnam, Korea
| | - Jung Kwon Kim
- Department of Urology, Seoul National University College of Medicine, Seoul National University Bundang Hospital, 13620, Seongnam, Korea
| | - Chang Wook Jeong
- Department of Urology, Seoul National University College of Medicine, Seoul National University Hospital, 03080, Seoul, Korea
| | - Cheol Kwak
- Department of Urology, Seoul National University College of Medicine, Seoul National University Hospital, 03080, Seoul, Korea
| | - Eu Chang Hwang
- Department of Urology, Chonnam National University Medical School, 61469, Gwangju, Korea
| | - Seok Ho Kang
- Department of Urology, Korea University School of Medicine, 02841, Seoul, Korea
| | - Jinsoo Chung
- Department of Urology, National Cancer Center, 10408, Goyang, Korea
| | - Yong-June Kim
- Department of Urology, Chungbuk National University College of Medicine, 28644, Cheongju, Korea
- Department of Urology, College of Medicine, Chungbuk National University, 28644, Cheongju, Korea
| | - Yun-Sok Ha
- Department of Urology, School of Medicine, Kyungpook National University Chilgok Hospital, Kyungpook National University, 41404, Daegu, Korea
| | - Sung-Hoo Hong
- Department of Urology, Seoul St. Mary's Hospital, College of Medicine, The Catholic University of Korea, Seoul, Republic of Korea.
| |
Collapse
|
10
|
Wu TY, Li YR, Chang KJ, Fang JC, Urano D, Liu MJ. Modeling alternative translation initiation sites in plants reveals evolutionarily conserved cis-regulatory codes in eukaryotes. Genome Res 2024; 34:272-285. [PMID: 38479836 PMCID: PMC10984385 DOI: 10.1101/gr.278100.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Accepted: 02/15/2024] [Indexed: 03/22/2024]
Abstract
mRNA translation relies on identifying translation initiation sites (TISs) in mRNAs. Alternative TISs are prevalent across plant transcriptomes, but the mechanisms for their recognition are unclear. Using ribosome profiling and machine learning, we developed models for predicting alternative TISs in the tomato (Solanum lycopersicum). Distinct feature sets were predictive of AUG and nonAUG TISs in 5' untranslated regions and coding sequences, including a novel CU-rich sequence that promoted plant TIS activity, a translational enhancer found across dicots and monocots, and humans and viruses. Our results elucidate the mechanistic and evolutionary basis of TIS recognition, whereby cis-regulatory RNA signatures affect start site selection. The TIS prediction model provides global estimates of TISs to discover neglected protein-coding genes across plant genomes. The prevalence of cis-regulatory signatures across plant species, humans, and viruses suggests their broad and critical roles in reprogramming the translational landscape.
Collapse
Affiliation(s)
- Ting-Ying Wu
- Institute of Plant and Microbial Biology, Academia Sinica, Taipei 11529, Taiwan;
| | - Ya-Ru Li
- Biotechnology Center in Southern Taiwan, Academia Sinica, Tainan 711, Taiwan
| | - Kai-Jyun Chang
- Biotechnology Center in Southern Taiwan, Academia Sinica, Tainan 711, Taiwan
- Institute of Tropical Plant Sciences, National Cheng Kung University, Tainan 701, Taiwan
| | - Jhen-Cheng Fang
- Biotechnology Center in Southern Taiwan, Academia Sinica, Tainan 711, Taiwan
| | - Daisuke Urano
- Temasek Life Sciences Laboratory, Singapore 117604, Singapore
- Department of Biological Sciences, National University of Singapore, Singapore 117558, Singapore
| | - Ming-Jung Liu
- Biotechnology Center in Southern Taiwan, Academia Sinica, Tainan 711, Taiwan;
- Institute of Tropical Plant Sciences, National Cheng Kung University, Tainan 701, Taiwan
- Agricultural Biotechnology Research Center, Academia Sinica, Taipei 115, Taiwan
| |
Collapse
|
11
|
Mota LFM, Arikawa LM, Santos SWB, Fernandes Júnior GA, Alves AAC, Rosa GJM, Mercadante MEZ, Cyrillo JNSG, Carvalheiro R, Albuquerque LG. Benchmarking machine learning and parametric methods for genomic prediction of feed efficiency-related traits in Nellore cattle. Sci Rep 2024; 14:6404. [PMID: 38493207 PMCID: PMC10944497 DOI: 10.1038/s41598-024-57234-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Accepted: 03/15/2024] [Indexed: 03/18/2024] Open
Abstract
Genomic selection (GS) offers a promising opportunity for selecting more efficient animals to use consumed energy for maintenance and growth functions, impacting profitability and environmental sustainability. Here, we compared the prediction accuracy of multi-layer neural network (MLNN) and support vector regression (SVR) against single-trait (STGBLUP), multi-trait genomic best linear unbiased prediction (MTGBLUP), and Bayesian regression (BayesA, BayesB, BayesC, BRR, and BLasso) for feed efficiency (FE) traits. FE-related traits were measured in 1156 Nellore cattle from an experimental breeding program genotyped for ~ 300 K markers after quality control. Prediction accuracy (Acc) was evaluated using a forward validation splitting the dataset based on birth year, considering the phenotypes adjusted for the fixed effects and covariates as pseudo-phenotypes. The MLNN and SVR approaches were trained by randomly splitting the training population into fivefold to select the best hyperparameters. The results show that the machine learning methods (MLNN and SVR) and MTGBLUP outperformed STGBLUP and the Bayesian regression approaches, increasing the Acc by approximately 8.9%, 14.6%, and 13.7% using MLNN, SVR, and MTGBLUP, respectively. Acc for SVR and MTGBLUP were slightly different, ranging from 0.62 to 0.69 and 0.62 to 0.68, respectively, with empirically unbiased for both models (0.97 and 1.09). Our results indicated that SVR and MTGBLUBP approaches were more accurate in predicting FE-related traits than Bayesian regression and STGBLUP and seemed competitive for GS of complex phenotypes with various degrees of inheritance.
Collapse
Affiliation(s)
- Lucio F M Mota
- School of Agricultural and Veterinarian Sciences, São Paulo State University (UNESP), Jaboticabal, SP, 14884-900, Brazil.
| | - Leonardo M Arikawa
- School of Agricultural and Veterinarian Sciences, São Paulo State University (UNESP), Jaboticabal, SP, 14884-900, Brazil
| | - Samuel W B Santos
- School of Agricultural and Veterinarian Sciences, São Paulo State University (UNESP), Jaboticabal, SP, 14884-900, Brazil
| | - Gerardo A Fernandes Júnior
- School of Agricultural and Veterinarian Sciences, São Paulo State University (UNESP), Jaboticabal, SP, 14884-900, Brazil
| | - Anderson A C Alves
- School of Agricultural and Veterinarian Sciences, São Paulo State University (UNESP), Jaboticabal, SP, 14884-900, Brazil
| | - Guilherme J M Rosa
- Department of Animal and Dairy Sciences, University of Wisconsin, Madison, WI, 53706, USA
| | - Maria E Z Mercadante
- Institute of Animal Science, Beef Cattle Research Center, Sertãozinho, SP, 14174-000, Brazil
- National Council for Science and Technological Development, Brasilia, DF, 71605-001, Brazil
| | - Joslaine N S G Cyrillo
- Institute of Animal Science, Beef Cattle Research Center, Sertãozinho, SP, 14174-000, Brazil
| | - Roberto Carvalheiro
- School of Agricultural and Veterinarian Sciences, São Paulo State University (UNESP), Jaboticabal, SP, 14884-900, Brazil
- National Council for Science and Technological Development, Brasilia, DF, 71605-001, Brazil
| | - Lucia G Albuquerque
- School of Agricultural and Veterinarian Sciences, São Paulo State University (UNESP), Jaboticabal, SP, 14884-900, Brazil.
- National Council for Science and Technological Development, Brasilia, DF, 71605-001, Brazil.
| |
Collapse
|
12
|
Tang AS, Rankin KP, Cerono G, Miramontes S, Mills H, Roger J, Zeng B, Nelson C, Soman K, Woldemariam S, Li Y, Lee A, Bove R, Glymour M, Aghaeepour N, Oskotsky TT, Miller Z, Allen IE, Sanders SJ, Baranzini S, Sirota M. Leveraging electronic health records and knowledge networks for Alzheimer's disease prediction and sex-specific biological insights. NATURE AGING 2024; 4:379-395. [PMID: 38383858 PMCID: PMC10950787 DOI: 10.1038/s43587-024-00573-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Accepted: 01/19/2024] [Indexed: 02/23/2024]
Abstract
Identification of Alzheimer's disease (AD) onset risk can facilitate interventions before irreversible disease progression. We demonstrate that electronic health records from the University of California, San Francisco, followed by knowledge networks (for example, SPOKE) allow for (1) prediction of AD onset and (2) prioritization of biological hypotheses, and (3) contextualization of sex dimorphism. We trained random forest models and predicted AD onset on a cohort of 749 individuals with AD and 250,545 controls with a mean area under the receiver operating characteristic of 0.72 (7 years prior) to 0.81 (1 day prior). We further harnessed matched cohort models to identify conditions with predictive power before AD onset. Knowledge networks highlight shared genes between multiple top predictors and AD (for example, APOE, ACTB, IL6 and INS). Genetic colocalization analysis supports AD association with hyperlipidemia at the APOE locus, as well as a stronger female AD association with osteoporosis at a locus near MS4A6A. We therefore show how clinical data can be utilized for early AD prediction and identification of personalized biological hypotheses.
Collapse
Affiliation(s)
- Alice S Tang
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA.
- Graduate Program in Bioengineering, University of California, San Francisco and University of California, Berkeley, San Francisco and Berkeley, CA, USA.
| | - Katherine P Rankin
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
- Memory and Aging Center, Department of Neurology, University of California, San Francisco, San Francisco, CA, USA
| | - Gabriel Cerono
- Weill Institute for Neuroscience. Department of Neurology, University of California, San Francisco, San Francisco, CA, USA
| | - Silvia Miramontes
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
| | - Hunter Mills
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
| | - Jacquelyn Roger
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
| | - Billy Zeng
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
| | - Charlotte Nelson
- Weill Institute for Neuroscience. Department of Neurology, University of California, San Francisco, San Francisco, CA, USA
| | - Karthik Soman
- Weill Institute for Neuroscience. Department of Neurology, University of California, San Francisco, San Francisco, CA, USA
| | - Sarah Woldemariam
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
| | - Yaqiao Li
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
| | - Albert Lee
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
| | - Riley Bove
- Weill Institute for Neuroscience. Department of Neurology, University of California, San Francisco, San Francisco, CA, USA
| | - Maria Glymour
- Department of Anesthesiology, Pain, and Perioperative Medicine, Stanford University, Palo Alto, CA, USA
| | - Nima Aghaeepour
- Department of Anesthesiology, Pain, and Perioperative Medicine, Stanford University, Palo Alto, CA, USA
- Department of Pediatrics, Stanford University, Palo Alto, CA, USA
- Department of Biomedical Data Science, Stanford University, Palo Alto, CA, USA
| | - Tomiko T Oskotsky
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
| | - Zachary Miller
- Memory and Aging Center, Department of Neurology, University of California, San Francisco, San Francisco, CA, USA
| | - Isabel E Allen
- Department of Epidemiology and Biostatistics, University of California, San Francisco, San Francisco, CA, USA
| | - Stephan J Sanders
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
- Institute of Developmental and Regenerative Medicine, Department of Paediatrics, University of Oxford, Oxford, UK
- Department of Psychiatry and Behavioral Sciences, Weill Institute for Neurosciences, University of California, San Francisco, CA, USA
| | - Sergio Baranzini
- Weill Institute for Neuroscience. Department of Neurology, University of California, San Francisco, San Francisco, CA, USA
| | - Marina Sirota
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA.
- Department of Pediatrics, University of California, San Francisco, CA, USA.
| |
Collapse
|
13
|
Ghosh SK, Khandoker AH. Investigation on explainable machine learning models to predict chronic kidney diseases. Sci Rep 2024; 14:3687. [PMID: 38355876 PMCID: PMC10866953 DOI: 10.1038/s41598-024-54375-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Accepted: 02/12/2024] [Indexed: 02/16/2024] Open
Abstract
Chronic kidney disease (CKD) is a major worldwide health problem, affecting a large proportion of the world's population and leading to higher morbidity and death rates. The early stages of CKD sometimes present without visible symptoms, causing patients to be unaware. Early detection and treatments are critical in reducing complications and improving the overall quality of life for people afflicted. In this work, we investigate the use of an explainable artificial intelligence (XAI)-based strategy, leveraging clinical characteristics, to predict CKD. This study collected clinical data from 491 patients, comprising 56 with CKD and 435 without CKD, encompassing clinical, laboratory, and demographic variables. To develop the predictive model, five machine learning (ML) methods, namely logistic regression (LR), random forest (RF), decision tree (DT), Naïve Bayes (NB), and extreme gradient boosting (XGBoost), were employed. The optimal model was selected based on accuracy and area under the curve (AUC). Additionally, the SHAP (SHapley Additive exPlanations) and LIME (Local Interpretable Model-agnostic Explanations) algorithms were utilized to demonstrate the influence of the features on the optimal model. Among the five models developed, the XGBoost model achieved the best performance with an AUC of 0.9689 and an accuracy of 93.29%. The analysis of feature importance revealed that creatinine, glycosylated hemoglobin type A1C (HgbA1C), and age were the three most influential features in the XGBoost model. The SHAP force analysis further illustrated the model's visualization of individualized CKD predictions. For further insights into individual predictions, we also utilized the LIME algorithm. This study presents an interpretable ML-based approach for the early prediction of CKD. The SHAP and LIME methods enhance the interpretability of ML models and help clinicians better understand the rationale behind the predicted outcomes more effectively.
Collapse
Affiliation(s)
- Samit Kumar Ghosh
- Department of Biomedical Engineering & Biotechnology, Khalifa University, Abu Dhabi, United Arab Emirates.
| | - Ahsan H Khandoker
- Department of Biomedical Engineering & Biotechnology, Khalifa University, Abu Dhabi, United Arab Emirates
| |
Collapse
|
14
|
Wang H, Moghe GD, Kovaleski AP, Keller M, Martinson TE, Wright AH, Franklin JL, Hébert-Haché A, Provost C, Reinke M, Atucha A, North MG, Russo JP, Helwi P, Centinari M, Londo JP. NYUS.2: an automated machine learning prediction model for the large-scale real-time simulation of grapevine freezing tolerance in North America. HORTICULTURE RESEARCH 2024; 11:uhad286. [PMID: 38487294 PMCID: PMC10939402 DOI: 10.1093/hr/uhad286] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Accepted: 12/17/2023] [Indexed: 03/17/2024]
Abstract
Accurate and real-time monitoring of grapevine freezing tolerance is crucial for the sustainability of the grape industry in cool climate viticultural regions. However, on-site data are limited due to the complexity of measurement. Current prediction models underperform under diverse climate conditions, which limits the large-scale deployment of these methods. We combined grapevine freezing tolerance data from multiple regions in North America and generated a predictive model based on hourly temperature-derived features and cultivar features using AutoGluon, an automated machine learning engine. Feature importance was quantified by AutoGluon and SHAP (SHapley Additive exPlanations) value. The final model was evaluated and compared with previous models for its performance under different climate conditions. The final model achieved an overall 1.36°C root-mean-square error during model testing and outperformed two previous models using three test cultivars at all testing regions. Two feature importance quantification methods identified five shared essential features. Detailed analysis of the features indicates that the model has adequately extracted some biological mechanisms during training. The final model, named NYUS.2, was deployed along with two previous models as an R shiny-based application in the 2022-23 dormancy season, enabling large-scale and real-time simulation of grapevine freezing tolerance in North America for the first time.
Collapse
Affiliation(s)
- Hongrui Wang
- School of Integrative Plant Science, Horticulture Section, Cornell AgriTech, Cornell University, Geneva, NY 14456, USA
| | - Gaurav D Moghe
- School of Integrative Plant Science, Plant Biology Section, Cornell University, Ithaca, NY 14850, USA
| | - Al P Kovaleski
- Plant and Agroecosystem Sciences Department, University of Wisconsin–Madison, Madison, WI 53706, USA
| | - Markus Keller
- Department of Viticulture and Enology, Irrigated Agriculture Research and Extension Center, Washington State University, Prosser, WA 99350, USA
| | - Timothy E Martinson
- School of Integrative Plant Science, Horticulture Section, Cornell AgriTech, Cornell University, Geneva, NY 14456, USA
| | - A Harrison Wright
- Kentville Research and Development Centre, Agriculture and Agri-Food Canada, Kentville, Nova Scotia, B4N 1J5, Canada
| | - Jeffrey L Franklin
- Kentville Research and Development Centre, Agriculture and Agri-Food Canada, Kentville, Nova Scotia, B4N 1J5, Canada
| | | | - Caroline Provost
- Centre de Recherche Agroalimentaire de Mirabel, Mirabel, Québec, J7N 2X8, Canada
| | - Michael Reinke
- Southwest Michigan Research and Extension Center, Michigan State University, Benton Harbor, MI 49022, USA
| | - Amaya Atucha
- Plant and Agroecosystem Sciences Department, University of Wisconsin–Madison, Madison, WI 53706, USA
| | - Michael G North
- Plant and Agroecosystem Sciences Department, University of Wisconsin–Madison, Madison, WI 53706, USA
| | - Jennifer P Russo
- School of Integrative Plant Science, Horticulture Section, Cornell AgriTech, Cornell University, Geneva, NY 14456, USA
| | - Pierre Helwi
- Martell & Co., 7 place Edouard Martell, Cognac 16100, France
| | - Michela Centinari
- Department of Plant Science, The Pennsylvania State University, University Park, PA 16802, USA
| | - Jason P Londo
- School of Integrative Plant Science, Horticulture Section, Cornell AgriTech, Cornell University, Geneva, NY 14456, USA
| |
Collapse
|
15
|
Hu J, Xu J, Li M, Jiang Z, Mao J, Feng L, Miao K, Li H, Chen J, Bai Z, Li X, Lu G, Li Y. Identification and validation of an explainable prediction model of acute kidney injury with prognostic implications in critically ill children: a prospective multicenter cohort study. EClinicalMedicine 2024; 68:102409. [PMID: 38273888 PMCID: PMC10809096 DOI: 10.1016/j.eclinm.2023.102409] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Revised: 12/19/2023] [Accepted: 12/19/2023] [Indexed: 01/27/2024] Open
Abstract
Background Acute kidney injury (AKI) is a common and serious organ dysfunction in critically ill children. Early identification and prediction of AKI are of great significance. However, current AKI criteria are insufficiently sensitive and specific, and AKI heterogeneity limits the clinical value of AKI biomarkers. This study aimed to establish and validate an explainable prediction model based on the machine learning (ML) approach for AKI, and assess its prognostic implications in children admitted to the pediatric intensive care unit (PICU). Methods This multicenter prospective study in China was conducted on critically ill children for the derivation and validation of the prediction model. The derivation cohort, consisting of 957 children admitted to four independent PICUs from September 2020 to January 2021, was separated for training and internal validation, and an external data set of 866 children admitted from February 2021 to February 2022 was employed for external validation. AKI was defined based on serum creatinine and urine output using the Kidney Disease: Improving Global Outcome (KDIGO) criteria. With 33 medical characteristics easily obtained or evaluated during the first 24 h after PICU admission, 11 ML algorithms were used to construct prediction models. Several evaluation indexes, including the area under the receiver-operating-characteristic curve (AUC), were used to compare the predictive performance. The SHapley Additive exPlanation method was used to rank the feature importance and explain the final model. A probability threshold for the final model was identified for AKI prediction and subgrouping. Clinical outcomes were evaluated in various subgroups determined by a combination of the final model and KDIGO criteria. Findings The random forest (RF) model performed best in discriminative ability among the 11 ML models. After reducing features according to feature importance rank, an explainable final RF model was established with 8 features. The final model could accurately predict AKI in both internal (AUC = 0.929) and external (AUC = 0.910) validations, and has been translated into a convenient tool to facilitate its utility in clinical settings. Critically ill children with a probability exceeding or equal to the threshold in the final model had a higher risk of death and multiple organ dysfunctions, regardless of whether they met the KDIGO criteria for AKI. Interpretation Our explainable ML model was not only successfully developed to accurately predict AKI but was also highly relevant to adverse outcomes in individual children at an early stage of PICU admission, and it mitigated the concern of the "black-box" issue with an undirect interpretation of the ML technique. Funding The National Natural Science Foundation of China, Jiangsu Province Science and Technology Support Program, Key talent of women's and children's health of Jiangsu Province, and Postgraduate Research & Practice Innovation Program of Jiangsu Province.
Collapse
Affiliation(s)
- Junlong Hu
- Department of Nephrology and Immunology, Children’s Hospital of Soochow University, Suzhou, Jiangsu province, China
| | - Jing Xu
- Department of Nephrology and Immunology, Children’s Hospital of Soochow University, Suzhou, Jiangsu province, China
| | - Min Li
- Pediatric Intensive Care Unit, Anhui Provincial Children’s Hospital, Hefei, Anhui province, China
| | - Zhen Jiang
- Pediatric Intensive Care Unit, Xuzhou Children’s Hospital, Xuzhou, Jiangsu province, China
| | - Jie Mao
- Department of Nephrology and Immunology, Children’s Hospital of Soochow University, Suzhou, Jiangsu province, China
| | - Lian Feng
- Department of Nephrology and Immunology, Children’s Hospital of Soochow University, Suzhou, Jiangsu province, China
| | - Kexin Miao
- Department of Nephrology and Immunology, Children’s Hospital of Soochow University, Suzhou, Jiangsu province, China
| | - Huiwen Li
- Department of Nephrology and Immunology, Children’s Hospital of Soochow University, Suzhou, Jiangsu province, China
| | - Jiao Chen
- Pediatric Intensive Care Unit, Children’s Hospital of Soochow University, Suzhou, Jiangsu province, China
| | - Zhenjiang Bai
- Pediatric Intensive Care Unit, Children’s Hospital of Soochow University, Suzhou, Jiangsu province, China
| | - Xiaozhong Li
- Department of Nephrology and Immunology, Children’s Hospital of Soochow University, Suzhou, Jiangsu province, China
| | - Guoping Lu
- Pediatric Intensive Care Unit, Children’s Hospital of Fudan University, Shanghai, China
| | - Yanhong Li
- Department of Nephrology and Immunology, Children’s Hospital of Soochow University, Suzhou, Jiangsu province, China
- Institute of Pediatric Research, Children’s Hospital of Soochow University, Suzhou, Jiangsu province, China
| |
Collapse
|
16
|
Ciobanu-Caraus O, Aicher A, Kernbach JM, Regli L, Serra C, Staartjes VE. A critical moment in machine learning in medicine: on reproducible and interpretable learning. Acta Neurochir (Wien) 2024; 166:14. [PMID: 38227273 DOI: 10.1007/s00701-024-05892-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2023] [Accepted: 12/14/2023] [Indexed: 01/17/2024]
Abstract
Over the past two decades, advances in computational power and data availability combined with increased accessibility to pre-trained models have led to an exponential rise in machine learning (ML) publications. While ML may have the potential to transform healthcare, this sharp increase in ML research output without focus on methodological rigor and standard reporting guidelines has fueled a reproducibility crisis. In addition, the rapidly growing complexity of these models compromises their interpretability, which currently impedes their successful and widespread clinical adoption. In medicine, where failure of such models may have severe implications for patients' health, the high requirements for accuracy, robustness, and interpretability confront ML researchers with a unique set of challenges. In this review, we discuss the semantics of reproducibility and interpretability, as well as related issues and challenges, and outline possible solutions to counteracting the "black box". To foster reproducibility, standard reporting guidelines need to be further developed and data or code sharing encouraged. Editors and reviewers may equally play a critical role by establishing high methodological standards and thus preventing the dissemination of low-quality ML publications. To foster interpretable learning, the use of simpler models more suitable for medical data can inform the clinician how results are generated based on input data. Model-agnostic explanation tools, sensitivity analysis, and hidden layer representations constitute further promising approaches to increase interpretability. Balancing model performance and interpretability are important to ensure clinical applicability. We have now reached a critical moment for ML in medicine, where addressing these issues and implementing appropriate solutions will be vital for the future evolution of the field.
Collapse
Affiliation(s)
- Olga Ciobanu-Caraus
- Machine Intelligence in Clinical Neuroscience & Microsurgical Neuroanatomy (MICN) Laboratory, Department of Neurosurgery, Clinical Neuroscience Center, University Hospital Zurich, University of Zurich, Zurich, Switzerland
| | - Anatol Aicher
- Machine Intelligence in Clinical Neuroscience & Microsurgical Neuroanatomy (MICN) Laboratory, Department of Neurosurgery, Clinical Neuroscience Center, University Hospital Zurich, University of Zurich, Zurich, Switzerland
| | - Julius M Kernbach
- Department of Neuroradiology, University Hospital Heidelberg, Heidelberg, Germany
| | - Luca Regli
- Machine Intelligence in Clinical Neuroscience & Microsurgical Neuroanatomy (MICN) Laboratory, Department of Neurosurgery, Clinical Neuroscience Center, University Hospital Zurich, University of Zurich, Zurich, Switzerland
| | - Carlo Serra
- Machine Intelligence in Clinical Neuroscience & Microsurgical Neuroanatomy (MICN) Laboratory, Department of Neurosurgery, Clinical Neuroscience Center, University Hospital Zurich, University of Zurich, Zurich, Switzerland
| | - Victor E Staartjes
- Machine Intelligence in Clinical Neuroscience & Microsurgical Neuroanatomy (MICN) Laboratory, Department of Neurosurgery, Clinical Neuroscience Center, University Hospital Zurich, University of Zurich, Zurich, Switzerland.
| |
Collapse
|
17
|
Huang X, Rymbekova A, Dolgova O, Lao O, Kuhlwilm M. Harnessing deep learning for population genetic inference. Nat Rev Genet 2024; 25:61-78. [PMID: 37666948 DOI: 10.1038/s41576-023-00636-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/11/2023] [Indexed: 09/06/2023]
Abstract
In population genetics, the emergence of large-scale genomic data for various species and populations has provided new opportunities to understand the evolutionary forces that drive genetic diversity using statistical inference. However, the era of population genomics presents new challenges in analysing the massive amounts of genomes and variants. Deep learning has demonstrated state-of-the-art performance for numerous applications involving large-scale data. Recently, deep learning approaches have gained popularity in population genetics; facilitated by the advent of massive genomic data sets, powerful computational hardware and complex deep learning architectures, they have been used to identify population structure, infer demographic history and investigate natural selection. Here, we introduce common deep learning architectures and provide comprehensive guidelines for implementing deep learning models for population genetic inference. We also discuss current challenges and future directions for applying deep learning in population genetics, focusing on efficiency, robustness and interpretability.
Collapse
Affiliation(s)
- Xin Huang
- Department of Evolutionary Anthropology, University of Vienna, Vienna, Austria.
- Human Evolution and Archaeological Sciences (HEAS), University of Vienna, Vienna, Austria.
| | - Aigerim Rymbekova
- Department of Evolutionary Anthropology, University of Vienna, Vienna, Austria
- Human Evolution and Archaeological Sciences (HEAS), University of Vienna, Vienna, Austria
| | - Olga Dolgova
- Integrative Genomics Laboratory, CIC bioGUNE - Centro de Investigación Cooperativa en Biociencias, Derio, Biscaya, Spain
| | - Oscar Lao
- Institute of Evolutionary Biology, CSIC-Universitat Pompeu Fabra, Barcelona, Spain.
| | - Martin Kuhlwilm
- Department of Evolutionary Anthropology, University of Vienna, Vienna, Austria.
- Human Evolution and Archaeological Sciences (HEAS), University of Vienna, Vienna, Austria.
| |
Collapse
|
18
|
Lang FF, Liu LY, Wang SW. Predictive modeling of perioperative blood transfusion in lumbar posterior interbody fusion using machine learning. Front Physiol 2023; 14:1306453. [PMID: 38187137 PMCID: PMC10767743 DOI: 10.3389/fphys.2023.1306453] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2023] [Accepted: 11/06/2023] [Indexed: 01/09/2024] Open
Abstract
Background: Accurate estimation of perioperative blood transfusion risk in lumbar posterior interbody fusion is essential to reduce the number, cost, and complications associated with blood transfusions. Machine learning algorithms have the potential to outperform traditional prediction methods in predicting perioperative blood transfusion. This study aimed to construct a machine learning-based perioperative transfusion risk prediction model for lumbar posterior interbody fusion in order to improve the efficacy of surgical decision-making. Methods: We retrospectively collected clinical data on 1905 patients who underwent lumbar posterior interbody fusion surgery at the Second Hospital of Shanxi Medical University between January 2021 and March 2023. All the data was randomly divided into a training set and a validation set, and the "feature_importances" method provided by eXtreme Gradient Boosting (XGBoost) algorithm was applied to select statistically significant features on the training set to establish five machine learning prediction models. The optimal model was identified by utilizing the area under the curve (AUC) and the probability calibration curve on the validation set. Shapley additive explanations (SHAP) and local interpretable model-agnostic explanations (LIME) were employed for interpretable analysis of the optimal model. Results: In the postoperative outcomes of patients, the number of hospital days in the transfusion group was longer than that in the non-transfusion group. Additionally, the transfusion group experienced higher total hospital costs, 90-day readmission rates, and complication rates within 90 days after surgery than the non-transfusion group. A total of 9 features were selected for the models. The XGBoost model performed best with an AUC value of 0.958. The SHAP values showed that intraoperative blood loss, intraoperative fluid infusion, and number of fused segments were the top 3 most important features affecting perioperative blood transfusion in lumbar posterior interbody fusion. The LIME algorithm was used to interpret the individualized prediction. Conclusion: Surgery, ASA class, levels fused, total intraoperative blood loss, operative time, and preoperative Hb are viable predictors of perioperative blood transfusion in lumbar posterior interbody fusion. The XGBoost model has demonstrated superior predictive efficacy compared to the traditional logistic regression model, making it a more effective decision-making tool for perioperative blood transfusion.
Collapse
Affiliation(s)
- Fang-Fang Lang
- School of Public Health, Shanxi Medical University, Taiyuan, China
| | - Li-Ying Liu
- School of Public Health, Shanxi Medical University, Taiyuan, China
| | - Shao-Wei Wang
- Department of Orthopedics, The Second Hospital of Shanxi Medical University, Taiyuan, China
| |
Collapse
|
19
|
Alexander Pyron R. Unsupervised machine learning for species delimitation, integrative taxonomy, and biodiversity conservation. Mol Phylogenet Evol 2023; 189:107939. [PMID: 37804960 DOI: 10.1016/j.ympev.2023.107939] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Revised: 09/25/2023] [Accepted: 10/04/2023] [Indexed: 10/09/2023]
Abstract
Integrative taxonomy, combining data from multiple axes of biologically relevant variation, is a major goal of systematics. Ideally, such taxonomies will derive from similarly integrative species-delimitation analyses. Yet, most current methods rely solely or primarily on molecular data, with other layers often incorporated only in a post hoc qualitative or comparative manner. A major limitation is the difficulty of devising quantitative parametric models linking different datasets in a unified ecological and evolutionary framework. Machine Learning (ML) methods offer flexibility in this arena by easily learning high-dimensional associations between observations (e.g., individual specimens) across a wide array of input features (e.g., genetics, geography, environment, and phenotype) to delimit statistically meaningful clusters. Here, I implement an unsupervised method using Self-Organizing (or "Kohonen") Maps (SOMs) for such purposes. Recent extensions called "SuperSOMs" can integrate multiple layers, each of which exerts independent influence on a two-dimensional output grid via empirically estimated weights. The grid cells are then delimited into K distinct units that can be interpreted as species or other entities. I show empirical examples in salamanders (Desmognathus) and snakes (Storeria) with layers representing alleles, space, climate, and traits. Simulations reveal that the SuperSOM approach can detect K = 1, tends not to over-split, reflects contributions from all layers, and limits large layers (e.g., genetic matrices) from overwhelming other datasets, desirable properties addressing major concerns from previous studies. Finally, I suggest that these and similar methods could integrate conservation-relevant layers such as population trends and human encroachment to delimit management units from an explicitly quantitative framework grounded in the ecology and evolution of species limits and boundaries.
Collapse
Affiliation(s)
- R Alexander Pyron
- Department of Biological Sciences, The George Washington University, Washington, DC 20052 USA.
| |
Collapse
|
20
|
Sadria M, Layton A, Bader GD. Adversarial training improves model interpretability in single-cell RNA-seq analysis. BIOINFORMATICS ADVANCES 2023; 3:vbad166. [PMID: 38099262 PMCID: PMC10719216 DOI: 10.1093/bioadv/vbad166] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 09/28/2023] [Accepted: 11/22/2023] [Indexed: 12/17/2023]
Abstract
Motivation Predictive computational models must be accurate, robust, and interpretable to be considered reliable in important areas such as biology and medicine. A sufficiently robust model should not have its output affected significantly by a slight change in the input. Also, these models should be able to explain how a decision is made to support user trust in the results. Efforts have been made to improve the robustness and interpretability of predictive computational models independently; however, the interaction of robustness and interpretability is poorly understood. Results As an example task, we explore the computational prediction of cell type based on single-cell RNA-seq data and show that it can be made more robust by adversarially training a deep learning model. Surprisingly, we find this also leads to improved model interpretability, as measured by identifying genes important for classification using a range of standard interpretability methods. Our results suggest that adversarial training may be generally useful to improve deep learning robustness and interpretability and that it should be evaluated on a range of tasks. Availability and implementation Our Python implementation of all analysis in this publication can be found at: https://github.com/MehrshadSD/robustness-interpretability. The analysis was conducted using numPy 0.2.5, pandas 2.0.3, scanpy 1.9.3, tensorflow 2.10.0, matplotlib 3.7.1, seaborn 0.12.2, sklearn 1.1.1, shap 0.42.0, lime 0.2.0.1, matplotlib_venn 0.11.9.
Collapse
Affiliation(s)
- Mehrshad Sadria
- Department of Applied Mathematics, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
| | - Anita Layton
- Department of Applied Mathematics, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
- Cheriton School of Computer Science, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
- Department of Biology, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
- School of Pharmacy, University of Waterloo, Waterloo, Ontario N2G 1C5, Canada
| | - Gary D Bader
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 1A8, Canada
- The Donnelly Centre, University of Toronto, Toronto, Ontario M5S 3E1, Canada
- Department of Computer Science, University of Toronto, Toronto, Ontario M5S 2E4, Canada
- The Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, Ontario M5G 1X5, Canada
- Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario M5G 2M9, Canada
| |
Collapse
|
21
|
Jia X, Wang T, Zhu H. Advancing Computational Toxicology by Interpretable Machine Learning. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2023; 57:17690-17706. [PMID: 37224004 PMCID: PMC10666545 DOI: 10.1021/acs.est.3c00653] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/25/2023] [Revised: 05/05/2023] [Accepted: 05/05/2023] [Indexed: 05/26/2023]
Abstract
Chemical toxicity evaluations for drugs, consumer products, and environmental chemicals have a critical impact on human health. Traditional animal models to evaluate chemical toxicity are expensive, time-consuming, and often fail to detect toxicants in humans. Computational toxicology is a promising alternative approach that utilizes machine learning (ML) and deep learning (DL) techniques to predict the toxicity potentials of chemicals. Although the applications of ML- and DL-based computational models in chemical toxicity predictions are attractive, many toxicity models are "black boxes" in nature and difficult to interpret by toxicologists, which hampers the chemical risk assessments using these models. The recent progress of interpretable ML (IML) in the computer science field meets this urgent need to unveil the underlying toxicity mechanisms and elucidate the domain knowledge of toxicity models. In this review, we focused on the applications of IML in computational toxicology, including toxicity feature data, model interpretation methods, use of knowledge base frameworks in IML development, and recent applications. The challenges and future directions of IML modeling in toxicology are also discussed. We hope this review can encourage efforts in developing interpretable models with new IML algorithms that can assist new chemical assessments by illustrating toxicity mechanisms in humans.
Collapse
Affiliation(s)
- Xuelian Jia
- Department
of Chemistry and Biochemistry, Rowan University, Glassboro, New Jersey 08028, United States
| | - Tong Wang
- Department
of Chemistry and Biochemistry, Rowan University, Glassboro, New Jersey 08028, United States
| | - Hao Zhu
- Department
of Chemistry and Biochemistry, Rowan University, Glassboro, New Jersey 08028, United States
| |
Collapse
|
22
|
Karim MR, Islam T, Shajalal M, Beyan O, Lange C, Cochez M, Rebholz-Schuhmann D, Decker S. Explainable AI for Bioinformatics: Methods, Tools and Applications. Brief Bioinform 2023; 24:bbad236. [PMID: 37478371 DOI: 10.1093/bib/bbad236] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Revised: 05/10/2023] [Accepted: 05/26/2023] [Indexed: 07/23/2023] Open
Abstract
Artificial intelligence (AI) systems utilizing deep neural networks and machine learning (ML) algorithms are widely used for solving critical problems in bioinformatics, biomedical informatics and precision medicine. However, complex ML models that are often perceived as opaque and black-box methods make it difficult to understand the reasoning behind their decisions. This lack of transparency can be a challenge for both end-users and decision-makers, as well as AI developers. In sensitive areas such as healthcare, explainability and accountability are not only desirable properties but also legally required for AI systems that can have a significant impact on human lives. Fairness is another growing concern, as algorithmic decisions should not show bias or discrimination towards certain groups or individuals based on sensitive attributes. Explainable AI (XAI) aims to overcome the opaqueness of black-box models and to provide transparency in how AI systems make decisions. Interpretable ML models can explain how they make predictions and identify factors that influence their outcomes. However, the majority of the state-of-the-art interpretable ML methods are domain-agnostic and have evolved from fields such as computer vision, automated reasoning or statistics, making direct application to bioinformatics problems challenging without customization and domain adaptation. In this paper, we discuss the importance of explainability and algorithmic transparency in the context of bioinformatics. We provide an overview of model-specific and model-agnostic interpretable ML methods and tools and outline their potential limitations. We discuss how existing interpretable ML methods can be customized and fit to bioinformatics research problems. Further, through case studies in bioimaging, cancer genomics and text mining, we demonstrate how XAI methods can improve transparency and decision fairness. Our review aims at providing valuable insights and serving as a starting point for researchers wanting to enhance explainability and decision transparency while solving bioinformatics problems. GitHub: https://github.com/rezacsedu/XAI-for-bioinformatics.
Collapse
Affiliation(s)
- Md Rezaul Karim
- Computer Science 5 - Information Systems and Databases, RWTH Aachen University, Germany
- Department of Data Science and Artificial Intelligence, Fraunhofer FIT, Germany
| | - Tanhim Islam
- Computer Science 9 - Process and Data Science, RWTH Aachen University, Germany
| | | | - Oya Beyan
- Computer Science 5 - Information Systems and Databases, RWTH Aachen University, Germany
- University of Cologne, Faculty of Medicine and University Hospital Cologne, Institute for Medical Informatics, Germany
| | - Christoph Lange
- Computer Science 5 - Information Systems and Databases, RWTH Aachen University, Germany
- Department of Data Science and Artificial Intelligence, Fraunhofer FIT, Germany
| | - Michael Cochez
- Department of Computer Science, Vrije Universiteit Amsterdam, the Netherlands
- Elsevier Discovery Lab, Amsterdam, the Netherlands
| | - Dietrich Rebholz-Schuhmann
- ZBMED - Information Center for Life Sciences, Cologne, Germany
- Faculty of Medicine, University of Cologne, Germany
| | - Stefan Decker
- Computer Science 5 - Information Systems and Databases, RWTH Aachen University, Germany
- Department of Data Science and Artificial Intelligence, Fraunhofer FIT, Germany
| |
Collapse
|
23
|
Lee NY, Hum M, Tan GP, Seah AC, Kin PT, Tan NC, Law HY, Lee ASG. Degradation of methylation signals in cryopreserved DNA. Clin Epigenetics 2023; 15:147. [PMID: 37697422 PMCID: PMC10496221 DOI: 10.1186/s13148-023-01565-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Accepted: 09/06/2023] [Indexed: 09/13/2023] Open
Abstract
BACKGROUND Blood-based DNA methylation has shown great promise as a biomarker in a wide variety of diseases. Studies of DNA methylation in blood often utilize samples which have been cryopreserved for years or even decades. Therefore, changes in DNA methylation associated with long-term cryopreservation can introduce biases or otherwise mislead methylation analyses of cryopreserved DNA. However, previous studies have presented conflicting results with studies reporting hypomethylation, no effect, or even hypermethylation of DNA following long-term cryopreservation. These studies may have been limited by insufficient sample sizes, or by their profiling of methylation only on an aggregate global scale, or profiling of only a few CpGs. RESULTS We analyzed two large prospective cohorts: a discovery (n = 126) and a validation (n = 136) cohort, where DNA was cryopreserved for up to four years. In both cohorts there was no detectable change in mean global methylation across increasing storage durations as DNA. However, when analysis was performed on the level of individual CpG methylation both cohorts exhibited a greater number of hypomethylated than hypermethylated CpGs at q-value < 0.05 (4049 hypomethylated but only 50 hypermethylated CpGs in discovery, and 63 hypomethylated but only 6 hypermethylated CpGs in validation). The results were the same even after controlling for age, storage duration as buffy coat prior to DNA extraction, and estimated cell type composition. Furthermore, we find that in both cohorts, CpGs have a greater likelihood to be hypomethylated the closer they are to a CpG island; except for CpGs at the CpG islands themselves which are less likely to be hypomethylated. CONCLUSION Cryopreservation of DNA after a few years results in a detectable bias toward hypomethylation at the level of individual CpG methylation, though when analyzed in aggregate there is no detectable change in mean global methylation. Studies profiling methylation in cryopreserved DNA should be mindful of this hypomethylation bias, and more attention should be directed at developing more stable methods of DNA cryopreservation for biomedical research or clinical use.
Collapse
Affiliation(s)
- Ning Yuan Lee
- Division of Cellular and Molecular Research, National Cancer Centre Singapore, 30 Hospital Boulevard, Singapore, 168583, Singapore
| | - Melissa Hum
- Division of Cellular and Molecular Research, National Cancer Centre Singapore, 30 Hospital Boulevard, Singapore, 168583, Singapore
| | - Guek Peng Tan
- DNA Diagnostic and Research Laboratory, KK Women's and Children's Hospital, 100 Bukit Timah Rd, Singapore, 229899, Singapore
| | - Ai Choo Seah
- SingHealth Polyclinics, 167 Jalan Bukit Merah, Singapore, 150167, Singapore
| | - Patricia T Kin
- SingHealth Polyclinics, 167 Jalan Bukit Merah, Singapore, 150167, Singapore
| | - Ngiap Chuan Tan
- SingHealth Polyclinics, 167 Jalan Bukit Merah, Singapore, 150167, Singapore
- SingHealth Duke-NUS Family Medicine Academic Clinical Programme, Duke-NUS Medical School, 8 College Road, Singapore, 169857, Singapore
| | - Hai-Yang Law
- DNA Diagnostic and Research Laboratory, KK Women's and Children's Hospital, 100 Bukit Timah Rd, Singapore, 229899, Singapore
| | - Ann S G Lee
- Division of Cellular and Molecular Research, National Cancer Centre Singapore, 30 Hospital Boulevard, Singapore, 168583, Singapore.
- SingHealth Duke-NUS Oncology Academic Clinical Programme (ONCO ACP), Duke-NUS Medical School, 8 College Road, Singapore, 169857, Singapore.
- Department of Physiology, Yong Loo Lin School of Medicine, National University of Singapore, 2 Medical Drive, Singapore, 117593, Singapore.
| |
Collapse
|
24
|
Li W, Wang T, Ng WWY. Population-Based Hyperparameter Tuning With Multitask Collaboration. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:5719-5731. [PMID: 34878983 DOI: 10.1109/tnnls.2021.3130896] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Population-based optimization methods are widely used for hyperparameter (HP) tuning for a given specific task. In this work, we propose the population-based hyperparameter tuning with multitask collaboration (PHTMC), which is a general multitask collaborative framework with parallel and sequential phases for population-based HP tuning methods. In the parallel HP tuning phase, a shared population for all tasks is kept and the intertask relatedness is considered to both yield a better generalization ability and avoid data bias to a single task. In the sequential HP tuning phase, a surrogate model is built for each new-added task so that the metainformation from the existing tasks can be extracted and used to help the initialization for the new task. Experimental results show significant improvements in generalization abilities yielded by neural networks trained using the PHTMC and better performances achieved by multitask metalearning. Moreover, a visualization of the solution distribution and the autoencoder's reconstruction of both the PHTMC and a single-task population-based HP tuning method is compared to analyze the property with the multitask collaboration.
Collapse
|
25
|
Wang S, Ren Y, Xia B. Estimation of urban AQI based on interpretable machine learning. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2023; 30:96562-96574. [PMID: 37580474 DOI: 10.1007/s11356-023-29336-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Accepted: 08/10/2023] [Indexed: 08/16/2023]
Abstract
Air pollution is an increasingly serious problem. Accurate and efficient prediction of air quality can effectively prevent air pollution and improve the quality of human life. The air quality index (AQI) is a dimensionless tool to describe air quality quantitatively. In this study, the machine learning (ML) method was used to estimate AQI for Shijiazhuang, China, as the research object, and pollutants and meteorological factors as data models. Specifically, eXtreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LightGBM), and Random Forest (RF) models were used. The experimental results show that XGBoost model captures the AQI variation trend well, and the R2 of XGBoost model is 0.929, which is 0.3% and 2.3% higher than the R2 of RF model and LightGBM model, respectively. In addition, through the SHAP-based model interpretation method, the study reveals the key factors of AQI variation, that is PM2.5 and PM10, play positive roles in the variation of AQI and AQI is less sensitive to meteorological factors. Finally, Beijing, Shanghai, Xi'an, and Guangzhou were selected to test the model's validity, and the model performance remained good. Our study shows that applying ML approach to air quality prediction is beneficial for efficiently assessing cities' future air quality.
Collapse
Affiliation(s)
- Siyuan Wang
- School of Mathematics and Computer Science, Yan'an University, Yan'an, 716000, China
| | - Ying Ren
- School of Mathematics and Computer Science, Yan'an University, Yan'an, 716000, China
| | - Bisheng Xia
- School of Mathematics and Computer Science, Yan'an University, Yan'an, 716000, China.
| |
Collapse
|
26
|
Aida H, Ying BW. Efforts to Minimise the Bacterial Genome as a Free-Living Growing System. BIOLOGY 2023; 12:1170. [PMID: 37759570 PMCID: PMC10525146 DOI: 10.3390/biology12091170] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/23/2023] [Revised: 08/17/2023] [Accepted: 08/24/2023] [Indexed: 09/29/2023]
Abstract
Exploring the minimal genetic requirements for cells to maintain free living is an exciting topic in biology. Multiple approaches are employed to address the question of the minimal genome. In addition to constructing the synthetic genome in the test tube, reducing the size of the wild-type genome is a practical approach for obtaining the essential genomic sequence for living cells. The well-studied Escherichia coli has been used as a model organism for genome reduction owing to its fast growth and easy manipulation. Extensive studies have reported how to reduce the bacterial genome and the collections of genomic disturbed strains acquired, which were sufficiently reviewed previously. However, the common issue of growth decrease caused by genetic disturbance remains largely unaddressed. This mini-review discusses the considerable efforts made to improve growth fitness, which was decreased due to genome reduction. The proposal and perspective are clarified for further accumulated genetic deletion to minimise the Escherichia coli genome in terms of genome reduction, experimental evolution, medium optimization, and machine learning.
Collapse
Affiliation(s)
| | - Bei-Wen Ying
- School of Life and Environmental Sciences, University of Tsukuba, Tsukuba 305-8572, Ibaraki, Japan
| |
Collapse
|
27
|
Ruperao P, Rangan P, Shah T, Thakur V, Kalia S, Mayes S, Rathore A. The Progression in Developing Genomic Resources for Crop Improvement. Life (Basel) 2023; 13:1668. [PMID: 37629524 PMCID: PMC10455509 DOI: 10.3390/life13081668] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 07/21/2023] [Accepted: 07/25/2023] [Indexed: 08/27/2023] Open
Abstract
Sequencing technologies have rapidly evolved over the past two decades, and new technologies are being continually developed and commercialized. The emerging sequencing technologies target generating more data with fewer inputs and at lower costs. This has also translated to an increase in the number and type of corresponding applications in genomics besides enhanced computational capacities (both hardware and software). Alongside the evolving DNA sequencing landscape, bioinformatics research teams have also evolved to accommodate the increasingly demanding techniques used to combine and interpret data, leading to many researchers moving from the lab to the computer. The rich history of DNA sequencing has paved the way for new insights and the development of new analysis methods. Understanding and learning from past technologies can help with the progress of future applications. This review focuses on the evolution of sequencing technologies, their significant enabling role in generating plant genome assemblies and downstream applications, and the parallel development of bioinformatics tools and skills, filling the gap in data analysis techniques.
Collapse
Affiliation(s)
- Pradeep Ruperao
- Center of Excellence in Genomics and Systems Biology, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad 502324, India
| | - Parimalan Rangan
- ICAR-National Bureau of Plant Genetic Resources, PUSA Campus, New Delhi 110012, India;
| | - Trushar Shah
- International Institute of Tropical Agriculture (IITA), Nairobi 30709-00100, Kenya;
| | - Vivek Thakur
- Department of Systems & Computational Biology, School of Life Sciences, University of Hyderabad, Hyderabad 500046, India;
| | - Sanjay Kalia
- Department of Biotechnology, Ministry of Science and Technology, Government of India, New Delhi 110003, India;
| | - Sean Mayes
- Center of Excellence in Genomics and Systems Biology, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad 502324, India
| | - Abhishek Rathore
- Excellence in Breeding, International Maize and Wheat Improvement Center (CIMMYT), Hyderabad 502324, India
| |
Collapse
|
28
|
Zhao A, Wu Y. Future implications of ChatGPT in pharmaceutical industry: drug discovery and development. Front Pharmacol 2023; 14:1194216. [PMID: 37529703 PMCID: PMC10390092 DOI: 10.3389/fphar.2023.1194216] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2023] [Accepted: 06/08/2023] [Indexed: 08/03/2023] Open
Affiliation(s)
- Ailin Zhao
- Department of Hematology, West China Hospital, Sichuan University, Chengdu, Sichuan, China
| | - Yijun Wu
- Cancer Center, West China Hospital, Sichuan University, Chengdu, Sichuan, China
| |
Collapse
|
29
|
Karlsen ST, Rau MH, Sánchez BJ, Jensen K, Zeidan AA. From genotype to phenotype: computational approaches for inferring microbial traits relevant to the food industry. FEMS Microbiol Rev 2023; 47:fuad030. [PMID: 37286882 PMCID: PMC10337747 DOI: 10.1093/femsre/fuad030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 05/31/2023] [Accepted: 06/06/2023] [Indexed: 06/09/2023] Open
Abstract
When selecting microbial strains for the production of fermented foods, various microbial phenotypes need to be taken into account to achieve target product characteristics, such as biosafety, flavor, texture, and health-promoting effects. Through continuous advances in sequencing technologies, microbial whole-genome sequences of increasing quality can now be obtained both cheaper and faster, which increases the relevance of genome-based characterization of microbial phenotypes. Prediction of microbial phenotypes from genome sequences makes it possible to quickly screen large strain collections in silico to identify candidates with desirable traits. Several microbial phenotypes relevant to the production of fermented foods can be predicted using knowledge-based approaches, leveraging our existing understanding of the genetic and molecular mechanisms underlying those phenotypes. In the absence of this knowledge, data-driven approaches can be applied to estimate genotype-phenotype relationships based on large experimental datasets. Here, we review computational methods that implement knowledge- and data-driven approaches for phenotype prediction, as well as methods that combine elements from both approaches. Furthermore, we provide examples of how these methods have been applied in industrial biotechnology, with special focus on the fermented food industry.
Collapse
Affiliation(s)
- Signe T Karlsen
- Bioinformatics & Modeling, R&D Digital Innovation, Chr. Hansen A/S, Bøge Allé 10-12, 2970 Hørsholm, Denmark
| | - Martin H Rau
- Bioinformatics & Modeling, R&D Digital Innovation, Chr. Hansen A/S, Bøge Allé 10-12, 2970 Hørsholm, Denmark
| | - Benjamín J Sánchez
- Bioinformatics & Modeling, R&D Digital Innovation, Chr. Hansen A/S, Bøge Allé 10-12, 2970 Hørsholm, Denmark
| | - Kristian Jensen
- Bioinformatics & Modeling, R&D Digital Innovation, Chr. Hansen A/S, Bøge Allé 10-12, 2970 Hørsholm, Denmark
| | - Ahmad A Zeidan
- Bioinformatics & Modeling, R&D Digital Innovation, Chr. Hansen A/S, Bøge Allé 10-12, 2970 Hørsholm, Denmark
| |
Collapse
|
30
|
Jiang Z, Xie W, Zhou X, Pan W, Jiang S, Zhang X, Zhang M, Zhang Z, Lu Y, Wang D. A virtual biopsy study of microsatellite instability in gastric cancer based on deep learning radiomics. Insights Imaging 2023; 14:104. [PMID: 37286810 DOI: 10.1186/s13244-023-01438-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2023] [Accepted: 04/15/2023] [Indexed: 06/09/2023] Open
Abstract
OBJECTIVES This study aims to develop and validate a virtual biopsy model to predict microsatellite instability (MSI) status in preoperative gastric cancer (GC) patients based on clinical information and the radiomics of deep learning algorithms. METHODS A total of 223 GC patients with MSI status detected by postoperative immunohistochemical staining (IHC) were retrospectively recruited and randomly assigned to the training (n = 167) and testing (n = 56) sets in a 3:1 ratio. In the training set, 982 high-throughput radiomic features were extracted from preoperative abdominal dynamic contrast-enhanced CT (CECT) and screened. According to the deep learning multilayer perceptron (MLP), 15 optimal features were optimized to establish the radiomic feature score (Rad-score), and LASSO regression was used to screen out clinically independent predictors. Based on logistic regression, the Rad-score and clinically independent predictors were integrated to build the clinical radiomics model and visualized as a nomogram and independently verified in the testing set. The performance and clinical applicability of hybrid model in identifying MSI status were evaluated by the area under the receiver operating characteristic (AUC) curve, calibration curve, and decision curve (DCA). RESULTS The AUCs of the clinical image model in training set and testing set were 0.883 [95% CI: 0.822-0.945] and 0.802 [95% CI: 0.666-0.937], respectively. This hybrid model showed good consistency in the calibration curve and clinical applicability in the DCA curve, respectively. CONCLUSIONS Using preoperative imaging and clinical information, we developed a deep-learning-based radiomics model for the non-invasive evaluation of MSI in GC patients. This model maybe can potentially support clinical treatment decision making for GC patients.
Collapse
Affiliation(s)
- Zinian Jiang
- Qingdao Medical College, Qingdao University, Qingdao, Shandong, China
| | - Wentao Xie
- Department of Gastrointestinal Surgery, The Affiliated Hospital of Qingdao University, No. 1677, Wutaishan Road, Qingdao, 266000, Shandong, China
| | - Xiaoming Zhou
- Department of Radiology, The Affiliated Hospital of Qingdao University, Qingdao, Shandong, China
| | - Wenjun Pan
- Qingdao Medical College, Qingdao University, Qingdao, Shandong, China
| | - Sheng Jiang
- Qingdao Medical College, Qingdao University, Qingdao, Shandong, China
| | - Xianxiang Zhang
- Department of Gastrointestinal Surgery, The Affiliated Hospital of Qingdao University, No. 1677, Wutaishan Road, Qingdao, 266000, Shandong, China
| | - Maoshen Zhang
- Department of Gastrointestinal Surgery, The Affiliated Hospital of Qingdao University, No. 1677, Wutaishan Road, Qingdao, 266000, Shandong, China
| | - Zhenqi Zhang
- Department of Pathology, The Affiliated Hospital of Qingdao University, Qingdao, Shandong, China
| | - Yun Lu
- Qingdao Medical College, Qingdao University, Qingdao, Shandong, China.
- Department of Gastrointestinal Surgery, The Affiliated Hospital of Qingdao University, No. 1677, Wutaishan Road, Qingdao, 266000, Shandong, China.
- Shandong Key Laboratory of Digital Medicine and Computer Assisted Surgery, Qingdao, Shandong, China.
| | - Dongsheng Wang
- Qingdao Medical College, Qingdao University, Qingdao, Shandong, China.
- Department of Gastrointestinal Surgery, The Affiliated Hospital of Qingdao University, No. 1677, Wutaishan Road, Qingdao, 266000, Shandong, China.
| |
Collapse
|
31
|
Aarons MF, Young CM, Bruce L, Dwyer DB. Real time prediction of match outcomes in Australian football. J Sports Sci 2023; 41:1115-1125. [PMID: 37733399 DOI: 10.1080/02640414.2023.2259266] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Accepted: 08/09/2023] [Indexed: 09/22/2023]
Abstract
This study aimed to determine whether machine learning models based on technical performance and not score margin could be used to predict end-of-match outcome of Australian football matches in real-time. If efficacious, these models could be used to generate insights about team performance and support the decision-making of coaches during matches. A database of 168 team technical performance indicators from 829 Australian Football League matches played between 2017 and 2021 was used. Two feature sets (data-driven and data-informed) were used to train and evaluate six models (generalised linear model, random forest and adaboost) on match outcome prediction (Win/Loss) over 120 epochs (a representation of normalised time during each match). All models performed well (mean classification accuracy = 73.5-75.8%) in comparison with a benchmark score-based model (mean classification accuracy = 77.4%). Data-informed feature sets performed better than data-driven in most cases. Classification accuracy was low at the start of a match (45.7-48.8%) but increased to a peak near the end of a match (87.2-92.7%). These findings suggest that any of the employed models can be used to formulate in-match decision support. The model which is best in practice will depend on factors such as time-cost trade-off, feasibility and the perceived value of its suggestions.
Collapse
Affiliation(s)
| | - Chris M Young
- Centre for Sport Research, Deakin University, Geelong, Australia
| | - Lyndell Bruce
- Centre for Sport Research, Deakin University, Geelong, Australia
| | - Dan B Dwyer
- Centre for Sport Research, Deakin University, Geelong, Australia
| |
Collapse
|
32
|
Ahlquist KD, Sugden LA, Ramachandran S. Enabling interpretable machine learning for biological data with reliability scores. PLoS Comput Biol 2023; 19:e1011175. [PMID: 37235578 PMCID: PMC10249903 DOI: 10.1371/journal.pcbi.1011175] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Revised: 06/08/2023] [Accepted: 05/10/2023] [Indexed: 05/28/2023] Open
Abstract
Machine learning tools have proven useful across biological disciplines, allowing researchers to draw conclusions from large datasets, and opening up new opportunities for interpreting complex and heterogeneous biological data. Alongside the rapid growth of machine learning, there have also been growing pains: some models that appear to perform well have later been revealed to rely on features of the data that are artifactual or biased; this feeds into the general criticism that machine learning models are designed to optimize model performance over the creation of new biological insights. A natural question arises: how do we develop machine learning models that are inherently interpretable or explainable? In this manuscript, we describe the SWIF(r) reliability score (SRS), a method building on the SWIF(r) generative framework that reflects the trustworthiness of the classification of a specific instance. The concept of the reliability score has the potential to generalize to other machine learning methods. We demonstrate the utility of the SRS when faced with common challenges in machine learning including: 1) an unknown class present in testing data that was not present in training data, 2) systemic mismatch between training and testing data, and 3) instances of testing data that have missing values for some attributes. We explore these applications of the SRS using a range of biological datasets, from agricultural data on seed morphology, to 22 quantitative traits in the UK Biobank, and population genetic simulations and 1000 Genomes Project data. With each of these examples, we demonstrate how the SRS can allow researchers to interrogate their data and training approach thoroughly, and to pair their domain-specific knowledge with powerful machine-learning frameworks. We also compare the SRS to related tools for outlier and novelty detection, and find that it has comparable performance, with the advantage of being able to operate when some data are missing. The SRS, and the broader discussion of interpretable scientific machine learning, will aid researchers in the biological machine learning space as they seek to harness the power of machine learning without sacrificing rigor and biological insight.
Collapse
Affiliation(s)
- K. D. Ahlquist
- Center for Computational Molecular Biology, Brown University, Providence, Rhode Island, United States of America
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, Rhode Island, United States of America
| | - Lauren A. Sugden
- Department of Mathematics and Computer Science, Duquesne University, Pittsburgh, Pennsylvania, United States of America
| | - Sohini Ramachandran
- Center for Computational Molecular Biology, Brown University, Providence, Rhode Island, United States of America
- Department of Ecology, Evolution and Organismal Biology, Brown University, Providence, Rhode Island, United States of America
- Data Science Initiative, Brown University, Providence, Rhode Island, United States of America
| |
Collapse
|
33
|
Wang S, Ren Y, Xia B, Liu K, Li H. Prediction of atmospheric pollutants in urban environment based on coupled deep learning model and sensitivity analysis. CHEMOSPHERE 2023; 331:138830. [PMID: 37137395 DOI: 10.1016/j.chemosphere.2023.138830] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/01/2022] [Revised: 04/11/2023] [Accepted: 04/30/2023] [Indexed: 05/05/2023]
Abstract
Accurate and efficient predictions of pollutants in the atmosphere provide a reliable basis for the scientific management of atmospheric pollution. This study develops a model that combines an attention mechanism, convolutional neural network (CNN), and long short-term memory (LSTM) unit to predict the O3 and PM2.5 levels in the atmosphere, as well as an air quality index (AQI). The prediction results given by the proposed model are compared with those from CNN-LSTM and LSTM models as well as random forest and support vector regression models. The proposed model achieves a correlation coefficient between the predicted and observed values of more than 0.90, outperforming the other four models. The model errors are also consistently lower when using the proposed approach. Sobol-based sensitivity analysis is applied to identify the variables that make the greatest contribution to the model prediction results. Taking the COVID-19 outbreak as the time boundary, we find some homology in the interactions among the pollutants and meteorological factors in the atmosphere during different periods. Solar irradiance is the most important factor for O3, CO is the most important factor for PM2.5, and particulate matter has the most significant effect on AQI. The key influencing factors are the same over the whole phase and before the COVID-19 outbreak, indicating that the impact of COVID-19 restrictions on AQI gradually stabilized. Removing variables that contribute the least to the prediction results without affecting the model prediction performance improves the modeling efficiency and reduces the computational costs.
Collapse
Affiliation(s)
- Siyuan Wang
- School of Environment, Nanjing Normal University, Nanjing, 210023, PR China; School of Mathematics and Computer Science, Yan'an University, Yan'an, 716000, PR China
| | - Ying Ren
- School of Mathematics and Computer Science, Yan'an University, Yan'an, 716000, PR China
| | - Bisheng Xia
- School of Mathematics and Computer Science, Yan'an University, Yan'an, 716000, PR China
| | - Kai Liu
- School of Environment, Nanjing Normal University, Nanjing, 210023, PR China
| | - Huiming Li
- School of Environment, Nanjing Normal University, Nanjing, 210023, PR China.
| |
Collapse
|
34
|
Wang Q, Xu T, Xu K, Lu Z, Ying J. Prediction of transport proteins from sequence information with the deep learning approach. Comput Biol Med 2023; 160:106974. [PMID: 37167658 DOI: 10.1016/j.compbiomed.2023.106974] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Revised: 04/17/2023] [Accepted: 04/22/2023] [Indexed: 05/13/2023]
Abstract
Transport proteins (TPs) are vital to the growth and life of all living things, especially in fields of microbial pathogenesis and drug resistance of tumor cells. Accurately identifying potential TPs remains an important challenge for the advancement of functional genomics. This study aimed to develop a tool for predicting TPs using the deep learning approach. Here, we proposed DeepTP, a convolutional neural network model that uses parallel subnetworks to extract features from protein sequences and uses fully connected layers for TP classification. To train and evaluate the performance of the developed model, datasets were collected from the UniProtKB/Swiss-Prot database. The test results revealed that the proposed model could successfully identify TPs with the AUCROC, accuracy, F-value, and Matthews correlation coefficient of 0.9719, 0.9513, 0.8982, and 0.8679, respectively. By further comparison, DeepTP achieved better performance than other commonly used methods. Analysis of the gradients of prediction score concerning input suggested that DeepTP makes predictions by recognizing the functional domains of TPs. We anticipate that DeepTP will serve as a useful tool for predicting TPs in large-scale genome projects, which will facilitate the discovery of novel TPs.
Collapse
Affiliation(s)
- Qian Wang
- Department of Clinical Laboratory, Wenzhou People's Hospital, The Third Affiliated Hospital of Shanghai University, The Third Clinical Institute Affiliated to Wenzhou Medical University, Wenzhou, China
| | - Teng Xu
- Institute of Translational Medicine, Baotou Central Hospital, Baotou, China
| | - Kai Xu
- Department of Clinical Laboratory, Wenzhou People's Hospital, The Third Affiliated Hospital of Shanghai University, The Third Clinical Institute Affiliated to Wenzhou Medical University, Wenzhou, China
| | - Zhongqiu Lu
- Wenzhou Key Laboratory of Emergency, Critical Care, and Disaster Medicine, Department of Emergency, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China.
| | - Jianchao Ying
- Central Laboratory, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China; Wenzhou Key Laboratory of Emergency, Critical Care, and Disaster Medicine, Department of Emergency, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China.
| |
Collapse
|
35
|
Patterson A, Elbasir A, Tian B, Auslander N. Computational Methods Summarizing Mutational Patterns in Cancer: Promise and Limitations for Clinical Applications. Cancers (Basel) 2023; 15:cancers15071958. [PMID: 37046619 PMCID: PMC10093138 DOI: 10.3390/cancers15071958] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Revised: 02/24/2023] [Accepted: 03/09/2023] [Indexed: 03/29/2023] Open
Abstract
Since the rise of next-generation sequencing technologies, the catalogue of mutations in cancer has been continuously expanding. To address the complexity of the cancer-genomic landscape and extract meaningful insights, numerous computational approaches have been developed over the last two decades. In this review, we survey the current leading computational methods to derive intricate mutational patterns in the context of clinical relevance. We begin with mutation signatures, explaining first how mutation signatures were developed and then examining the utility of studies using mutation signatures to correlate environmental effects on the cancer genome. Next, we examine current clinical research that employs mutation signatures and discuss the potential use cases and challenges of mutation signatures in clinical decision-making. We then examine computational studies developing tools to investigate complex patterns of mutations beyond the context of mutational signatures. We survey methods to identify cancer-driver genes, from single-driver studies to pathway and network analyses. In addition, we review methods inferring complex combinations of mutations for clinical tasks and using mutations integrated with multi-omics data to better predict cancer phenotypes. We examine the use of these tools for either discovery or prediction, including prediction of tumor origin, treatment outcomes, prognosis, and cancer typing. We further discuss the main limitations preventing widespread clinical integration of computational tools for the diagnosis and treatment of cancer. We end by proposing solutions to address these challenges using recent advances in machine learning.
Collapse
Affiliation(s)
- Andrew Patterson
- Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- The Wistar Institute, Philadelphia, PA 19104, USA
| | | | - Bin Tian
- The Wistar Institute, Philadelphia, PA 19104, USA
| | - Noam Auslander
- The Wistar Institute, Philadelphia, PA 19104, USA
- Department of Cancer Biology, University of Pennsylvania, Philadelphia, PA 19104, USA
- Correspondence:
| |
Collapse
|
36
|
Artificial Intelligence for Antimicrobial Resistance Prediction: Challenges and Opportunities towards Practical Implementation. Antibiotics (Basel) 2023; 12:antibiotics12030523. [PMID: 36978390 PMCID: PMC10044311 DOI: 10.3390/antibiotics12030523] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Revised: 03/01/2023] [Accepted: 03/03/2023] [Indexed: 03/08/2023] Open
Abstract
Antimicrobial resistance (AMR) is emerging as a potential threat to many lives worldwide. It is very important to understand and apply effective strategies to counter the impact of AMR and its mutation from a medical treatment point of view. The intersection of artificial intelligence (AI), especially deep learning/machine learning, has led to a new direction in antimicrobial identification. Furthermore, presently, the availability of huge amounts of data from multiple sources has made it more effective to use these artificial intelligence techniques to identify interesting insights into AMR genes such as new genes, mutations, drug identification, conditions favorable to spread, and so on. Therefore, this paper presents a review of state-of-the-art challenges and opportunities. These include interesting input features posing challenges in use, state-of-the-art deep-learning/machine-learning models for robustness and high accuracy, challenges, and prospects to apply these techniques for practical purposes. The paper concludes with the encouragement to apply AI to the AMR sector with the intention of practical diagnosis and treatment, since presently most studies are at early stages with minimal application in the practice of diagnosis and treatment of disease.
Collapse
|
37
|
Macedo Mota LF, Bisutti V, Vanzin A, Pegolo S, Toscano A, Schiavon S, Tagliapietra F, Gallo L, Ajmone Marsan P, Cecchinato A. Predicting milk protein fractions using infrared spectroscopy and a gradient boosting machine for breeding purposes in Holstein cattle. J Dairy Sci 2023; 106:1853-1873. [PMID: 36710177 DOI: 10.3168/jds.2022-22119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2022] [Accepted: 10/10/2022] [Indexed: 01/29/2023]
Abstract
In recent years, increasing attention has been focused on the genetic evaluation of protein fractions in cow milk with the aim of improving milk quality and technological characteristics. In this context, advances in high-throughput phenotyping by Fourier transform infrared (FTIR) spectroscopy offer the opportunity for large-scale, efficient measurement of novel traits that can be exploited in breeding programs as indicator traits. We took milk samples from 2,558 Holstein cows belonging to 38 herds in northern Italy, operating under different production systems. Fourier transform infrared spectra were collected on the same day as milk sampling and stored for subsequent analysis. Two sets of data (i.e., phenotypes and FTIR spectra) collected in 2 different years (2013 and 2019-2020) were compiled. The following traits were assessed using HPLC: true protein, major casein fractions [αS1-casein (CN), αS2-CN, β-CN, κ-CN, and glycosylated-κ-CN], and major whey proteins (β-lactoglobulin and α-lactalbumin), all of which were measured both in grams per liter (g/L) and proportion of total nitrogen (% N). The FTIR predictions were calculated using the gradient boosting machine technique and tested by 3 different cross-validation (CRV) methods. We used the following CRV scenarios: (1) random 10-fold, which randomly split the whole into 10-folds of equal size (9-folds for training and 1-fold for validation); (2) herd/date-out CRV, which assigned 80% of herd/date as the training set with independence of 20% of herd/date assigned as the validation set; (3) forward/backward CRV, which split the data set in training and validation set according with the year of milk sampling (FTIR and gold standard data assessed in 2013 or 2019-2020) using the "old" and "new" databases for training and validation, and vice-versa with independence among them; (4) the CRV for genetic parameters (CRV-gen), where animals without pedigree as assigned as a fixed training population and animals with pedigree information was split in 5-folds, in which 1-fold was assigned to the fixed training population, and 4-folds were assigned to the validation set (independent from the training set). The results (i.e., measures and predictions) of CRV-gen were used to infer the genetic parameters for gold standard laboratory measurements (i.e., proteins assessed with HPLC) and FTIR-based predictions considering the CRV-gen scenario from a bi-trait animal model using single-step genomic BLUP. We found that the prediction accuracies of the gradient boosting machine equations differed according to the way in which the proteins were expressed, achieving higher accuracy when expressed in g/L than when expressed as % N in all CRV scenarios. Concerning the reproducibility of the equations over the different years, the results showed no relevant differences in predictive ability between using "old" data as the training set and "new" data as the validation set and vice-versa. Comparing the additive genetic variance estimates for milk protein fractions between the FTIR predicted and HPLC measures, we found reductions of -19.7% for milk protein fractions expressed in g/L, and -21.19% expressed as % N. Although we found reductions in the heritability estimates, they were small, with values ranging from -1.9 to -7.25% for g/L, and -1.6 to -7.9% for % N. The posterior distributions of the additive genetic correlations (ra) between the FTIR predictions and the laboratory measurements were generally high (>0.8), even when the milk protein fractions were expressed as % N. Our results show the potential of using FTIR predictions in breeding programs as indicator traits for the selection of animals to enhance milk protein fraction contents. We expect acceptable responses to selection due to the high genetic correlations between HPLC measurements and FTIR predictions.
Collapse
Affiliation(s)
- L F Macedo Mota
- Department of Agronomy, Food, Natural Resources, Animals and Environment (DAFNAE), University of Padova, Viale dell' Università 16, 35020 Legnaro, Italy
| | - V Bisutti
- Department of Agronomy, Food, Natural Resources, Animals and Environment (DAFNAE), University of Padova, Viale dell' Università 16, 35020 Legnaro, Italy
| | - A Vanzin
- Department of Agronomy, Food, Natural Resources, Animals and Environment (DAFNAE), University of Padova, Viale dell' Università 16, 35020 Legnaro, Italy
| | - S Pegolo
- Department of Agronomy, Food, Natural Resources, Animals and Environment (DAFNAE), University of Padova, Viale dell' Università 16, 35020 Legnaro, Italy.
| | - A Toscano
- Department of Agronomy, Food, Natural Resources, Animals and Environment (DAFNAE), University of Padova, Viale dell' Università 16, 35020 Legnaro, Italy
| | - S Schiavon
- Department of Agronomy, Food, Natural Resources, Animals and Environment (DAFNAE), University of Padova, Viale dell' Università 16, 35020 Legnaro, Italy
| | - F Tagliapietra
- Department of Agronomy, Food, Natural Resources, Animals and Environment (DAFNAE), University of Padova, Viale dell' Università 16, 35020 Legnaro, Italy
| | - L Gallo
- Department of Agronomy, Food, Natural Resources, Animals and Environment (DAFNAE), University of Padova, Viale dell' Università 16, 35020 Legnaro, Italy
| | - P Ajmone Marsan
- Department of Animal Science, Food and Nutrition (DIANA) and Research Center Romeo and Enrica Invernizzi for Sustainable Dairy Production (CREI), Faculty of Agricultural, Food and Environmental Sciences, Università Cattolica del Sacro Cuore, 29122 Piacenza, Italy
| | - A Cecchinato
- Department of Agronomy, Food, Natural Resources, Animals and Environment (DAFNAE), University of Padova, Viale dell' Università 16, 35020 Legnaro, Italy
| |
Collapse
|
38
|
Zhao X, Yoshida N, Ueda T, Sugano H, Tanaka T. Epileptic seizure detection by using interpretable machine learning models. J Neural Eng 2023; 20. [PMID: 36603215 DOI: 10.1088/1741-2552/acb089] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2022] [Accepted: 01/05/2023] [Indexed: 01/06/2023]
Abstract
Objective.Accurate detection of epileptic seizures using electroencephalogram (EEG) data is essential for epilepsy diagnosis, but the visual diagnostic process for clinical experts is a time-consuming task. To improve efficiency, some seizure detection methods have been proposed. Regardless of traditional or machine learning methods, the results identify only seizures and non-seizures. Our goal is not only to detect seizures but also to explain the basis for detection and provide reference information to clinical experts.Approach.In this study, we follow the visual diagnosis mechanism used by clinical experts that directly processes plotted EEG image data and apply some commonly used models of LeNet, VGG, deep residual network (ResNet), and vision transformer (ViT) to the EEG image classification task. Before using these models, we propose a data augmentation method using random channel ordering (RCO), which adjusts the channel order to generate new images. The Gradient-weighted class activation mapping (Grad-CAM) and attention layer methods are used to interpret the models.Main results.The RCO method can balance the dataset in seizure and non-seizure classes. The models achieved good performance in the seizure detection task. Moreover, the Grad-CAM and attention layer methods explained the detection basis of the model very well and calculate a value that measures the seizure degree.Significance.Processing EEG data in the form of images can flexibility to use a variety of machine learning models. The imbalance problem that exists widely in clinical practice is well solved by the RCO method. Since the method follows the visual diagnosis mechanism of clinical experts, the model interpretation results can be presented to clinical experts intuitively, and the quantitative information provided by the model is also a good diagnostic reference.
Collapse
Affiliation(s)
- Xuyang Zhao
- Department of Electrical Engineering and Computer Science, Tokyo University of Agriculture and Technology, Tokyo, Japan
| | | | - Tetsuya Ueda
- Faculty of Medicine, Juntendo University, Tokyo, Japan
| | | | - Toshihisa Tanaka
- Department of Electrical Engineering and Computer Science, Tokyo University of Agriculture and Technology, Tokyo, Japan
| |
Collapse
|
39
|
Lee YY, Endale M, Wu G, Ruben MD, Francey LJ, Morris AR, Choo NY, Anafi RC, Smith DF, Liu AC, Hogenesch JB. Integration of genome-scale data identifies candidate sleep regulators. Sleep 2023; 46:zsac279. [PMID: 36462188 PMCID: PMC9905783 DOI: 10.1093/sleep/zsac279] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2022] [Revised: 09/02/2022] [Indexed: 12/05/2022] Open
Abstract
STUDY OBJECTIVES Genetics impacts sleep, yet, the molecular mechanisms underlying sleep regulation remain elusive. In this study, we built machine learning models to predict sleep genes based on their similarity to genes that are known to regulate sleep. METHODS We trained a prediction model on thousands of published datasets, representing circadian, immune, sleep deprivation, and many other processes, using a manually curated list of 109 sleep genes. RESULTS Our predictions fit with prior knowledge of sleep regulation and identified key genes and pathways to pursue in follow-up studies. As an example, we focused on the NF-κB pathway and showed that chronic activation of NF-κB in a genetic mouse model impacted the sleep-wake patterns. CONCLUSION Our study highlights the power of machine learning in integrating prior knowledge and genome-wide data to study genetic regulation of complex behaviors such as sleep.
Collapse
Affiliation(s)
- Yin Yeng Lee
- Divisions of Human Genetics and Immunobiology, Department of Pediatrics, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, 45229, USA
- Department of Pharmacology and Systems Physiology, University of Cincinnati College of Medicine, Cincinnati, OH 45229, USA
| | - Mehari Endale
- Department of Physiology and Aging, University of Florida College of Medicine, Gainesville, FL 32610, USA
| | - Gang Wu
- Divisions of Human Genetics and Immunobiology, Department of Pediatrics, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, 45229, USA
| | - Marc D Ruben
- Divisions of Human Genetics and Immunobiology, Department of Pediatrics, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, 45229, USA
| | - Lauren J Francey
- Divisions of Human Genetics and Immunobiology, Department of Pediatrics, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, 45229, USA
| | - Andrew R Morris
- Department of Physiology and Aging, University of Florida College of Medicine, Gainesville, FL 32610, USA
| | - Natalie Y Choo
- Division of Pediatric Otolaryngology-Head and Neck Surgery, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH 45229, USA
| | - Ron C Anafi
- Department of Medicine, Chronobiology and Sleep Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - David F Smith
- Division of Pediatric Otolaryngology-Head and Neck Surgery, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH 45229, USA
- Division of Pulmonary Medicine and the Sleep Center, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH 45229, USA
- Center for Circadian Medicine, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH 45229, USA
- Department of Otolaryngology - Head and Neck Surgery, University of Cincinnati College of Medicine, Cincinnati, OH 45229, USA
| | - Andrew C Liu
- Department of Physiology and Aging, University of Florida College of Medicine, Gainesville, FL 32610, USA
| | - John B Hogenesch
- Divisions of Human Genetics and Immunobiology, Department of Pediatrics, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, 45229, USA
- Center for Circadian Medicine, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH 45229, USA
| |
Collapse
|
40
|
Ren S, Wu S, Weng Q. Physics-informed machine learning methods for biomass gasification modeling by considering monotonic relationships. BIORESOURCE TECHNOLOGY 2023; 369:128472. [PMID: 36509306 DOI: 10.1016/j.biortech.2022.128472] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/13/2022] [Revised: 12/05/2022] [Accepted: 12/06/2022] [Indexed: 06/17/2023]
Abstract
Machine learning methods have recently shown a broad application prospect in biomass gasification modeling. However, a significant drawback of the machine learning approaches is their poor physical interpretability when relying on limited experimental data. In the present work, a physics-informed neural network method (PINN) is developed to predict biomass gasification products (N2, H2, CO, CO2, and CH4). PINN simultaneously considers regression, structure, and physical monotonicity constraints in the loss function, providing physically feasible predictions. Specifically, the PINN models have outperformed prediction capability (average test R2 0.91-0.97) compared to five other machine learning methods through 50 times random sample classifications. Furthermore, it is demonstrated that the developed models can maintain correct monotonicity even if the feedstock characteristics or gasification conditions are outside the training data. By using a reliable physical mechanism to guide machine learning, the model can ensure better generalizability and scientific interpretability.
Collapse
Affiliation(s)
- Shaojun Ren
- Key Laboratory of Energy Thermal Conversion and Control of Ministry of Education, School of Energy and Environment, Southeast University, Nanjing 210096, PR China.
| | - Shiliang Wu
- Key Laboratory of Energy Thermal Conversion and Control of Ministry of Education, School of Energy and Environment, Southeast University, Nanjing 210096, PR China
| | - Qihang Weng
- Key Laboratory of Energy Thermal Conversion and Control of Ministry of Education, School of Energy and Environment, Southeast University, Nanjing 210096, PR China
| |
Collapse
|
41
|
Blankestijn JM, Lopez-Rincon A, Neerincx AH, Vijverberg SJH, Hashimoto S, Gorenjak M, Sardón Prado O, Corcuera-Elosegui P, Korta-Murua J, Pino-Yanes M, Potočnik U, Bang C, Franke A, Wolff C, Brandstetter S, Toncheva AA, Kheiroddin P, Harner S, Kabesch M, Kraneveld AD, Abdel-Aziz MI, Maitland-van der Zee AH. Classifying asthma control using salivary and fecal bacterial microbiome in children with moderate-to-severe asthma. Pediatr Allergy Immunol 2023; 34:e13919. [PMID: 36825736 DOI: 10.1111/pai.13919] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Revised: 01/23/2023] [Accepted: 01/24/2023] [Indexed: 02/24/2023]
Abstract
BACKGROUND Uncontrolled asthma can lead to severe exacerbations and reduced quality of life. Research has shown that the microbiome may be linked with asthma characteristics; however, its association with asthma control has not been explored. We aimed to investigate whether the gastrointestinal microbiome can be used to discriminate between uncontrolled and controlled asthma in children. METHODS 143 and 103 feces samples were obtained from 143 children with moderate-to-severe asthma aged 6 to 17 years from the SysPharmPediA study. Patients were classified as controlled or uncontrolled asthmatics, and their microbiome at species level was compared using global (alpha/beta) diversity, conventional differential abundance analysis (DAA, analysis of compositions of microbiomes with bias correction), and machine learning [Recursive Ensemble Feature Selection (REFS)]. RESULTS Global diversity and DAA did not find significant differences between controlled and uncontrolled pediatric asthmatics. REFS detected a set of taxa, including Haemophilus and Veillonella, differentiating uncontrolled and controlled asthma with an average classification accuracy of 81% (saliva) and 86% (feces). These taxa showed enrichment in taxa previously associated with inflammatory diseases for both sampling compartments, and with COPD for the saliva samples. CONCLUSION Controlled and uncontrolled children with asthma can be differentiated based on their gastrointestinal microbiome using machine learning, specifically REFS. Our results show an association between asthma control and the gastrointestinal microbiome. This suggests that the gastrointestinal microbiome may be a potential biomarker for treatment responsiveness and thereby help to improve asthma control in children.
Collapse
Affiliation(s)
- Jelle M Blankestijn
- Department of Pulmonary Medicine, Amsterdam UMC, University of Amsterdam, Amsterdam, The Netherlands
- Amsterdam Institute for Infection and Immunity, Amsterdam, The Netherlands
- Amsterdam Public Health, Amsterdam, The Netherlands
| | - Alejandro Lopez-Rincon
- Division of Pharmacology, Utrecht Institute for Pharmaceutical Sciences, Faculty of Science, Utrecht University, Utrecht, The Netherlands
- Department of Data Science, Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht, The Netherlands
| | - Anne H Neerincx
- Department of Pulmonary Medicine, Amsterdam UMC, University of Amsterdam, Amsterdam, The Netherlands
| | - Susanne J H Vijverberg
- Department of Pulmonary Medicine, Amsterdam UMC, University of Amsterdam, Amsterdam, The Netherlands
| | - Simone Hashimoto
- Department of Pulmonary Medicine, Amsterdam UMC, University of Amsterdam, Amsterdam, The Netherlands
- Department of Pediatric Respiratory Medicine, Emma Children's Hospital, Amsterdam UMC, Amsterdam, The Netherlands
| | - Mario Gorenjak
- Center for Human Molecular Genetics and Pharmacogenomics, Faculty of Medicine, University of Maribor, Maribor, Slovenia
| | - Olaia Sardón Prado
- Division of Pediatric Respiratory Medicine, Hospital Universitario Donostia, San Sebastián, Spain
- Department of Pediatrics, University of the Basque Country (UPV/EHU), San Sebastián, Spain
| | - Paula Corcuera-Elosegui
- Division of Pediatric Respiratory Medicine, Hospital Universitario Donostia, San Sebastián, Spain
| | - Javier Korta-Murua
- Division of Pediatric Respiratory Medicine, Hospital Universitario Donostia, San Sebastián, Spain
| | - Maria Pino-Yanes
- Genomics and Health Group, Department of Biochemistry, Microbiology, Cell Biology and Genetics, Universidad de La Laguna (ULL), San Cristóbal de La Laguna, Spain
- CIBER de Enfermedades Respiratorias, Instituto de Salud Carlos III, Madrid, Spain
- Instituto de Tecnologías Biomédicas (ITB), Universidad de La Laguna, La Laguna, Spain
| | - Uroš Potočnik
- Center for Human Molecular Genetics and Pharmacogenomics, Faculty of Medicine, University of Maribor, Maribor, Slovenia
- Laboratory for Biochemistry, Molecular Biology and Genomics, Faculty of Chemistry and Chemical Engineering, University of Maribor, Maribor, Slovenia
| | - Corinna Bang
- Institute of Clinical Molecular Biology, Christian-Albrechts-University of Kiel, Kiel, Germany
| | - Andre Franke
- Institute of Clinical Molecular Biology, Christian-Albrechts-University of Kiel, Kiel, Germany
| | - Christine Wolff
- Science and Development Campus Regensburg (WECARE), University Children's Hospital Regensburg (KUNO) at the Hospital St. Hedwig of the Order of St. John, University of Regensburg, Regensburg, Germany
| | - Susanne Brandstetter
- Science and Development Campus Regensburg (WECARE), University Children's Hospital Regensburg (KUNO) at the Hospital St. Hedwig of the Order of St. John, University of Regensburg, Regensburg, Germany
| | - Antoaneta A Toncheva
- Science and Development Campus Regensburg (WECARE), University Children's Hospital Regensburg (KUNO) at the Hospital St. Hedwig of the Order of St. John, University of Regensburg, Regensburg, Germany
| | - Parastoo Kheiroddin
- Science and Development Campus Regensburg (WECARE), University Children's Hospital Regensburg (KUNO) at the Hospital St. Hedwig of the Order of St. John, University of Regensburg, Regensburg, Germany
| | - Susanne Harner
- Department of Pediatric Pneumology and Allergy, University Children's Hospital Regensburg (KUNO) at the Hospital St. Hedwig of the Order of St. John, University of Regensburg, Regensburg, Germany
| | - Michael Kabesch
- Science and Development Campus Regensburg (WECARE), University Children's Hospital Regensburg (KUNO) at the Hospital St. Hedwig of the Order of St. John, University of Regensburg, Regensburg, Germany
- Department of Pediatric Pneumology and Allergy, University Children's Hospital Regensburg (KUNO) at the Hospital St. Hedwig of the Order of St. John, University of Regensburg, Regensburg, Germany
| | - Aletta D Kraneveld
- Division of Pharmacology, Utrecht Institute for Pharmaceutical Sciences, Faculty of Science, Utrecht University, Utrecht, The Netherlands
| | - Mahmoud I Abdel-Aziz
- Department of Pulmonary Medicine, Amsterdam UMC, University of Amsterdam, Amsterdam, The Netherlands
- Amsterdam Institute for Infection and Immunity, Amsterdam, The Netherlands
- Amsterdam Public Health, Amsterdam, The Netherlands
- Department of Clinical Pharmacy, Faculty of Pharmacy, Assiut University, Assiut, Egypt
| | - Anke H Maitland-van der Zee
- Department of Pulmonary Medicine, Amsterdam UMC, University of Amsterdam, Amsterdam, The Netherlands
- Amsterdam Institute for Infection and Immunity, Amsterdam, The Netherlands
- Amsterdam Public Health, Amsterdam, The Netherlands
- Department of Pediatric Respiratory Medicine, Emma Children's Hospital, Amsterdam UMC, Amsterdam, The Netherlands
| |
Collapse
|
42
|
Fritzsche MC, Akyüz K, Cano Abadía M, McLennan S, Marttinen P, Mayrhofer MT, Buyx AM. Ethical layering in AI-driven polygenic risk scores-New complexities, new challenges. Front Genet 2023; 14:1098439. [PMID: 36816027 PMCID: PMC9933509 DOI: 10.3389/fgene.2023.1098439] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Accepted: 01/04/2023] [Indexed: 01/27/2023] Open
Abstract
Researchers aim to develop polygenic risk scores as a tool to prevent and more effectively treat serious diseases, disorders and conditions such as breast cancer, type 2 diabetes mellitus and coronary heart disease. Recently, machine learning techniques, in particular deep neural networks, have been increasingly developed to create polygenic risk scores using electronic health records as well as genomic and other health data. While the use of artificial intelligence for polygenic risk scores may enable greater accuracy, performance and prediction, it also presents a range of increasingly complex ethical challenges. The ethical and social issues of many polygenic risk score applications in medicine have been widely discussed. However, in the literature and in practice, the ethical implications of their confluence with the use of artificial intelligence have not yet been sufficiently considered. Based on a comprehensive review of the existing literature, we argue that this stands in need of urgent consideration for research and subsequent translation into the clinical setting. Considering the many ethical layers involved, we will first give a brief overview of the development of artificial intelligence-driven polygenic risk scores, associated ethical and social implications, challenges in artificial intelligence ethics, and finally, explore potential complexities of polygenic risk scores driven by artificial intelligence. We point out emerging complexity regarding fairness, challenges in building trust, explaining and understanding artificial intelligence and polygenic risk scores as well as regulatory uncertainties and further challenges. We strongly advocate taking a proactive approach to embedding ethics in research and implementation processes for polygenic risk scores driven by artificial intelligence.
Collapse
Affiliation(s)
- Marie-Christine Fritzsche
- Institute of History and Ethics in Medicine, TUM School of Medicine, Technical University of Munich, Munich, Germany,Department of Science, Technology and Society (STS), School of Social Sciences and Technology, Technical University of Munich, Munich, Germany,*Correspondence: Marie-Christine Fritzsche,
| | - Kaya Akyüz
- Biobanking and Biomolecular Resources Research Infrastructure Consortium - European Research Infrastructure Consortium (BBMRI-ERIC), Graz, Austria,Department of Science and Technology Studies, University of Vienna, Vienna, Austria
| | - Mónica Cano Abadía
- Biobanking and Biomolecular Resources Research Infrastructure Consortium - European Research Infrastructure Consortium (BBMRI-ERIC), Graz, Austria
| | - Stuart McLennan
- Institute of History and Ethics in Medicine, TUM School of Medicine, Technical University of Munich, Munich, Germany,Department of Science, Technology and Society (STS), School of Social Sciences and Technology, Technical University of Munich, Munich, Germany
| | - Pekka Marttinen
- Helsinki Institute for Information Technology HIIT, Aalto University, Helsinki, Finland
| | - Michaela Th. Mayrhofer
- Biobanking and Biomolecular Resources Research Infrastructure Consortium - European Research Infrastructure Consortium (BBMRI-ERIC), Graz, Austria
| | - Alena M. Buyx
- Institute of History and Ethics in Medicine, TUM School of Medicine, Technical University of Munich, Munich, Germany,Department of Science, Technology and Society (STS), School of Social Sciences and Technology, Technical University of Munich, Munich, Germany
| |
Collapse
|
43
|
Li MP, Liu WC, Sun BL, Zhong NS, Liu ZL, Huang SH, Zhang ZH, Liu JM. Prediction of bone metastasis in non-small cell lung cancer based on machine learning. Front Oncol 2023; 12:1054300. [PMID: 36698411 PMCID: PMC9869148 DOI: 10.3389/fonc.2022.1054300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Accepted: 11/21/2022] [Indexed: 01/12/2023] Open
Abstract
Objective The purpose of this paper was to develop a machine learning algorithm with good performance in predicting bone metastasis (BM) in non-small cell lung cancer (NSCLC) and establish a simple web predictor based on the algorithm. Methods Patients who diagnosed with NSCLC between 2010 and 2018 in the Surveillance, Epidemiology and End Results (SEER) database were involved. To increase the extensibility of the research, data of patients who first diagnosed with NSCLC at the First Affiliated Hospital of Nanchang University between January 2007 and December 2016 were also included in this study. Independent risk factors for BM in NSCLC were screened by univariate and multivariate logistic regression. At this basis, we chose six commonly machine learning algorithms to build predictive models, including Logistic Regression (LR), Decision tree (DT), Random Forest (RF), Gradient Boosting Machine (GBM), Naive Bayes classifiers (NBC) and eXtreme gradient boosting (XGB). Then, the best model was identified to build the web-predictor for predicting BM of NSCLC patients. Finally, area under receiver operating characteristic curve (AUC), accuracy, sensitivity and specificity were used to evaluate the performance of these models. Results A total of 50581 NSCLC patients were included in this study, and 5087(10.06%) of them developed BM. The sex, grade, laterality, histology, T stage, N stage, and chemotherapy were independent risk factors for NSCLC. Of these six models, the machine learning model built by the XGB algorithm performed best in both internal and external data setting validation, with AUC scores of 0.808 and 0.841, respectively. Then, the XGB algorithm was used to build a web predictor of BM from NSCLC. Conclusion This study developed a web predictor based XGB algorithm for predicting the risk of BM in NSCLC patients, which may assist doctors for clinical decision making.
Collapse
Affiliation(s)
- Meng-Pan Li
- Department of Orthopedic Surgery, The First Affiliated Hospital of Nanchang University, Nanchang, China,The First Clinical Medical College of Nanchang University, Nanchang, China
| | - Wen-Cai Liu
- Department of Orthopedic Surgery, The First Affiliated Hospital of Nanchang University, Nanchang, China,The First Clinical Medical College of Nanchang University, Nanchang, China,Department of Orthopaedics, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai, China
| | - Bo-Lin Sun
- Department of Orthopedic Surgery, The First Affiliated Hospital of Nanchang University, Nanchang, China,Institute of Spine and Spinal Cord, Nanchang University, Nanchang, China
| | - Nan-Shan Zhong
- Department of Orthopedic Surgery, The First Affiliated Hospital of Nanchang University, Nanchang, China,Institute of Spine and Spinal Cord, Nanchang University, Nanchang, China
| | - Zhi-Li Liu
- Department of Orthopedic Surgery, The First Affiliated Hospital of Nanchang University, Nanchang, China,Institute of Spine and Spinal Cord, Nanchang University, Nanchang, China
| | - Shan-Hu Huang
- Department of Orthopedic Surgery, The First Affiliated Hospital of Nanchang University, Nanchang, China,Institute of Spine and Spinal Cord, Nanchang University, Nanchang, China
| | - Zhi-Hong Zhang
- Department of Orthopedic Surgery, The First Affiliated Hospital of Nanchang University, Nanchang, China,Institute of Spine and Spinal Cord, Nanchang University, Nanchang, China,*Correspondence: Jia-Ming Liu, ; Zhi-Hong Zhang,
| | - Jia-Ming Liu
- Department of Orthopedic Surgery, The First Affiliated Hospital of Nanchang University, Nanchang, China,Institute of Spine and Spinal Cord, Nanchang University, Nanchang, China,*Correspondence: Jia-Ming Liu, ; Zhi-Hong Zhang,
| |
Collapse
|
44
|
Mahmood U, Li X, Fan Y, Chang W, Niu Y, Li J, Qu C, Lu K. Multi-omics revolution to promote plant breeding efficiency. FRONTIERS IN PLANT SCIENCE 2022; 13:1062952. [PMID: 36570904 PMCID: PMC9773847 DOI: 10.3389/fpls.2022.1062952] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/06/2022] [Accepted: 11/24/2022] [Indexed: 06/17/2023]
Abstract
Crop production is the primary goal of agricultural activities, which is always taken into consideration. However, global agricultural systems are coming under increasing pressure from the rising food demand of the rapidly growing world population and changing climate. To address these issues, improving high-yield and climate-resilient related-traits in crop breeding is an effective strategy. In recent years, advances in omics techniques, including genomics, transcriptomics, proteomics, and metabolomics, paved the way for accelerating plant/crop breeding to cope with the changing climate and enhance food production. Optimized omics and phenotypic plasticity platform integration, exploited by evolving machine learning algorithms will aid in the development of biological interpretations for complex crop traits. The precise and progressive assembly of desire alleles using precise genome editing approaches and enhanced breeding strategies would enable future crops to excel in combating the changing climates. Furthermore, plant breeding and genetic engineering ensures an exclusive approach to developing nutrient sufficient and climate-resilient crops, the productivity of which can sustainably and adequately meet the world's food, nutrition, and energy needs. This review provides an overview of how the integration of omics approaches could be exploited to select crop varieties with desired traits.
Collapse
Affiliation(s)
- Umer Mahmood
- Integrative Science Center of Germplasm Creation in Western China (Chongqing) Science City and Southwest University, College of Agronomy and Biotechnology, Southwest University, Chongqing, China
| | - Xiaodong Li
- Integrative Science Center of Germplasm Creation in Western China (Chongqing) Science City and Southwest University, College of Agronomy and Biotechnology, Southwest University, Chongqing, China
| | - Yonghai Fan
- Integrative Science Center of Germplasm Creation in Western China (Chongqing) Science City and Southwest University, College of Agronomy and Biotechnology, Southwest University, Chongqing, China
| | - Wei Chang
- Integrative Science Center of Germplasm Creation in Western China (Chongqing) Science City and Southwest University, College of Agronomy and Biotechnology, Southwest University, Chongqing, China
| | - Yue Niu
- Integrative Science Center of Germplasm Creation in Western China (Chongqing) Science City and Southwest University, College of Agronomy and Biotechnology, Southwest University, Chongqing, China
| | - Jiana Li
- Integrative Science Center of Germplasm Creation in Western China (Chongqing) Science City and Southwest University, College of Agronomy and Biotechnology, Southwest University, Chongqing, China
- Academy of Agricultural Sciences, Southwest University, Chongqing, China
- Engineering Research Center of South Upland Agriculture, Ministry of Education, Chongqing, China
| | - Cunmin Qu
- Integrative Science Center of Germplasm Creation in Western China (Chongqing) Science City and Southwest University, College of Agronomy and Biotechnology, Southwest University, Chongqing, China
- Academy of Agricultural Sciences, Southwest University, Chongqing, China
- Engineering Research Center of South Upland Agriculture, Ministry of Education, Chongqing, China
| | - Kun Lu
- Integrative Science Center of Germplasm Creation in Western China (Chongqing) Science City and Southwest University, College of Agronomy and Biotechnology, Southwest University, Chongqing, China
- Academy of Agricultural Sciences, Southwest University, Chongqing, China
- Engineering Research Center of South Upland Agriculture, Ministry of Education, Chongqing, China
| |
Collapse
|
45
|
Non-Invasive Biomarkers for Early Lung Cancer Detection. Cancers (Basel) 2022; 14:cancers14235782. [PMID: 36497263 PMCID: PMC9739091 DOI: 10.3390/cancers14235782] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2022] [Revised: 07/18/2022] [Accepted: 07/20/2022] [Indexed: 11/27/2022] Open
Abstract
Worldwide, lung cancer (LC) is the most common cause of cancer death, and any delay in the detection of new and relapsed disease serves as a major factor for a significant proportion of LC morbidity and mortality. Though invasive methods such as tissue biopsy are considered the gold standard for diagnosis and disease monitoring, they have several limitations. Therefore, there is an urgent need to identify and validate non-invasive biomarkers for the early diagnosis, prognosis, and treatment of lung cancer for improved patient management. Despite recent progress in the identification of non-invasive biomarkers, currently, there is a shortage of reliable and accessible biomarkers demonstrating high sensitivity and specificity for LC detection. In this review, we aim to cover the latest developments in the field, including the utility of biomarkers that are currently used in LC screening and diagnosis. We comment on their limitations and summarise the findings and developmental stages of potential molecular contenders such as microRNAs, circulating tumour DNA, and methylation markers. Furthermore, we summarise research challenges in the development of biomarkers used for screening purposes and the potential clinical applications of newly discovered biomarkers.
Collapse
|
46
|
Discovery and classification of complex multimorbidity patterns: unravelling chronicity networks and their social profiles. Sci Rep 2022; 12:20004. [PMID: 36411299 PMCID: PMC9678882 DOI: 10.1038/s41598-022-23617-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2022] [Accepted: 11/02/2022] [Indexed: 11/23/2022] Open
Abstract
Multimorbidity can be defined as the presence of two or more chronic diseases in an individual. This condition is associated with reduced quality of life, increased disability, greater functional impairment, increased health care utilisation, greater fragmentation of care and complexity of treatment, and increased mortality. Thus, understanding its epidemiology and inherent complexity is essential to improve the quality of life of patients and to reduce the costs associated with multi-pathology. In this paper, using data from the European Health Survey, we explore the application of Mixed Graphical Models and its combination with social network analysis techniques for the discovery and classification of complex multimorbidity patterns. The results obtained show the usefulness and versatility of this approach for the study of multimorbidity based on the use of graphs, which offer the researcher a holistic view of the relational structure of data with variables of different types and high dimensionality.
Collapse
|
47
|
Samal BR, Loers JU, Vermeirssen V, De Preter K. Opportunities and challenges in interpretable deep learning for drug sensitivity prediction of cancer cells. FRONTIERS IN BIOINFORMATICS 2022; 2:1036963. [PMID: 36466148 PMCID: PMC9714662 DOI: 10.3389/fbinf.2022.1036963] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Accepted: 11/03/2022] [Indexed: 01/02/2024] Open
Abstract
In precision oncology, therapy stratification is done based on the patients' tumor molecular profile. Modeling and prediction of the drug response for a given tumor molecular type will further improve therapeutic decision-making for cancer patients. Indeed, deep learning methods hold great potential for drug sensitivity prediction, but a major problem is that these models are black box algorithms and do not clarify the mechanisms of action. This puts a limitation on their clinical implementation. To address this concern, many recent studies attempt to overcome these issues by developing interpretable deep learning methods that facilitate the understanding of the logic behind the drug response prediction. In this review, we discuss strengths and limitations of recent approaches, and suggest future directions that could guide further improvement of interpretable deep learning in drug sensitivity prediction in cancer research.
Collapse
Affiliation(s)
- Bikash Ranjan Samal
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
- Center for Medical Genetics Ghent (CMGG), Ghent University, Ghent, Belgium
- Cancer Research Institute Ghent (CRIG), Ghent, Belgium
| | - Jens Uwe Loers
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
- Center for Medical Genetics Ghent (CMGG), Ghent University, Ghent, Belgium
- Cancer Research Institute Ghent (CRIG), Ghent, Belgium
- Department of Biomedical Molecular Biology, Ghent University, Ghent, Belgium
| | - Vanessa Vermeirssen
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
- Center for Medical Genetics Ghent (CMGG), Ghent University, Ghent, Belgium
- Cancer Research Institute Ghent (CRIG), Ghent, Belgium
- Department of Biomedical Molecular Biology, Ghent University, Ghent, Belgium
| | - Katleen De Preter
- Department of Biomolecular Medicine, Ghent University, Ghent, Belgium
- Center for Medical Genetics Ghent (CMGG), Ghent University, Ghent, Belgium
- Cancer Research Institute Ghent (CRIG), Ghent, Belgium
| |
Collapse
|
48
|
Xu Y, Zhang X, Li H, Zheng H, Zhang J, Olsen MS, Varshney RK, Prasanna BM, Qian Q. Smart breeding driven by big data, artificial intelligence, and integrated genomic-enviromic prediction. MOLECULAR PLANT 2022; 15:1664-1695. [PMID: 36081348 DOI: 10.1016/j.molp.2022.09.001] [Citation(s) in RCA: 43] [Impact Index Per Article: 21.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Revised: 08/20/2022] [Accepted: 09/02/2022] [Indexed: 05/12/2023]
Abstract
The first paradigm of plant breeding involves direct selection-based phenotypic observation, followed by predictive breeding using statistical models for quantitative traits constructed based on genetic experimental design and, more recently, by incorporation of molecular marker genotypes. However, plant performance or phenotype (P) is determined by the combined effects of genotype (G), envirotype (E), and genotype by environment interaction (GEI). Phenotypes can be predicted more precisely by training a model using data collected from multiple sources, including spatiotemporal omics (genomics, phenomics, and enviromics across time and space). Integration of 3D information profiles (G-P-E), each with multidimensionality, provides predictive breeding with both tremendous opportunities and great challenges. Here, we first review innovative technologies for predictive breeding. We then evaluate multidimensional information profiles that can be integrated with a predictive breeding strategy, particularly envirotypic data, which have largely been neglected in data collection and are nearly untouched in model construction. We propose a smart breeding scheme, integrated genomic-enviromic prediction (iGEP), as an extension of genomic prediction, using integrated multiomics information, big data technology, and artificial intelligence (mainly focused on machine and deep learning). We discuss how to implement iGEP, including spatiotemporal models, environmental indices, factorial and spatiotemporal structure of plant breeding data, and cross-species prediction. A strategy is then proposed for prediction-based crop redesign at both the macro (individual, population, and species) and micro (gene, metabolism, and network) scales. Finally, we provide perspectives on translating smart breeding into genetic gain through integrative breeding platforms and open-source breeding initiatives. We call for coordinated efforts in smart breeding through iGEP, institutional partnerships, and innovative technological support.
Collapse
Affiliation(s)
- Yunbi Xu
- Institute of Crop Sciences, CIMMYT-China, Chinese Academy of Agricultural Sciences, Beijing 100081, China; CIMMYT-China Tropical Maize Research Center, School of Food Science and Engineering, Foshan University, Foshan, Guangdong 528231, China; Peking University Institute of Advanced Agricultural Sciences, Weifang, Shandong 261325, China.
| | - Xingping Zhang
- Peking University Institute of Advanced Agricultural Sciences, Weifang, Shandong 261325, China
| | - Huihui Li
- Institute of Crop Sciences, CIMMYT-China, Chinese Academy of Agricultural Sciences, Beijing 100081, China; National Nanfan Research Institute (Sanya), Chinese Academy of Agricultural Sciences, Sanya, Hainan 572024, China
| | - Hongjian Zheng
- CIMMYT-China Specialty Maize Research Center, Shanghai Academy of Agricultural Sciences, Shanghai 201400, China
| | - Jianan Zhang
- MolBreeding Biotechnology Co., Ltd., Shijiazhuang, Hebei 050035, China
| | - Michael S Olsen
- CIMMYT (International Maize and Wheat Improvement Center), ICRAF Campus, United Nations Avenue, Nairobi, Kenya
| | - Rajeev K Varshney
- State Agricultural Biotechnology Centre, Centre for Crop and Food Innovation, Food Futures Institute, Murdoch University, Murdoch, Australia
| | - Boddupalli M Prasanna
- CIMMYT (International Maize and Wheat Improvement Center), ICRAF Campus, United Nations Avenue, Nairobi, Kenya
| | - Qian Qian
- Institute of Crop Sciences, CIMMYT-China, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| |
Collapse
|
49
|
Wang F, Wang ZR, Ding XS, Yang H, Guo Y, Su H, Wan XR, Wang LJ, Jiang XY, Xu YH, Chen F, Cui W, Feng FZ. Combining serum peptide signatures with International Federation of Gynecology and Obstetrics (FIGO) risk score to predict the outcomes of patients with gestational trophoblastic neoplasia (GTN) after first-line chemotherapy. Front Oncol 2022; 12:982806. [PMID: 36338720 PMCID: PMC9634134 DOI: 10.3389/fonc.2022.982806] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Accepted: 10/06/2022] [Indexed: 11/21/2022] Open
Abstract
Background Gestational trophoblastic neoplasia (GTN) is a group of clinically rare tumors that develop in the uterus from placental tissue. Currently, its satisfactory curability derives from the timely and accurately classification and refined management for patients. This study aimed to discover biomarkers that could predict the outcomes of GTN patients after first-line chemotherapy. Methods A total of 65 GTN patients were included in the study. Patients were divided into the good or poor outcome group and the clinical characteristics of the patients in the two groups were compared. Furthermore, the serum peptide profiles of all patients were uncovered by using weak cation exchange magnetic beads and matrix-assisted laser desorption/ionization time-of-flight mass spectrometry. Feature peaks were identified by three machine learning algorithms and then models were constructed and compared using five machine learning methods. Additionally, liquid chromatography mass spectrometry was used to identify the feature peptides. Results Multivariate logistic regression analysis showed that the International Federation of Gynecology and Obstetrics (FIGO) risk score was associated with poor outcomes. Eight feature peaks (m/z =1287, 2042, 2862, 2932, 2950, 3240, 3277 and 6626) were selected for model construction and validation by the three algorithms. Based on the panel combining FIGO risk score and peptide serum signatures, the neural network (nnet) model showed promising performance in both the training (AUC=0.9635) and validation (AUC=0.8788) cohorts. Peaks at m/z 2042, 2862, 2932, 3240 were identified as the partial sequences of transthyretin, fibrinogen alpha chain (FGA), beta-globin and FGA, respectively. Conclusion We combined FIGO risk score and serum peptide signatures using the nnet method to construct the model which can accurately predict outcome of GTN patients after first-line chemotherapy. With this model, patients can be further classified and managed, and those with poor predicted outcomes can be given more attention for developing treatment failure.
Collapse
Affiliation(s)
- Fei Wang
- Department of Laboratory Medicine, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Zi-ran Wang
- Department of Laboratory Medicine, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Xue-song Ding
- Department of Obstetrics and Gynecology, National Clinical Research Center for Obstetric & Gynecologic Diseases, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Hua Yang
- Department of Obstetrics and Gynecology, National Clinical Research Center for Obstetric & Gynecologic Diseases, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Ye Guo
- Department of Laboratory Medicine, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Hao Su
- Department of Obstetrics and Gynecology, National Clinical Research Center for Obstetric & Gynecologic Diseases, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Xi-run Wan
- Department of Obstetrics and Gynecology, National Clinical Research Center for Obstetric & Gynecologic Diseases, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Li-juan Wang
- Department of Gynecological Oncology, Sun Yat-Sen Memorial Hospital of Sun Yat-Sen University, Guangzhou, China
| | - Xiang-yang Jiang
- Department of Obstetrics and Gynecology, Shanxi Provincial People’s Hospital, Xian, China
| | - Yan-hua Xu
- Department of Obstetrics and Gynecology, Jinan Maternity and Child Health Care Hospital, Jinan, China
| | - Feng Chen
- Department of Clinical Laboratory, State Key Laboratory of Molecular Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Wei Cui
- Department of Clinical Laboratory, State Key Laboratory of Molecular Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
- *Correspondence: Wei Cui, ; Feng-zhi Feng,
| | - Feng-zhi Feng
- Department of Obstetrics and Gynecology, National Clinical Research Center for Obstetric & Gynecologic Diseases, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
- *Correspondence: Wei Cui, ; Feng-zhi Feng,
| |
Collapse
|
50
|
Towards a better understanding of TF-DNA binding prediction from genomic features. Comput Biol Med 2022; 149:105993. [DOI: 10.1016/j.compbiomed.2022.105993] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Revised: 07/12/2022] [Accepted: 08/14/2022] [Indexed: 11/17/2022]
|