1
|
Song B, Ning X, Guo L, Liu W, Jin H. Comparative Proteomics Analysis Reveals Distinct Molecular Phenotype and Biomarkers in Patients with Erythrodermic Atopic Dermatitis and Erythrodermic Psoriasis. Inflammation 2024:10.1007/s10753-024-02078-3. [PMID: 38877357 DOI: 10.1007/s10753-024-02078-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2024] [Revised: 06/03/2024] [Accepted: 06/03/2024] [Indexed: 06/16/2024]
Abstract
Erythrodermic atopic dermatitis (EAD) and erythrodermic psoriasis (EP) are rare yet debilitating inflammatory skin disorders that propose challenges in diagnosis and discovering effective therapeutic targets. Despite their clinical and histological similarities, the underlying molecular mechanisms and systemic biomarkers of these diseases are substantially unclear. In this study, we sought to investigate the differential serum proteome of EP and EAD patients and identify biomarkers for these two subtypes of erythroderma. We recruited 14 EAD patients, 14 EP patients and 14 healthy controls. Serum samples were collected and analyzed using the Olink high-throughput platform to assess the levels of 269 inflammation-/immune response-/cardiovascular-related biomarkers. Both EAD and EP patients exhibited enhanced immune activation and dysregulated cardiovascular profiles compared to healthy controls. EAD demonstrated a more pronounced inflammation tone, characterized by Th1/Th2/Th22/IL-1-dominant patterns, as well as increased TNF superfamily, Th17, and apoptosis markers. Conversely, EP displayed inflammation with Th1/Th17/TNF-skewing and mild Th2 upregulation, along with notable increases in epidermal-development markers. Disease severity in EAD was strongly correlated with apoptosis/Th2 markers, while correlated with Th17 markers in EP. Furthermore, a panel of eight markers (IL-17A/IL-17C/PI3/CCL20/SH2D1A/SIRT2/DFFA/IL-13) was identified that effectively discriminated between EP and EAD, with an Area Under the Curve greater than 0.8. Our study comprehensively characterizes the circulating molecular profiles in EAD and EP patients, providing insights into the similarities and complexities of their inflammation phenotypes. The identified serum biomarkers have the potential to differentiate between EP and EAD, which could aid in the diagnosis and guiding tailored therapeutics.
Collapse
Affiliation(s)
- Biao Song
- Department of Dermatology, Peking Union Medical College Hospital, Chinese Academy of Medical Science and Peking Union Medical College, Beijing, China
- State Key Laboratory for Complex Severe and Rare Diseases, Peking Union Medical College Hospital, Beijing, 100730, China
- National Clinical Research Center for Dermatologic and Immunologic Diseases, Beijing, China
| | - Xin Ning
- Department of Dermatology, Peking Union Medical College Hospital, Chinese Academy of Medical Science and Peking Union Medical College, Beijing, China
- State Key Laboratory for Complex Severe and Rare Diseases, Peking Union Medical College Hospital, Beijing, 100730, China
- National Clinical Research Center for Dermatologic and Immunologic Diseases, Beijing, China
| | - Lan Guo
- Department of Dermatology, Peking Union Medical College Hospital, Chinese Academy of Medical Science and Peking Union Medical College, Beijing, China
- State Key Laboratory for Complex Severe and Rare Diseases, Peking Union Medical College Hospital, Beijing, 100730, China
- National Clinical Research Center for Dermatologic and Immunologic Diseases, Beijing, China
| | - Weida Liu
- State Key Laboratory for Complex Severe and Rare Diseases, Peking Union Medical College Hospital, Beijing, 100730, China
- Medical Research Center, Peking Union Medical College Hospital, Peking Union Medical College, Chinese Academy of Medical Sciences, Beijing, 100730, China
| | - Hongzhong Jin
- Department of Dermatology, Peking Union Medical College Hospital, Chinese Academy of Medical Science and Peking Union Medical College, Beijing, China.
- State Key Laboratory for Complex Severe and Rare Diseases, Peking Union Medical College Hospital, Beijing, 100730, China.
- National Clinical Research Center for Dermatologic and Immunologic Diseases, Beijing, China.
| |
Collapse
|
2
|
Wu Z, Yu W, Luo J, Shen G, Cui Z, Ni W, Wang H. Comprehensive transcriptomic analysis unveils macrophage-associated genes for establishing an abdominal aortic aneurysm diagnostic model and molecular therapeutic framework. Eur J Med Res 2024; 29:323. [PMID: 38867262 PMCID: PMC11167832 DOI: 10.1186/s40001-024-01900-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Accepted: 05/22/2024] [Indexed: 06/14/2024] Open
Abstract
BACKGROUND Abdominal aortic aneurysm (AAA) is a highly lethal cardiovascular disease. The aim of this research is to identify new biomarkers and therapeutic targets for the treatment of such deadly diseases. METHODS Single-sample gene set enrichment analysis (ssGSEA) and CIBERSORT algorithms were used to identify distinct immune cell infiltration types between AAA and normal abdominal aortas. Single-cell RNA sequencing data were used to analyse the hallmark genes of AAA-associated macrophage cell subsets. Six macrophage-related hub genes were identified through weighted gene co-expression network analysis (WGCNA) and validated for expression in clinical samples and AAA mouse models. We screened potential therapeutic drugs for AAA through online Connectivity Map databases (CMap). A network-based approach was used to explore the relationships between the candidate genes and transcription factors (TFs), lncRNAs, and miRNAs. Additionally, we also identified hub genes that can effectively identify AAA and atherosclerosis (AS) through a variety of machine learning algorithms. RESULTS We obtained six macrophage hub genes (IL-1B, CXCL1, SOCS3, SLC2A3, G0S2, and CCL3) that can effectively diagnose abdominal aortic aneurysm. The ROC curves and decision curve analysis (DCA) were combined to further confirm the good diagnostic efficacy of the hub genes. Further analysis revealed that the expression of the six hub genes mentioned above was significantly increased in AAA patients and mice. We also constructed TF regulatory networks and competing endogenous RNA networks (ceRNA) to reveal potential mechanisms of disease occurrence. We also obtained two key genes (ZNF652 and UBR5) through a variety of machine learning algorithms, which can effectively distinguish abdominal aortic aneurysm and atherosclerosis. CONCLUSION Our findings depict the molecular pharmaceutical network in AAA, providing new ideas for effective diagnosis and treatment of diseases.
Collapse
Affiliation(s)
- Zhen Wu
- Department of Vascular and Interventional Surgery, The First Affiliated Hospital of Harbin Medical University, Harbin, 150001, Heilongjiang, China
| | - Weiming Yu
- Department of Vascular and Interventional Surgery, The First Affiliated Hospital of Harbin Medical University, Harbin, 150001, Heilongjiang, China
- General Surgery, Thyroid Surgery, Sun Yat-sen Memorial Hospital, Sun Yat-sen University, Guangzhou, 510000, Guangdong, China
| | - Jie Luo
- Department of Vascular and Interventional Surgery, The First Affiliated Hospital of Harbin Medical University, Harbin, 150001, Heilongjiang, China
- Department of Clinical Laboratory, Shanghai Tenth People's Hospital, School of Medicine, Tongji University, Shanghai, 200072, China
| | - Guanghui Shen
- Department of Vascular and Interventional Surgery, The First Affiliated Hospital of Harbin Medical University, Harbin, 150001, Heilongjiang, China
| | - Zhongqi Cui
- Department of Clinical Laboratory, Shanghai Tenth People's Hospital, School of Medicine, Tongji University, Shanghai, 200072, China
| | - Wenxuan Ni
- Department of Clinical Laboratory, Shanghai Tenth People's Hospital, School of Medicine, Tongji University, Shanghai, 200072, China.
| | - Haiyang Wang
- Department of Vascular and Interventional Surgery, The First Affiliated Hospital of Harbin Medical University, Harbin, 150001, Heilongjiang, China.
| |
Collapse
|
3
|
Feng C, Wei H, Li X, Feng B, Xu C, Zhu X, Liu R. A stacking-based algorithm for antifreeze protein identification using combined physicochemical, pseudo amino acid composition, and reduction property features. Comput Biol Med 2024; 176:108534. [PMID: 38754217 DOI: 10.1016/j.compbiomed.2024.108534] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2024] [Revised: 04/03/2024] [Accepted: 04/28/2024] [Indexed: 05/18/2024]
Abstract
Antifreeze proteins have wide applications in the medical and food industries. In this study, we propose a stacking-based classifier that can effectively identify antifreeze proteins. Initially, feature extraction was performed in three aspects: reduction properties, scalable pseudo amino acid composition, and physicochemical properties. A hybrid feature set comprised of the combined information from these three categories was obtained. Subsequently, we trained the training set based on LightGBM, XGBoost, and RandomForest algorithms, and the training outcomes were passed to the Logistic algorithm for matching, thereby establishing a stacking algorithm. The proposed algorithm was tested on the test set and an independent validation set. Experimental data indicates that the algorithm achieved a recognition accuracy of 98.3 %, and an accuracy of 98.5 % on the validation set. Lastly, we analyzed the reasons why numerical features achieved high recognition capabilities from multiple aspects. Data dimensionality reduction and the analysis from two-dimensional and three-dimensional views revealed separability between positive and negative samples, and the protein three-dimensional structure further demonstrated significant differences in related features between the two samples. Analysis of the classifier revealed that Hr*Hr, HrHr, and Sc-PseAAC_1, 188D(152,116,57,183) were among the seven most important numerical features affecting algorithm recognition. For Hr*Hr and HrHr, supportive sequence level evidence for the reduction dictionary was found in terms of conservation area analysis, multiple sequence alignment, and amino acid conservative substitution. Moreover, the importance of the reduction dictionary was recognized through a comparative analysis of importance before and after the reduction, realizing the effectiveness of the dictionary in improving feature importance. A decision tree model has been utilized to discern the distinctions between dipeptides associated with the physical and chemical properties of His(H), Iso(I), Leu(L), and Lys(K) and other dipeptides. We finally analyzed the other seven features of importance, and data analysis confirmed that hydrophobicity, secondary structure, charge properties, van der Waals forces, and solvent accessibility are also factors affecting the antifreeze capability of proteins.
Collapse
Affiliation(s)
- Changli Feng
- Department of Information Science and Technology, Taishan University, Taian, 271000, China.
| | - Haiyan Wei
- Department of Information Science and Technology, Taishan University, Taian, 271000, China.
| | - Xin Li
- Department of Information Science and Technology, Taishan University, Taian, 271000, China.
| | - Bin Feng
- Department of Information Science and Technology, Taishan University, Taian, 271000, China.
| | - Chugui Xu
- Department of Information Science and Technology, Taishan University, Taian, 271000, China.
| | - Xiaorong Zhu
- Department of Information Science and Technology, Taishan University, Taian, 271000, China.
| | - Ruijun Liu
- School of Software, Beihang University, Beijing, 100191, China.
| |
Collapse
|
4
|
Traini E, Portengen L, Ohanyan H, van Vorstenbosch R, Vermeulen R, Huss A. A prospective exploration of the urban exposome in relation to headache in the Dutch population-based Occupational and environmental health cohort study (AMIGO). ENVIRONMENT INTERNATIONAL 2024; 188:108776. [PMID: 38810494 DOI: 10.1016/j.envint.2024.108776] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Revised: 05/03/2024] [Accepted: 05/24/2024] [Indexed: 05/31/2024]
Abstract
OBJECTIVE Headache is one of the most prevalent and disabling health conditions globally. We prospectively explored the urban exposome in relation to weekly occurrence of headache episodes using data from the Dutch population-based Occupational and Environmental Health Cohort Study (AMIGO). MATERIAL AND METHODS Participants (N = 7,339) completed baseline and follow-up questionnaires in 2011 and 2015, reporting headache frequency. Information on the urban exposome covered 80 exposures across 10 domains, such as air pollution, electromagnetic fields, and lifestyle and socio-demographic characteristics. We first identified all relevant exposures using the Boruta algorithm and then, for each exposure separately, we estimated the average treatment effect (ATE) and related standard error (SE) by training causal forests adjusted for age, depression diagnosis, painkiller use, general health indicator, sleep disturbance index and weekly occurrence of headache episodes at baseline. RESULTS Occurrence of weekly headache was 12.5 % at baseline and 11.1 % at follow-up. Boruta selected five air pollutants (NO2, NOX, PM10, silicon in PM10, iron in PM2.5) and one urban temperature measure (heat island effect) as factors contributing to the occurrence of weekly headache episodes at follow-up. The estimated causal effect of each exposure on weekly headache indicated positive associations. NO2 showed the largest effect (ATE = 0.007 per interquartile range (IQR) increase; SE = 0.004), followed by PM10 (ATE = 0.006 per IQR increase; SE = 0.004), heat island effect (ATE = 0.006 per one-degree Celsius increase; SE = 0.007), NOx (ATE = 0.004 per IQR increase; SE = 0.004), iron in PM2.5 (ATE = 0.003 per IQR increase; SE = 0.004), and silicon in PM10 (ATE = 0.003 per IQR increase; SE = 0.004). CONCLUSION Our results suggested that exposure to air pollution and heat island effects contributed to the reporting of weekly headache episodes in the study population.
Collapse
Affiliation(s)
- Eugenio Traini
- Institute for Risk Assessment Sciences, Utrecht University, Utrecht, the Netherlands.
| | - Lützen Portengen
- Institute for Risk Assessment Sciences, Utrecht University, Utrecht, the Netherlands
| | - Haykanush Ohanyan
- Institute for Risk Assessment Sciences, Utrecht University, Utrecht, the Netherlands
| | | | - Roel Vermeulen
- Institute for Risk Assessment Sciences, Utrecht University, Utrecht, the Netherlands
| | - Anke Huss
- Institute for Risk Assessment Sciences, Utrecht University, Utrecht, the Netherlands
| |
Collapse
|
5
|
Silva L, da Motta LG, Eberly L. Prediction of tuberculosis clusters in the riverine municipalities of the Brazilian Amazon with machine learning. REVISTA BRASILEIRA DE EPIDEMIOLOGIA 2024; 27:e240024. [PMID: 38747742 PMCID: PMC11093519 DOI: 10.1590/1980-549720240024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Revised: 02/17/2024] [Accepted: 03/06/2024] [Indexed: 05/19/2024] Open
Abstract
OBJECTIVE Tuberculosis (TB) is the second most deadly infectious disease globally, posing a significant burden in Brazil and its Amazonian region. This study focused on the "riverine municipalities" and hypothesizes the presence of TB clusters in the area. We also aimed to train a machine learning model to differentiate municipalities classified as hot spots vs. non-hot spots using disease surveillance variables as predictors. METHODS Data regarding the incidence of TB from 2019 to 2022 in the riverine town was collected from the Brazilian Health Ministry Informatics Department. Moran's I was used to assess global spatial autocorrelation, while the Getis-Ord GI* method was employed to detect high and low-incidence clusters. A Random Forest machine-learning model was trained using surveillance variables related to TB cases to predict hot spots among non-hot spot municipalities. RESULTS Our analysis revealed distinct geographical clusters with high and low TB incidence following a west-to-east distribution pattern. The Random Forest Classification model utilizes six surveillance variables to predict hot vs. non-hot spots. The machine learning model achieved an Area Under the Receiver Operator Curve (AUC-ROC) of 0.81. CONCLUSION Municipalities with higher percentages of recurrent cases, deaths due to TB, antibiotic regimen changes, percentage of new cases, and cases with smoking history were the best predictors of hot spots. This prediction method can be leveraged to identify the municipalities at the highest risk of being hot spots for the disease, aiding policymakers with an evidenced-based tool to direct resource allocation for disease control in the riverine municipalities.
Collapse
Affiliation(s)
- Luis Silva
- University of Minnesota, Minneapolis – Minneapolis (MN), United States
| | | | - Lynn Eberly
- University of Minnesota, Minneapolis – Minneapolis (MN), United States
| |
Collapse
|
6
|
Campler MR, Cheng TY, Lee CW, Hofacre CL, Lossie G, Silva GS, El-Gazzar MM, Arruda AG. Investigating the uses of machine learning algorithms to inform risk factor analyses: The example of avian infectious bronchitis virus (IBV) in broiler chickens. Res Vet Sci 2024; 171:105201. [PMID: 38442531 DOI: 10.1016/j.rvsc.2024.105201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Revised: 11/16/2023] [Accepted: 02/24/2024] [Indexed: 03/07/2024]
Abstract
Infectious bronchitis virus (IBV) is a contagious coronavirus causing respiratory and urogenital disease in chickens and is responsible for significant economic losses for both the broiler and table egg layer industries. Despite IBV being regularly monitored using standard epidemiologic surveillance practices, knowledge and evidence of risk factors associated with IBV transmission remain limited. The study objective was to compare risk factor modeling outcomes between a traditional stepwise variable selection approach and a machine learning-based random forest Boruta algorithm using routinely collected IBV antibody titer data from broiler flocks. IBV antibody sampling events (n = 1111) from 166 broiler sites between 2016 and 2021 were accessed. Ninety-two geospatial-related and poultry-density variables were obtained using a geographic information system and data sets from publicly available sources. Seventeen and 27 candidate variables were screened to potentially have an association with elevated IBV antibody titers according to the manual selection and machine learning algorithm, respectively. Selected variables from both methods were further investigated by construction of multivariable generalized mixed logistic regression models. Six variables were shortlisted by both screening methods, which included year, distance to urban areas, main roads, landcover, density of layer sites and year, however, final models for both approaches only shared year as an important predictor. Despite limited significance of clinical outcomes, this work showcases the potential of a novel explorative modeling approach in combination with often unutilized resources such as publicly available geospatial data, surveillance health data and machine learning as potential supplementary tools to investigate risk factors related to infectious diseases.
Collapse
Affiliation(s)
- Magnus R Campler
- Department of Veterinary Preventive Medicine, The Ohio State University, OH 43210, USA
| | - Ting-Yu Cheng
- Department of Veterinary Preventive Medicine, The Ohio State University, OH 43210, USA
| | - Chang-Won Lee
- Exotic and Emerging Avian Diseases, Southeast Poultry Research Laboratory, National Poultry Research Center, Agricultural Research Service, U.S. Department of Agriculture, Athens, GA 30605, USA
| | | | - Geoffrey Lossie
- Department of Comparative Pathobiology and Animal Disease Diagnostic Laboratory, College of Veterinary Medicine, Purdue University, IN 47907, USA
| | - Gustavo S Silva
- Department of Comparative Pathobiology and Animal Disease Diagnostic Laboratory, College of Veterinary Medicine, Purdue University, IN 47907, USA
| | - Mohamed M El-Gazzar
- Department of Veterinary Diagnostic and Production Animal Medicine, College of Veterinary Medicine, Iowa State University, IA 50011, USA
| | - Andréia G Arruda
- Department of Veterinary Preventive Medicine, The Ohio State University, OH 43210, USA.
| |
Collapse
|
7
|
Luo J, Zhou Y, Song Y, Wang D, Li M, Du X, Kang J, Ye P, Xia J. Association between the neutrophil-to-lymphocyte ratio and in-hospital mortality in patients with chronic kidney disease and coronary artery disease in the intensive care unit. Eur J Med Res 2024; 29:260. [PMID: 38689359 PMCID: PMC11059689 DOI: 10.1186/s40001-024-01850-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2024] [Accepted: 04/18/2024] [Indexed: 05/02/2024] Open
Abstract
BACKGROUND The objective of this study was to investigate the correlation between neutrophil-to-lymphocyte ratios (NLR) and the risk of in-hospital death in patients admitted to the intensive care unit (ICU) with both chronic kidney disease (CKD) and coronary artery disease (CAD). METHODS Data from the MIMIC-IV database, which includes a vast collection of more than 50,000 ICU admissions occurring between 2008 and 2019, was utilized in the study and eICU-CRD was conducted for external verification. The Boruta algorithm was employed for feature selection. Univariable and multivariable logistic regression analyses and multivariate restricted cubic spline regression were employed to scrutinize the association between NLR and in-hospital mortality. The receiver operating characteristic (ROC) curves were conducted to estimate the predictive ability of NLR. RESULTS After carefully applying criteria to include and exclude participants, a total of 2254 patients with CKD and CAD were included in the research. The findings showed a median NLR of 7.3 (4.4, 12.1). The outcomes of multivariable logistic regression demonstrated that NLR significantly elevated the risk of in-hospital mortality (OR 2.122, 95% confidence interval [CI] 1.542-2.921, P < 0.001) after accounting for all relevant factors. Further insights from subgroup analyses unveiled that age and Sequential Organ Failure Assessment (SOFA) scores displayed an interactive effect in the correlation between NLR and in-hospital deaths. The NLR combined with traditional cardiovascular risk factors showed relatively great predictive value for in-hospital mortality (AUC 0.750). CONCLUSION The findings of this research indicate that the NLR can be used as an indicator for predicting the likelihood of death during a patient's stay in the intensive care unit, particularly for individuals with both CAD and CKD. The results indicate that NLR may serve as a valuable tool for assessing and managing risks in this group at high risk. Further investigation is required to authenticate these findings and investigate the mechanisms that underlie the correlation between NLR and mortality in individuals with CAD and CKD.
Collapse
Affiliation(s)
- Jingjing Luo
- Department of Cardiovascular Surgery, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430022, China
| | - Yufan Zhou
- Department of Cardiovascular Surgery, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430022, China
| | - Yu Song
- Department of Cardiovascular Surgery, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430022, China
| | - Dashuai Wang
- Department of Cardiovascular Surgery, The First Affiliated Hospital of Zhengzhou University, Henan Province, 450052, China
| | - Meihong Li
- Department of Physiology and Pathophysiology, School of Basic Medical Sciences, Peking University Health Science Center, Beijing, 100038, China
| | - Xinling Du
- Department of Cardiovascular Surgery, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430022, China.
| | - Jihong Kang
- Department of Physiology and Pathophysiology, School of Basic Medical Sciences, Peking University Health Science Center, Beijing, 100038, China.
| | - Ping Ye
- Department of Cardiology, The Central Hospital of Wuhan, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430014, China.
| | - Jiahong Xia
- Department of Cardiovascular Surgery, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430022, China.
| |
Collapse
|
8
|
Nian Y, Su X, Yue H, Zhu Y, Li J, Wang W, Sheng Y, Ma Q, Liu J, Li X. Estimation of the rice aboveground biomass based on the first derivative spectrum and Boruta algorithm. FRONTIERS IN PLANT SCIENCE 2024; 15:1396183. [PMID: 38726299 PMCID: PMC11079175 DOI: 10.3389/fpls.2024.1396183] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/05/2024] [Accepted: 04/11/2024] [Indexed: 05/12/2024]
Abstract
Aboveground biomass (AGB) is regarded as a critical variable in monitoring crop growth and yield. The use of hyperspectral remote sensing has emerged as a viable method for the rapid and precise monitoring of AGB. Due to the extensive dimensionality and volume of hyperspectral data, it is crucial to effectively reduce data dimensionality and select sensitive spectral features to enhance the accuracy of rice AGB estimation models. At present, derivative transform and feature selection algorithms have become important means to solve this problem. However, few studies have systematically evaluated the impact of derivative spectrum combined with feature selection algorithm on rice AGB estimation. To this end, at the Xiaogang Village (Chuzhou City, China) Experimental Base in 2020, this study used an ASD FieldSpec handheld 2 ground spectrometer (Analytical Spectroscopy Devices, Boulder, Colorado, USA) to obtain canopy spectral data at the critical growth stage (tillering, jointing, booting, heading, and maturity stages) of rice, and evaluated the performance of the recursive feature elimination (RFE) and Boruta feature selection algorithm through partial least squares regression (PLSR), principal component regression (PCR), support vector machine (SVM) and ridge regression (RR). Moreover, we analyzed the importance of the optimal derivative spectrum. The findings indicate that (1) as the growth stage progresses, the correlation between rice canopy spectrum and AGB shows a trend from high to low, among which the first derivative spectrum (FD) has the strongest correlation with AGB. (2) The number of feature bands selected by the Boruta algorithm is 19~35, which has a good dimensionality reduction effect. (3) The combination of FD-Boruta-PCR (FB-PCR) demonstrated the best performance in estimating rice AGB, with an increase in R² of approximately 10% ~ 20% and a decrease in RMSE of approximately 0.08% ~ 14%. (4) The best estimation stage is the booting stage, with R2 values between 0.60 and 0.74 and RMSE values between 1288.23 and 1554.82 kg/hm2. This study confirms the accuracy of hyperspectral remote sensing in estimating vegetation biomass and further explores the theoretical foundation and future direction for monitoring rice growth dynamics.
Collapse
Affiliation(s)
- Ying Nian
- College of Resource and Environment, Anhui Science and Technology University, Chuzhou, China
| | - Xiangxiang Su
- College of Resource and Environment, Anhui Science and Technology University, Chuzhou, China
| | - Hu Yue
- College of Resource and Environment, Anhui Science and Technology University, Chuzhou, China
| | - Yongji Zhu
- College of Resource and Environment, Anhui Science and Technology University, Chuzhou, China
| | - Jun Li
- College of Resource and Environment, Anhui Science and Technology University, Chuzhou, China
| | - Weiqiang Wang
- College of Resource and Environment, Anhui Science and Technology University, Chuzhou, China
| | - Yali Sheng
- College of Resource and Environment, Anhui Science and Technology University, Chuzhou, China
| | - Qiang Ma
- College of Resource and Environment, Anhui Science and Technology University, Chuzhou, China
| | - Jikai Liu
- College of Resource and Environment, Anhui Science and Technology University, Chuzhou, China
- Anhui Province Crop Intelligent Planting and Processing Technology Engineering Research Center, Anhui Science and Technology University, Chuzhou, Anhui, China
| | - Xinwei Li
- College of Resource and Environment, Anhui Science and Technology University, Chuzhou, China
- Anhui Province Crop Intelligent Planting and Processing Technology Engineering Research Center, Anhui Science and Technology University, Chuzhou, Anhui, China
- Anhui Province Agricultural Waste Fertilizer Utilization and Cultivated Land Quality Improvement Engineering Research Center, Anhui Science and Technology University, Chuzhou, China
| |
Collapse
|
9
|
Hu J, Szymczak S. Evaluation of network-guided random forest for disease gene discovery. BioData Min 2024; 17:10. [PMID: 38627770 PMCID: PMC11020917 DOI: 10.1186/s13040-024-00361-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Accepted: 04/09/2024] [Indexed: 04/20/2024] Open
Abstract
BACKGROUND Gene network information is believed to be beneficial for disease module and pathway identification, but has not been explicitly utilized in the standard random forest (RF) algorithm for gene expression data analysis. We investigate the performance of a network-guided RF where the network information is summarized into a sampling probability of predictor variables which is further used in the construction of the RF. RESULTS Our simulation results suggest that network-guided RF does not provide better disease prediction than the standard RF. In terms of disease gene discovery, if disease genes form module(s), network-guided RF identifies them more accurately. In addition, when disease status is independent from genes in the given network, spurious gene selection results can occur when using network information, especially on hub genes. Our empirical analysis on two balanced microarray and RNA-Seq breast cancer datasets from The Cancer Genome Atlas (TCGA) for classification of progesterone receptor (PR) status also demonstrates that network-guided RF can identify genes from PGR-related pathways, which leads to a better connected module of identified genes. CONCLUSIONS Gene networks can provide additional information to aid the gene expression analysis for disease module and pathway identification. But they need to be used with caution and validation on the results need to be carried out to guard against spurious gene selection. More robust approaches to incorporate such information into RF construction also warrant further study.
Collapse
Affiliation(s)
- Jianchang Hu
- Institute of Medical Biometry and Statistics, University of Lübeck, Ratzeburger Allee 160, Lübeck, 23562, Germany
| | - Silke Szymczak
- Institute of Medical Biometry and Statistics, University of Lübeck, Ratzeburger Allee 160, Lübeck, 23562, Germany.
| |
Collapse
|
10
|
Chiu CY, Chiang MC, Chiang MH, Lien R, Fu RH, Hsu KH, Chu SM. Metabolomic Analysis Reveals the Association of Severe Bronchopulmonary Dysplasia with Gut Microbiota and Oxidative Response in Extremely Preterm Infants. Metabolites 2024; 14:219. [PMID: 38668347 PMCID: PMC11052141 DOI: 10.3390/metabo14040219] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2024] [Revised: 04/10/2024] [Accepted: 04/11/2024] [Indexed: 04/28/2024] Open
Abstract
Bronchopulmonary dysplasia (BPD) is a chronic lung disease mainly affecting premature infants needing ventilation or oxygen for respiratory distress. This study aimed to evaluate the molecular linkages for BPD in very and extremely preterm infants using a metabolomics-based approach. A case-control study of enrolling preterm infants born before 32 weeks gestational age (GA) was prospectively performed. These preterm infants were subsequently stratified into the following two groups for further analysis: no or mild BPD, and moderate or severe BPD based on the 2019 NICHD criteria. Urinary metabolomic profiling was performed using 1H-Nuclear magnetic resonance (NMR) spectroscopy coupled with partial least squares discriminant analysis (PLS-DA) at a corrected age of 6 months. Metabolites significantly differentially related to GA and BPD severity were performed between groups, and their roles in functional metabolic pathways were also assessed. A total of 89 preterm infants born before 32 weeks gestation and 50 infants born at term age (above 37 completed weeks' gestation) served as controls and were enrolled into the study. There were 21 and 24 urinary metabolites identified to be significantly associated with GA and BPD severity, respectively (p < 0.05). Among them, N-phenylacetylglycine, hippurate, acetylsalicylate, gluconate, and indoxyl sulfate were five metabolites that were significantly higher, with the highest importance in both infants with GA < 28 weeks and those with moderate to severe BPD, whereas betaine and N,N-dimethylglycine were significantly lower (p < 0.05). Furthermore, ribose and a gluconate related pentose phosphate pathway were strongly associated with these infants (p < 0.01). In conclusion, urinary metabolomic analysis highlights the crucial role of gut microbiota dysbiosis in the pathogenesis of BPD in preterm infants, accompanied by metabolites related to diminished antioxidative capacity, prompting an aggressive antioxidation response in extremely preterm infants with severe BPD.
Collapse
Affiliation(s)
- Chih-Yung Chiu
- Division of Pediatric Pulmonology, Department of Pediatrics, Chang Gung Memorial Hospital at Linkou, and Chang Gung University, Taoyuan 333, Taiwan
- Clinical Metabolomics Core Laboratory, Chang Gung Memorial Hospital at Linkou, Taoyuan 333, Taiwan;
| | - Ming-Chou Chiang
- Division of Pediatric Neonatology, Department of Pediatrics, Chang Gung Memorial Hospital at Linkou, and Chang Gung University, Taoyuan 333, Taiwan; (M.-C.C.); (R.L.); (R.-H.F.); (K.-H.H.)
| | - Meng-Han Chiang
- Clinical Metabolomics Core Laboratory, Chang Gung Memorial Hospital at Linkou, Taoyuan 333, Taiwan;
| | - Reyin Lien
- Division of Pediatric Neonatology, Department of Pediatrics, Chang Gung Memorial Hospital at Linkou, and Chang Gung University, Taoyuan 333, Taiwan; (M.-C.C.); (R.L.); (R.-H.F.); (K.-H.H.)
| | - Ren-Huei Fu
- Division of Pediatric Neonatology, Department of Pediatrics, Chang Gung Memorial Hospital at Linkou, and Chang Gung University, Taoyuan 333, Taiwan; (M.-C.C.); (R.L.); (R.-H.F.); (K.-H.H.)
| | - Kai-Hsiang Hsu
- Division of Pediatric Neonatology, Department of Pediatrics, Chang Gung Memorial Hospital at Linkou, and Chang Gung University, Taoyuan 333, Taiwan; (M.-C.C.); (R.L.); (R.-H.F.); (K.-H.H.)
| | - Shih-Ming Chu
- Division of Pediatric Neonatology, Department of Pediatrics, Chang Gung Memorial Hospital at Linkou, and Chang Gung University, Taoyuan 333, Taiwan; (M.-C.C.); (R.L.); (R.-H.F.); (K.-H.H.)
| |
Collapse
|
11
|
Lyu C, Joehanes R, Huan T, Levy D, Li Y, Wang M, Liu X, Liu C, Ma J. Enhancing selection of alcohol consumption-associated genes by random forest. Br J Nutr 2024:1-10. [PMID: 38606596 DOI: 10.1017/s0007114524000795] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/13/2024]
Abstract
Machine learning methods have been used in identifying omics markers for a variety of phenotypes. We aimed to examine whether a supervised machine learning algorithm can improve identification of alcohol-associated transcriptomic markers. In this study, we analysed array-based, whole-blood derived expression data for 17 873 gene transcripts in 5508 Framingham Heart Study participants. By using the Boruta algorithm, a supervised random forest (RF)-based feature selection method, we selected twenty-five alcohol-associated transcripts. In a testing set (30 % of entire study participants), AUC (area under the receiver operating characteristics curve) of these twenty-five transcripts were 0·73, 0·69 and 0·66 for non-drinkers v. moderate drinkers, non-drinkers v. heavy drinkers and moderate drinkers v. heavy drinkers, respectively. The AUC of the selected transcripts by the Boruta method were comparable to those identified using conventional linear regression models, for example, AUC of 1958 transcripts identified by conventional linear regression models (false discovery rate < 0·2) were 0·74, 0·66 and 0·65, respectively. With Bonferroni correction for the twenty-five Boruta method-selected transcripts and three CVD risk factors (i.e. at P < 6·7e-4), we observed thirteen transcripts were associated with obesity, three transcripts with type 2 diabetes and one transcript with hypertension. For example, we observed that alcohol consumption was inversely associated with the expression of DOCK4, IL4R, and SORT1, and DOCK4 and SORT1 were positively associated with obesity, and IL4R was inversely associated with hypertension. In conclusion, using a supervised machine learning method, the RF-based Boruta algorithm, we identified novel alcohol-associated gene transcripts.
Collapse
Affiliation(s)
- Chenglin Lyu
- Department of Biostatistics, Boston University School of Public Health, Boston, MA02118, USA
- Department of Anatomy and Neurobiology, Boston University Chobanian & Avedisian School of Medicine, Boston, MA02118, USA
| | - Roby Joehanes
- Framingham Heart Study and Population Sciences Branch, NHLBI, Framingham, MA01702, USA
| | - Tianxiao Huan
- Framingham Heart Study and Population Sciences Branch, NHLBI, Framingham, MA01702, USA
| | - Daniel Levy
- Framingham Heart Study and Population Sciences Branch, NHLBI, Framingham, MA01702, USA
| | - Yi Li
- Department of Biostatistics, Boston University School of Public Health, Boston, MA02118, USA
| | - Mengyao Wang
- Department of Biostatistics, Boston University School of Public Health, Boston, MA02118, USA
| | - Xue Liu
- Department of Biostatistics, Boston University School of Public Health, Boston, MA02118, USA
| | - Chunyu Liu
- Department of Biostatistics, Boston University School of Public Health, Boston, MA02118, USA
| | - Jiantao Ma
- Nutrition Epidemiology and Data Science, Friedman School of Nutrition Science and Policy, Tufts University, Boston, MA02111, USA
| |
Collapse
|
12
|
Shen CH, Chen CB, Chiang MH, Kuo CN, Chung WH, Lin YK, Chiu CY. Vitamin D level is inversely related to allergen sensitization for risking atopic dermatitis in early childhood. World Allergy Organ J 2024; 17:100890. [PMID: 38585333 PMCID: PMC10998224 DOI: 10.1016/j.waojou.2024.100890] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Revised: 02/21/2024] [Accepted: 03/04/2024] [Indexed: 04/09/2024] Open
Abstract
Background There are few studies concerning the impact of serum vitamin D status on the risk of allergen sensitization and atopic dermatitis (AD) during early childhood. Method Children with AD and age-matched healthy controls (HC) were prospectively enrolled at age 0.5, 2, and 4 years. Serum 25-hydroxyvitamin D (25[OH]D) level was measured using Elecsys Vitamin D Total assay. The study utilized the ImmunoCAP assay to analyze specific IgE for food and inhalant allergens, along with total serum IgE levels. It explored the connection between vitamin D levels and allergen sensitization, as well as their influence on AD at different ages. Results A total of 222 children including 95 (59 AD and 36 HC), 66 (37 AD and 29 HC), and 61 (32 AD and 29 HC) children were classified at age 0.5, 2, and 4 years, respectively. In children with AD, there was a significantly lower vitamin D level at age 2 and 4, but a significantly higher prevalence of food and mite sensitization at all ages in comparison with HC (P < 0.001). Vitamin D level was found to be inversely related to the prevalence of allergen sensitization at age 4 (P < 0.05). However, vitamin D level appeared to have high importance for allergen sensitization at all ages and AD at age 2 and 4 years. Conclusion Vitamin D deficiency is strongly associated with heightened prevalence of allergen sensitization, potentially increasing the susceptibility to AD in early childhood.
Collapse
Affiliation(s)
- Chin-Hsuan Shen
- College of Medicine, Chang Gung University, Taoyuan, Taiwan
- Chang Gung Memorial Hospital Linkou Medical Center, Taoyuan, Taiwan
| | - Chun-Bing Chen
- Department of Dermatology, Drug Hypersensitivity Clinical and Research Center, Chang Gung Memorial Hospital, Linkou, Taipei, Keelung, Taiwan
- Graduate Institute of Clinical Medical Sciences, College of Medicine, Chang Gung University, Taoyuan, Taiwan
| | - Meng-Han Chiang
- Clinical Metabolomics Core Laboratory, Chang Gung Memorial Hospital at Linkou, Taoyuan, Taiwan
| | - Chieh-Ni Kuo
- Chang Gung Memorial Hospital Linkou Medical Center, Taoyuan, Taiwan
| | - Wen-Hung Chung
- Department of Dermatology, Drug Hypersensitivity Clinical and Research Center, Chang Gung Memorial Hospital, Linkou, Taipei, Keelung, Taiwan
| | - Yin-Ku Lin
- Department of Traditional Chinese Medicine, Chang Gung Memorial Hospital at Keelung, Taiwan
- School of Traditional Chinese Medicine, Chang Gung University, Taoyuan, Taiwan
| | - Chih-Yung Chiu
- Clinical Metabolomics Core Laboratory, Chang Gung Memorial Hospital at Linkou, Taoyuan, Taiwan
- Division of Pediatric Pulmonology, Chang Gung Memorial Hospital at Linkou, College of Medicine, Chang Gung University, Taoyuan, Taiwan
| |
Collapse
|
13
|
Gotta J, Koch V, Geyer T, Martin SS, Booz C, Mahmoudi S, Eichler K, Reschke P, D'Angelo T, Klimek K, Vogl TJ, Gruenewald LD. Imaging-based risk stratification of patients with pulmonary embolism based on dual-energy CT-derived radiomics. Eur J Clin Invest 2024; 54:e14139. [PMID: 38063028 DOI: 10.1111/eci.14139] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/14/2023] [Revised: 11/13/2023] [Accepted: 11/20/2023] [Indexed: 03/13/2024]
Abstract
BACKGROUND Technological progress in the acquisition of medical images and the extraction of underlying quantitative imaging data has introduced exciting prospects for the diagnostic assessment of a wide range of conditions. This study aims to investigate the diagnostic utility of a machine learning classifier based on dual-energy computed tomography (DECT) radiomics for classifying pulmonary embolism (PE) severity and assessing the risk for early death. METHODS Patients who underwent CT pulmonary angiogram (CTPA) between January 2015 and March 2022 were considered for inclusion in this study. Based on DECT imaging, 107 radiomic features were extracted for each patient using standardized image processing. After dividing the dataset into training and test sets, stepwise feature reduction based on reproducibility, variable importance and correlation analyses were performed to select the most relevant features; these were used to train and validate the gradient-boosted tree models. RESULTS The trained machine learning classifier achieved a classification accuracy of .90 for identifying high-risk PE patients with an area under the receiver operating characteristic curve of .59. This CT-based radiomics signature showed good diagnostic accuracy for risk stratification in individuals presenting with central PE, particularly within higher risk groups. CONCLUSION Models utilizing DECT-derived radiomics features can accurately stratify patients with pulmonary embolism into established clinical risk scores. This approach holds the potential to enhance patient management and optimize patient flow by assisting in the clinical decision-making process. It also offers the advantage of saving time and resources by leveraging existing imaging to eliminate the necessity for manual clinical scoring.
Collapse
Affiliation(s)
- Jennifer Gotta
- Goethe University Hospital Frankfurt, Frankfurt am Main, Germany
| | - Vitali Koch
- Goethe University Hospital Frankfurt, Frankfurt am Main, Germany
| | - Tobias Geyer
- Goethe University Hospital Frankfurt, Frankfurt am Main, Germany
| | - Simon S Martin
- Goethe University Hospital Frankfurt, Frankfurt am Main, Germany
| | - Christian Booz
- Goethe University Hospital Frankfurt, Frankfurt am Main, Germany
| | | | - Katrin Eichler
- Goethe University Hospital Frankfurt, Frankfurt am Main, Germany
| | - Philipp Reschke
- Goethe University Hospital Frankfurt, Frankfurt am Main, Germany
| | - Tommaso D'Angelo
- Department of Biomedical Sciences and Morphological and Functional Imaging, University of Messina, Messina, Italy
| | - Konrad Klimek
- Goethe University Frankfurt, University Hospital, Clinic for Nuclear Medicine, Frankfurt am Main, Germany
| | - Thomas J Vogl
- Goethe University Hospital Frankfurt, Frankfurt am Main, Germany
| | | |
Collapse
|
14
|
Wu G, Zaker A, Ebrahimi A, Tripathi S, Mer AS. Text-mining-based feature selection for anticancer drug response prediction. BIOINFORMATICS ADVANCES 2024; 4:vbae047. [PMID: 38606185 PMCID: PMC11009020 DOI: 10.1093/bioadv/vbae047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Revised: 03/09/2024] [Accepted: 03/22/2024] [Indexed: 04/13/2024]
Abstract
Motivation Predicting anticancer treatment response from baseline genomic data is a critical obstacle in personalized medicine. Machine learning methods are commonly used for predicting drug response from gene expression data. In the process of constructing these machine learning models, one of the most significant challenges is identifying appropriate features among a massive number of genes. Results In this study, we utilize features (genes) extracted using the text-mining of scientific literatures. Using two independent cancer pharmacogenomic datasets, we demonstrate that text-mining-based features outperform traditional feature selection techniques in machine learning tasks. In addition, our analysis reveals that text-mining feature-based machine learning models trained on in vitro data also perform well when predicting the response of in vivo cancer models. Our results demonstrate that text-mining-based feature selection is an easy to implement approach that is suitable for building machine learning models for anticancer drug response prediction. Availability and implementation https://github.com/merlab/text_features.
Collapse
Affiliation(s)
- Grace Wu
- Division of Engineering Science, University of Toronto, Toronto, M5S2E4, Canada
| | - Arvin Zaker
- Department of Biochemistry, Microbiology & Immunology, University of Ottawa, Ottawa, K1H8M5, Canada
- Ottawa Institute of Systems Biology, University of Ottawa, Ottawa, K1H8M5, Canada
| | - Amirhosein Ebrahimi
- Department of Biochemistry, Microbiology & Immunology, University of Ottawa, Ottawa, K1H8M5, Canada
| | - Shivanshi Tripathi
- Department of Biochemistry, Microbiology & Immunology, University of Ottawa, Ottawa, K1H8M5, Canada
- Ottawa Institute of Systems Biology, University of Ottawa, Ottawa, K1H8M5, Canada
| | - Arvind Singh Mer
- Department of Biochemistry, Microbiology & Immunology, University of Ottawa, Ottawa, K1H8M5, Canada
- Ottawa Institute of Systems Biology, University of Ottawa, Ottawa, K1H8M5, Canada
- School of Electrical Engineering & Computer Science, University of Ottawa, Ottawa, K1N6N5, Canada
| |
Collapse
|
15
|
Chen ZY, Turrubiates RFM, Petetin H, Lacima A, Pérez García-Pando C, Ballester J. Estimation of pan-European, daily total, fine-mode and coarse-mode Aerosol Optical Depth at 0.1° resolution to facilitate air quality assessments. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 918:170593. [PMID: 38307268 DOI: 10.1016/j.scitotenv.2024.170593] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Revised: 01/12/2024] [Accepted: 01/29/2024] [Indexed: 02/04/2024]
Abstract
Aerosol Optical Depth (AOD) data derived from satellites is crucial for estimating spatially-resolved PM concentrations, but existing AOD data over land remain affected by several limitations (e.g., data gaps, coarser resolution, higher uncertainty or lack of size fraction data), which weakens the AOD-PM relationship. We developed a 0.1° resolution daily AOD data set over Europe over the period 2003-2020, based on two-stage Quantile Machine Learning (QML) frameworks. Our approach first fills gaps in satellite AOD data and then constructs three components' models to obtain reliable full-coverage AOD along with Fine-mode AOD (fAOD) and Coarse-mode AOD (cAOD). These models are based on AERONET (AErosol RObotic NETwork) observations, Gap-filled satellite AOD, climate and atmospheric composition reanalyses. Our QML AOD products exhibit better quality with an out-of-sample R2 equal to 0.68 for AOD, 0.66 for fAOD and 0.65 for cAOD, which is 23-92 %, 11-13 % and 115-132 % higher than the corresponding satellite or reanalysis products, respectively. Over 91.6 %, 81.6 %, and 88.9 % of QML AOD, fAOD and cAOD predictions fall within ±20 % Expected Error (EE) envelopes, respectively. Previous studies reported that a weak satellite AOD-PM correlation across Europe (Pearson correlation coefficient (PCC) around 0.1). Our QML products exhibit higher correlations with ground-level PMs, particularly when broadly matched by size: AOD with PM10, fAOD with PM2.5, cAOD with PM coarse (R = 0.41, 0.45 and 0.26, respectively). Different AOD fractions more effectively distinct PM size fractions, than total AOD. Our QML aerosol dataset and models pioneer full-coverage, daily high-resolution monitoring of fine-mode and coarse-mode aerosols, effectively addressing existing AOD challenges for further PMs exposures' estimations. This dataset opens avenues for more in-depth exploration of the impacts of aerosols on human health, climate, visibility, and biogeochemical processes, offering valuable insights for air quality management and environmental health risk assessment.
Collapse
Affiliation(s)
- Zhao-Yue Chen
- ISGLOBAL, Barcelona, Spain; Universitat Pompeu Fabra (UPF), Barcelona, Spain.
| | | | | | | | - Carlos Pérez García-Pando
- Barcelona Supercomputing Center, Barcelona, Spain; ICREA, Catalan Institution for Research and Advanced Studies, Barcelona, Spain
| | | |
Collapse
|
16
|
Chen ZY, Petetin H, Méndez Turrubiates RF, Achebak H, Pérez García-Pando C, Ballester J. Population exposure to multiple air pollutants and its compound episodes in Europe. Nat Commun 2024; 15:2094. [PMID: 38480711 PMCID: PMC10937992 DOI: 10.1038/s41467-024-46103-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Accepted: 02/13/2024] [Indexed: 03/17/2024] Open
Abstract
Air pollution remains as a substantial health problem, particularly regarding the combined health risks arising from simultaneous exposure to multiple air pollutants. However, understanding these combined exposure events over long periods has been hindered by sparse and temporally inconsistent monitoring data. Here we analyze daily ambient PM2.5, PM10, NO2 and O3 concentrations at a 0.1-degree resolution during 2003-2019 across 1426 contiguous regions in 35 European countries, representing 543 million people. We find that PM10 levels decline by 2.72% annually, followed by NO2 (2.45%) and PM2.5 (1.72%). In contrast, O3 increase by 0.58% in southern Europe, leading to a surge in unclean air days. Despite air quality advances, 86.3% of Europeans experience at least one compound event day per year, especially for PM2.5-NO2 and PM2.5-O3. We highlight the improvements in air quality control but emphasize the need for targeted measures addressing specific pollutants and their compound events, particularly amidst rising temperatures.
Collapse
Affiliation(s)
- Zhao-Yue Chen
- ISGlobal, Barcelona, Spain.
- Universitat Pompeu Fabra (UPF), Barcelona, Spain.
| | | | | | - Hicham Achebak
- ISGlobal, Barcelona, Spain
- Inserm, France Cohortes, Paris, France
| | - Carlos Pérez García-Pando
- Barcelona Supercomputing Center, Barcelona, Spain
- ICREA, Catalan Institution for Research and Advanced Studies, Barcelona, Spain
| | | |
Collapse
|
17
|
Xu Y, Chen B, Guo Z, Chen C, Wang C, Zhou H, Zhang C, Feng Y. Identification of diagnostic markers for moyamoya disease by combining bulk RNA-sequencing analysis and machine learning. Sci Rep 2024; 14:5931. [PMID: 38467737 PMCID: PMC10928210 DOI: 10.1038/s41598-024-56367-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Accepted: 03/05/2024] [Indexed: 03/13/2024] Open
Abstract
Moyamoya disease (MMD) remains a chronic progressive cerebrovascular disease with unknown etiology. A growing number of reports describe the development of MMD relevant to infection or autoimmune diseases. Identifying biomarkers of MMD is to understand the pathogenesis and development of novel targeted therapy and may be the key to improving the patient's outcome. Here, we analyzed gene expression from two GEO databases. To identify the MMD biomarkers, the weighted gene co-expression network analysis (WGCNA) and the differential expression analyses were conducted to identify 266 key genes. The KEGG and GO analyses were then performed to construct the protein interaction (PPI) network. The three machine-learning algorithms of support vector machine-recursive feature elimination (SVM-RFE), random forest and least absolute shrinkage and selection operator (LASSO) were used to analyze the key genes and take intersection to construct MMD diagnosis based on the four core genes found (ACAN, FREM1, TOP2A and UCHL1), with highly accurate AUCs of 0.805, 0.903, 0.815, 0.826. Gene enrichment analysis illustrated that the MMD samples revealed quite a few differences in pathways like one carbon pool by folate, aminoacyl-tRNA biosynthesis, fat digestion and absorption and fructose and mannose metabolism. In addition, the immune infiltration profile demonstrated that ACAN expression was associated with mast cells resting, FREM1 expression was associated with T cells CD4 naive, TOP2A expression was associated with B cells memory, UCHL1 expression was associated with mast cells activated. Ultimately, the four key genes were verified by qPCR. Taken together, our study analyzed the diagnostic biomarkers and immune infiltration characteristics of MMD, which may shed light on the potential intervention targets of moyamoya disease patients.
Collapse
Affiliation(s)
- Yifan Xu
- Department of Neurosurgery, The Affiliated Hospital of Qingdao University, 16 Jiang Su Road, Qingdao City, 266000, China
| | - Bing Chen
- Department of Neurosurgery, The Affiliated Hospital of Qingdao University, 16 Jiang Su Road, Qingdao City, 266000, China
| | - Zhongxiang Guo
- Department of Neurosurgery, The Affiliated Hospital of Qingdao University, 16 Jiang Su Road, Qingdao City, 266000, China
| | - Cheng Chen
- Department of Neurosurgery, The Affiliated Hospital of Qingdao University, 16 Jiang Su Road, Qingdao City, 266000, China
| | - Chao Wang
- Department of Neurosurgery, The Affiliated Hospital of Qingdao University, 16 Jiang Su Road, Qingdao City, 266000, China
| | - Han Zhou
- Department of Neurosurgery, The Affiliated Hospital of Qingdao University, 16 Jiang Su Road, Qingdao City, 266000, China
| | - Chonghui Zhang
- Department of Neurosurgery, The Affiliated Hospital of Qingdao University, 16 Jiang Su Road, Qingdao City, 266000, China
| | - Yugong Feng
- Department of Neurosurgery, The Affiliated Hospital of Qingdao University, 16 Jiang Su Road, Qingdao City, 266000, China.
| |
Collapse
|
18
|
Ba Y, Liu S, Wei Z, Zhao N, Qiao T, Ren Y, Li L, Zhang Y, Weng S, Xu H, Li C, Ge X, Han X. Pyroptosis-Derived Long Noncoding RNA Profiles Reveal a Novel Signature for Evaluating the Prognosis of Patients With Lung Adenocarcinoma. JCO Precis Oncol 2024; 8:e2300405. [PMID: 38547420 PMCID: PMC10994429 DOI: 10.1200/po.23.00405] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Revised: 10/11/2023] [Accepted: 02/07/2024] [Indexed: 04/02/2024] Open
Abstract
PURPOSE Long noncoding RNAs (lncRNAs) were recently implicated in modifying pyroptosis. Nonetheless, pyroptosis-related lncRNAs and their possible clinical relevance persist largely uninvestigated in lung adenocarcinoma (LUAD). MATERIALS AND METHODS A sum of 921 samples were collected from three independent data sets. We obtained pyroptosis-related genes from both the Molecular Signatures Database and relevant literature sources and used four machine learning techniques, comprising stepwise Cox, ridge regression, least absolute shrinkage and selection operator, and random forest. Multiple bioinformatics approaches were used to further investigate the underlying mechanisms. RESULTS In total, 39 differentially expressed pyroptosis genes were identified by comparing normal and tumor samples. Correlation analysis revealed 933 pyroptosis-related lncRNAs. Furthermore, univariate Cox regression determined 11 lncRNAs that exhibited stable associations with prognosis in the three cohorts, which were used to construct the pyroptosis-derived lncRNA signature. After analyzing the optimal results from four machine learning algorithms, we ultimately selected random forest to develop the pyroptosis-derived lncRNA signature. This signature was proven to be an independent prognostic factor and exhibited robust performance in three cohorts. CONCLUSION We provided novel insight and established a pyroptosis-derived lncRNA signature for patients with LUAD, exhibiting strong predictive capabilities in both the training and validation sets.
Collapse
Affiliation(s)
- Yuhao Ba
- Department of Interventional Radiology, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
| | - Shutong Liu
- The Medical School of Zhengzhou University, Zhengzhou University, Zhengzhou, China
| | - Zhengpan Wei
- The Medical School of Zhengzhou University, Zhengzhou University, Zhengzhou, China
| | - Nannan Zhao
- Department of Neurosurgery, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
| | - Tong Qiao
- Department of Thoracic Surgery, Henan Provincial People's Hospital, Zhengzhou, China
| | - Yuqing Ren
- Department of Respiratory Medicine, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
| | - Lifeng Li
- Internet Medical and System Applications of National Engineering Laboratory, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
| | - Yuyuan Zhang
- Department of Interventional Radiology, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
| | - Siyuan Weng
- Department of Interventional Radiology, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
| | - Hui Xu
- Department of Interventional Radiology, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
| | - Chunwei Li
- Internet Medical and System Applications of National Engineering Laboratory, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
- Department of Pharmacy, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
| | - Xiaoyong Ge
- Department of Interventional Radiology, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
| | - Xinwei Han
- Department of Interventional Radiology, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
- Interventional Institute of Zhengzhou University, Zhengzhou, China
- Interventional Treatment and Clinical Research Center of Henan Province, Zhengzhou, China
| |
Collapse
|
19
|
Cao S, Xu Y, Zhou T, Wu A. Predicting pragmatic functions of Chinese echo questions using prosody: evidence from acoustic analysis and data modeling. Front Psychol 2024; 15:1322482. [PMID: 38633875 PMCID: PMC11022972 DOI: 10.3389/fpsyg.2024.1322482] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Accepted: 02/15/2024] [Indexed: 04/19/2024] Open
Abstract
Echo questions serve two pragmatic functions (recapitulatory and explicatory) and are subdivided into two types (yes-no echo question and wh-echo question) in verbal communication. Yet to date, most relevant studies have been conducted in European languages like English and Spanish. It remains unknown whether the different functions of echo questions can be conveyed via prosody in spoken Chinese. Additionally, no comparison was made on the diversified algorithmic models in predicting functions by the prosodity of Chinese echo questions, a novel linguistic cognition in nature. This motivated us to use different acoustic cues to predict different pragmatic functions of Chinese echo questions by virtue of acoustic experiment and data modeling. The results showed that for yes-no echo question, explicatory function exhibited higher pitch and intensity patterns than recapitulatory function whereas for wh-echo question, recapitulatory function demonstrated higher pitch and intensity patterns than explicatory function. With regard to data modeling, the algorithm Support Vector Machine (SVM) relative to Random Forest (RF) and Logistic Regression (LR) performed better when predicting different functions using prosodic cues in both yes-no and wh-echo questions. This study from a digitized perspective adds evidence to the cognition of echo questions' functions on a prosodic basis.
Collapse
Affiliation(s)
- Siyi Cao
- School of Foreign Languages, Southeast University, Nanjing, China
- Department of Chinese Bilingual Studies, Hong Kong Polytechnic University, Kowloon, Hong Kong SAR, China
| | - Yizhong Xu
- College of Foreign Languages, Nanjing University of Aeronautics and Astronautics, Nanjing, China
| | - Tongquan Zhou
- School of Foreign Languages, Southeast University, Nanjing, China
| | - Anqi Wu
- College of Foreign Languages, Nanjing University of Aeronautics and Astronautics, Nanjing, China
| |
Collapse
|
20
|
Lin W, Zhang S, Gu C, Zhu H, Liu Y. GLIPR2: a potential biomarker and therapeutic target unveiled - Insights from extensive pan-cancer analyses, with a spotlight on lung adenocarcinoma. Front Immunol 2024; 15:1280525. [PMID: 38476239 PMCID: PMC10929020 DOI: 10.3389/fimmu.2024.1280525] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2023] [Accepted: 02/12/2024] [Indexed: 03/14/2024] Open
Abstract
Background Glioma pathogenesis related-2 (GLIPR2), an emerging Golgi membrane protein implicated in autophagy, has received limited attention in current scholarly discourse. Methods Leveraging extensive datasets, including The Cancer Genome Atlas (TCGA), Genotype Tissue Expression (GTEx), Human Protein Atlas (HPA), and Clinical Proteomic Tumor Analysis Consortium (CPTAC), we conducted a comprehensive investigation into GLIPR2 expression across diverse human malignancies. Utilizing UALCAN, OncoDB, MEXPRESS and cBioPortal databases, we scrutinized GLIPR2 mutation patterns and methylation landscapes. The integration of bulk and single-cell RNA sequencing facilitated elucidation of relationships among cellular heterogeneity, immune infiltration, and GLIPR2 levels in pan-cancer. Employing ROC and KM analyses, we unveiled the diagnostic and prognostic potential of GLIPR2 across diverse cancers. Immunohistochemistry provided insights into GLIPR2 expression patterns in a multicenter cohort spanning various cancer types. In vitro functional experiments, including transwell assays, wound healing analyses, and drug sensitivity testing, were employed to delineate the tumor suppressive role of GLIPR2. Results GLIPR2 expression was significantly reduced in neoplastic tissues compared to its prevalence in healthy tissues. Copy number variations (CNV) and alterations in methylation patterns exhibited discernible correlations with GLIPR2 expression within tumor tissues. Moreover, GLIPR2 demonstrated diagnostic and prognostic implications, showing pronounced associations with the expression profiles of numerous immune checkpoint genes and the relative abundance of immune cells in the neoplastic microenvironment. This multifaceted influence was evident across various cancer types, with lung adenocarcinoma (LUAD) being particularly prominent. Notably, patients with LUAD exhibited a significant decrease in GLIPR2 expression within practical clinical settings. Elevated GLIPR2 expression correlated with improved prognostic outcomes specifically in LUAD. Following radiotherapy, LUAD cases displayed an increased presence of GLIPR2+ infiltrating cellular constituents, indicating a notable correlation with heightened sensitivity to radiation-induced therapeutic modalities. A battery of experiments validated the functional role of GLIPR2 in suppressing the malignant phenotype and enhancing treatment sensitivity. Conclusion In pan-cancer, particularly in LUAD, GLIPR2 emerges as a promising novel biomarker and tumor suppressor. Its involvement in immune cell infiltration suggests potential as an immunotherapeutic target.
Collapse
Affiliation(s)
- Wei Lin
- Cancer Research Center Nantong, Affiliated Tumor Hospital of Nantong University, Nantong, China
- Jiangsu Key Laboratory of Neuropsychiatric Diseases and Institute of Neuroscience, Soochow University, Suzhou, China
| | - Siming Zhang
- Cancer Research Center Nantong, Affiliated Tumor Hospital of Nantong University, Nantong, China
| | - Chunyan Gu
- Department of Pathology, Affiliated Nantong Hospital 3 of Nantong University (Nantong Third People’s Hospital), Nantong, China
| | - Haixia Zhu
- Cancer Research Center Nantong, Affiliated Tumor Hospital of Nantong University, Nantong, China
| | - Yuan Liu
- Cancer Research Center Nantong, Affiliated Tumor Hospital of Nantong University, Nantong, China
| |
Collapse
|
21
|
Zhang L, Liu Y, Zou J, Wang T, Hu H, Zhou Y, Lu Y, Qiu T, Zhou J, Liu X. The Development and Evaluation of a Prediction Model for Kidney Transplant-Based Pneumocystis carinii Pneumonia Patients Based on Hematological Indicators. Biomedicines 2024; 12:366. [PMID: 38397968 PMCID: PMC10886538 DOI: 10.3390/biomedicines12020366] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2024] [Revised: 01/21/2024] [Accepted: 01/31/2024] [Indexed: 02/25/2024] Open
Abstract
BACKGROUND This study aimed to develop a simple predictive model for early identification of the risk of adverse outcomes in kidney transplant-associated Pneumocystis carinii pneumonia (PCP) patients. METHODS This study encompassed 103 patients diagnosed with PCP, who received treatment at our hospital between 2018 and 2023. Among these participants, 20 were categorized as suffering from severe PCP, and, regrettably, 13 among them succumbed. Through the application of machine learning techniques and multivariate logistic regression analysis, two pivotal variables were discerned and subsequently integrated into a nomogram. The efficacy of the model was assessed via receiver operating characteristic (ROC) curves and calibration curves. Additionally, decision curve analysis (DCA) and a clinical impact curve (CIC) were employed to evaluate the clinical utility of the model. The Kaplan-Meier (KM) survival curves were utilized to ascertain the model's aptitude for risk stratification. RESULTS Hematological markers, namely Procalcitonin (PCT) and C-reactive protein (CRP)-to-albumin ratio (CAR), were identified through machine learning and multivariate logistic regression. These variables were subsequently utilized to formulate a predictive model, presented in the form of a nomogram. The ROC curve exhibited commendable predictive accuracy in both internal validation (AUC = 0.861) and external validation (AUC = 0.896). Within a specific threshold probability range, both DCA and CIC demonstrated notable performance. Moreover, the KM survival curve further substantiated the nomogram's efficacy in risk stratification. CONCLUSIONS Based on hematological parameters, especially CAR and PCT, a simple nomogram was established to stratify prognostic risk in patients with renal transplant-related PCP.
Collapse
Affiliation(s)
- Long Zhang
- Department of Organ Transplantation, Renmin Hospital of Wuhan University, Wuhan 430060, China; (L.Z.); (Y.L.); (J.Z.); (T.W.); (H.H.); (Y.Z.); (Y.L.); (T.Q.)
- Department of Urology, Renmin Hospital of Wuhan University, Wuhan 430060, China
| | - Yiting Liu
- Department of Organ Transplantation, Renmin Hospital of Wuhan University, Wuhan 430060, China; (L.Z.); (Y.L.); (J.Z.); (T.W.); (H.H.); (Y.Z.); (Y.L.); (T.Q.)
- Department of Urology, Renmin Hospital of Wuhan University, Wuhan 430060, China
| | - Jilin Zou
- Department of Organ Transplantation, Renmin Hospital of Wuhan University, Wuhan 430060, China; (L.Z.); (Y.L.); (J.Z.); (T.W.); (H.H.); (Y.Z.); (Y.L.); (T.Q.)
- Department of Urology, Renmin Hospital of Wuhan University, Wuhan 430060, China
| | - Tianyu Wang
- Department of Organ Transplantation, Renmin Hospital of Wuhan University, Wuhan 430060, China; (L.Z.); (Y.L.); (J.Z.); (T.W.); (H.H.); (Y.Z.); (Y.L.); (T.Q.)
- Department of Urology, Renmin Hospital of Wuhan University, Wuhan 430060, China
| | - Haochong Hu
- Department of Organ Transplantation, Renmin Hospital of Wuhan University, Wuhan 430060, China; (L.Z.); (Y.L.); (J.Z.); (T.W.); (H.H.); (Y.Z.); (Y.L.); (T.Q.)
- Department of Urology, Renmin Hospital of Wuhan University, Wuhan 430060, China
| | - Yujie Zhou
- Department of Organ Transplantation, Renmin Hospital of Wuhan University, Wuhan 430060, China; (L.Z.); (Y.L.); (J.Z.); (T.W.); (H.H.); (Y.Z.); (Y.L.); (T.Q.)
- Department of Urology, Renmin Hospital of Wuhan University, Wuhan 430060, China
| | - Yifan Lu
- Department of Organ Transplantation, Renmin Hospital of Wuhan University, Wuhan 430060, China; (L.Z.); (Y.L.); (J.Z.); (T.W.); (H.H.); (Y.Z.); (Y.L.); (T.Q.)
- Department of Urology, Renmin Hospital of Wuhan University, Wuhan 430060, China
| | - Tao Qiu
- Department of Organ Transplantation, Renmin Hospital of Wuhan University, Wuhan 430060, China; (L.Z.); (Y.L.); (J.Z.); (T.W.); (H.H.); (Y.Z.); (Y.L.); (T.Q.)
- Department of Urology, Renmin Hospital of Wuhan University, Wuhan 430060, China
| | - Jiangqiao Zhou
- Department of Organ Transplantation, Renmin Hospital of Wuhan University, Wuhan 430060, China; (L.Z.); (Y.L.); (J.Z.); (T.W.); (H.H.); (Y.Z.); (Y.L.); (T.Q.)
- Department of Urology, Renmin Hospital of Wuhan University, Wuhan 430060, China
| | - Xiuheng Liu
- Department of Organ Transplantation, Renmin Hospital of Wuhan University, Wuhan 430060, China; (L.Z.); (Y.L.); (J.Z.); (T.W.); (H.H.); (Y.Z.); (Y.L.); (T.Q.)
- Department of Urology, Renmin Hospital of Wuhan University, Wuhan 430060, China
| |
Collapse
|
22
|
Liu Y, Wu J, Zhou J, Guo J, Liang C, Xing Y, Wang Z, Chen L, Ding Y, Ren D, Bai Y, Hu D. Identification of high-risk population of pneumoconiosis using deep learning segmentation of lung 3D images and radiomics texture analysis. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024; 244:108006. [PMID: 38215580 DOI: 10.1016/j.cmpb.2024.108006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Revised: 12/25/2023] [Accepted: 01/01/2024] [Indexed: 01/14/2024]
Abstract
OBJECTION The aim of this study is to develop an early-warning model for identifying high-risk populations of pneumoconiosis by combining lung 3D images and radiomics lung texture features. METHODS A retrospective study was conducted, including 600 dust-exposed workers and 300 confirmed pneumoconiosis patients. Chest computed tomography (CT) images were divided into a training set and a test set in a 2:1 ratio. Whole-lung segmentation was performed using deep learning models for feature extraction of radiomics. Two feature selection algorithms and five classification models were used. The optimal model was selected using a 10-fold cross-validation strategy, and the calibration curve and decision curve were evaluated. To verify the applicability of the model, the diagnostic efficiency and accuracy between the model and human interpretation were compared. Additionally, the risk probabilities for different risk groups defined by the model were compared at different time intervals. RESULTS Four radiomics features were ultimately used to construct the predictive model. The logistic regression model was the most stable in both the training set and testing set, with an area under curve (AUC) of 0.964 (95 % confidence interval [CI], 0.950-0.976) and 0.947 (95 %CI, 0.925-0.964). In the training and testing sets, the Brier scores were 0.092 and 0.14, respectively, with threshold probability ranges of 2 %-99 % and 2 %-85 %. These results indicate that the model exhibits good calibration and clinical benefit. The comparison between the model and human interpretation showed that the model was not inferior in terms of diagnostic efficiency and accuracy. Additionally, the high-risk population identified by the model was diagnosed as pneumoconiosis two years later. CONCLUSION This study provides a meticulous and quantifiable method for detecting and assessing the risk of pneumoconiosis, building upon accurate diagnosis. Employing risk scoring and probability estimation, not only enhances the efficiency of diagnostic physicians but also provides a valuable reference for controlling the occurrence of pneumoconiosis.
Collapse
Affiliation(s)
- Yafeng Liu
- School of Medicine, Anhui University of Science and Technology, Huainan, PR China; Anhui Province Engineering Laboratory of Occupational Health and Safety, Anhui University of Science and Technology, Huainan, PR China
| | - Jing Wu
- School of Medicine, Anhui University of Science and Technology, Huainan, PR China; Anhui Province Engineering Laboratory of Occupational Health and Safety, Anhui University of Science and Technology, Huainan, PR China.
| | - Jiawei Zhou
- School of Medicine, Anhui University of Science and Technology, Huainan, PR China; Anhui Province Engineering Laboratory of Occupational Health and Safety, Anhui University of Science and Technology, Huainan, PR China
| | - Jianqiang Guo
- School of Medicine, Anhui University of Science and Technology, Huainan, PR China; Anhui Province Engineering Laboratory of Occupational Health and Safety, Anhui University of Science and Technology, Huainan, PR China
| | - Chao Liang
- School of Medicine, Anhui University of Science and Technology, Huainan, PR China; Anhui Province Engineering Laboratory of Occupational Health and Safety, Anhui University of Science and Technology, Huainan, PR China
| | - Yingru Xing
- School of Medicine, Anhui University of Science and Technology, Huainan, PR China; Department of Clinical Laboratory, Anhui Zhongke Gengjiu Hospital, Hefei, PR China
| | - Zhongyu Wang
- Ziwei King Star Digital Technology Co., Ltd., Hefei, PR China
| | - Lijuan Chen
- Occupational Control Hospital of Huaihe Energy Group, Huainan, PR China
| | - Yan Ding
- Occupational Control Hospital of Huaihe Energy Group, Huainan, PR China
| | - Dingfei Ren
- Occupational Control Hospital of Huaihe Energy Group, Huainan, PR China.
| | - Ying Bai
- School of Medicine, Anhui University of Science and Technology, Huainan, PR China; Anhui Province Engineering Laboratory of Occupational Health and Safety, Anhui University of Science and Technology, Huainan, PR China
| | - Dong Hu
- School of Medicine, Anhui University of Science and Technology, Huainan, PR China; Anhui Province Engineering Laboratory of Occupational Health and Safety, Anhui University of Science and Technology, Huainan, PR China; Key Laboratory of Industrial Dust Prevention and Control & Occupational Safety and Health of the Ministry of Education, Anhui University of Science and Technology, Huainan, PR China.
| |
Collapse
|
23
|
Lalechère E, Monnet JM, Breen J, Fuhr M. Assessing the potential of remote sensing-based models to predict old-growth forests on large spatiotemporal scales. JOURNAL OF ENVIRONMENTAL MANAGEMENT 2024; 351:119865. [PMID: 38159307 DOI: 10.1016/j.jenvman.2023.119865] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Revised: 11/20/2023] [Accepted: 12/13/2023] [Indexed: 01/03/2024]
Abstract
Old-growth forests provide a broad range of ecosystem services. However, due to poor knowledge of their spatiotemporal distribution, implementing conservation and restoration strategies is challenging. The goal of this study is to compare the predictive ability of socioecological factors and different sources of remotely sensed data that determine the spatiotemporal scales at which forest maturity attributes can be predicted. We evaluated various remotely sensed data that cover a broad range of spatial (from local to global) and temporal (from current to decades) extents, from Airborne Laser Scanning (ALS), aerial multispectral and stereo-imagery, Sentinel-1, Sentinel-2 and Landsat data. Using random forests, remotely sensed data were related to a forest maturity index available in 688 forest plots across four ranges of the French Alps. Each model also includes socioecological predictors related to topography, socioeconomy, pedology and climatology. We found that the different remotely sensed data provide information on the main forest structural characteristics as defined by ALS, except for Landsat, which has a too coarse resolution, and Sentinel-1, which responds differently to vegetation structure. The predictions were quite similar considering aerial remotely sensed data, on the one hand, and satellite remotely sensed data, on the other hand. Socioecological variables are the most important predictors compared to the remote sensing metrics. In conclusion, our results indicate that a wide range of remotely sensed data can be used to study old-growth forests beyond the use of ALS and despite different abilities to predict forest structure. Accounting for socioecological predictors is indispensable to avoid a significant loss of predictive accuracy. Remotely sensed data can allow for predictions to be made at different spatiotemporal resolutions and extents. This study paves the way to large-scale monitoring of forest maturity, as well as for retrospective analyses which will show to what extent predicted maturity change at different dates.
Collapse
Affiliation(s)
- Etienne Lalechère
- Université de Picardie Jules Verne, EDYSAN (UMR CNRS-UPJV 7058), 1 rue des Louvels, 80037, Amiens Cedex, France.
| | - Jean-Matthieu Monnet
- INRAE, UR LESSEM, 2 rue de la Papeterie, BP 76 38402, Saint Martin d'Hères Cedex, France.
| | - Juliette Breen
- INRAE, UR LESSEM, 2 rue de la Papeterie, BP 76 38402, Saint Martin d'Hères Cedex, France.
| | - Marc Fuhr
- INRAE, UR LESSEM, 2 rue de la Papeterie, BP 76 38402, Saint Martin d'Hères Cedex, France.
| |
Collapse
|
24
|
Uddin MG, Nash S, Rahman A, Dabrowski T, Olbert AI. Data-driven modelling for assessing trophic status in marine ecosystems using machine learning approaches. ENVIRONMENTAL RESEARCH 2024; 242:117755. [PMID: 38008200 DOI: 10.1016/j.envres.2023.117755] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Revised: 10/05/2023] [Accepted: 11/20/2023] [Indexed: 11/28/2023]
Abstract
Assessing eutrophication in coastal and transitional waters is of utmost importance, yet existing Trophic Status Index (TSI) models face challenges like multicollinearity, data redundancy, inappropriate aggregation methods, and complex classification schemes. To tackle these issues, we developed a novel tool that harnesses machine learning (ML) and artificial intelligence (AI), enhancing the reliability and accuracy of trophic status assessments. Our research introduces an improved data-driven methodology specifically tailored for transitional and coastal (TrC) waters, with a focus on Cork Harbour, Ireland, as a case study. Our innovative approach, named the Assessment Trophic Status Index (ATSI) model, comprises three main components: the selection of pertinent water quality indicators, the computation of ATSI scores, and the implementation of a new classification scheme. To optimize input data and minimize redundancy, we employed ML techniques, including advanced deep learning methods. Specifically, we developed a CHL prediction model utilizing ten algorithms, among which XGBoost demonstrated exceptional performance, showcasing minimal errors during both training (RMSE = 0.0, MSE = 0.0, MAE = 0.01) and testing (RMSE = 0.0, MSE = 0.0, MAE = 0.01) phases. Utilizing a novel linear rescaling interpolation function, we calculated ATSI scores and evaluated the model's sensitivity and efficiency across diverse application domains, employing metrics such as R2, the Nash-Sutcliffe efficiency (NSE), and the model efficiency factor (MEF). The results consistently revealed heightened sensitivity and efficiency across all application domains. Additionally, we introduced a brand new classification scheme for ranking the trophic status of transitional and coastal waters. To assess spatial sensitivity, we applied the ATSI model to four distinct waterbodies in Ireland, comparing trophic assessment outcomes with the Assessment of Trophic Status of Estuaries and Bays in Ireland (ATSEBI) System. Remarkably, significant disparities between the ATSI and ATSEBI System were evident in all domains, except for Mulroy Bay. Overall, our research significantly enhances the accuracy of trophic status assessments in marine ecosystems. The ATSI model, combined with cutting-edge ML techniques and our new classification scheme, represents a promising avenue for evaluating and monitoring trophic conditions in TrC waters. The study also demonstrated the effectiveness of ATSI in assessing trophic status across various waterbodies, including lakes, rivers, and more. These findings make substantial contributions to the field of marine ecosystem management and conservation.
Collapse
Affiliation(s)
- Md Galal Uddin
- School of Engineering, University of Galway, Ireland; Ryan Institute, University of Galway, Ireland; MaREI Research Centre, University of Galway, Ireland; Eco-HydroInformatics Research Group (EHIRG), Civil Engineering, University of Galway, Ireland.
| | - Stephen Nash
- School of Engineering, University of Galway, Ireland; Ryan Institute, University of Galway, Ireland; MaREI Research Centre, University of Galway, Ireland
| | - Azizur Rahman
- School of Computing, Mathematics and Engineering, Charles Sturt University, Wagga Wagga, Australia; The Gulbali Institute of Agriculture, Water and Environment, Charles Sturt University, Wagga Wagga, Australia
| | | | - Agnieszka I Olbert
- School of Engineering, University of Galway, Ireland; Ryan Institute, University of Galway, Ireland; MaREI Research Centre, University of Galway, Ireland; Eco-HydroInformatics Research Group (EHIRG), Civil Engineering, University of Galway, Ireland
| |
Collapse
|
25
|
Fan R, Deng Y, Du Y, Xie X. Predicting geogenic groundwater arsenic contamination risk in floodplains using interpretable machine-learning model. ENVIRONMENTAL POLLUTION (BARKING, ESSEX : 1987) 2024; 340:122787. [PMID: 37879555 DOI: 10.1016/j.envpol.2023.122787] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Revised: 09/17/2023] [Accepted: 10/21/2023] [Indexed: 10/27/2023]
Abstract
Long-term exposure to geogenic arsenic (As)-contaminated groundwater poses a severe threat to public health problems. Generally, elevated As concentrations have been observed with high amounts of ammonium in groundwater of floodplains. An extreme gradient boosting algorithm was conducted to develop a probability model based on hydrogeochemical data, which predicted the occurrence rates of groundwater As on a regional scale. Results showed that concentrations of NH4+, Eh, K, Cl-, SO42-, and NO3- were powerful predictive variables of As exposure. The model revealed the co-enrichment of As with NH4+, suggesting that the mineralization of nitrogen-containing organic matter promoted the reduction of As-bearing iron-oxides. The predicted distribution of high-As groundwater showed high consistency with known spatial distribution of As contamination, and the model also accurately predicted As concentrations in Jiangbei Plain of China and typical As-affected floodplains of Southeast Asia. The model can serve as a low-cost and rapid virtual sensor for detecting As concentrations in private or newly drilled wells, thereby providing critical information for informed management decisions, environmental protection and public health safety.
Collapse
Affiliation(s)
- Ruiyu Fan
- MOE Key Laboratory of Groundwater Quality and Health, China University of Geosciences, Wuhan, 430078, China; State Environmental Protection Key Laboratory of Source Apportionment and Control of Aquatic Pollution & School of Environmental Studies, China University of Geosciences, Wuhan, 430078, China
| | - Yamin Deng
- MOE Key Laboratory of Groundwater Quality and Health, China University of Geosciences, Wuhan, 430078, China; State Environmental Protection Key Laboratory of Source Apportionment and Control of Aquatic Pollution & School of Environmental Studies, China University of Geosciences, Wuhan, 430078, China.
| | - Yao Du
- MOE Key Laboratory of Groundwater Quality and Health, China University of Geosciences, Wuhan, 430078, China; State Environmental Protection Key Laboratory of Source Apportionment and Control of Aquatic Pollution & School of Environmental Studies, China University of Geosciences, Wuhan, 430078, China
| | - Xianjun Xie
- MOE Key Laboratory of Groundwater Quality and Health, China University of Geosciences, Wuhan, 430078, China; State Environmental Protection Key Laboratory of Source Apportionment and Control of Aquatic Pollution & School of Environmental Studies, China University of Geosciences, Wuhan, 430078, China
| |
Collapse
|
26
|
She Y, Zhou L, Li Y. Interpretable machine learning models for predicting 90-day death in patients in the intensive care unit with epilepsy. Seizure 2024; 114:23-32. [PMID: 38035490 DOI: 10.1016/j.seizure.2023.11.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Revised: 11/11/2023] [Accepted: 11/24/2023] [Indexed: 12/02/2023] Open
Abstract
PURPOSE This study aims to develop a machine learning-based model for predicting mortality risk in patients with epilepsy admitted to the intensive care unit (ICU), providing clinicians with an accurate prognostic tool to guide individualized treatment. METHODS We collected clinical data from clinical databases (MIMIC IV and eICU-CRD) of epilepsy patients 24 h after ICU admission. The clinical characteristics of ICU patients with epilepsy were carefully feature selected and processed. MIMIC IV as the training set and eICU-CRD database as the test set. Six models were developed and validated, and the best LightGBM model was selected by performance comparison and analysed for interpretability. RESULTS The final cohort comprised 429 patients for training and 1217 for testing. The training set exhibited a 90-day mortality rate of 9.32 %, and the test set had an in-hospital 90-day mortality rate of 4.10 %. Utilizing the LightGBM model, we achieved an AUC of 0.956 in the training set. External validation demonstrated promising results with accuracy of 0.898, precision of 0.975, AUC of 0.781, F1 score of 0.945, highlighting the model's potential for guiding clinical decision-making. Significant factors influencing model performance included the severity of illness, as measured by the OASIS score, and clinical parameters like heart rate and body temperature. CONCLUSION This study introduces a machine learning-based approach to predict mortality risk in ICU epilepsy patients, offering a valuable tool for clinicians to identify high-risk individuals and devise personalized treatment strategies, thus improving patient prognosis and treatment outcomes.
Collapse
Affiliation(s)
- Yingfang She
- Neurology Center, The Seventh Affiliated Hospital of Sun yat-sen University, Shenzhen, China
| | - Liemin Zhou
- Neurology Center, The Seventh Affiliated Hospital of Sun yat-sen University, Shenzhen, China.
| | - Yide Li
- Department of Critical Care, The Seventh Affiliated Hospital of Sun yat-sen University, Shenzhen, China.
| |
Collapse
|
27
|
Dimmick HL, van Rassel CR, MacInnis MJ, Ferber R. Use of subject-specific models to detect fatigue-related changes in running biomechanics: a random forest approach. Front Sports Act Living 2023; 5:1283316. [PMID: 38186400 PMCID: PMC10768007 DOI: 10.3389/fspor.2023.1283316] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Accepted: 12/08/2023] [Indexed: 01/09/2024] Open
Abstract
Running biomechanics are affected by fatiguing or prolonged runs. However, no evidence to date has conclusively linked this effect to running-related injury (RRI) development or performance implications. Previous investigations using subject-specific models in running have demonstrated higher accuracy than group-based models, however, this has been infrequently applied to fatigue. In this study, two experiments were conducted to determine whether subject-specific models outperformed group-based models to classify running biomechanics during non-fatigued and fatigued conditions. In the first experiment, 16 participants performed four treadmill runs at or around the maximal lactate steady state. In the second experiment, nine participants performed five prolonged runs using commercial wearable devices. For each experiment, two segments were extracted from each trial from early and late in the run. For each participant, a random forest model was applied with a leave-one-run-out cross-validation to classify between the early (non-fatigued) and late (fatigued) segments. Additionally, group-based classifiers with a leave-one-subject-out cross validation were constructed. For experiment 1, mean classification accuracies for the single-subject and group-based classifiers were 68.2 ± 8.2% and 57.0 ± 8.9%, respectively. For experiment 2, mean classification accuracies for the single-subject and group-based classifiers were 68.9 ± 17.1% and 61.5 ± 11.7%, respectively. Variable importance rankings were consistent within participants, but these rankings differed from each participant to those of the group. Although the classification accuracies were relatively low, these findings highlight the advantage of subject-specific classifiers to detect changes in running biomechanics with fatigue and indicate the potential of using big data and wearable technology approaches in future research to determine possible connections between biomechanics and RRI.
Collapse
Affiliation(s)
- Hannah L. Dimmick
- Human Performance Laboratory, Faculty of Kinesiology, University of Calgary, Calgary, AB, Canada
| | - Cody R. van Rassel
- Human Performance Laboratory, Faculty of Kinesiology, University of Calgary, Calgary, AB, Canada
| | - Martin J. MacInnis
- Human Performance Laboratory, Faculty of Kinesiology, University of Calgary, Calgary, AB, Canada
| | - Reed Ferber
- Human Performance Laboratory, Faculty of Kinesiology, University of Calgary, Calgary, AB, Canada
- Running Injury Clinic, Calgary, AB, Canada
| |
Collapse
|
28
|
Quesada-Vázquez S, Castells-Nobau A, Latorre J, Oliveras-Cañellas N, Puig-Parnau I, Tejera N, Tobajas Y, Baudin J, Hildebrand F, Beraza N, Burcelin R, Martinez-Gili L, Chilloux J, Dumas ME, Federici M, Hoyles L, Caimari A, Del Bas JM, Escoté X, Fernández-Real JM, Mayneris-Perxachs J. Potential therapeutic implications of histidine catabolism by the gut microbiota in NAFLD patients with morbid obesity. Cell Rep Med 2023; 4:101341. [PMID: 38118419 PMCID: PMC10772641 DOI: 10.1016/j.xcrm.2023.101341] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2023] [Revised: 07/18/2023] [Accepted: 11/22/2023] [Indexed: 12/22/2023]
Abstract
The gut microbiota contributes to the pathophysiology of non-alcoholic fatty liver disease (NAFLD). Histidine is a key energy source for the microbiota, scavenging it from the host. Its role in NAFLD is poorly known. Plasma metabolomics, liver transcriptomics, and fecal metagenomics were performed in three human cohorts coupled with hepatocyte, rodent, and Drosophila models. Machine learning analyses identified plasma histidine as being strongly inversely associated with steatosis and linked to a hepatic transcriptomic signature involved in insulin signaling, inflammation, and trace amine-associated receptor 1. Circulating histidine was inversely associated with Proteobacteria and positively with bacteria lacking the histidine utilization (Hut) system. Histidine supplementation improved NAFLD in different animal models (diet-induced NAFLD in mouse and flies, ob/ob mouse, and ovariectomized rats) and reduced de novo lipogenesis. Fecal microbiota transplantation (FMT) from low-histidine donors and mono-colonization of germ-free flies with Enterobacter cloacae increased triglyceride accumulation and reduced histidine content. The interplay among microbiota, histidine catabolism, and NAFLD opens therapeutic opportunities.
Collapse
Affiliation(s)
| | - Anna Castells-Nobau
- Department of Diabetes, Endocrinology, and Nutrition, Dr. Josep Trueta Hospital, Girona, Spain; Nutrition, Eumetabolism, and Health Group, Girona Biomedical Research Institute (IDIBGI), Girona, Spain; CIBER Fisiopatología de la Obesidad y Nutrición (CIBERobn), Instituto de Salud Carlos III, Madrid, Spain
| | - Jèssica Latorre
- Department of Diabetes, Endocrinology, and Nutrition, Dr. Josep Trueta Hospital, Girona, Spain
| | - Núria Oliveras-Cañellas
- Department of Diabetes, Endocrinology, and Nutrition, Dr. Josep Trueta Hospital, Girona, Spain; Nutrition, Eumetabolism, and Health Group, Girona Biomedical Research Institute (IDIBGI), Girona, Spain; CIBER Fisiopatología de la Obesidad y Nutrición (CIBERobn), Instituto de Salud Carlos III, Madrid, Spain
| | - Irene Puig-Parnau
- Department of Diabetes, Endocrinology, and Nutrition, Dr. Josep Trueta Hospital, Girona, Spain; Nutrition, Eumetabolism, and Health Group, Girona Biomedical Research Institute (IDIBGI), Girona, Spain; CIBER Fisiopatología de la Obesidad y Nutrición (CIBERobn), Instituto de Salud Carlos III, Madrid, Spain
| | - Noemi Tejera
- Microbes in the Food Chain, Institute Strategic Program, Microbes and Gut Health, Institute Strategic Program - Quadram Institute Bioscience, Norwich Research Park, Norwich, UK
| | - Yaiza Tobajas
- Eurecat, Centre Tecnològic de Catalunya, Unitat de Nutrició i Salut, Reus, Spain
| | - Julio Baudin
- Eurecat, Centre Tecnològic de Catalunya, Unitat de Nutrició i Salut, Reus, Spain
| | - Falk Hildebrand
- Microbes in the Food Chain, Institute Strategic Program, Microbes and Gut Health, Institute Strategic Program - Quadram Institute Bioscience, Norwich Research Park, Norwich, UK; Digital Biology, Earlham Institute, Norwich Research Park, Norwich, Norfolk NR4 7UZ, UK
| | - Naiara Beraza
- Microbes in the Food Chain, Institute Strategic Program, Microbes and Gut Health, Institute Strategic Program - Quadram Institute Bioscience, Norwich Research Park, Norwich, UK
| | - Rémy Burcelin
- Institut National de la Santé et de la Recherche Médicale (INSERM), Toulouse, France; Université Paul Sabatier (UPS), Unité Mixte de Recherche (UMR), Toulouse, France; Institut des Maladies Métaboliques et Cardiovasculaires (I2MC), Team 2: 'Intestinal Risk Factors, Diabetes, Dyslipidemia, and Heart Failure', F-31432 Toulouse Cedex 4, France
| | - Laura Martinez-Gili
- Section of Biomolecular Medicine, Division of Systems Medicine, Department of Metabolism, Digestion, and Reproduction, Imperial College London, Du Cane Road, London W12 0NN, UK
| | - Julien Chilloux
- Section of Biomolecular Medicine, Division of Systems Medicine, Department of Metabolism, Digestion, and Reproduction, Imperial College London, Du Cane Road, London W12 0NN, UK
| | - Marc-Emmanuel Dumas
- Section of Biomolecular Medicine, Division of Systems Medicine, Department of Metabolism, Digestion, and Reproduction, Imperial College London, Du Cane Road, London W12 0NN, UK; Section of Genomic and Environmental Medicine, National Heart & Lung Institute, Imperial College London, Dovehouse Street, London SW3 6LY, UK; European Genomic Institute for Diabetes, CNRS UMR 8199, INSERM UMR 1283, Institut Pasteur de Lille, Lille University Hospital, University of Lille, 59045 Lille, France; McGill Genome Centre, McGill University, 740 Doctor Penfield Avenue, Montréal, QC H3A 0G1, Canada
| | - Massimo Federici
- Department of Systems Medicine, University of Rome Tor Vergata, Via Montpellier 1, 00133 Rome, Italy
| | - Lesley Hoyles
- Department of Biosciences, School of Science and Technology, Nottingham Trent University, Nottingham NG11 8NS, UK
| | - Antoni Caimari
- Eurecat, Centre Tecnològic de Catalunya, Unitat de Nutrició i Salut, Reus, Spain
| | - Josep M Del Bas
- Eurecat, Centre Tecnològic de Catalunya, Unitat de Nutrició i Salut, Reus, Spain
| | - Xavier Escoté
- Eurecat, Centre Tecnològic de Catalunya, Unitat de Nutrició i Salut, Reus, Spain.
| | - José-Manuel Fernández-Real
- Department of Diabetes, Endocrinology, and Nutrition, Dr. Josep Trueta Hospital, Girona, Spain; Nutrition, Eumetabolism, and Health Group, Girona Biomedical Research Institute (IDIBGI), Girona, Spain; CIBER Fisiopatología de la Obesidad y Nutrición (CIBERobn), Instituto de Salud Carlos III, Madrid, Spain.
| | - Jordi Mayneris-Perxachs
- Department of Diabetes, Endocrinology, and Nutrition, Dr. Josep Trueta Hospital, Girona, Spain; Nutrition, Eumetabolism, and Health Group, Girona Biomedical Research Institute (IDIBGI), Girona, Spain; CIBER Fisiopatología de la Obesidad y Nutrición (CIBERobn), Instituto de Salud Carlos III, Madrid, Spain.
| |
Collapse
|
29
|
Diac MM, Toma GM, Damian SI, Fotache M, Romanov N, Tabian D, Sechel G, Scripcaru A, Hancianu M, Iliescu DB. Machine Learning Models for Prediction of Sex Based on Lumbar Vertebral Morphometry. Diagnostics (Basel) 2023; 13:3630. [PMID: 38132214 PMCID: PMC10742438 DOI: 10.3390/diagnostics13243630] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 12/04/2023] [Accepted: 12/06/2023] [Indexed: 12/23/2023] Open
Abstract
BACKGROUND Identifying skeletal remains has been and will remain a challenge for forensic experts and forensic anthropologists, especially in disasters with multiple victims or skeletal remains in an advanced stage of decomposition. This study examined the performance of two machine learning (ML) algorithms in predicting the person's sex based only on the morphometry of L1-L5 lumbar vertebrae collected recently from Romanian individuals. The purpose of the present study was to assess whether by using the machine learning (ML) techniques one can obtain a reliable prediction of sex in forensic identification based only on the parameters obtained from the metric analysis of the lumbar spine. METHOD This paper built and tuned predictive models with two of the most popular techniques for classification, RF (random forest) and XGB (xgboost). Both series of models used cross-validation and a grid search to find the best combination of hyper-parameters. The best models were selected based on the ROC_AUC (area under curve) metric. RESULTS The L1-L5 lumbar vertebrae exhibit sexual dimorphism and can be used as predictors in sex prediction. Out of the eight significant predictors for sex, six were found to be particularly important for the RF model, while only three were determined to be important by the XGB model. CONCLUSIONS Even if the data set was small (149 observations), both RF and XGB techniques reliably predicted a person's sex based only on the L1-L5 measurements. This can prove valuable, especially when only skeletal remains are available. With minor adjustments, the presented ML setup can be transformed into an interactive web service, freely accessible to forensic anthropologists, in which, after entering the L1-L5 measurements of a body/cadaver, they can predict the person's sex.
Collapse
Affiliation(s)
- Madalina Maria Diac
- Forensic Medicine Sciences Department, Institute of Legal Medicine, University of Medicine and Pharmacy “Grigore T. Popa”, 700115 Iasi, Romania; (M.M.D.); (D.B.I.)
| | - Gina Madalina Toma
- Forensic Medicine Department, “Sf. Ioan” Hospital Suceava, University of Medicine and Pharmacy “Grigore T. Popa”, 700115 Iasi, Romania
| | - Simona Irina Damian
- Forensic Medicine Sciences Department, Institute of Legal Medicine, University of Medicine and Pharmacy “Grigore T. Popa”, 700115 Iasi, Romania; (M.M.D.); (D.B.I.)
| | - Marin Fotache
- Alexandru Ioan Cuza University, 700506 Iasi, Romania; (M.F.); (N.R.)
| | - Nicolae Romanov
- Alexandru Ioan Cuza University, 700506 Iasi, Romania; (M.F.); (N.R.)
| | - Daniel Tabian
- Department of Fundamental, Prophylactic and Clinical Disciplines, Medicine Faculty, Transilvania University of Brasov, 500019 Brasov, Romania; (D.T.); (G.S.)
| | - Gabriela Sechel
- Department of Fundamental, Prophylactic and Clinical Disciplines, Medicine Faculty, Transilvania University of Brasov, 500019 Brasov, Romania; (D.T.); (G.S.)
| | - Andrei Scripcaru
- Forensic Medicine Sciences Department, University of Medicine and Pharmacy “Grigore T. Popa”, 700115 Iasi, Romania;
| | - Monica Hancianu
- Pharmacy Department, University of Medicine and Pharmacy “Grigore T. Popa”, 700115 Iasi, Romania;
| | - Diana Bulgaru Iliescu
- Forensic Medicine Sciences Department, Institute of Legal Medicine, University of Medicine and Pharmacy “Grigore T. Popa”, 700115 Iasi, Romania; (M.M.D.); (D.B.I.)
| |
Collapse
|
30
|
Bowe AK, Lightbody G, Staines A, Murray DM, Norman M. Prediction of 2-Year Cognitive Outcomes in Very Preterm Infants Using Machine Learning Methods. JAMA Netw Open 2023; 6:e2349111. [PMID: 38147334 PMCID: PMC10751596 DOI: 10.1001/jamanetworkopen.2023.49111] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Accepted: 11/09/2023] [Indexed: 12/27/2023] Open
Abstract
Importance Early intervention can improve cognitive outcomes for very preterm infants but is resource intensive. Identifying those who need early intervention most is important. Objective To evaluate a model for use in very preterm infants to predict cognitive delay at 2 years of age using routinely available clinical and sociodemographic data. Design, Setting, and Participants This prognostic study was based on the Swedish Neonatal Quality Register. Nationwide coverage of neonatal data was reached in 2011, and registration of follow-up data opened on January 1, 2015, with inclusion ending on September 31, 2022. A variety of machine learning models were trained and tested to predict cognitive delay. Surviving infants from neonatal units in Sweden with a gestational age younger than 32 weeks and complete data for the Bayley Scales of Infant and Toddler Development, Third Edition cognitive index or cognitive scale scores at 2 years of corrected age were assessed. Infants with major congenital anomalies were excluded. Exposures A total of 90 variables (containing sociodemographic and clinical information on conditions, investigations, and treatments initiated during pregnancy, delivery, and neonatal unit admission) were examined for predictability. Main Outcomes and Measures The main outcome was cognitive function at 2 years, categorized as screening positive for cognitive delay (cognitive index score <90) or exhibiting typical cognitive development (score ≥90). Results A total of 1062 children (median [IQR] birth weight, 880 [720-1100] g; 566 [53.3%] male) were included in the modeling process, of whom 231 (21.8%) had cognitive delay. A logistic regression model containing 26 predictive features achieved an area under the receiver operating curve of 0.77 (95% CI, 0.71-0.83). The 5 most important features for cognitive delay were non-Scandinavian family language, prolonged duration of hospitalization, low birth weight, discharge to other destination than home, and the infant not receiving breastmilk on discharge. At discharge from the neonatal unit, the full model could correctly identify 605 of 650 infants who would have cognitive delay at 24 months (sensitivity, 0.93) and 1081 of 2350 who would not (specificity, 0.46). Conclusions and Relevance The findings of this study suggest that predictive modeling in neonatal care could enable early and targeted intervention for very preterm infants most at risk for developing cognitive impairment.
Collapse
Affiliation(s)
- Andrea K. Bowe
- INFANT Research Centre, University College Cork, Cork, Ireland
| | - Gordon Lightbody
- INFANT Research Centre, University College Cork, Cork, Ireland
- Department of Electrical and Electronic Engineering, University College Cork, Cork, Ireland
| | - Anthony Staines
- School of Nursing, Psychotherapy, and Community Health, Dublin City University, Dublin, Ireland
| | - Deirdre M. Murray
- INFANT Research Centre, University College Cork, Cork, Ireland
- Department of Paediatrics, Cork University Hospital, Cork, Ireland
| | - Mikael Norman
- Department of Clinical Science, Intervention, and Technology, Karolinska University Hospital, Karolinska Institutet, Stockholm, Sweden
- Department of Neonatal Medicine, Karolinska University Hospital, Karolinska Institutet, Stockholm, Sweden
| |
Collapse
|
31
|
Tao S, Yu L, Li J, Xie Z, Huang L, Yang D, Tan Y, Zhang W, Huang X, Xue T. Prognostic value of triglyceride-glucose index in patients with chronic coronary syndrome undergoing percutaneous coronary intervention. Cardiovasc Diabetol 2023; 22:322. [PMID: 38017540 PMCID: PMC10685592 DOI: 10.1186/s12933-023-02060-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Accepted: 11/10/2023] [Indexed: 11/30/2023] Open
Abstract
BACKGROUND The triglyceride-glucose (TyG) index has been proposed as a reliable surrogate marker of insulin resistance and an independent predictor of major adverse cardiovascular events (MACEs). Several recent studies have shown the relationship between the TyG index and cardiovascular outcomes; however, the role of the TyG index in chronic coronary syndrome (CCS) progression has not been extensively assessed especially in population after revascularization. This study aimed to investigate the prognostic value of the TyG index in predicting MACEs in CCS patients undergoing percutaneous coronary intervention (PCI). METHODS The data for the study were taken from the Hospital Information System database in China-Japan Friendship Hospital over the period 2019-2021. Eligible participants were divided into groups according to the TyG index tertiles. The Boruta algorithm was performed for feature selection. Multivariate Cox proportional hazards models and restricted cubic spline (RCS) analysis were applied to examine the dose-response relationship between the TyG index and endpoint, and the results were expressed with hazard ratio (HR) and 95% confidence interval (CI) values. The area under the receiver operating characteristic (ROC) curve (AUC), decision curve analysis (DCA), and clinical impact curve (CIC) were plotted to comprehensively evaluate the predictive accuracy and clinical value of the model. The goodness-of-fit of models was evaluated using the calibration curve and χ2 likelihood ratio test. RESULTS After applying inclusion and exclusion criteria, 1353 patients with CCS undergoing PCI were enrolled in the study. After adjusting for all confounders, we found that those with the highest TyG index had a 59.5% increased risk of MACEs over the 1-year follow-up (HR 1.595, 95% CI 1.370 ~ 1.855). Using the lowest TyG index tertile as the reference (T1), the fully adjusted HRs (95% CIs) for endpoints was 1.343 (1.054 ~ 1.711) in the middle (T2) and 2.297 (1.842 ~ 2.864) in highest tertile (T3) (P for trend < 0.001). The TyG index had an excellent predictive performance according to the results of AUC 0.810 (0.786, 0.834) and χ2 likelihood ratio test (χ2 = 7.474, P = 0.486). DCA and CIC analysis also suggested a good overall net benefit and clinical impact of the multivariate model. The results in the subgroup analysis were consistent with the main analyses. RCS model demonstrated that the TyG index was nonlinearly associated with the risk of MACEs within one year (P for nonlinear < 0.001). CONCLUSION The elevated TyG index is associated with an increased risk of cardiovascular events and predicts future MACEs in patients with CCS undergoing PCI independently of known cardiovascular risk factors, indicating that the TyG index may be a potential marker for risk stratification and prognosis in CCS patients undergoing PCI.
Collapse
Affiliation(s)
- Shiyi Tao
- Department of Cardiology, Guang'anmen Hospital, China Academy of Chinese Medical Sciences, Beijing, China
- Graduate School, Beijing University of Chinese Medicine, Beijing, China
| | - Lintong Yu
- Graduate School, Beijing University of Chinese Medicine, Beijing, China
| | - Jun Li
- Department of Cardiology, Guang'anmen Hospital, China Academy of Chinese Medical Sciences, Beijing, China.
| | - Zicong Xie
- Department of Cardiology, Guang'anmen Hospital, China Academy of Chinese Medical Sciences, Beijing, China
| | - Li Huang
- Department of Integrative Cardiology, China-Japan Friendship Hospital, Beijing, China
| | - Deshuang Yang
- Department of Integrative Cardiology, China-Japan Friendship Hospital, Beijing, China
| | - Yuqing Tan
- Department of Cardiology, Guang'anmen Hospital, China Academy of Chinese Medical Sciences, Beijing, China
- Graduate School, Beijing University of Chinese Medicine, Beijing, China
| | - Wenjie Zhang
- Department of Cardiology, Guang'anmen Hospital, China Academy of Chinese Medical Sciences, Beijing, China
- Graduate School, Beijing University of Chinese Medicine, Beijing, China
| | - Xuanchun Huang
- Department of Cardiology, Guang'anmen Hospital, China Academy of Chinese Medical Sciences, Beijing, China
| | - Tiantian Xue
- Department of Cardiology, Guang'anmen Hospital, China Academy of Chinese Medical Sciences, Beijing, China
| |
Collapse
|
32
|
Ng JWY, Felix JF, Olson DM. A novel approach to risk exposure and epigenetics-the use of multidimensional context to gain insights into the early origins of cardiometabolic and neurocognitive health. BMC Med 2023; 21:466. [PMID: 38012757 PMCID: PMC10683259 DOI: 10.1186/s12916-023-03168-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/02/2023] [Accepted: 11/09/2023] [Indexed: 11/29/2023] Open
Abstract
BACKGROUND Each mother-child dyad represents a unique combination of genetic and environmental factors. This constellation of variables impacts the expression of countless genes. Numerous studies have uncovered changes in DNA methylation (DNAm), a form of epigenetic regulation, in offspring related to maternal risk factors. How these changes work together to link maternal-child risks to childhood cardiometabolic and neurocognitive traits remains unknown. This question is a key research priority as such traits predispose to future non-communicable diseases (NCDs). We propose viewing risk and the genome through a multidimensional lens to identify common DNAm patterns shared among diverse risk profiles. METHODS We identified multifactorial Maternal Risk Profiles (MRPs) generated from population-based data (n = 15,454, Avon Longitudinal Study of Parents and Children (ALSPAC)). Using cord blood HumanMethylation450 BeadChip data, we identified genome-wide patterns of DNAm that co-vary with these MRPs. We tested the prospective relation of these DNAm patterns (n = 914) to future outcomes using decision tree analysis. We then tested the reproducibility of these patterns in (1) DNAm data at age 7 and 17 years within the same cohort (n = 973 and 974, respectively) and (2) cord DNAm in an independent cohort, the Generation R Study (n = 686). RESULTS We identified twenty MRP-related DNAm patterns at birth in ALSPAC. Four were prospectively related to cardiometabolic and/or neurocognitive childhood outcomes. These patterns were replicated in DNAm data from blood collected at later ages. Three of these patterns were externally validated in cord DNAm data in Generation R. Compared to previous literature, DNAm patterns exhibited novel spatial distribution across the genome that intersects with chromatin functional and tissue-specific signatures. CONCLUSIONS To our knowledge, we are the first to leverage multifactorial population-wide data to detect patterns of variability in DNAm. This context-based approach decreases biases stemming from overreliance on specific samples or variables. We discovered molecular patterns demonstrating prospective and replicable relations to complex traits. Moreover, results suggest that patterns harbour a genome-wide organisation specific to chromatin regulation and target tissues. These preliminary findings warrant further investigation to better reflect the reality of human context in molecular studies of NCDs.
Collapse
Affiliation(s)
- Jane W Y Ng
- Department of Pediatrics, Cummings School of Medicine, University of Calgary, 28 Oki Drive NW, Calgary, AB, T3B 6A8, Canada
| | - Janine F Felix
- The Generation F Study Group, Erasmus MC University Medical Center Rotterdam, Postbus, 2040, 3000 CA, Rotterdam, The Netherlands
- Department of Pediatrics, Erasmus MC University Medical Center Rotterdam, Rotterdam, The Netherlands
| | - David M Olson
- Departments of Obstetrics and Gynecology, Physiology, and Pediatrics, Faculty of Medicine and Dentistry, University of Alberta, 220 HMRC, Edmonton, AB, T6G2S2, Canada.
| |
Collapse
|
33
|
Alshejari A, Kodogiannis VS, Leonidis S. Combining Feature Selection Techniques and Neurofuzzy Systems for the Prediction of Total Viable Counts in Beef Fillets Using Multispectral Imaging. SENSORS (BASEL, SWITZERLAND) 2023; 23:9451. [PMID: 38067823 PMCID: PMC10708854 DOI: 10.3390/s23239451] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/08/2023] [Revised: 11/18/2023] [Accepted: 11/26/2023] [Indexed: 12/18/2023]
Abstract
In the food industry, quality and safety issues are associated with consumers' health condition. There is a growing interest in applying various noninvasive sensorial techniques to obtain quickly quality attributes. One of them, hyperspectral/multispectral imaging technique has been extensively used for inspection of various food products. In this paper, a stacking-based ensemble prediction system has been developed for the prediction of total viable counts of microorganisms in beef fillet samples, an essential cause to meat spoilage, utilizing multispectral imaging information. As the selection of important wavelengths from the multispectral imaging system is considered as an essential stage to the prediction scheme, a features fusion approach has been also explored, by combining wavelengths extracted from various feature selection techniques. Ensemble sub-components include two advanced clustering-based neuro-fuzzy network prediction models, one utilizing information from average reflectance values, while the other one from the standard deviation of the pixels' intensity per wavelength. The performances of neurofuzzy models were compared against established regression algorithms such as multilayer perceptron, support vector machines and partial least squares. Obtained results confirmed the validity of the proposed hypothesis to utilize a combination of feature selection methods with neurofuzzy models in order to assess the microbiological quality of meat products.
Collapse
Affiliation(s)
- Abeer Alshejari
- Department of Mathematical Science, Princess Nourah Bint Abdulrahman University, Riyadh 11671, Saudi Arabia;
| | | | - Stavros Leonidis
- Consulting & Systems Integration, Netcompany-Intrasoft, GR-57001 Thessaloniki, Greece;
| |
Collapse
|
34
|
Alexander H, Hu SK, Krinos AI, Pachiadaki M, Tully BJ, Neely CJ, Reiter T. Eukaryotic genomes from a global metagenomic data set illuminate trophic modes and biogeography of ocean plankton. mBio 2023; 14:e0167623. [PMID: 37947402 PMCID: PMC10746220 DOI: 10.1128/mbio.01676-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Accepted: 09/27/2023] [Indexed: 11/12/2023] Open
Abstract
Metagenomics is a powerful method for interpreting the ecological roles and physiological capabilities of mixed microbial communities. Yet, many tools for processing metagenomic data are neither designed to consider eukaryotes nor are they built for an increasing amount of sequence data. EukHeist is an automated pipeline to retrieve eukaryotic and prokaryotic metagenome-assembled genomes (MAGs) from large-scale metagenomic sequence data sets. We developed the EukHeist workflow to specifically process large amounts of both metagenomic and/or metatranscriptomic sequence data in an automated and reproducible fashion. Here, we applied EukHeist to the large-size fraction data (0.8-2,000 µm) from Tara Oceans to recover both eukaryotic and prokaryotic MAGs, which we refer to as TOPAZ (Tara Oceans Particle-Associated MAGs). The TOPAZ MAGs consisted of >900 environmentally relevant eukaryotic MAGs and >4,000 bacterial and archaeal MAGs. The bacterial and archaeal TOPAZ MAGs expand upon the phylogenetic diversity of likely particle- and host-associated taxa. We use these MAGs to demonstrate an approach to infer the putative trophic mode of the recovered eukaryotic MAGs. We also identify ecological cohorts of co-occurring MAGs, which are driven by specific environmental factors and putative host-microbe associations. These data together add to a number of growing resources of environmentally relevant eukaryotic genomic information. Complementary and expanded databases of MAGs, such as those provided through scalable pipelines like EukHeist, stand to advance our understanding of eukaryotic diversity through increased coverage of genomic representatives across the tree of life.IMPORTANCESingle-celled eukaryotes play ecologically significant roles in the marine environment, yet fundamental questions about their biodiversity, ecological function, and interactions remain. Environmental sequencing enables researchers to document naturally occurring protistan communities, without culturing bias, yet metagenomic and metatranscriptomic sequencing approaches cannot separate individual species from communities. To more completely capture the genomic content of mixed protistan populations, we can create bins of sequences that represent the same organism (metagenome-assembled genomes [MAGs]). We developed the EukHeist pipeline, which automates the binning of population-level eukaryotic and prokaryotic genomes from metagenomic reads. We show exciting insight into what protistan communities are present and their trophic roles in the ocean. Scalable computational tools, like EukHeist, may accelerate the identification of meaningful genetic signatures from large data sets and complement researchers' efforts to leverage MAG databases for addressing ecological questions, resolving evolutionary relationships, and discovering potentially novel biodiversity.
Collapse
Affiliation(s)
- Harriet Alexander
- Biology Department, Woods Hole Oceanographic Institution, Woods Hole, Massachusetts, USA
| | - Sarah K. Hu
- Marine Chemistry and Geochemistry, Woods Hole Oceanographic Institution, Woods Hole, Massachusetts, USA
| | - Arianna I. Krinos
- Biology Department, Woods Hole Oceanographic Institution, Woods Hole, Massachusetts, USA
- MIT-WHOI Joint Program in Oceanography/Applied Ocean Science and Engineering, Cambridge and Woods Hole, Massachusetts, USA
| | - Maria Pachiadaki
- Biology Department, Woods Hole Oceanographic Institution, Woods Hole, Massachusetts, USA
| | - Benjamin J. Tully
- Department of Biological Sciences, University of Southern California, Los Angeles, California, USA
| | - Christopher J. Neely
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, California, USA
| | - Taylor Reiter
- Population Health and Reproduction, University of California, Davis, Davis, California, USA
| |
Collapse
|
35
|
Tao S, Yu L, Li J, Huang L, Huang X, Zhang W, Xie Z, Tan Y, Yang D. Association between the triglyceride-glucose index and 1-year major adverse cardiovascular events in patients with coronary heart disease and hypertension. Cardiovasc Diabetol 2023; 22:305. [PMID: 37940943 PMCID: PMC10633928 DOI: 10.1186/s12933-023-02018-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/08/2023] [Accepted: 10/09/2023] [Indexed: 11/10/2023] Open
Abstract
BACKGROUND The triglyceride-glucose (TyG) index has been proposed as a potential predictor of adverse prognosis of coronary heart disease (CHD). However, its prognostic value in patients with CHD and hypertension remains unclear. This study aimed to evaluate the association between the TyG index and the 1-year risk of major adverse cardiovascular events (MACEs) in patients with CHD and hypertension. METHODS The data for the study were taken from the Hospital Information System database in China-Japan Friendship Hospital which contained over 10,000 cardiovascular admissions from 2019 to 2022. The Boruta algorithm was performed for feature selection. The study used univariable analysis, multivariable logistic regression analysis, and restricted cubic spline (RCS) regression to evaluate the association between the TyG index and the 1-year risk of MACEs in patients with CHD and hypertension. RESULTS After applying inclusion and exclusion criteria, a total of 810 patients with CHD and hypertension were included in the study with a median TyG index of 8.85 (8.48, 9.18). Using the lowest TyG index quartile as the reference, the fully adjusted ORs (95% CIs) for 1-year MACEs for TyG index Q2, Q3, and Q4 were 1.001 (0.986 ~ 1.016), 1.047 (1.032 ~ 1.062), and 1.760 (1.268 ~ 2.444), respectively. After adjusting for all confounders, we found that those with the highest TyG index had a 47.0% increased risk of MACEs over the 1-year follow-up (OR 1.470, 95% CI 1.071 ~ 2.018). The results in the subgroup analysis were similar to the main analyses. RCS model suggested that the TyG index was nonlinearly associated with the 1-year risk of MACEs (P for nonlinear < 0.001). CONCLUSION This study shows that the elevated TyG index is a potential marker of adverse prognosis among patients with CHD and hypertension and informs the development of clinical decisions to improve outcomes.
Collapse
Affiliation(s)
- Shiyi Tao
- Department of Cardiology, Guang'anmen Hospital, China Academy of Chinese Medical Sciences, Beijing, China
- Graduate School, Beijing University of Chinese Medicine, Beijing, China
| | - Lintong Yu
- Graduate School, Beijing University of Chinese Medicine, Beijing, China
| | - Jun Li
- Department of Cardiology, Guang'anmen Hospital, China Academy of Chinese Medical Sciences, Beijing, China.
| | - Li Huang
- Department of Integrative Cardiology, China-Japan Friendship Hospital, Beijing, China
| | - Xuanchun Huang
- Department of Cardiology, Guang'anmen Hospital, China Academy of Chinese Medical Sciences, Beijing, China
| | - Wenjie Zhang
- Department of Cardiology, Guang'anmen Hospital, China Academy of Chinese Medical Sciences, Beijing, China
- Graduate School, Beijing University of Chinese Medicine, Beijing, China
| | - Zicong Xie
- Department of Cardiology, Guang'anmen Hospital, China Academy of Chinese Medical Sciences, Beijing, China
| | - Yuqing Tan
- Department of Cardiology, Guang'anmen Hospital, China Academy of Chinese Medical Sciences, Beijing, China
- Graduate School, Beijing University of Chinese Medicine, Beijing, China
| | - Deshuang Yang
- Department of Integrative Cardiology, China-Japan Friendship Hospital, Beijing, China
| |
Collapse
|
36
|
Wang X, Qiao Y, Cui Y, Ren H, Zhao Y, Linghu L, Ren J, Zhao Z, Chen L, Qiu L. An explainable artificial intelligence framework for risk prediction of COPD in smokers. BMC Public Health 2023; 23:2164. [PMID: 37932692 PMCID: PMC10626705 DOI: 10.1186/s12889-023-17011-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2023] [Accepted: 10/17/2023] [Indexed: 11/08/2023] Open
Abstract
BACKGROUND Since the inconspicuous nature of early signs associated with Chronic Obstructive Pulmonary Disease (COPD), individuals often remain unidentified, leading to suboptimal opportunities for timely prevention and treatment. The purpose of this study was to create an explainable artificial intelligence framework combining data preprocessing methods, machine learning methods, and model interpretability methods to identify people at high risk of COPD in the smoking population and to provide a reasonable interpretation of model predictions. METHODS The data comprised questionnaire information, physical examination data and results of pulmonary function tests before and after bronchodilatation. First, the factorial analysis for mixed data (FAMD), Boruta and NRSBoundary-SMOTE resampling methods were used to solve the missing data, high dimensionality and category imbalance problems. Then, seven classification models (CatBoost, NGBoost, XGBoost, LightGBM, random forest, SVM and logistic regression) were applied to model the risk level, and the best machine learning (ML) model's decisions were explained using the Shapley additive explanations (SHAP) method and partial dependence plot (PDP). RESULTS In the smoking population, age and 14 other variables were significant factors for predicting COPD. The CatBoost, random forest, and logistic regression models performed reasonably well in unbalanced datasets. CatBoost with NRSBoundary-SMOTE had the best classification performance in balanced datasets when composite indicators (the AUC, F1-score, and G-mean) were used as model comparison criteria. Age, COPD Assessment Test (CAT) score, gross annual income, body mass index (BMI), systolic blood pressure (SBP), diastolic blood pressure (DBP), anhelation, respiratory disease, central obesity, use of polluting fuel for household heating, region, use of polluting fuel for household cooking, and wheezing were important factors for predicting COPD in the smoking population. CONCLUSION This study combined feature screening methods, unbalanced data processing methods, and advanced machine learning methods to enable early identification of COPD risk groups in the smoking population. COPD risk factors in the smoking population were identified using SHAP and PDP, with the goal of providing theoretical support for targeted screening strategies and smoking population self-management strategies.
Collapse
Affiliation(s)
- Xuchun Wang
- Department of Health Statistics, School of Public Health, Shanxi Medical University, 56 South XinJian Road, Taiyuan, 030001, P.R. China
| | - Yuchao Qiao
- Department of Health Statistics, School of Public Health, Shanxi Medical University, 56 South XinJian Road, Taiyuan, 030001, P.R. China
| | - Yu Cui
- Department of Health Statistics, School of Public Health, Shanxi Medical University, 56 South XinJian Road, Taiyuan, 030001, P.R. China
| | - Hao Ren
- Department of Health Statistics, School of Public Health, Shanxi Medical University, 56 South XinJian Road, Taiyuan, 030001, P.R. China
| | - Ying Zhao
- Shanxi Centre for Disease Control and Prevention, Taiyuan, Shanxi, 030012, China
| | - Liqin Linghu
- Department of Health Statistics, School of Public Health, Shanxi Medical University, 56 South XinJian Road, Taiyuan, 030001, P.R. China
- Shanxi Centre for Disease Control and Prevention, Taiyuan, Shanxi, 030012, China
| | - Jiahui Ren
- Department of Health Statistics, School of Public Health, Shanxi Medical University, 56 South XinJian Road, Taiyuan, 030001, P.R. China
| | - Zhiyang Zhao
- Department of Health Statistics, School of Public Health, Shanxi Medical University, 56 South XinJian Road, Taiyuan, 030001, P.R. China
| | - Limin Chen
- The Fifth Hospital (Shanxi People's Hospital) of Shanxi Medical University, Taiyuan, Shanxi, 030012, P.R. China.
| | - Lixia Qiu
- Department of Health Statistics, School of Public Health, Shanxi Medical University, 56 South XinJian Road, Taiyuan, 030001, P.R. China.
| |
Collapse
|
37
|
Cao Y, Zhang J, Huang L, Zhao Z, Zhang G, Ren J, Li H, Zhang H, Guo B, Wang Z, Xing Y, Zhou J. Construction of prediction model for KRAS mutation status of colorectal cancer based on CT radiomics. Jpn J Radiol 2023; 41:1236-1246. [PMID: 37311935 PMCID: PMC10613595 DOI: 10.1007/s11604-023-01458-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Accepted: 06/04/2023] [Indexed: 06/15/2023]
Abstract
BACKGROUND In this study, we used computed tomography (CT)-based radiomics signatures to predict the mutation status of KRAS in patients with colorectal cancer (CRC) and to identify the phase of radiomics signature with the most robust and high performance from triphasic enhanced CT. METHODS This study involved 447 patients who underwent KRAS mutation testing and preoperative triphasic enhanced CT. They were categorized into training (n = 313) and validation cohorts (n = 134) in a 7:3 ratio. Radiomics features were extracted using triphasic enhanced CT imaging. The Boruta algorithm was used to retain the features closely associated with KRAS mutations. The Random Forest (RF) algorithm was used to develop radiomics, clinical, and combined clinical-radiomics models for KRAS mutations. The receiver operating characteristic curve, calibration curve, and decision curve were used to evaluate the predictive performance and clinical usefulness of each model. RESULTS Age, CEA level, and clinical T stage were independent predictors of KRAS mutation status. After rigorous feature screening, four arterial phase (AP), three venous phase (VP), and seven delayed phase (DP) radiomics features were retained as the final signatures for predicting KRAS mutations. The DP models showed superior predictive performance compared to AP or VP models. The clinical-radiomics fusion model showed excellent performance, with an AUC, sensitivity, and specificity of 0.772, 0.792, and 0.646 in the training cohort, and 0.755, 0.724, and 0.684 in the validation cohort, respectively. The decision curve showed that the clinical-radiomics fusion model had more clinical practicality than the single clinical or radiomics model in predicting KRAS mutation status. CONCLUSION The clinical-radiomics fusion model, which combines the clinical and DP radiomics model, has the best predictive performance for predicting the mutation status of KRAS in CRC, and the constructed model has been effectively verified by an internal validation cohort.
Collapse
Affiliation(s)
- Yuntai Cao
- Department of Radiology, Affiliated Hospital of Qinghai University, Tongren Road No. 29, Xining, 810001, People's Republic of China.
- Department of Radiology, Lanzhou University Second Hospital, Cuiyingmen No. 82, Chengguan District, Lanzhou, 730030, People's Republic of China.
- Key Laboratory of Medical Imaging of Gansu Province, Lanzhou, 730030, People's Republic of China.
- Gansu International Scientific and Technological Cooperation Base of Medical Imaging Artificial Intelligence, Lanzhou, 730030, People's Republic of China.
| | - Jing Zhang
- The Fifth Affiliated Hospital of Zunyi Medical University, Zunyi, 519100, People's Republic of China
| | - Lele Huang
- Department of Nuclear Medicine, Lanzhou University Second Hospital, Lanzhou, China
| | - Zhiyong Zhao
- Department of Neurosurgery, Lanzhou University Second Hospital, Lanzhou, China
| | - Guojin Zhang
- Sichuan Academy of Medical Sciences & Sichuan Provincial People's Hospital, Chengdu, China
| | - Jialiang Ren
- Department of Pharmaceuticals Diagnosis, GE Healthcare, Beijing, China
| | - Hailong Li
- Affiliated Hospital of Qinghai University, Xining, China
| | - Hongqian Zhang
- Affiliated Hospital of Qinghai University, Xining, China
| | - Bin Guo
- Affiliated Hospital of Qinghai University, Xining, China
| | - Zhan Wang
- Affiliated Hospital of Qinghai University, Xining, China
| | - Yue Xing
- Xinxiang Medical University, Henan, China
| | - Junlin Zhou
- Department of Radiology, Lanzhou University Second Hospital, Cuiyingmen No. 82, Chengguan District, Lanzhou, 730030, People's Republic of China.
- Key Laboratory of Medical Imaging of Gansu Province, Lanzhou, 730030, People's Republic of China.
- Gansu International Scientific and Technological Cooperation Base of Medical Imaging Artificial Intelligence, Lanzhou, 730030, People's Republic of China.
| |
Collapse
|
38
|
Klontzas ME, Leventis D, Spanakis K, Karantanas AH, Kranioti EF. Post-mortem CT radiomics for the prediction of time since death. Eur Radiol 2023; 33:8387-8395. [PMID: 37329460 DOI: 10.1007/s00330-023-09746-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Revised: 03/27/2023] [Accepted: 04/22/2023] [Indexed: 06/19/2023]
Abstract
OBJECTIVES Post-mortem interval (PMI) estimation has long been relying on sequential post-mortem changes on the body as a function of extrinsic, intrinsic, and environmental factors. Such factors are difficult to account for in complicated death scenes; thus, PMI estimation can be compromised. Herein, we aimed to evaluate the use of post-mortem CT (PMCT) radiomics for the differentiation between early and late PMI. METHODS Consecutive whole-body PMCT examinations performed between 2016 and 2021 were retrospectively included (n = 120), excluding corpses without an accurately reported PMI (n = 23). Radiomics data were extracted from liver and pancreas tissue and randomly split into training and validation sets (70:30%). Following data preprocessing, significant features were selected (Boruta selection) and three XGBoost classifiers were built (liver, pancreas, combined) to differentiate between early (< 12 h) and late (> 12 h) PMI. Classifier performance was assessed with receiver operating characteristics (ROC) curves and areas under the curves (AUC), which were compared by bootstrapping. RESULTS A total of 97 PMCTs were included, representing individuals (23 females and 74 males) with a mean age of 47.1 ± 23.38 years. The combined model achieved the highest AUC reaching 75% (95%CI 58.4-91.6%) (p = 0.03 compared to liver and p = 0.18 compared to pancreas). The liver-based and pancreas-based XGBoost models achieved AUCs of 53.6% (95%CI 34.8-72.3%) and 64.3% (95%CI 46.7-81.9%) respectively (p > 0.05 for the comparison between liver- and pancreas-based models). CONCLUSION The use of radiomics analysis on PMCT examinations differentiated early from late PMI, unveiling a novel image-based method with important repercussions in forensic casework. CLINICAL RELEVANCE STATEMENT This paper introduces the employment of radiomics in forensic diagnosis by presenting an effective automated alternative method of estimating post-mortem interval from targeted tissues, thus paving the way for improvement in speed and quality of forensic investigations. KEY POINTS • A combined liver-pancreas radiomics model differentiated early from late post-mortem intervals (using a 12-h threshold) with an area under the curve of 75% (95%CI 58.4-91.6%). • XGBoost models based on liver-only or pancreas-only radiomics demonstrated inferior performance to the combined model in predicting the post-mortem interval.
Collapse
Affiliation(s)
- Michail E Klontzas
- Department of Medical Imaging, University Hospital of Heraklion, Voutes, Heraklion, 71110, Crete, Greece
- Department of Radiology, Medical School, University of Crete, Voutes, Heraklion, 71110, Crete, Greece
- Advanced Hybrid Imaging Systems, Institute of Computer Science - FORTH, Voutes, Heraklion, 71110, Crete, Greece
| | - Dimitrios Leventis
- Department of Medical Imaging, University Hospital of Heraklion, Voutes, Heraklion, 71110, Crete, Greece
| | - Konstantinos Spanakis
- Department of Medical Imaging, University Hospital of Heraklion, Voutes, Heraklion, 71110, Crete, Greece
| | - Apostolos H Karantanas
- Department of Medical Imaging, University Hospital of Heraklion, Voutes, Heraklion, 71110, Crete, Greece.
- Department of Radiology, Medical School, University of Crete, Voutes, Heraklion, 71110, Crete, Greece.
- Advanced Hybrid Imaging Systems, Institute of Computer Science - FORTH, Voutes, Heraklion, 71110, Crete, Greece.
| | - Elena F Kranioti
- Forensic Medicine Unit, Department of Forensic Sciences, Faculty of Medicine, University of Crete, Voutes, Heraklion, 71110, Greece.
| |
Collapse
|
39
|
Zieliński K, Drabczyk D, Kunicki M, Drzyzga D, Kloska A, Rumiński J. Evaluating the risk of endometriosis based on patients' self-assessment questionnaires. Reprod Biol Endocrinol 2023; 21:102. [PMID: 37898817 PMCID: PMC10612251 DOI: 10.1186/s12958-023-01156-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Accepted: 10/23/2023] [Indexed: 10/30/2023] Open
Abstract
BACKGROUND Endometriosis is a condition that significantly affects the quality of life of about 10 % of reproductive-aged women. It is characterized by the presence of tissue similar to the uterine lining (endometrium) outside the uterus, which can lead lead scarring, adhesions, pain, and fertility issues. While numerous factors associated with endometriosis are documented, a wide range of symptoms may still be undiscovered. METHODS In this study, we employed machine learning algorithms to predict endometriosis based on the patient symptoms extracted from 13,933 questionnaires. We compared the results of feature selection obtained from various algorithms (i.e., Boruta algorithm, Recursive Feature Selection) with experts' decisions. As a benchmark model architecture, we utilized a LightGBM algorithm, along with Multivariate Imputation by Chained Equations (MICE) and k-nearest neighbors (KNN), for missing data imputation. Our primary objective was to assess the model's performance and feature importance compared to existing studies. RESULTS We identified the top 20 predictors of endometriosis, uncovering previously overlooked features such as Cesarean section, ovarian cysts, and hernia. Notably, the model's performance metrics were maximized when utilizing a combination of multiple feature selection methods. Specifically, the final model achieved an area under the receiver operator characteristic curve (AUC) of 0.85 on the training dataset and an AUC of 0.82 on the testing dataset. CONCLUSIONS The application of machine learning in diagnosing endometriosis has the potential to significantly impact clinical practice, streamlining the diagnostic process and enhancing efficiency. Our questionnaire-based prediction approach empowers individuals with endometriosis to proactively identify potential symptoms, facilitating informed discussions with healthcare professionals about diagnosis and treatment options.
Collapse
Affiliation(s)
- Krystian Zieliński
- INVICTA, Research and Development Center, Sopot, Poland.
- Department of Biomedical Engineering, Faculty of Electronics, Telecommunications and Informatics, Gdańsk University of Technology, Gdańsk, Poland.
| | | | | | | | - Anna Kloska
- INVICTA, Research and Development Center, Sopot, Poland.
- Department of Medical Biology and Genetics, Faculty of Biology, University of Gdańsk, Gdańsk, Poland.
| | - Jacek Rumiński
- Department of Biomedical Engineering, Faculty of Electronics, Telecommunications and Informatics, Gdańsk University of Technology, Gdańsk, Poland
| |
Collapse
|
40
|
Leme DEDC, de Oliveira C. Machine Learning Models to Predict Future Frailty in Community-Dwelling Middle-Aged and Older Adults: The ELSA Cohort Study. J Gerontol A Biol Sci Med Sci 2023; 78:2176-2184. [PMID: 37209408 PMCID: PMC10613015 DOI: 10.1093/gerona/glad127] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2022] [Indexed: 05/22/2023] Open
Abstract
BACKGROUND Machine learning (ML) models can be used to predict future frailty in the community setting. However, outcome variables for epidemiologic data sets such as frailty usually have an imbalance between categories, that is, there are far fewer individuals classified as frail than as nonfrail, adversely affecting the performance of ML models when predicting the syndrome. METHODS A retrospective cohort study with participants (50 years or older) from the English Longitudinal Study of Ageing who were nonfrail at baseline (2008-2009) and reassessed for the frailty phenotype at 4-year follow-up (2012-2013). Social, clinical, and psychosocial baseline predictors were selected to predict frailty at follow-up in ML models (Logistic Regression, Random Forest [RF], Support Vector Machine, Neural Network, K-nearest neighbor, and Naive Bayes classifier). RESULTS Of all the 4 378 nonfrail participants at baseline, 347 became frail at follow-up. The proposed combined oversampling and undersampling method to adjust imbalanced data improved the performance of the models, and RF had the best performance, with areas under the receiver-operating characteristic curve and the precision-recall curve of 0.92 and 0.97, respectively, specificity of 0.83, sensitivity of 0.88, and balanced accuracy of 85.5% for balanced data. Age, chair-rise test, household wealth, balance problems, and self-rated health were the most important frailty predictors in most of the models trained with balanced data. CONCLUSIONS ML proved useful in identifying individuals who became frail over time, and this result was made possible by balancing the data set. This study highlighted factors that may be useful in the early detection of frailty.
Collapse
Affiliation(s)
| | - Cesar de Oliveira
- Department of Epidemiology and Public Health, University College London, London, UK
| |
Collapse
|
41
|
Wenck S, Mix T, Fischer M, Hackl T, Seifert S. Opening the Random Forest Black Box of 1H NMR Metabolomics Data by the Exploitation of Surrogate Variables. Metabolites 2023; 13:1075. [PMID: 37887402 PMCID: PMC10608983 DOI: 10.3390/metabo13101075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Revised: 10/05/2023] [Accepted: 10/10/2023] [Indexed: 10/28/2023] Open
Abstract
The untargeted metabolomics analysis of biological samples with nuclear magnetic resonance (NMR) provides highly complex data containing various signals from different molecules. To use these data for classification, e.g., in the context of food authentication, machine learning methods are used. These methods are usually applied as a black box, which means that no information about the complex relationships between the variables and the outcome is obtained. In this study, we show that the random forest-based approach surrogate minimal depth (SMD) can be applied for a comprehensive analysis of class-specific differences by selecting relevant variables and analyzing their mutual impact on the classification model of different truffle species. SMD allows the assignment of variables from the same metabolites as well as the detection of interactions between different metabolites that can be attributed to known biological relationships.
Collapse
Affiliation(s)
- Soeren Wenck
- Institute of Food Chemistry, Hamburg School of Food Science, University of Hamburg, Grindelallee 117, 20146 Hamburg, Germany (M.F.); (T.H.)
| | - Thorsten Mix
- Institute of Organic Chemistry, University of Hamburg, Martin-Luther-King-Platz 6, 20146 Hamburg, Germany;
| | - Markus Fischer
- Institute of Food Chemistry, Hamburg School of Food Science, University of Hamburg, Grindelallee 117, 20146 Hamburg, Germany (M.F.); (T.H.)
| | - Thomas Hackl
- Institute of Food Chemistry, Hamburg School of Food Science, University of Hamburg, Grindelallee 117, 20146 Hamburg, Germany (M.F.); (T.H.)
- Institute of Organic Chemistry, University of Hamburg, Martin-Luther-King-Platz 6, 20146 Hamburg, Germany;
| | - Stephan Seifert
- Institute of Food Chemistry, Hamburg School of Food Science, University of Hamburg, Grindelallee 117, 20146 Hamburg, Germany (M.F.); (T.H.)
| |
Collapse
|
42
|
Loesel H, Shakiba N, Wenck S, Le Tan P, Karstens TO, Creydt M, Seifert S, Hackl T, Fischer M. Food Monitoring: Limitations of Accelerated Storage to Predict Molecular Changes in Hazelnuts ( Corylus avellana L.) under Realistic Conditions Using UPLC-ESI-IM-QTOF-MS. Metabolites 2023; 13:1031. [PMID: 37887356 PMCID: PMC10608644 DOI: 10.3390/metabo13101031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2023] [Revised: 09/13/2023] [Accepted: 09/22/2023] [Indexed: 10/28/2023] Open
Abstract
Accelerated storage is routinely used with pharmaceuticals to predict stability and degradation patterns over time. The aim of this is to assess the shelf life and quality under harsher conditions, providing crucial insights into their long-term stability and potential storage issues. This study explores the potential of transferring this approach to food matrices for shelf-life estimation. Therefore, hazelnuts were stored under accelerated short-term and realistic long-term conditions. Subsequently, they were analyzed with high resolution mass spectrometry, focusing on the lipid profile. LC-MS analysis has shown that many unique processes take place under accelerated conditions that do not occur or occur much more slowly under realistic conditions. This mainly involved the degradation of membrane lipids such as phospholipids, ceramides, and digalactosyldiacylglycerides, while oxidation processes occurred at different rates in both conditions. It can be concluded that a food matrix is far too complex and heterogeneous compared to pharmaceuticals, so that many more processes take place during accelerated storage, which is why the results cannot be used to predict molecular changes in hazelnuts stored under realistic conditions.
Collapse
Affiliation(s)
- Henri Loesel
- Hamburg School of Food Science, Institute of Food Chemistry, University of Hamburg, Grindelallee 117, 20146 Hamburg, Germany; (H.L.); (N.S.); (S.W.); (P.L.T.); (T.-O.K.); (M.C.); (S.S.); (T.H.)
| | - Navid Shakiba
- Hamburg School of Food Science, Institute of Food Chemistry, University of Hamburg, Grindelallee 117, 20146 Hamburg, Germany; (H.L.); (N.S.); (S.W.); (P.L.T.); (T.-O.K.); (M.C.); (S.S.); (T.H.)
- Institute of Organic Chemistry, University of Hamburg, Martin-Luther-King-Platz 6, 20146 Hamburg, Germany
| | - Soeren Wenck
- Hamburg School of Food Science, Institute of Food Chemistry, University of Hamburg, Grindelallee 117, 20146 Hamburg, Germany; (H.L.); (N.S.); (S.W.); (P.L.T.); (T.-O.K.); (M.C.); (S.S.); (T.H.)
| | - Phat Le Tan
- Hamburg School of Food Science, Institute of Food Chemistry, University of Hamburg, Grindelallee 117, 20146 Hamburg, Germany; (H.L.); (N.S.); (S.W.); (P.L.T.); (T.-O.K.); (M.C.); (S.S.); (T.H.)
| | - Tim-Oliver Karstens
- Hamburg School of Food Science, Institute of Food Chemistry, University of Hamburg, Grindelallee 117, 20146 Hamburg, Germany; (H.L.); (N.S.); (S.W.); (P.L.T.); (T.-O.K.); (M.C.); (S.S.); (T.H.)
| | - Marina Creydt
- Hamburg School of Food Science, Institute of Food Chemistry, University of Hamburg, Grindelallee 117, 20146 Hamburg, Germany; (H.L.); (N.S.); (S.W.); (P.L.T.); (T.-O.K.); (M.C.); (S.S.); (T.H.)
| | - Stephan Seifert
- Hamburg School of Food Science, Institute of Food Chemistry, University of Hamburg, Grindelallee 117, 20146 Hamburg, Germany; (H.L.); (N.S.); (S.W.); (P.L.T.); (T.-O.K.); (M.C.); (S.S.); (T.H.)
| | - Thomas Hackl
- Hamburg School of Food Science, Institute of Food Chemistry, University of Hamburg, Grindelallee 117, 20146 Hamburg, Germany; (H.L.); (N.S.); (S.W.); (P.L.T.); (T.-O.K.); (M.C.); (S.S.); (T.H.)
- Institute of Organic Chemistry, University of Hamburg, Martin-Luther-King-Platz 6, 20146 Hamburg, Germany
| | - Markus Fischer
- Hamburg School of Food Science, Institute of Food Chemistry, University of Hamburg, Grindelallee 117, 20146 Hamburg, Germany; (H.L.); (N.S.); (S.W.); (P.L.T.); (T.-O.K.); (M.C.); (S.S.); (T.H.)
| |
Collapse
|
43
|
Hu M, Zhu J, Peng G, Lu W, Wang H, Xie Z. IMOVNN: incomplete multi-omics data integration variational neural networks for gut microbiome disease prediction and biomarker identification. Brief Bioinform 2023; 24:bbad394. [PMID: 37930027 DOI: 10.1093/bib/bbad394] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2023] [Revised: 09/03/2023] [Accepted: 10/14/2023] [Indexed: 11/07/2023] Open
Abstract
The gut microbiome has been regarded as one of the fundamental determinants regulating human health, and multi-omics data profiling has been increasingly utilized to bolster the deep understanding of this complex system. However, stemming from cost or other constraints, the integration of multi-omics often suffers from incomplete views, which poses a great challenge for the comprehensive analysis. In this work, a novel deep model named Incomplete Multi-Omics Variational Neural Networks (IMOVNN) is proposed for incomplete data integration, disease prediction application and biomarker identification. Benefiting from the information bottleneck and the marginal-to-joint distribution integration mechanism, the IMOVNN can learn the marginal latent representation of each individual omics and the joint latent representation for better disease prediction. Moreover, owing to the feature-selective layer predicated upon the concrete distribution, the model is interpretable and can identify the most relevant features. Experiments on inflammatory bowel disease multi-omics datasets demonstrate that our method outperforms several state-of-the-art methods for disease prediction. In addition, IMOVNN has identified significant biomarkers from multi-omics data sources.
Collapse
Affiliation(s)
- Mingyi Hu
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, China
| | - Jinlin Zhu
- School of Food Science and Technology, Jiangnan University, Wuxi, China
| | | | - Wenwei Lu
- School of Food Science and Technology, Jiangnan University, Wuxi, China
| | - Hongchao Wang
- School of Food Science and Technology, Jiangnan University, Wuxi, China
| | - Zhenping Xie
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, China
| |
Collapse
|
44
|
Wies C, Miltenberger R, Grieser G, Jahn-Eimermacher A. Exploring the variable importance in random forests under correlations: a general concept applied to donor organ quality in post-transplant survival. BMC Med Res Methodol 2023; 23:209. [PMID: 37726680 PMCID: PMC10507897 DOI: 10.1186/s12874-023-02023-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Accepted: 08/23/2023] [Indexed: 09/21/2023] Open
Abstract
Random Forests are a powerful and frequently applied Machine Learning tool. The permutation variable importance (VIMP) has been proposed to improve the explainability of such a pure prediction model. It describes the expected increase in prediction error after randomly permuting a variable and disturbing its association with the outcome. However, VIMPs measure a variable's marginal influence only, that can make its interpretation difficult or even misleading. In the present work we address the general need for improving the explainability of prediction models by exploring VIMPs in the presence of correlated variables. In particular, we propose to use a variable's residual information for investigating if its permutation importance partially or totally originates from correlated predictors. Hypotheses tests are derived by a resampling algorithm that can further support results by providing test decisions and p-values. In simulation studies we show that the proposed test controls type I error rates. When applying the methods to a Random Forest analysis of post-transplant survival after kidney transplantation, the importance of kidney donor quality for predicting post-transplant survival is shown to be high. However, the transplant allocation policy introduces correlations with other well-known predictors, which raises the concern that the importance of kidney donor quality may simply originate from these predictors. By using the proposed method, this concern is addressed and it is demonstrated that kidney donor quality plays an important role in post-transplant survival, regardless of correlations with other predictors.
Collapse
Affiliation(s)
- Christoph Wies
- Department of Mathematics and Natural Sciences, Darmstadt University of Applied Sciences, Schöfferstraße 3, Darmstadt, 64295, Germany
- Digital Biomarkers for Oncology, German Cancer Research Center (DKFZ), Im Neuenheimer Feld 223, Heidelberg, 69120, Germany
- Medical Facility, University Heidelberg, Im Neuenheimer Feld 672, Heidelberg, 69120, Germany
| | - Robert Miltenberger
- Department of Mathematics and Natural Sciences, Darmstadt University of Applied Sciences, Schöfferstraße 3, Darmstadt, 64295, Germany
| | - Gunter Grieser
- Department of Computer Science, Darmstadt University of Applied Sciences, Schöfferstraße 3, Darmstadt, 64295, Germany
| | - Antje Jahn-Eimermacher
- Department of Mathematics and Natural Sciences, Darmstadt University of Applied Sciences, Schöfferstraße 3, Darmstadt, 64295, Germany.
| |
Collapse
|
45
|
Liu M, Zhou R, Zou W, Yang Z, Li Q, Chen Z, Jiang L, Zhang J. Machine learning-identified stemness features and constructed stemness-related subtype with prognosis, chemotherapy, and immunotherapy responses for non-small cell lung cancer patients. Stem Cell Res Ther 2023; 14:238. [PMID: 37674202 PMCID: PMC10483786 DOI: 10.1186/s13287-023-03406-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2022] [Accepted: 06/27/2023] [Indexed: 09/08/2023] Open
Abstract
AIM This study aimed to explore a novel subtype classification method based on the stemness characteristics of patients with non-small cell lung cancer (NSCLC). METHODS Based on the Cancer Genome Atlas database to calculate the stemness index (mRNAsi) of NSCLC patients, an unsupervised consensus clustering method was used to classify patients into two subtypes and analyze the survival differences, somatic mutational load, copy number variation, and immune characteristics differences between them. Subsequently, four machine learning methods were used to construct and validate a stemness subtype classification model, and cell function experiments were performed to verify the effect of the signature gene ARTN on NSCLC. RESULTS Patients with Stemness Subtype I had better PFS and a higher somatic mutational burden and copy number alteration than patients with Stemness Subtype II. In addition, the two stemness subtypes have different patterns of tumor immune microenvironment. The immune score and stromal score and overall score of Stemness Subtype II were higher than those of Stemness Subtype I, suggesting a relatively small benefit to immune checkpoints. Four machine learning methods constructed and validated classification model for stemness subtypes and obtained multiple logistic regression equations for 22 characteristic genes. The results of cell function experiments showed that ARTN can promote the proliferation, invasion, and migration of NSCLC and is closely related to cancer stem cell properties. CONCLUSION This new classification method based on stemness characteristics can effectively distinguish patients' characteristics and thus provide possible directions for the selection and optimization of clinical treatment plans.
Collapse
Affiliation(s)
- Mingshan Liu
- Department of Thoracic Surgery, The First Affiliated Hospital of Nanchang University, Nanchang, 330006, Jiangxi, China
- Jiangxi Hospital of China-Japan Friendship Hospital, National Regional Center for Respiratory Medicine Nanchang, Jiangxi, 330000, People's Republic of China
| | - Ruihao Zhou
- Department of Anesthesiology, West China Hospital, Sichuan University, Chengdu, 610041, Sichuan Province, People's Republic of China
| | - Wei Zou
- Department of Thoracic Surgery, The First Affiliated Hospital of Nanchang University, Nanchang, 330006, Jiangxi, China
- Jiangxi Hospital of China-Japan Friendship Hospital, National Regional Center for Respiratory Medicine Nanchang, Jiangxi, 330000, People's Republic of China
| | - Zhuofan Yang
- Department of Thoracic Surgery, The First Affiliated Hospital of Nanchang University, Nanchang, 330006, Jiangxi, China
- Jiangxi Hospital of China-Japan Friendship Hospital, National Regional Center for Respiratory Medicine Nanchang, Jiangxi, 330000, People's Republic of China
| | - Quanjin Li
- Department of Thoracic Surgery, The First Affiliated Hospital of Nanchang University, Nanchang, 330006, Jiangxi, China
- Jiangxi Hospital of China-Japan Friendship Hospital, National Regional Center for Respiratory Medicine Nanchang, Jiangxi, 330000, People's Republic of China
| | - Zhiguo Chen
- Department of Thoracic Surgery, The First Affiliated Hospital of Nanchang University, Nanchang, 330006, Jiangxi, China
- Jiangxi Hospital of China-Japan Friendship Hospital, National Regional Center for Respiratory Medicine Nanchang, Jiangxi, 330000, People's Republic of China
| | - Lei Jiang
- Department of Thoracic Surgery, The First Affiliated Hospital of Nanchang University, Nanchang, 330006, Jiangxi, China.
- Jiangxi Hospital of China-Japan Friendship Hospital, National Regional Center for Respiratory Medicine Nanchang, Jiangxi, 330000, People's Republic of China.
| | - Jingtao Zhang
- Department of Thoracic Surgery, The First Affiliated Hospital of Nanchang University, Nanchang, 330006, Jiangxi, China.
- Jiangxi Hospital of China-Japan Friendship Hospital, National Regional Center for Respiratory Medicine Nanchang, Jiangxi, 330000, People's Republic of China.
| |
Collapse
|
46
|
Dunne R, Reguant R, Ramarao-Milne P, Szul P, Sng LM, Lundberg M, Twine NA, Bauer DC. Thresholding Gini variable importance with a single-trained random forest: An empirical Bayes approach. Comput Struct Biotechnol J 2023; 21:4354-4360. [PMID: 37711185 PMCID: PMC10497997 DOI: 10.1016/j.csbj.2023.08.033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2023] [Revised: 08/30/2023] [Accepted: 08/30/2023] [Indexed: 09/16/2023] Open
Abstract
Random forests (RFs) are a widely used modelling tool capable of feature selection via a variable importance measure (VIM), however, a threshold is needed to control for false positives. In the absence of a good understanding of the characteristics of VIMs, many current approaches attempt to select features associated to the response by training multiple RFs to generate statistical power via a permutation null, by employing recursive feature elimination, or through a combination of both. However, for high-dimensional datasets these approaches become computationally infeasible. In this paper, we present RFlocalfdr, a statistical approach, built on the empirical Bayes argument of Efron, for thresholding mean decrease in impurity (MDI) importances. It identifies features significantly associated with the response while controlling the false positive rate. Using synthetic data and real-world data in health, we demonstrate that RFlocalfdr has equivalent accuracy to currently published approaches, while being orders of magnitude faster. We show that RFlocalfdr can successfully threshold a dataset of 106 datapoints, establishing its usability for large-scale datasets, like genomics. Furthermore, RFlocalfdr is compatible with any RF implementation that returns a VIM and counts, making it a versatile feature selection tool that reduces false discoveries.
Collapse
Affiliation(s)
- Robert Dunne
- Data61, Commonwealth Scientific and Industrial Research Organisation, Sydney, Australia
| | - Roc Reguant
- Transformational Bioinformatics, Commonwealth Scientific and Industrial Research Organisation, Westmead, Australia
| | - Priya Ramarao-Milne
- Transformational Bioinformatics, Commonwealth Scientific and Industrial Research Organisation, Westmead, Australia
| | - Piotr Szul
- Data61, Commonwealth Scientific and Industrial Research Organisation, Dutton Park, Australia
| | - Letitia M.F. Sng
- Transformational Bioinformatics, Commonwealth Scientific and Industrial Research Organisation, Westmead, Australia
| | - Mischa Lundberg
- Transformational Bioinformatics, Commonwealth Scientific and Industrial Research Organisation, Westmead, Australia
- Diamantina Institute, The University of Queensland, St Lucia, Australia
| | - Natalie A. Twine
- Transformational Bioinformatics, Commonwealth Scientific and Industrial Research Organisation, Westmead, Australia
- Macquarie University, Applied BioSciences, Faculty of Science and Engineering, Macquarie Park, Australia
| | - Denis C. Bauer
- Transformational Bioinformatics, Commonwealth Scientific and Industrial Research Organisation, Westmead, Australia
- Macquarie University, Applied BioSciences, Faculty of Science and Engineering, Macquarie Park, Australia
- Macquarie University, Department of Biomedical Sciences, Faculty of Medicine and Health Science, Macquarie Park, Australia
| |
Collapse
|
47
|
Lin Y, Jing X, Chen Z, Pan X, Xu D, Yu X, Zhong F, Zhao L, Yang C, Wang B, Wang S, Ye Y, Shen Z. Histone deacetylase-mediated tumor microenvironment characteristics and synergistic immunotherapy in gastric cancer. Theranostics 2023; 13:4574-4600. [PMID: 37649598 PMCID: PMC10465215 DOI: 10.7150/thno.86928] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Accepted: 08/07/2023] [Indexed: 09/01/2023] Open
Abstract
Background: Studies have shown that the expression of histone deacetylases (HDACs) is significantly related to the tumor microenvironment (TME) in gastric cancer. However, the expression of a single molecule or several molecules does not accurately reflect the TME characteristics or guide immunotherapy in gastric cancer. Methods: We constructed an HDAC score (HDS) based on the expression level of HDACs. The single-cell transcriptome was used to analyze the underlying factors contributing to differences in immune infiltration between patients with a high and low HDS. In vitro and in vivo experiments validated the strategy of transforming cold tumors into hot tumors to guide immunotherapy. Results: According to the expression characteristics of HDACs, we constructed an HDS model to characterize the TME. We found that patients with a high HDS had stronger immunogenicity and could benefit more from immunotherapy than those with a low score. The AUC value of the HDS combined with the combined positive score (CPS)for predicting the efficacy of immunotherapy was as high as 0.96. By single-cell and paired bulk transcriptome sequencing analysis, we found that the infiltration levels of CD4+ T cells, CD8+ T cells and NK cells were significantly decreased in the low HDS group, which may be induced by MYH11+ fibroblasts, CD234+ endothelial cells and CCL17+ pDCs via the MIF signaling pathway. Inhibition of the MIF signaling pathway was confirmed to potentially enhance immune infiltration. In addition, our analysis revealed that GPX4 inhibitors might be effective for patients with a low HDS. GPX4 knockout significantly inhibited PD-L1 expression and promoted the infiltration and activation of CD8+ T cells. Conclusion: We constructed an HDS model based on the HDAC expression characteristics of gastric cancer. This model was used to evaluate TME characteristics and predict immunotherapy efficacy. Inhibition of the MIF signaling pathway in the TME and GPX4 expression in tumor cells may be an important strategy for cold tumor synergistic immunotherapy for gastric cancer.
Collapse
Affiliation(s)
- Yilin Lin
- Department of Gastroenterological Surgery, Peking University People's Hospital, Beijing 100044, PR China
- Laboratory of Surgical Oncology, Beijing Key Laboratory of Colorectal Cancer Diagnosis and Treatment Research, Peking University People's Hospital, Beijing 100044, PR China
| | - Xiangxiang Jing
- Department of Gastroenterological Surgery, Peking University People's Hospital, Beijing 100044, PR China
- Laboratory of Surgical Oncology, Beijing Key Laboratory of Colorectal Cancer Diagnosis and Treatment Research, Peking University People's Hospital, Beijing 100044, PR China
| | - Zhihua Chen
- Department of Gastrointestinal surgery, The First Affiliated Hospital of Fujian Medical University, Fuzhou, Fujian, 350000, PR China
| | - Xiaoxian Pan
- Department of Radiotherapy, The First Affiliated Hospital of Fujian Medical University, Fuzhou, Fujian, 350000, PR China
| | - Duo Xu
- Department of Gastroenterological Surgery, Peking University People's Hospital, Beijing 100044, PR China
- Laboratory of Surgical Oncology, Beijing Key Laboratory of Colorectal Cancer Diagnosis and Treatment Research, Peking University People's Hospital, Beijing 100044, PR China
| | - Xiang Yu
- Department of Gastroenterological Surgery, Peking University People's Hospital, Beijing 100044, PR China
- Laboratory of Surgical Oncology, Beijing Key Laboratory of Colorectal Cancer Diagnosis and Treatment Research, Peking University People's Hospital, Beijing 100044, PR China
| | - Fengyun Zhong
- Department of Gastroenterological Surgery, Peking University People's Hospital, Beijing 100044, PR China
- Laboratory of Surgical Oncology, Beijing Key Laboratory of Colorectal Cancer Diagnosis and Treatment Research, Peking University People's Hospital, Beijing 100044, PR China
| | - Long Zhao
- Department of Gastroenterological Surgery, Peking University People's Hospital, Beijing 100044, PR China
- Laboratory of Surgical Oncology, Beijing Key Laboratory of Colorectal Cancer Diagnosis and Treatment Research, Peking University People's Hospital, Beijing 100044, PR China
| | - Changjiang Yang
- Department of Gastroenterological Surgery, Peking University People's Hospital, Beijing 100044, PR China
- Laboratory of Surgical Oncology, Beijing Key Laboratory of Colorectal Cancer Diagnosis and Treatment Research, Peking University People's Hospital, Beijing 100044, PR China
| | - Bo Wang
- Department of Gastroenterological Surgery, Peking University People's Hospital, Beijing 100044, PR China
- Laboratory of Surgical Oncology, Beijing Key Laboratory of Colorectal Cancer Diagnosis and Treatment Research, Peking University People's Hospital, Beijing 100044, PR China
| | - Shan Wang
- Department of Gastroenterological Surgery, Peking University People's Hospital, Beijing 100044, PR China
- Laboratory of Surgical Oncology, Beijing Key Laboratory of Colorectal Cancer Diagnosis and Treatment Research, Peking University People's Hospital, Beijing 100044, PR China
| | - Yingjiang Ye
- Department of Gastroenterological Surgery, Peking University People's Hospital, Beijing 100044, PR China
- Laboratory of Surgical Oncology, Beijing Key Laboratory of Colorectal Cancer Diagnosis and Treatment Research, Peking University People's Hospital, Beijing 100044, PR China
| | - Zhanlong Shen
- Department of Gastroenterological Surgery, Peking University People's Hospital, Beijing 100044, PR China
- Laboratory of Surgical Oncology, Beijing Key Laboratory of Colorectal Cancer Diagnosis and Treatment Research, Peking University People's Hospital, Beijing 100044, PR China
| |
Collapse
|
48
|
Liu C, Mokashi NV, Darville T, Sun X, O’Connell CM, Hufnagel K, Waterboer T, Zheng X. A Machine Learning-Based Analytic Pipeline Applied to Clinical and Serum IgG Immunoproteome Data To Predict Chlamydia trachomatis Genital Tract Ascension and Incident Infection in Women. Microbiol Spectr 2023; 11:e0468922. [PMID: 37318345 PMCID: PMC10434056 DOI: 10.1128/spectrum.04689-22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Accepted: 06/01/2023] [Indexed: 06/16/2023] Open
Abstract
We developed a reusable and open-source machine learning (ML) pipeline that can provide an analytical framework for rigorous biomarker discovery. We implemented the ML pipeline to determine the predictive potential of clinical and immunoproteome antibody data for outcomes associated with Chlamydia trachomatis (Ct) infection collected from 222 cis-gender females with high Ct exposure. We compared the predictive performance of 4 ML algorithms (naive Bayes, random forest, extreme gradient boosting with linear booster [xgbLinear], and k-nearest neighbors [KNN]), screened from 215 ML methods, in combination with two different feature selection strategies, Boruta and recursive feature elimination. Recursive feature elimination performed better than Boruta in this study. In prediction of Ct ascending infection, naive Bayes yielded a slightly higher median value of are under the receiver operating characteristic curve (AUROC) 0.57 (95% confidence interval [CI], 0.54 to 0.59) than other methods and provided biological interpretability. For prediction of incident infection among women uninfected at enrollment, KNN performed slightly better than other algorithms, with a median AUROC of 0.61 (95% CI, 0.49 to 0.70). In contrast, xgbLinear and random forest had higher predictive performances, with median AUROC of 0.63 (95% CI, 0.58 to 0.67) and 0.62 (95% CI, 0.58 to 0.64), respectively, for women infected at enrollment. Our findings suggest that clinical factors and serum anti-Ct protein IgGs are inadequate biomarkers for ascension or incident Ct infection. Nevertheless, our analysis highlights the utility of a pipeline that searches for biomarkers and evaluates prediction performance and interpretability. IMPORTANCE Biomarker discovery to aid early diagnosis and treatment using machine learning (ML) approaches is a rapidly developing area in host-microbe studies. However, lack of reproducibility and interpretability of ML-driven biomarker analysis hinders selection of robust biomarkers that can be applied in clinical practice. We thus developed a rigorous ML analytical framework and provide recommendations for enhancing reproducibility of biomarkers. We emphasize the importance of robustness in selection of ML methods, evaluation of performance, and interpretability of biomarkers. Our ML pipeline is reusable and open-source and can be used not only to identify host-pathogen interaction biomarkers but also in microbiome studies and ecological and environmental microbiology research.
Collapse
Affiliation(s)
- Chuwen Liu
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Neha Vivek Mokashi
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
- Department of Pediatrics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Toni Darville
- Department of Pediatrics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Xuejun Sun
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Catherine M. O’Connell
- Department of Pediatrics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Katrin Hufnagel
- Infections and Cancer Epidemiology, German Cancer Research Center (Deutsches Krebsforschungszentrum), Heidelberg, Germany
| | - Tim Waterboer
- Infections and Cancer Epidemiology, German Cancer Research Center (Deutsches Krebsforschungszentrum), Heidelberg, Germany
| | - Xiaojing Zheng
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
- Department of Pediatrics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| |
Collapse
|
49
|
Young T, Laroche O, Walker SP, Miller MR, Casanovas P, Steiner K, Esmaeili N, Zhao R, Bowman JP, Wilson R, Bridle A, Carter CG, Nowak BF, Alfaro AC, Symonds JE. Prediction of Feed Efficiency and Performance-Based Traits in Fish via Integration of Multiple Omics and Clinical Covariates. BIOLOGY 2023; 12:1135. [PMID: 37627019 PMCID: PMC10452023 DOI: 10.3390/biology12081135] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Revised: 08/07/2023] [Accepted: 08/08/2023] [Indexed: 08/27/2023]
Abstract
Fish aquaculture is a rapidly expanding global industry, set to support growing demands for sources of marine protein. Enhancing feed efficiency (FE) in farmed fish is required to reduce production costs and improve sector sustainability. Recognising that organisms are complex systems whose emerging phenotypes are the product of multiple interacting molecular processes, systems-based approaches are expected to deliver new biological insights into FE and growth performance. Here, we establish 14 diverse layers of multi-omics and clinical covariates to assess their capacities to predict FE and associated performance traits in a fish model (Oncorhynchus tshawytscha) and uncover the influential variables. Inter-omic relatedness between the different layers revealed several significant concordances, particularly between datasets originating from similar material/tissue and between blood indicators and some of the proteomic (liver), metabolomic (liver), and microbiomic layers. Single- and multi-layer random forest (RF) regression models showed that integration of all data layers provide greater FE prediction power than any single-layer model alone. Although FE was among the most challenging of the traits we attempted to predict, the mean accuracy of 40 different FE models in terms of root-mean square errors normalized to percentage was 30.4%, supporting RF as a feature selection tool and approach for complex trait prediction. Major contributions to the integrated FE models were derived from layers of proteomic and metabolomic data, with substantial influence also provided by the lipid composition layer. A correlation matrix of the top 27 variables in the models highlighted FE trait-associations with faecal bacteria (Serratia spp.), palmitic and nervonic acid moieties in whole body lipids, levels of free glycerol in muscle, and N-acetylglutamic acid content in liver. In summary, we identified subsets of molecular characteristics for the assessment of commercially relevant performance-based metrics in farmed Chinook salmon.
Collapse
Affiliation(s)
- Tim Young
- Aquaculture Biotechnology Research Group, Department of Environmental Science, School of Science, Private Bag 92006, Auckland 1142, New Zealand
- The Centre for Biomedical and Chemical Sciences, School of Science, Auckland University of Technology, Private Bag 92006, Auckland 1142, New Zealand
| | | | | | - Matthew R. Miller
- Cawthron Institute, Nelson 7010, New Zealand
- Institute for Marine and Antarctic Studies, University of Tasmania, Hobart Private Bag 49, Hobart 7005, Australia
| | | | | | - Noah Esmaeili
- Institute for Marine and Antarctic Studies, University of Tasmania, Hobart Private Bag 49, Hobart 7005, Australia
| | - Ruixiang Zhao
- Institute for Marine and Antarctic Studies, University of Tasmania, Hobart Private Bag 49, Hobart 7005, Australia
| | - John P. Bowman
- Tasmanian Institute of Agricultural Research, University of Tasmania, Hobart 7005, Australia
| | - Richard Wilson
- Central Science Laboratory, Research Division, University of Tasmania, Hobart 7001, Australia
| | - Andrew Bridle
- Institute for Marine and Antarctic Studies, University of Tasmania, Hobart Private Bag 49, Hobart 7005, Australia
| | - Chris G. Carter
- Institute for Marine and Antarctic Studies, University of Tasmania, Hobart Private Bag 49, Hobart 7005, Australia
- Blue Economy Cooperative Research Centre, Launceston 7250, Australia
| | - Barbara F. Nowak
- Institute for Marine and Antarctic Studies, University of Tasmania, Hobart Private Bag 49, Hobart 7005, Australia
| | - Andrea C. Alfaro
- Aquaculture Biotechnology Research Group, Department of Environmental Science, School of Science, Private Bag 92006, Auckland 1142, New Zealand
| | - Jane E. Symonds
- Cawthron Institute, Nelson 7010, New Zealand
- Institute for Marine and Antarctic Studies, University of Tasmania, Hobart Private Bag 49, Hobart 7005, Australia
| |
Collapse
|
50
|
Hamidi F, Gilani N, Arabi Belaghi R, Yaghoobi H, Babaei E, Sarbakhsh P, Malakouti J. Identifying potential circulating miRNA biomarkers for the diagnosis and prediction of ovarian cancer using machine-learning approach: application of Boruta. Front Digit Health 2023; 5:1187578. [PMID: 37621964 PMCID: PMC10445490 DOI: 10.3389/fdgth.2023.1187578] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Accepted: 07/20/2023] [Indexed: 08/26/2023] Open
Abstract
Introduction In gynecologic oncology, ovarian cancer is a great clinical challenge. Because of the lack of typical symptoms and effective biomarkers for noninvasive screening, most patients develop advanced-stage ovarian cancer by the time of diagnosis. MicroRNAs (miRNAs) are a type of non-coding RNA molecule that has been linked to human cancers. Specifying diagnostic biomarkers to determine non-cancer and cancer samples is difficult. Methods By using Boruta, a novel random forest-based feature selection in the machine-learning techniques, we aimed to identify biomarkers associated with ovarian cancer using cancerous and non-cancer samples from the Gene Expression Omnibus (GEO) database: GSE106817. In this study, we used two independent GEO data sets as external validation, including GSE113486 and GSE113740. We utilized five state-of-the-art machine-learning algorithms for classification: logistic regression, random forest, decision trees, artificial neural networks, and XGBoost. Results Four models discovered in GSE113486 had an AUC of 100%, three in GSE113740 with AUC of over 94%, and four in GSE113486 with AUC of over 94%. We identified 10 miRNAs to distinguish ovarian cancer cases from normal controls: hsa-miR-1290, hsa-miR-1233-5p, hsa-miR-1914-5p, hsa-miR-1469, hsa-miR-4675, hsa-miR-1228-5p, hsa-miR-3184-5p, hsa-miR-6784-5p, hsa-miR-6800-5p, and hsa-miR-5100. Our findings suggest that miRNAs could be used as possible biomarkers for ovarian cancer screening, for possible intervention.
Collapse
Affiliation(s)
- Farzaneh Hamidi
- Department of Statistics and Epidemiology, Faculty of Health, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Neda Gilani
- Department of Statistics and Epidemiology, Faculty of Health, Tabriz University of Medical Sciences, Tabriz, Iran
- Road Traffic Injury Research Center, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Reza Arabi Belaghi
- Department of Mathematics, Applied Mathematics and Statistics, Uppsala University, Uppsala, Sweden
- Department of Statistics, Faculty of Mathematical Science, University of Tabriz, Tabriz, Iran
- Department of Energy and Technology, Swedish Agricultural University, Uppsala, Sweden
| | - Hanif Yaghoobi
- Department of Biological Sciences, School of Natural Sciences, University of Tabriz, Tabriz, Iran
| | - Esmaeil Babaei
- Department of Biological Sciences, School of Natural Sciences, University of Tabriz, Tabriz, Iran
- Interfaculty Institute for Bioinformatics and Medical Informatics (IBMI), University of Tübingen, Tübingen, Germany
| | - Parvin Sarbakhsh
- Department of Statistics and Epidemiology, Faculty of Health, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Jamileh Malakouti
- Department of Midwifery, Faculty of Nursing and Midwifery, Tabriz University of Medical Science, Tabriz, Iran
| |
Collapse
|