1
|
Mouskeftara T, Deda O, Liapikos T, Panteris E, Karagiannidis E, Papazoglou AS, Gika H. Lipidomic-Based Algorithms Can Enhance Prediction of Obstructive Coronary Artery Disease. J Proteome Res 2024; 23:3598-3611. [PMID: 39008891 DOI: 10.1021/acs.jproteome.4c00249] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/17/2024]
Abstract
Lipidomics emerges as a promising research field with the potential to help in personalized risk stratification and improve our understanding on the functional role of individual lipid species in the metabolic perturbations occurring in coronary artery disease (CAD). This study aimed to utilize a machine learning approach to provide a lipid panel able to identify patients with obstructive CAD. In this posthoc analysis of the prospective CorLipid trial, we investigated the lipid profiles of 146 patients with suspected CAD, divided into two categories based on the existence of obstructive CAD. In total, 517 lipid species were identified, from which 288 lipid species were finally quantified, including glycerophospholipids, glycerolipids, and sphingolipids. Univariate and multivariate statistical analyses have shown significant discrimination between the serum lipidomes of patients with obstructive CAD. Finally, the XGBoost algorithm identified a panel of 17 serum biomarkers (5 sphingolipids, 7 glycerophospholipids, a triacylglycerol, galectin-3, glucose, LDL, and LDH) as totally sensitive (100% sensitivity, 62.1% specificity, 100% negative predictive value) for the prediction of obstructive CAD. Our findings shed light on dysregulated lipid metabolism's role in CAD, validating existing evidence and suggesting promise for novel therapies and improved risk stratification.
Collapse
Affiliation(s)
- Thomai Mouskeftara
- Laboratory of Forensic Medicine and Toxicology, School of Medicine, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece
- Biomic_AUTh, CIRI-AUTH Center for Interdisciplinary Research and Innovation Aristotle University of Thessaloniki, 57001 Thessaloniki, Greece
| | - Olga Deda
- Laboratory of Forensic Medicine and Toxicology, School of Medicine, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece
- Biomic_AUTh, CIRI-AUTH Center for Interdisciplinary Research and Innovation Aristotle University of Thessaloniki, 57001 Thessaloniki, Greece
| | - Theodoros Liapikos
- Biomic_AUTh, CIRI-AUTH Center for Interdisciplinary Research and Innovation Aristotle University of Thessaloniki, 57001 Thessaloniki, Greece
| | - Eleftherios Panteris
- Biomic_AUTh, CIRI-AUTH Center for Interdisciplinary Research and Innovation Aristotle University of Thessaloniki, 57001 Thessaloniki, Greece
| | - Efstratios Karagiannidis
- Second Department of Cardiology, General Hospital "Hippokration", Aristotle University of Thessaloniki, Konstantinoupoleos 49, 54642 Thessaloniki, Greece
| | | | - Helen Gika
- Laboratory of Forensic Medicine and Toxicology, School of Medicine, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece
- Biomic_AUTh, CIRI-AUTH Center for Interdisciplinary Research and Innovation Aristotle University of Thessaloniki, 57001 Thessaloniki, Greece
| |
Collapse
|
2
|
Joyce T, Tasci E, Jagasia S, Shephard J, Chappidi S, Zhuge Y, Zhang L, Cooley Zgela T, Sproull M, Mackey M, Camphausen K, Krauze AV. Serum CD133-Associated Proteins Identified by Machine Learning Are Connected to Neural Development, Cancer Pathways, and 12-Month Survival in Glioblastoma. Cancers (Basel) 2024; 16:2740. [PMID: 39123468 PMCID: PMC11311306 DOI: 10.3390/cancers16152740] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2024] [Revised: 07/24/2024] [Accepted: 07/26/2024] [Indexed: 08/12/2024] Open
Abstract
Glioma is the most prevalent type of primary central nervous system cancer, while glioblastoma (GBM) is its most aggressive variant, with a median survival of only 15 months when treated with maximal surgical resection followed by chemoradiation therapy (CRT). CD133 is a potentially significant GBM biomarker. However, current clinical biomarker studies rely on invasive tissue samples. These make prolonged data acquisition impossible, resulting in increased interest in the use of liquid biopsies. Our study, analyzed 7289 serum proteins from 109 patients with pathology-proven GBM obtained prior to CRT using the aptamer-based SOMAScan® proteomic assay technology. We developed a novel methodology that identified 24 proteins linked to both serum CD133 and 12-month overall survival (OS) through a multi-step machine learning (ML) analysis. These identified proteins were subsequently subjected to survival and clustering evaluations, categorizing patients into five risk groups that accurately predicted 12-month OS based on their protein profiles. Most of these proteins are involved in brain function, neural development, and/or cancer biology signaling, highlighting their significance and potential predictive value. Identifying these proteins provides a valuable foundation for future serum investigations as validation of clinically applicable GBM biomarkers can unlock immense potential for diagnostics and treatment monitoring.
Collapse
Affiliation(s)
- Thomas Joyce
- Radiation Oncology Branch, Center for Cancer Research, National Cancer Institute NIH, 9000 Rockville Pike, Bethesda, MD 20892, USA; (T.J.); (S.J.); (J.S.); (S.C.); (Y.Z.); (L.Z.); (T.C.Z.); (M.S.); (M.M.); (K.C.)
| | - Erdal Tasci
- Radiation Oncology Branch, Center for Cancer Research, National Cancer Institute NIH, 9000 Rockville Pike, Bethesda, MD 20892, USA; (T.J.); (S.J.); (J.S.); (S.C.); (Y.Z.); (L.Z.); (T.C.Z.); (M.S.); (M.M.); (K.C.)
| | - Sarisha Jagasia
- Radiation Oncology Branch, Center for Cancer Research, National Cancer Institute NIH, 9000 Rockville Pike, Bethesda, MD 20892, USA; (T.J.); (S.J.); (J.S.); (S.C.); (Y.Z.); (L.Z.); (T.C.Z.); (M.S.); (M.M.); (K.C.)
| | - Jason Shephard
- Radiation Oncology Branch, Center for Cancer Research, National Cancer Institute NIH, 9000 Rockville Pike, Bethesda, MD 20892, USA; (T.J.); (S.J.); (J.S.); (S.C.); (Y.Z.); (L.Z.); (T.C.Z.); (M.S.); (M.M.); (K.C.)
| | - Shreya Chappidi
- Radiation Oncology Branch, Center for Cancer Research, National Cancer Institute NIH, 9000 Rockville Pike, Bethesda, MD 20892, USA; (T.J.); (S.J.); (J.S.); (S.C.); (Y.Z.); (L.Z.); (T.C.Z.); (M.S.); (M.M.); (K.C.)
- Department of Computer Science and Technology, University of Cambridge, 15 JJ Thomson Ave, Cambridge CB3 0FD, UK
| | - Ying Zhuge
- Radiation Oncology Branch, Center for Cancer Research, National Cancer Institute NIH, 9000 Rockville Pike, Bethesda, MD 20892, USA; (T.J.); (S.J.); (J.S.); (S.C.); (Y.Z.); (L.Z.); (T.C.Z.); (M.S.); (M.M.); (K.C.)
| | - Longze Zhang
- Radiation Oncology Branch, Center for Cancer Research, National Cancer Institute NIH, 9000 Rockville Pike, Bethesda, MD 20892, USA; (T.J.); (S.J.); (J.S.); (S.C.); (Y.Z.); (L.Z.); (T.C.Z.); (M.S.); (M.M.); (K.C.)
| | - Theresa Cooley Zgela
- Radiation Oncology Branch, Center for Cancer Research, National Cancer Institute NIH, 9000 Rockville Pike, Bethesda, MD 20892, USA; (T.J.); (S.J.); (J.S.); (S.C.); (Y.Z.); (L.Z.); (T.C.Z.); (M.S.); (M.M.); (K.C.)
| | - Mary Sproull
- Radiation Oncology Branch, Center for Cancer Research, National Cancer Institute NIH, 9000 Rockville Pike, Bethesda, MD 20892, USA; (T.J.); (S.J.); (J.S.); (S.C.); (Y.Z.); (L.Z.); (T.C.Z.); (M.S.); (M.M.); (K.C.)
| | - Megan Mackey
- Radiation Oncology Branch, Center for Cancer Research, National Cancer Institute NIH, 9000 Rockville Pike, Bethesda, MD 20892, USA; (T.J.); (S.J.); (J.S.); (S.C.); (Y.Z.); (L.Z.); (T.C.Z.); (M.S.); (M.M.); (K.C.)
| | - Kevin Camphausen
- Radiation Oncology Branch, Center for Cancer Research, National Cancer Institute NIH, 9000 Rockville Pike, Bethesda, MD 20892, USA; (T.J.); (S.J.); (J.S.); (S.C.); (Y.Z.); (L.Z.); (T.C.Z.); (M.S.); (M.M.); (K.C.)
| | - Andra V. Krauze
- Radiation Oncology Branch, Center for Cancer Research, National Cancer Institute NIH, 9000 Rockville Pike, Bethesda, MD 20892, USA; (T.J.); (S.J.); (J.S.); (S.C.); (Y.Z.); (L.Z.); (T.C.Z.); (M.S.); (M.M.); (K.C.)
| |
Collapse
|
3
|
Villena OC, Arab A, Lippi CA, Ryan SJ, Johnson LR. Influence of environmental, geographic, socio-demographic, and epidemiological factors on presence of malaria at the community level in two continents. Sci Rep 2024; 14:16734. [PMID: 39030306 PMCID: PMC11271557 DOI: 10.1038/s41598-024-67452-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2024] [Accepted: 07/11/2024] [Indexed: 07/21/2024] Open
Abstract
The interactions of environmental, geographic, socio-demographic, and epidemiological factors in shaping mosquito-borne disease transmission dynamics are complex and changeable, influencing the abundance and distribution of vectors and the pathogens they transmit. In this study, 27 years of cross-sectional malaria survey data (1990-2017) were used to examine the effects of these factors on Plasmodium falciparum and Plasmodium vivax malaria presence at the community level in Africa and Asia. Monthly long-term, open-source data for each factor were compiled and analyzed using generalized linear models and classification and regression trees. Both temperature and precipitation exhibited unimodal relationships with malaria, with a positive effect up to a point after which a negative effect was observed as temperature and precipitation increased. Overall decline in malaria from 2000 to 2012 was well captured by the models, as was the resurgence after that. The models also indicated higher malaria in regions with lower economic and development indicators. Malaria is driven by a combination of environmental, geographic, socioeconomic, and epidemiological factors, and in this study, we demonstrated two approaches to capturing this complexity of drivers within models. Identifying these key drivers, and describing their associations with malaria, provides key information to inform planning and prevention strategies and interventions to reduce malaria burden.
Collapse
Affiliation(s)
- Oswaldo C Villena
- The Earth Commons Institute, Georgetown University, Washington, DC, 20057, USA.
| | - Ali Arab
- Department of Mathematics and Statistics, Georgetown University, Washington, DC, 20057, USA
| | - Catherine A Lippi
- Department of Geography, University of Florida, Gainesville, FL, 32611, USA
- Emerging Pathogens Institute, University of Florida, Gainesville, FL, USA
| | - Sadie J Ryan
- Department of Geography, University of Florida, Gainesville, FL, 32611, USA
- Emerging Pathogens Institute, University of Florida, Gainesville, FL, USA
- School of Life Sciences, University of KwaZulu-Natal, Durban, South Africa
| | - Leah R Johnson
- Department of Statistics, Virginia Tech, Blacksburg, VA, 24061, USA
- Computational Modeling and Data Analytics, Virginia Tech, Blacksburg, VA, 24061, USA
- Department of Biology, Virginia Tech, Blacksburg, VA, 24061, USA
| |
Collapse
|
4
|
Mahawan T, Luckett T, Mielgo Iza A, Pornputtapong N, Caamaño Gutiérrez E. Robust and consistent biomarker candidates identification by a machine learning approach applied to pancreatic ductal adenocarcinoma metastasis. BMC Med Inform Decis Mak 2024; 24:175. [PMID: 38902676 PMCID: PMC11191155 DOI: 10.1186/s12911-024-02578-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2024] [Accepted: 06/14/2024] [Indexed: 06/22/2024] Open
Abstract
BACKGROUND Machine Learning (ML) plays a crucial role in biomedical research. Nevertheless, it still has limitations in data integration and irreproducibility. To address these challenges, robust methods are needed. Pancreatic ductal adenocarcinoma (PDAC), a highly aggressive cancer with low early detection rates and survival rates, is used as a case study. PDAC lacks reliable diagnostic biomarkers, especially metastatic biomarkers, which remains an unmet need. In this study, we propose an ML-based approach for discovering disease biomarkers, apply it to the identification of a PDAC metastatic composite biomarker candidate, and demonstrate the advantages of harnessing data resources. METHODS We utilised primary tumour RNAseq data from five public repositories, pooling samples to maximise statistical power and integrating data by correcting for technical variance. Data were split into train and validation sets. The train dataset underwent variable selection via a 10-fold cross-validation process that combined three algorithms in 100 models per fold. Genes found in at least 80% of models and five folds were considered robust to build a consensus multivariate model. A random forest model was constructed using selected genes from the train dataset and tested in the validation set. We also assessed the goodness of prediction by recalibrating a model using only the validation data. The biological context and relevance of signals was explored through enrichment and pathway analyses using QIAGEN Ingenuity Pathway Analysis and GeneMANIA. RESULTS We developed a pipeline that can detect robust signatures to build composite biomarkers. We tested the pipeline in PDAC, exploiting transcriptomics data from different sources, proposing a composite biomarker candidate comprised of fifteen genes consistently selected that showed very promising predictive capability. Biological contextualisation revealed links with cancer progression and metastasis, underscoring their potential relevance. All code is available in GitHub. CONCLUSION This study establishes a robust framework for identifying composite biomarkers across various disease contexts. We demonstrate its potential by proposing a plausible composite biomarker candidate for PDAC metastasis. By reusing data from public repositories, we highlight the sustainability of our research and the wider applications of our pipeline. The preliminary findings shed light on a promising validation and application path.
Collapse
Affiliation(s)
- Tanakamol Mahawan
- Program in Bioinformatics and Computational Biology, Graduate School, Chulalongkorn University, Bangkok, Thailand
- Department of Biochemistry & System Biology, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool, UK
- Akkhraratchakumari Veterinary College, Walailak University, Nakhon Si Thammarat, Thailand
| | - Teifion Luckett
- Department of Molecular and Clinical Cancer Medicine, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool, UK
| | - Ainhoa Mielgo Iza
- Department of Molecular and Clinical Cancer Medicine, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool, UK
| | - Natapol Pornputtapong
- Department of Biochemistry and Microbiology, Faculty of Pharmaceutical Sciences, and Center of Excellence in Systems Biology, Faculty of Medicine, Chulalongkorn University, Bangkok, Thailand
| | - Eva Caamaño Gutiérrez
- Department of Biochemistry & System Biology, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool, UK.
- Computational Biology Facility, LIV-SRF, Faculty of Health and Life Sciences, University of Liverpool, Liverpool, UK.
| |
Collapse
|
5
|
Richardson E, Trevizani R, Greenbaum JA, Carter H, Nielsen M, Peters B. The receiver operating characteristic curve accurately assesses imbalanced datasets. PATTERNS (NEW YORK, N.Y.) 2024; 5:100994. [PMID: 39005487 PMCID: PMC11240176 DOI: 10.1016/j.patter.2024.100994] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Revised: 03/05/2024] [Accepted: 05/03/2024] [Indexed: 07/16/2024]
Abstract
Many problems in biology require looking for a "needle in a haystack," corresponding to a binary classification where there are a few positives within a much larger set of negatives, which is referred to as a class imbalance. The receiver operating characteristic (ROC) curve and the associated area under the curve (AUC) have been reported as ill-suited to evaluate prediction performance on imbalanced problems where there is more interest in performance on the positive minority class, while the precision-recall (PR) curve is preferable. We show via simulation and a real case study that this is a misinterpretation of the difference between the ROC and PR spaces, showing that the ROC curve is robust to class imbalance, while the PR curve is highly sensitive to class imbalance. Furthermore, we show that class imbalance cannot be easily disentangled from classifier performance measured via PR-AUC.
Collapse
Affiliation(s)
- Eve Richardson
- Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology, La Jolla, CA, USA
| | - Raphael Trevizani
- Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology, La Jolla, CA, USA
- Fiocruz Ceará, Fundação Oswaldo Cruz, Rua São José s/n, Precabura, Eusébio/CE, Brazil
| | - Jason A Greenbaum
- Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology, La Jolla, CA, USA
| | - Hannah Carter
- Department of Medicine, University of California, La Jolla, CA, USA
| | - Morten Nielsen
- Department of Health Technology, Section for Bioinformatics, Technical University of Denmark, Lyngby, Denmark
| | - Bjoern Peters
- Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology, La Jolla, CA, USA
| |
Collapse
|
6
|
Pham MP, Vu DD, Nguyen TT, Nguyen VS. Predictive ecological niche model for Cinnamomumparthenoxylon (Jack) Meisn. (Lauraceae) from Last Glacial Maximum to future in Vietnam. Biodivers Data J 2024; 12:e122325. [PMID: 38827585 PMCID: PMC11140409 DOI: 10.3897/bdj.12.e122325] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2024] [Accepted: 04/26/2024] [Indexed: 06/04/2024] Open
Abstract
Cinnamomumparthenoxylon (Jack) Meisn. is a tree in genus Cinnamomum that has been facing global threats due to forest degradation and habitat fragmentation. Many recent studies aim to describe habitats and assess population and species genetic diversity for species conservation by expanding afforestation models for this species. Understanding their current and future potential distribution plays a major role in guiding conservation efforts. Using five modern machine-learning algorithms available on Google Earth Engine helped us evaluate suitable habitats for the species. The results revealed that Random Forest (RF) had the highest accuracy for model comparison, outperforming Support Vector Machine (SVM), Classification and Regression Trees (CART), Gradient Boosting Decision Tree (GBDT) and Maximum Entropy (MaxEnt). The results also showed that the extremely suitable ecological areas for the species are mostly distributed in northern Vietnam, followed by the North Central Coast and the Central Highlands. Elevation, Temperature Annual Range and Mean Diurnal Range were the three most important parameters affecting the potential distribution of C.parthenoxylon. Evaluation of the impact of climate on its distribution under different climate scenarios in the past (Last Glacial Maximum and Mid-Holocene), in the present (Worldclim) and in the future (using four climate change scenarios: ACCESS, MIROC6, EC-Earth3-Veg and MRI-ESM2-0) revealed that of C.parthenoxylon would likely expand to the northeast, while a large area of central Vietnam will gradually lose its adaptive capacity by 2100.
Collapse
Affiliation(s)
- Mai-Phuong Pham
- Join Vietnam–Russia Tropical Science and Technology Research Center, Hanoi, Vietnam, Ha Noi, VietnamJoin Vietnam–Russia Tropical Science and Technology Research Center, Hanoi, VietnamHa NoiVietnam
- Graduate University of Science and Technology (GUST), Vietnam Academy of Science and Technology, Ha Noi, VietnamGraduate University of Science and Technology (GUST), Vietnam Academy of Science and TechnologyHa NoiVietnam
| | - Duy Dinh Vu
- Join Vietnam–Russia Tropical Science and Technology Research Center, Hanoi, Vietnam, Ha Noi, VietnamJoin Vietnam–Russia Tropical Science and Technology Research Center, Hanoi, VietnamHa NoiVietnam
| | - Thanh Tuan Nguyen
- Vietnam National University of Forestry at Dong Nai, Dong Nai, VietnamVietnam National University of Forestry at Dong NaiDong NaiVietnam
| | - Van Sinh Nguyen
- Institute of Ecology and Biological Resources, Vietnamese Academy of Science and Technologies, Hanoi, VietnamInstitute of Ecology and Biological Resources, Vietnamese Academy of Science and TechnologiesHanoiVietnam
| |
Collapse
|
7
|
Kougioumoutzis K, Constantinou I, Panitsa M. Rising Temperatures, Falling Leaves: Predicting the Fate of Cyprus's Endemic Oak under Climate and Land Use Change. PLANTS (BASEL, SWITZERLAND) 2024; 13:1109. [PMID: 38674518 PMCID: PMC11053427 DOI: 10.3390/plants13081109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/21/2024] [Revised: 04/11/2024] [Accepted: 04/14/2024] [Indexed: 04/28/2024]
Abstract
Endemic island species face heightened extinction risk from climate-driven shifts, yet standard models often underestimate threat levels for those like Quercus alnifolia, an iconic Cypriot oak with pre-adaptations to aridity. Through species distribution modelling, we investigated the potential shifts in its distribution under future climate and land-use change scenarios. Our approach uniquely combines dispersal constraints, detailed soil characteristics, hydrological factors, and anticipated soil erosion data, offering a comprehensive assessment of environmental suitability. We quantified the species' sensitivity, exposure, and vulnerability to projected changes, conducting a preliminary IUCN extinction risk assessment according to Criteria A and B. Our projections uniformly predict range reductions, with a median decrease of 67.8% by the 2070s under the most extreme scenarios. Additionally, our research indicates Quercus alnifolia's resilience to diverse erosion conditions and preference for relatively dry climates within a specific annual temperature range. The preliminary IUCN risk assessment designates Quercus alnifolia as Critically Endangered in the future, highlighting the need for focused conservation efforts. Climate and land-use changes are critical threats to the species' survival, emphasising the importance of comprehensive modelling techniques and the urgent requirement for dedicated conservation measures to safeguard this iconic species.
Collapse
Affiliation(s)
| | | | - Maria Panitsa
- Laboratory of Botany, Department of Biology, University of Patras, 26504 Patras, Greece; (K.K.); (I.C.)
| |
Collapse
|
8
|
Beyene KM, Chen DG, Kifle YG. A novel nonparametric time-dependent precision-recall curve estimator for right-censored survival data. Biom J 2024; 66:e2300135. [PMID: 38637327 DOI: 10.1002/bimj.202300135] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2023] [Revised: 10/04/2023] [Accepted: 12/27/2023] [Indexed: 04/20/2024]
Abstract
In order to assess prognostic risk for individuals in precision health research, risk prediction models are increasingly used, in which statistical models are used to estimate the risk of future outcomes based on clinical and nonclinical characteristics. The predictive accuracy of a risk score must be assessed before it can be used in routine clinical decision making, where the receiver operator characteristic curves, precision-recall curves, and their corresponding area under the curves are commonly used metrics to evaluate the discriminatory ability of a continuous risk score. Among these the precision-recall curves have been shown to be more informative when dealing with unbalanced biomarker distribution between classes, which is common in rare event, even though except one, all existing methods are proposed for classic uncensored data. This paper is therefore to propose a novel nonparametric estimation approach for the time-dependent precision-recall curve and its associated area under the curve for right-censored data. A simulation is conducted to show the better finite sample property of the proposed estimator over the existing method and a real-world data from primary biliary cirrhosis trial is used to demonstrate the practical applicability of the proposed estimator.
Collapse
Affiliation(s)
- Kassu Mehari Beyene
- College of Health Solutions, Arizona State University, Phoenix, Arizona, USA
| | - Ding-Geng Chen
- College of Health Solutions, Arizona State University, Phoenix, Arizona, USA
- Department of Statistics, University of Pretoria, Pretoria, South Africa
| | - Yehenew Getachew Kifle
- Department of Mathematics and Statistics, University of Maryland Baltimore County, Baltimore, Maryland, USA
| |
Collapse
|
9
|
Wang K, Zeng X, Zhou J, Liu F, Luan X, Wang X. BERT-TFBS: a novel BERT-based model for predicting transcription factor binding sites by transfer learning. Brief Bioinform 2024; 25:bbae195. [PMID: 38701417 PMCID: PMC11066948 DOI: 10.1093/bib/bbae195] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Revised: 03/26/2024] [Accepted: 04/10/2024] [Indexed: 05/05/2024] Open
Abstract
Transcription factors (TFs) are proteins essential for regulating genetic transcriptions by binding to transcription factor binding sites (TFBSs) in DNA sequences. Accurate predictions of TFBSs can contribute to the design and construction of metabolic regulatory systems based on TFs. Although various deep-learning algorithms have been developed for predicting TFBSs, the prediction performance needs to be improved. This paper proposes a bidirectional encoder representations from transformers (BERT)-based model, called BERT-TFBS, to predict TFBSs solely based on DNA sequences. The model consists of a pre-trained BERT module (DNABERT-2), a convolutional neural network (CNN) module, a convolutional block attention module (CBAM) and an output module. The BERT-TFBS model utilizes the pre-trained DNABERT-2 module to acquire the complex long-term dependencies in DNA sequences through a transfer learning approach, and applies the CNN module and the CBAM to extract high-order local features. The proposed model is trained and tested based on 165 ENCODE ChIP-seq datasets. We conducted experiments with model variants, cross-cell-line validations and comparisons with other models. The experimental results demonstrate the effectiveness and generalization capability of BERT-TFBS in predicting TFBSs, and they show that the proposed model outperforms other deep-learning models. The source code for BERT-TFBS is available at https://github.com/ZX1998-12/BERT-TFBS.
Collapse
Affiliation(s)
- Kai Wang
- Key Laboratory of Advanced Process Control for Light Industry (Ministry of Education), School of Internet of Things Engineering, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China
| | - Xuan Zeng
- Key Laboratory of Advanced Process Control for Light Industry (Ministry of Education), School of Internet of Things Engineering, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China
| | - Jingwen Zhou
- Science Center for Future Foods, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China
- Key Laboratory of Industrial Biotechnology, Ministry of Education and School of Biotechnology, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China
- Engineering Research Center of Ministry of Education on Food Synthetic Biotechnology, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China
- Jiangsu Province Engineering Research Center of Food Synthetic Biotechnology, Jiangnan University, Wuxi 214122, China
| | - Fei Liu
- Key Laboratory of Advanced Process Control for Light Industry (Ministry of Education), School of Internet of Things Engineering, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China
| | - Xiaoli Luan
- Key Laboratory of Advanced Process Control for Light Industry (Ministry of Education), School of Internet of Things Engineering, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China
| | - Xinglong Wang
- Science Center for Future Foods, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China
- Key Laboratory of Industrial Biotechnology, Ministry of Education and School of Biotechnology, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China
| |
Collapse
|
10
|
Velásquez-López Y, Ruiz-Escudero A, Arrasate S, González-Díaz H. Implementation of IFPTML Computational Models in Drug Discovery Against Flaviviridae Family. J Chem Inf Model 2024; 64:1841-1852. [PMID: 38466369 PMCID: PMC10966645 DOI: 10.1021/acs.jcim.3c01796] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 02/26/2024] [Accepted: 02/27/2024] [Indexed: 03/13/2024]
Abstract
The Flaviviridae family consists of single-stranded positive-sense RNA viruses, which contains the genera Flavivirus, Hepacivirus, Pegivirus, and Pestivirus. Currently, there is an outbreak of viral diseases caused by this family affecting millions of people worldwide, leading to significant morbidity and mortality rates. Advances in computational chemistry have greatly facilitated the discovery of novel drugs and treatments for diseases associated with this family. Chemoinformatic techniques, such as the perturbation theory machine learning method, have played a crucial role in developing new approaches based on ML models that can effectively aid drug discovery. The IFPTML models have shown its capability to handle, classify, and process large data sets with high specificity. The results obtained from different models indicates that this methodology is proficient in processing the data, resulting in a reduction of the false positive rate by 4.25%, along with an accuracy of 83% and reliability of 92%. These values suggest that the model can serve as a computational tool in assisting drug discovery efforts and the development of new treatments against Flaviviridae family diseases.
Collapse
Affiliation(s)
- Yendrek Velásquez-López
- Departamento
de Química Orgánica e Inorgánica, Facultad de
Ciencia y Tecnología, Universidad
del País Vasco/Euskal Herriko Unibertsitatea UPV/EHU. Apdo. 644. 48080 Bilbao (Spain)
- Bio-Cheminformatics
Research Group, Universidad de Las Américas, Quito 170504, (Ecuador)
| | - Andrea Ruiz-Escudero
- Department
of Pharmacology, University of the Basque
Country UPV/EHU, 48940 Leioa, (Spain)
- IKERDATA
S.L., ZITEK, University of Basque Country
UPV/EHU, Rectorate Building, 48940 Leioa, Spain
| | - Sonia Arrasate
- Departamento
de Química Orgánica e Inorgánica, Facultad de
Ciencia y Tecnología, Universidad
del País Vasco/Euskal Herriko Unibertsitatea UPV/EHU. Apdo. 644. 48080 Bilbao (Spain)
| | - Humberto González-Díaz
- Departamento
de Química Orgánica e Inorgánica, Facultad de
Ciencia y Tecnología, Universidad
del País Vasco/Euskal Herriko Unibertsitatea UPV/EHU. Apdo. 644. 48080 Bilbao (Spain)
- BIOFISIKA, Basque
Center for Biophysics CSIC-UPV/EHU, 48940 Bilbao (Spain)
- IKERBASQUE, Basque Foundation for Science, 48011 Bilbao (Spain)
| |
Collapse
|
11
|
Usuzaki T, Takahashi K, Inamori R. Be Careful About Metrics When Imbalanced Data Is Used for a Deep Learning Model. Chest 2024; 165:e87-e89. [PMID: 38461027 DOI: 10.1016/j.chest.2023.10.039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2023] [Accepted: 10/20/2023] [Indexed: 03/11/2024] Open
Affiliation(s)
- Takuma Usuzaki
- Department of Diagnostic Radiology, Tohoku University Hospital, Sendai, Japan.
| | - Kengo Takahashi
- Department of Clinical Imaging, Graduate School of Medicine, Tohoku University, Sendai, Japan
| | - Ryusei Inamori
- Department of Clinical Imaging, Graduate School of Medicine, Tohoku University, Sendai, Japan
| |
Collapse
|
12
|
Atimbire SA, Appati JK, Owusu E. Empirical exploration of whale optimisation algorithm for heart disease prediction. Sci Rep 2024; 14:4530. [PMID: 38402276 PMCID: PMC10894250 DOI: 10.1038/s41598-024-54990-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Accepted: 02/19/2024] [Indexed: 02/26/2024] Open
Abstract
Heart Diseases have the highest mortality worldwide, necessitating precise predictive models for early risk assessment. Much existing research has focused on improving model accuracy with single datasets, often neglecting the need for comprehensive evaluation metrics and utilization of different datasets in the same domain (heart disease). This research introduces a heart disease risk prediction approach by harnessing the whale optimization algorithm (WOA) for feature selection and implementing a comprehensive evaluation framework. The study leverages five distinct datasets, including the combined dataset comprising the Cleveland, Long Beach VA, Switzerland, and Hungarian heart disease datasets. The others are the Z-AlizadehSani, Framingham, South African, and Cleveland heart datasets. The WOA-guided feature selection identifies optimal features, subsequently integrated into ten classification models. Comprehensive model evaluation reveals significant improvements across critical performance metrics, including accuracy, precision, recall, F1 score, and the area under the receiver operating characteristic curve. These enhancements consistently outperform state-of-the-art methods using the same dataset, validating the effectiveness of our methodology. The comprehensive evaluation framework provides a robust assessment of the model's adaptability, underscoring the WOA's effectiveness in identifying optimal features in multiple datasets in the same domain.
Collapse
Affiliation(s)
| | | | - Ebenezer Owusu
- Department of Computer Science, University of Ghana, Accra, Ghana
| |
Collapse
|
13
|
Cavaiola M, Cassola F, Sacchetti D, Ferrari F, Mazzino A. Hybrid AI-enhanced lightning flash prediction in the medium-range forecast horizon. Nat Commun 2024; 15:1188. [PMID: 38331837 PMCID: PMC10853497 DOI: 10.1038/s41467-024-44697-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Accepted: 12/27/2023] [Indexed: 02/10/2024] Open
Abstract
Traditional fully-deterministic algorithms, which rely on physical equations and mathematical models, are the backbone of many scientific disciplines for decades. These algorithms are based on well-established principles and laws of physics, enabling a systematic and predictable approach to problem-solving. On the other hand, AI-based strategies emerge as a powerful tool for handling vast amounts of data and extracting patterns and relationships that might be challenging to identify through traditional algorithms. Here, we bridge these two realms by using AI to find an optimal mapping of meteorological features predicted two days ahead by the state-of-the-art numerical weather prediction model by the European Centre for Medium-range Weather Forecasts (ECMWF) into lightning flash occurrence. The prediction capability of the resulting AI-enhanced algorithm turns out to be significantly higher than that of the fully-deterministic algorithm employed in the ECMWF model. A remarkable Recall peak of about 95% within the 0-24 h forecast interval is obtained. This performance surpasses the 85% achieved by the ECMWF model at the same Precision of the AI algorithm.
Collapse
Affiliation(s)
- Mattia Cavaiola
- DICCA, Department of Civil, Chemical and Environmental Engineering, Via Montallegro 1, Genova, 16145, Italy.
- INFN, Istituto Nazionale di Fisica Nucleare, Sezione di Genova, Via Dodecaneso 33, Genova, 16146, Italy.
- CNR - National Research Council of Italy, Institute of Marine Sciences, Via S.Teresa S/N, 19032, Pozzuolo di Lerici, La Spezia, Italy.
| | - Federico Cassola
- ARPAL, Regional Agency for Environmental Protection Liguria, Genova, Italy
| | - Davide Sacchetti
- ARPAL, Regional Agency for Environmental Protection Liguria, Genova, Italy
| | - Francesco Ferrari
- DICCA, Department of Civil, Chemical and Environmental Engineering, Via Montallegro 1, Genova, 16145, Italy
- INFN, Istituto Nazionale di Fisica Nucleare, Sezione di Genova, Via Dodecaneso 33, Genova, 16146, Italy
| | - Andrea Mazzino
- DICCA, Department of Civil, Chemical and Environmental Engineering, Via Montallegro 1, Genova, 16145, Italy.
- INFN, Istituto Nazionale di Fisica Nucleare, Sezione di Genova, Via Dodecaneso 33, Genova, 16146, Italy.
| |
Collapse
|
14
|
Usuzaki T, Takahashi K, Inamori R. Letter to the editor on "Automated classification of fat-infiltrated axillary lymph nodes on screening mammograms". Br J Radiol 2024; 97:479-480. [PMID: 38308039 DOI: 10.1093/bjr/tqad061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2023] [Accepted: 10/23/2023] [Indexed: 02/04/2024] Open
Affiliation(s)
- Takuma Usuzaki
- Department of Diagnostic Radiology, Tohoku University Hospital, Sendai 980-8574, Japan
| | - Kengo Takahashi
- Department of Clinical Imaging, Graduate School of Medicine, Tohoku University, Sendai 980-8574, Japan
| | - Ryusei Inamori
- Department of Clinical Imaging, Graduate School of Medicine, Tohoku University, Sendai 980-8574, Japan
| |
Collapse
|
15
|
Ray DD, Flagel L, Schrider DR. IntroUNET: Identifying introgressed alleles via semantic segmentation. PLoS Genet 2024; 20:e1010657. [PMID: 38377104 PMCID: PMC10906877 DOI: 10.1371/journal.pgen.1010657] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Revised: 03/01/2024] [Accepted: 01/29/2024] [Indexed: 02/22/2024] Open
Abstract
A growing body of evidence suggests that gene flow between closely related species is a widespread phenomenon. Alleles that introgress from one species into a close relative are typically neutral or deleterious, but sometimes confer a significant fitness advantage. Given the potential relevance to speciation and adaptation, numerous methods have therefore been devised to identify regions of the genome that have experienced introgression. Recently, supervised machine learning approaches have been shown to be highly effective for detecting introgression. One especially promising approach is to treat population genetic inference as an image classification problem, and feed an image representation of a population genetic alignment as input to a deep neural network that distinguishes among evolutionary models (i.e. introgression or no introgression). However, if we wish to investigate the full extent and fitness effects of introgression, merely identifying genomic regions in a population genetic alignment that harbor introgressed loci is insufficient-ideally we would be able to infer precisely which individuals have introgressed material and at which positions in the genome. Here we adapt a deep learning algorithm for semantic segmentation, the task of correctly identifying the type of object to which each individual pixel in an image belongs, to the task of identifying introgressed alleles. Our trained neural network is thus able to infer, for each individual in a two-population alignment, which of those individual's alleles were introgressed from the other population. We use simulated data to show that this approach is highly accurate, and that it can be readily extended to identify alleles that are introgressed from an unsampled "ghost" population, performing comparably to a supervised learning method tailored specifically to that task. Finally, we apply this method to data from Drosophila, showing that it is able to accurately recover introgressed haplotypes from real data. This analysis reveals that introgressed alleles are typically confined to lower frequencies within genic regions, suggestive of purifying selection, but are found at much higher frequencies in a region previously shown to be affected by adaptive introgression. Our method's success in recovering introgressed haplotypes in challenging real-world scenarios underscores the utility of deep learning approaches for making richer evolutionary inferences from genomic data.
Collapse
Affiliation(s)
- Dylan D. Ray
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Lex Flagel
- Division of Data Science, Gencove Inc., New York, New York, United States of America
- Department of Plant and Microbial Biology, University of Minnesota, Saint Paul, Minnesota, United States of America
| | - Daniel R. Schrider
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| |
Collapse
|
16
|
Acharya V, Choi D, Yener B, Beamer G. Prediction of Tuberculosis From Lung Tissue Images of Diversity Outbred Mice Using Jump Knowledge Based Cell Graph Neural Network. IEEE ACCESS : PRACTICAL INNOVATIONS, OPEN SOLUTIONS 2024; 12:17164-17194. [PMID: 38515959 PMCID: PMC10956573 DOI: 10.1109/access.2024.3359989] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 03/23/2024]
Abstract
Tuberculosis (TB), primarily affecting the lungs, is caused by the bacterium Mycobacterium tuberculosis and poses a significant health risk. Detecting acid-fast bacilli (AFB) in stained samples is critical for TB diagnosis. Whole Slide (WS) Imaging allows for digitally examining these stained samples. However, current deep-learning approaches to analyzing large-sized whole slide images (WSIs) often employ patch-wise analysis, potentially missing the complex spatial patterns observed in the granuloma essential for accurate TB classification. To address this limitation, we propose an approach that models cell characteristics and interactions as a graph, capturing both cell-level information and the overall tissue micro-architecture. This method differs from the strategies in related cell graph-based works that rely on edge thresholds based on sparsity/density in cell graph construction, emphasizing a biologically informed threshold determination instead. We introduce a cell graph-based jumping knowledge neural network (CG-JKNN) that operates on the cell graphs where the edge thresholds are selected based on the length of the mycobacteria's cords and the activated macrophage nucleus's size to reflect the actual biological interactions observed in the tissue. The primary process involves training a Convolutional Neural Network (CNN) to segment AFBs and macrophage nuclei, followed by converting large (42831*41159 pixels) lung histology images into cell graphs where an activated macrophage nucleus/AFB represents each node within the graph and their interactions are denoted as edges. To enhance the interpretability of our model, we employ Integrated Gradients and Shapely Additive Explanations (SHAP). Our analysis incorporated a combination of 33 graph metrics and 20 cell morphology features. In terms of traditional machine learning models, Extreme Gradient Boosting (XGBoost) was the best performer, achieving an F1 score of 0.9813 and an Area under the Precision-Recall Curve (AUPRC) of 0.9848 on the test set. Among graph-based models, our CG-JKNN was the top performer, attaining an F1 score of 0.9549 and an AUPRC of 0.9846 on the held-out test set. The integration of graph-based and morphological features proved highly effective, with CG-JKNN and XGBoost showing promising results in classifying instances into AFB and activated macrophage nucleus. The features identified as significant by our models closely align with the criteria used by pathologists in practice, highlighting the clinical applicability of our approach. Future work will explore knowledge distillation techniques and graph-level classification into distinct TB progression categories.
Collapse
Affiliation(s)
| | - Diana Choi
- Cummings School of Veterinary Medicine, Tufts University, North Grafton, MA 02155, USA
| | - BüLENT Yener
- Rensselaer Polytechnic Institute, Troy, NY 12180, USA
| | - Gillian Beamer
- Research Pathology, Aiforia Technologies, Cambridge, MA 02142, USA
- Texas Biomedical Research Institute, San Antonio, TX 78227, USA
| |
Collapse
|
17
|
Li D. Attention-enhanced architecture for improved pneumonia detection in chest X-ray images. BMC Med Imaging 2024; 24:6. [PMID: 38166579 PMCID: PMC10763425 DOI: 10.1186/s12880-023-01177-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Accepted: 12/07/2023] [Indexed: 01/04/2024] Open
Abstract
In this paper, we propose an attention-enhanced architecture for improved pneumonia detection in chest X-ray images. A unique attention mechanism is integrated with ResNet to highlight salient features crucial for pneumonia detection. Rigorous evaluation demonstrates that our attention mechanism significantly enhances pneumonia detection accuracy, achieving a satisfactory result of 96% accuracy. To address the issue of imbalanced training samples, we integrate an enhanced focal loss into our architecture. This approach assigns higher weights to minority classes during training, effectively mitigating data imbalance. Our model's performance significantly improves, surpassing that of traditional approaches such as the pretrained ResNet-50 model. Our attention-enhanced architecture thus presents a powerful solution for pneumonia detection in chest X-ray images, achieving an accuracy of 98%. By integrating enhanced focal loss, our approach effectively addresses imbalanced training sample. Comparative analysis underscores the positive impact of our model's spatial and channel attention modules. Overall, our study advances pneumonia detection in medical imaging and underscores the potential of attention-enhanced architectures for improved diagnostic accuracy and patient outcomes. Our findings offer valuable insights into image diagnosis and pneumonia prevention, contributing to future research in medical imaging and machine learning.
Collapse
Affiliation(s)
- Dikai Li
- Shenzhen Key Laboratory of Ultraintense Laser and Advanced Material Technology, Center for Advanced Material Diagnostic Technology, and College of Engineering Physics, Shenzhen Technology University, Lantian Road, Shenzhen, Guangdong, 518118, China.
| |
Collapse
|
18
|
Zhang L, Xu R, Zhao J. Learning technology for detection and grading of cancer tissue using tumour ultrasound images1. JOURNAL OF X-RAY SCIENCE AND TECHNOLOGY 2024; 32:157-171. [PMID: 37424493 DOI: 10.3233/xst-230085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/11/2023]
Abstract
BACKGROUND Early diagnosis of breast cancer is crucial to perform effective therapy. Many medical imaging modalities including MRI, CT, and ultrasound are used to diagnose cancer. OBJECTIVE This study aims to investigate feasibility of applying transfer learning techniques to train convoluted neural networks (CNNs) to automatically diagnose breast cancer via ultrasound images. METHODS Transfer learning techniques helped CNNs recognise breast cancer in ultrasound images. Each model's training and validation accuracies were assessed using the ultrasound image dataset. Ultrasound images educated and tested the models. RESULTS MobileNet had the greatest accuracy during training and DenseNet121 during validation. Transfer learning algorithms can detect breast cancer in ultrasound images. CONCLUSIONS Based on the results, transfer learning models may be useful for automated breast cancer diagnosis in ultrasound images. However, only a trained medical professional should diagnose cancer, and computational approaches should only be used to help make quick decisions.
Collapse
Affiliation(s)
- Liyan Zhang
- Department of Ultrasound, Sunshine Union Hospital, Weifang, China
| | - Ruiyan Xu
- College of Health, Binzhou Polytechnical College, Binzhou, China
| | - Jingde Zhao
- Department of Imaging, Qingdao Hospital of Traditional Chinese Medicine (Qingdao HaiCi Hospital), Qingdao, China
| |
Collapse
|
19
|
Zimmer SN, Holsinger KW, Dawson CA. A field-validated ensemble species distribution model of Eriogonum pelinophilum, an endangered subshrub in Colorado, USA. Ecol Evol 2023; 13:e10816. [PMID: 38107426 PMCID: PMC10721943 DOI: 10.1002/ece3.10816] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Revised: 10/10/2023] [Accepted: 11/27/2023] [Indexed: 12/19/2023] Open
Abstract
Understanding the suitable habitat of endangered species is crucial for agencies such as the Bureau of Land Management to plan management and conservation. However, few species distribution models are directly validated, potentially limiting their application in management. In preparation for a Species Status Assessment of clay-loving wild buckwheat (Eriogonum pelinophilum), an endangered subshrub found in southwest Colorado, we ran a series of species distribution models to estimate the species' potential occupied habitat and validated these models in the field. A 1-meter resolution digital elevation model derived from LiDAR and a high-resolution geology mapping helped identify biologically relevant characteristics of the species' habitat. We employed a weighted ensemble model based on two Random Forest and one Boosted Regression Tree model, and discrimination performance of the ensemble model was high (AUC-PR = 0.793). We then conducted a systematic field survey of model habitat suitability predictions, during which we discovered 55 new subpopulations of the species and demonstrated that new species observations were strongly associated with model predictions (p < .0001, Cliff's delta = 0.575). We further refined our original models by incorporating the additional species occurrences collected in the field survey, a new explanatory variable, and a more diverse set of models. These iterative changes marginally improved performance of the ensemble model (AUC-PR = 0.825). Direct validation of species distribution models is extremely rare, and our field survey provides strong validation of our model results. This helps increase confidence to utilize predictions in planning. The final model predictions greatly improve the Bureau of Land Management's understanding of the species' habitat and increase our ability to consider potential habitat in planning land use activities such as road development and travel management.
Collapse
Affiliation(s)
- Scott N. Zimmer
- Uncompahgre Field OfficeBureau of Land ManagementMontroseColoradoUSA
- Fire Sciences LaboratoryRocky Mountain Research Station, U.S. Forest ServiceMissoulaMontanaUSA
| | | | - Carol A. Dawson
- Colorado State OfficeBureau of Land ManagementLakewoodColoradoUSA
| |
Collapse
|
20
|
Mulhern RE, Kondash AJ, Norman E, Johnson J, Levine K, McWilliams A, Napier M, Weber F, Stella L, Wood E, Lee Pow Jackson C, Colley S, Cajka J, MacDonald Gibson J, Hoponick Redmon J. Improved Decision Making for Water Lead Testing in U.S. Child Care Facilities Using Machine-Learned Bayesian Networks. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2023; 57:17959-17970. [PMID: 36932953 PMCID: PMC10666530 DOI: 10.1021/acs.est.2c07477] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/11/2022] [Revised: 03/06/2023] [Accepted: 03/07/2023] [Indexed: 06/18/2023]
Abstract
Tap water lead testing programs in the U.S. need improved methods for identifying high-risk facilities to optimize limited resources. In this study, machine-learned Bayesian network (BN) models were used to predict building-wide water lead risk in over 4,000 child care facilities in North Carolina according to maximum and 90th percentile lead levels from water lead concentrations at 22,943 taps. The performance of the BN models was compared to common alternative risk factors, or heuristics, used to inform water lead testing programs among child care facilities including building age, water source, and Head Start program status. The BN models identified a range of variables associated with building-wide water lead, with facilities that serve low-income families, rely on groundwater, and have more taps exhibiting greater risk. Models predicting the probability of a single tap exceeding each target concentration performed better than models predicting facilities with clustered high-risk taps. The BN models' Fβ-scores outperformed each of the alternative heuristics by 118-213%. This represents up to a 60% increase in the number of high-risk facilities that could be identified and up to a 49% decrease in the number of samples that would need to be collected by using BN model-informed sampling compared to using simple heuristics. Overall, this study demonstrates the value of machine-learning approaches for identifying high water lead risk that could improve lead testing programs nationwide.
Collapse
Affiliation(s)
- Riley E. Mulhern
- RTI
International, Research
Triangle Park, North Carolina 27709, United States
| | - AJ Kondash
- RTI
International, Research
Triangle Park, North Carolina 27709, United States
| | - Ed Norman
- Environmental
Health Section, Division of Public Health, North Carolina Department of Health and Human Services, Raleigh, North Carolina 27609, United States
| | - Joseph Johnson
- RTI
International, Research
Triangle Park, North Carolina 27709, United States
| | - Keith Levine
- RTI
International, Research
Triangle Park, North Carolina 27709, United States
| | - Andrea McWilliams
- RTI
International, Research
Triangle Park, North Carolina 27709, United States
| | - Melanie Napier
- Environmental
Health Section, Division of Public Health, North Carolina Department of Health and Human Services, Raleigh, North Carolina 27609, United States
| | - Frank Weber
- RTI
International, Research
Triangle Park, North Carolina 27709, United States
| | - Laurie Stella
- RTI
International, Research
Triangle Park, North Carolina 27709, United States
| | - Erica Wood
- RTI
International, Research
Triangle Park, North Carolina 27709, United States
| | | | - Sarah Colley
- RTI
International, Research
Triangle Park, North Carolina 27709, United States
| | - Jamie Cajka
- RTI
International, Research
Triangle Park, North Carolina 27709, United States
| | - Jacqueline MacDonald Gibson
- Department
of Civil, Construction, and Environmental Engineering, North Carolina State University, Raleigh, North Carolina 27695, United States
| | | |
Collapse
|
21
|
Ghorbanali Z, Zare-Mirakabad F, Salehi N, Akbari M, Masoudi-Nejad A. DrugRep-HeSiaGraph: when heterogenous siamese neural network meets knowledge graphs for drug repurposing. BMC Bioinformatics 2023; 24:374. [PMID: 37789314 PMCID: PMC10548718 DOI: 10.1186/s12859-023-05479-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Accepted: 09/12/2023] [Indexed: 10/05/2023] Open
Abstract
BACKGROUND Drug repurposing is an approach that holds promise for identifying new therapeutic uses for existing drugs. Recently, knowledge graphs have emerged as significant tools for addressing the challenges of drug repurposing. However, there are still major issues with constructing and embedding knowledge graphs. RESULTS This study proposes a two-step method called DrugRep-HeSiaGraph to address these challenges. The method integrates the drug-disease knowledge graph with the application of a heterogeneous siamese neural network. In the first step, a drug-disease knowledge graph named DDKG-V1 is constructed by defining new relationship types, and then numerical vector representations for the nodes are created using the distributional learning method. In the second step, a heterogeneous siamese neural network called HeSiaNet is applied to enrich the embedding of drugs and diseases by bringing them closer in a new unified latent space. Then, it predicts potential drug candidates for diseases. DrugRep-HeSiaGraph achieves impressive performance metrics, including an AUC-ROC of 91.16%, an AUC-PR of 90.32%, an accuracy of 84.63%, a BS of 0.119, and an MCC of 69.31%. CONCLUSION We demonstrate the effectiveness of the proposed method in identifying potential drugs for COVID-19 as a case study. In addition, this study shows the role of dipeptidyl peptidase 4 (DPP-4) as a potential receptor for SARS-CoV-2 and the effectiveness of DPP-4 inhibitors in facing COVID-19. This highlights the practical application of the model in addressing real-world challenges in the field of drug repurposing. The code and data for DrugRep-HeSiaGraph are publicly available at https://github.com/CBRC-lab/DrugRep-HeSiaGraph .
Collapse
Affiliation(s)
- Zahra Ghorbanali
- Computational Biology Research Center (CBRC), Department of Mathematics and Computer Science, Amirkabir University of Technology, Tehran, Iran
| | - Fatemeh Zare-Mirakabad
- Computational Biology Research Center (CBRC), Department of Mathematics and Computer Science, Amirkabir University of Technology, Tehran, Iran.
| | - Najmeh Salehi
- School of Biological Science, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran
| | - Mohammad Akbari
- Computational Biology Research Center (CBRC), Department of Mathematics and Computer Science, Amirkabir University of Technology, Tehran, Iran
| | - Ali Masoudi-Nejad
- Laboratory of Systems Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
| |
Collapse
|
22
|
Ng S, Masarone S, Watson D, Barnes MR. The benefits and pitfalls of machine learning for biomarker discovery. Cell Tissue Res 2023; 394:17-31. [PMID: 37498390 PMCID: PMC10558383 DOI: 10.1007/s00441-023-03816-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Accepted: 07/12/2023] [Indexed: 07/28/2023]
Abstract
Prospects for the discovery of robust and reproducible biomarkers have improved considerably with the development of sensitive omics platforms that can enable measurement of biological molecules at an unprecedented scale. With technical barriers to success lowering, the challenge is now moving into the analytical domain. Genome-wide discovery presents a problem of scale and multiple testing as standard statistical methods struggle to distinguish signal from noise in increasingly complex biological systems. Machine learning and AI methods are good at finding answers in large datasets, but they have a tendency to overfit solutions. It may be possible to find a local answer or mechanism in a specific patient sample or small group of samples, but this may not generalise to wider patient populations due to the high likelihood of false discovery. The rise of explainable AI offers to improve the opportunity for true discovery by providing explanations for predictions that can be explored mechanistically before proceeding to costly and time-consuming validation studies. This review aims to introduce some of the basic concepts of machine learning and AI for biomarker discovery with a focus on post hoc explanation of predictions. To illustrate this, we consider how explainable AI has already been used successfully, and we explore a case study that applies AI to biomarker discovery in rheumatoid arthritis, demonstrating the accessibility of tools for AI and machine learning. We use this to illustrate and discuss some of the potential challenges and solutions that may enable AI to critically interrogate disease and response mechanisms.
Collapse
Affiliation(s)
- Sandra Ng
- Centre for Translational Bioinformatics, William Harvey Research Institute, Queen Mary University of London, London, UK
| | - Sara Masarone
- Centre for Translational Bioinformatics, William Harvey Research Institute, Queen Mary University of London, London, UK
- Alan Turing Institute, London, UK
| | - David Watson
- Department of Informatics, King's College London, London, UK
| | - Michael R Barnes
- Centre for Translational Bioinformatics, William Harvey Research Institute, Queen Mary University of London, London, UK.
- Alan Turing Institute, London, UK.
| |
Collapse
|
23
|
Geleijnse J, Rutten M, de Villiers D, Bamwenda JT, Abraham E. Enhancing water access monitoring through mapping multi-source usage and disaggregated geographic inequalities with machine learning and surveys. Sci Rep 2023; 13:13433. [PMID: 37596313 PMCID: PMC10439218 DOI: 10.1038/s41598-023-39917-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2023] [Accepted: 08/02/2023] [Indexed: 08/20/2023] Open
Abstract
Monitoring safe water access in developing countries relies primarily on household health survey and census data. These surveys are often incomplete: they tend to focus on the primary water source only, are spatially coarse, and usually happen every 5-10 years, during which significant changes can happen in urbanisation and infrastructure provision, especially in sub Saharan Africa. In this work, we present a data-driven approach that utilises and compliments survey based data of water access, to provide context-specific and disaggregated monitoring. The level of access to improved water and sanitation has been shown to vary with geographical inequalities related to the availability of water resources and terrain, population density and socio-economic determinants such as income and education. We use such data and successfully predict the level of water access in areas for which data is lacking, providing spatially explicit and community level monitoring possibilities for mapping geographical inequalities in access. This is showcased by applying three machine learning models that use such geographical data to predict the number of presences of water access points of eight different access types across Uganda, with a 1km by 1km grid resolution. Two Multi-Layer-Perceptron (MLP) models and a Maximum Entropy (MaxEnt) model are developed and compared, where the former are shown to consistently outperform the latter. The best performing Neural Network model achieved a True Positive Rate of 0.89 and a False Positive Rate of 0.24, compared to 0.85 and 0.46 respectively for the MaxEnt model. The models improve on previous work on water point modeling through the use of neural networks, in addition to introducing the True Positive - and False Positive Rate as better evaluation metrics to also assess the MaxEnt model. We also present a scaling method to move from predicting only the relative probability of water point presences, to predicting the absolute number of presences. To challenge both the model results and the more standard health surveys, a new household level survey is carried out in Bushenyi, a mid-sized town in the South-West of Uganda, asking specifically about the multitude of water sources. On average Bushenyi households reported to use 1.9 water sources. The survey further showed that the actual presence of a source, does not always imply that it is used. Therefore it is no option to rely solely on models for water access monitoring. For this, household surveys remain necessary but should be extended with questions on the multiple sources that are used by households.
Collapse
Affiliation(s)
- Jan Geleijnse
- Department of Water Management, Delft University of Technology, Mekelweg, 2628 CD, Delft, The Netherlands.
- UNICEF, Nairobi, Kenya.
| | - Martine Rutten
- Department of Water Management, Delft University of Technology, Mekelweg, 2628 CD, Delft, The Netherlands
| | - Didier de Villiers
- Department of Water Management, Delft University of Technology, Mekelweg, 2628 CD, Delft, The Netherlands
| | | | - Edo Abraham
- Department of Water Management, Delft University of Technology, Mekelweg, 2628 CD, Delft, The Netherlands
| |
Collapse
|
24
|
Oh SS, Kuang I, Jeong H, Song JY, Ren B, Moon JY, Park EC, Kawachi I. Predicting Fetal Alcohol Spectrum Disorders Using Machine Learning Techniques: Multisite Retrospective Cohort Study. J Med Internet Res 2023; 25:e45041. [PMID: 37463016 PMCID: PMC10394506 DOI: 10.2196/45041] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Revised: 05/22/2023] [Accepted: 06/18/2023] [Indexed: 07/21/2023] Open
Abstract
BACKGROUND Fetal alcohol syndrome (FAS) is a lifelong developmental disability that occurs among individuals with prenatal alcohol exposure (PAE). With improved prediction models, FAS can be diagnosed or treated early, if not completely prevented. OBJECTIVE In this study, we sought to compare different machine learning algorithms and their FAS predictive performance among women who consumed alcohol during pregnancy. We also aimed to identify which variables (eg, timing of exposure to alcohol during pregnancy and type of alcohol consumed) were most influential in generating an accurate model. METHODS Data from the collaborative initiative on fetal alcohol spectrum disorders from 2007 to 2017 were used to gather information about 595 women who consumed alcohol during pregnancy at 5 hospital sites around the United States. To obtain information about PAE, questionnaires or in-person interviews, as well as reviews of medical, legal, or social service records were used to gather information about alcohol consumption. Four different machine learning algorithms (logistic regression, XGBoost, light gradient-boosting machine, and CatBoost) were trained to predict the prevalence of FAS at birth, and model performance was measured by analyzing the area under the receiver operating characteristics curve (AUROC). Of the total cases, 80% were randomly selected for training, while 20% remained as test data sets for predicting FAS. Feature importance was also analyzed using Shapley values for the best-performing algorithm. RESULTS Overall, there were 20 cases of FAS within a total population of 595 individuals with PAE. Most of the drinking occurred in the first trimester only (n=491) or throughout all 3 trimesters (n=95); however, there were also reports of drinking in the first and second trimesters only (n=8), and 1 case of drinking in the third trimester only (n=1). The CatBoost method delivered the best performance in terms of AUROC (0.92) and area under the precision-recall curve (AUPRC 0.51), followed by the logistic regression method (AUROC 0.90; AUPRC 0.59), the light gradient-boosting machine (AUROC 0.89; AUPRC 0.52), and XGBoost (AUROC 0.86; AURPC 0.45). Shapley values in the CatBoost model revealed that 12 variables were considered important in FAS prediction, with drinking throughout all 3 trimesters of pregnancy, maternal age, race, and type of alcoholic beverage consumed (eg, beer, wine, or liquor) scoring highly in overall feature importance. For most predictive measures, the best performance was obtained by the CatBoost algorithm, with an AUROC of 0.92, precision of 0.50, specificity of 0.29, F1 score of 0.29, and accuracy of 0.96. CONCLUSIONS Machine learning algorithms were able to identify FAS risk with a prediction performance higher than that of previous models among pregnant drinkers. For small training sets, which are common with FAS, boosting mechanisms like CatBoost may help alleviate certain problems associated with data imbalances and difficulties in optimization or generalization.
Collapse
Affiliation(s)
- Sarah Soyeon Oh
- Department of Social and Behavioral Sciences, Harvard TH Chan School of Public Health, Boston, MA, United States
- Institute of Health Services Research, Yonsei University College of Medicine, Seoul, Republic of Korea
| | - Irene Kuang
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, United States
| | - Hyewon Jeong
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, United States
| | - Jin-Yeop Song
- Department of Physics, Massachusetts Institute of Technology, Cambridge, MA, United States
| | - Boyu Ren
- Department of Psychiatry, Harvard Medical School, Boston, MA, United States
| | - Jong Youn Moon
- Artificial Intelligence and Big-Data Convergence Center, Gil Medical Center, Gachon University College of Medicine, Incheon, Republic of Korea
| | - Eun-Cheol Park
- Institute of Health Services Research, Yonsei University College of Medicine, Seoul, Republic of Korea
| | - Ichiro Kawachi
- Department of Social and Behavioral Sciences, Harvard TH Chan School of Public Health, Boston, MA, United States
| |
Collapse
|
25
|
Stillman AN, Wilkerson RL, Kaschube DR, Siegel RB, Sawyer SC, Tingley MW. Incorporating pyrodiversity into wildlife habitat assessments for rapid post-fire management: A woodpecker case study. ECOLOGICAL APPLICATIONS : A PUBLICATION OF THE ECOLOGICAL SOCIETY OF AMERICA 2023; 33:e2853. [PMID: 36995347 DOI: 10.1002/eap.2853] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/09/2022] [Revised: 03/15/2023] [Accepted: 03/16/2023] [Indexed: 06/02/2023]
Abstract
Spatial and temporal variation in fire characteristics-termed pyrodiversity-are increasingly recognized as important factors that structure wildlife communities in fire-prone ecosystems, yet there have been few attempts to incorporate pyrodiversity or post-fire habitat dynamics into predictive models of animal distributions and abundance to support post-fire management. We use the black-backed woodpecker-a species associated with burned forests-as a case study to demonstrate a pathway for incorporating pyrodiversity into wildlife habitat assessments for adaptive management. Employing monitoring data (2009-2019) from post-fire forests in California, we developed three competing occupancy models describing different hypotheses for habitat associations: (1) a static model representing an existing management tool, (2) a temporal model accounting for years since fire, and (3) a temporal-landscape model which additionally incorporates emerging evidence from field studies about the influence of pyrodiversity. Evaluating predictive ability, we found superior support for the temporal-landscape model, which showed a positive relationship between occupancy and pyrodiversity and interactions between habitat associations and years since fire. We incorporated the new temporal-landscape model into an RShiny application to make this decision-support tool accessible to decision-makers.
Collapse
Affiliation(s)
- Andrew N Stillman
- Cornell Lab of Ornithology, Cornell University, Ithaca, New York, USA
- Cornell Atkinson Center for Sustainability, Cornell University, Ithaca, New York, USA
| | | | | | - Rodney B Siegel
- The Institute for Bird Populations, Petaluma, California, USA
| | | | - Morgan W Tingley
- Ecology and Evolutionary Biology, University of California, Los Angeles, Los Angeles, California, USA
| |
Collapse
|
26
|
Lin HD, Lee TH, Lin CH, Wu HC. Optical Imaging Deformation Inspection and Quality Level Determination of Multifocal Glasses. SENSORS (BASEL, SWITZERLAND) 2023; 23:s23094497. [PMID: 37177700 PMCID: PMC10181736 DOI: 10.3390/s23094497] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Revised: 05/01/2023] [Accepted: 05/03/2023] [Indexed: 05/15/2023]
Abstract
Multifocal glasses are a new type of lens that can fit both nearsighted and farsighted vision on the same lens. This property allows the glass to have various curvatures in distinct regions within the glass during the grinding process. However, when the curvature varies irregularly, the glass is prone to optical deformation during imaging. Most of the previous studies on imaging deformation focus on the deformation correction of optical lenses. Consequently, this research uses an automatic deformation defect detection system for multifocal glasses to replace professional assessors. To quantify the grade of deformation of curved multifocal glasses, we first digitally imaged a pattern of concentric circles through a test glass to generate an imaged image of the glass. Second, we preprocess the image to enhance the clarity of the concentric circles' appearance. A centroid-radius model is used to represent the form variation properties of every circle in the processed image. Third, the deviation of the centroid radius for detecting deformation defects is found by a slight deviation control scheme, and we gain a difference image indicating the detected deformed regions after comparing it with the norm pattern. Fourth, based on the deformation measure and occurrence location of multifocal glasses, we build fuzzy membership functions and inference regulations to quantify the deformation's severity. Finally, a mixed model incorporating a network-based fuzzy inference and a genetic algorithm is applied to determine a quality grade for the deformation severity of detected defects. Testing outcomes show that the proposed methods attain a 94% accuracy rate of the quality levels for deformation severity, an 81% recall rate of deformation defects, and an 11% false positive rate for multifocal glass detection. This research contributes solutions to the problems of imaging deformation inspection and provides computer-aided systems for determining quality levels that meet the demands of inspection and quality control.
Collapse
Affiliation(s)
- Hong-Dar Lin
- Department of Industrial Engineering and Management, Chaoyang University of Technology, Taichung 413310, Taiwan
| | - Tung-Hsin Lee
- Department of Industrial Engineering and Management, Chaoyang University of Technology, Taichung 413310, Taiwan
| | - Chou-Hsien Lin
- Department of Civil, Architectural, and Environmental Engineering, The University of Texas at Austin, Austin, TX 78712-0273, USA
| | - Hsin-Chieh Wu
- Department of Industrial Engineering and Management, Chaoyang University of Technology, Taichung 413310, Taiwan
| |
Collapse
|
27
|
Ghorbanali Z, Zare-Mirakabad F, Akbari M, Salehi N, Masoudi-Nejad A. DrugRep-KG: Toward Learning a Unified Latent Space for Drug Repurposing Using Knowledge Graphs. J Chem Inf Model 2023; 63:2532-2545. [PMID: 37023229 PMCID: PMC10109243 DOI: 10.1021/acs.jcim.2c01291] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2022] [Indexed: 04/08/2023]
Abstract
Drug repurposing or repositioning (DR) refers to finding new therapeutic applications for existing drugs. Current computational DR methods face data representation and negative data sampling challenges. Although retrospective studies attempt to operate various representations, it is a crucial step for an accurate prediction to aggregate these features and bring the associations between drugs and diseases into a unified latent space. In addition, the number of unknown associations between drugs and diseases, which is considered negative data, is much higher than the number of known associations, or positive data, leading to an imbalanced dataset. In this regard, we propose the DrugRep-KG method, which applies a knowledge graph embedding approach for representing drugs and diseases, to address these challenges. Despite the typical DR methods that consider all unknown drug-disease associations as negative data, we select a subset of unknown associations, provided the disease occurs because of an adverse reaction to a drug. DrugRep-KG has been evaluated based on different settings and achieves an AUC-ROC (area under the receiver operating characteristic curve) of 90.83% and an AUC-PR (area under the precision-recall curve) of 90.10%, which are higher than in previous works. Besides, we checked the performance of our framework in finding potential drugs for coronavirus infection and skin-related diseases: contact dermatitis and atopic eczema. DrugRep-KG predicted beclomethasone for contact dermatitis, and fluorometholone, clocortolone, fluocinonide, and beclomethasone for atopic eczema, all of which have previously been proven to be effective in other studies. Fluorometholone for contact dermatitis is a novel suggestion by DrugRep-KG that should be validated experimentally. DrugRep-KG also predicted the associations between COVID-19 and potential treatments suggested by DrugBank, in addition to new drug candidates provided with experimental evidence. The data and code underlying this article are available at https://github.com/CBRC-lab/DrugRep-KG.
Collapse
Affiliation(s)
- Zahra Ghorbanali
- Department
of Mathematics and Computer Science, Amirkabir
University of Technology, Tehran 1591634311, Iran
| | - Fatemeh Zare-Mirakabad
- Department
of Mathematics and Computer Science, Amirkabir
University of Technology, Tehran 1591634311, Iran
| | - Mohammad Akbari
- Department
of Mathematics and Computer Science, Amirkabir
University of Technology, Tehran 1591634311, Iran
| | - Najmeh Salehi
- School
of Biological Science, Institute for Research
in Fundamental Sciences (IPM), Tehran 19395-5746, Iran
| | - Ali Masoudi-Nejad
- Laboratory
of Systems Biology and Bioinformatics (LBB), Institute of Biochemistry
and Biophysics, University of Tehran, Tehran 1417935840, Iran
| |
Collapse
|
28
|
Silagyi DV, Liu D. Prediction of severity of aviation landing accidents using support vector machine models. ACCIDENT; ANALYSIS AND PREVENTION 2023; 187:107043. [PMID: 37086512 DOI: 10.1016/j.aap.2023.107043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Revised: 12/29/2022] [Accepted: 03/23/2023] [Indexed: 05/03/2023]
Abstract
The purpose of this study was to apply support vector machine (SVM) models to predict the severity of aircraft damage and the severity of personal injury during an aircraft approach and landing accident and to evaluate and rank the importance of 14 accident factors across 39 sub-categorical factors. Three new factors were introduced using the theory of inattentional blindness: The presence of visual area surface penetrations for a runway, the Federal Aviation Administration's (FAA) visual area surface penetration policy timeframe, and the type of runway approach lighting. The study comprised 1,297 aircraft approach and landing accidents at airports within the United States with at least one instrument approach procedure. Support vector machine models were developed in using the linear, polynomial, radial basis function (RBF), and sigmoid kernels for the severity of aircraft damage and additional SVM models were developed for the severity of personal injury. The SVM models using the RBF kernel produced the best machine learning models with a 96% accuracy for predicting the severity of aircraft damage (0.94 precision, 0.95 recall, and 0.95 F1-score) and a 98% accuracy for predicting the severity of personal injury (0.99 precision, 0.98 recall, and 0.99 F1-score). The top predictors across both models were the pilot's total flight hours, time of the accident, pilot's age, crosswind component, landing runway number, single-engine land certificate, and any obstacle penetration. This study demonstrates the benefit of SVM modeling using the RBF kernel for accident prediction and for datasets with categorical factors.
Collapse
Affiliation(s)
- Dezsö V Silagyi
- Embry-Riddle Aeronautical University, Daytona Beach, FL, USA.
| | - Dahai Liu
- Embry-Riddle Aeronautical University, Daytona Beach, FL, USA.
| |
Collapse
|
29
|
Chang S, Wilkho RS, Gharaibeh N, Sansom G, Meyer M, Olivera F, Zou L. Environmental, climatic, and situational factors influencing the probability of fatality or injury occurrence in flash flooding: a rare event logistic regression predictive model. NATURAL HAZARDS (DORDRECHT, NETHERLANDS) 2023; 116:3957-3978. [PMID: 37974652 PMCID: PMC10653003 DOI: 10.1007/s11069-023-05845-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/05/2022] [Accepted: 01/30/2023] [Indexed: 11/19/2023]
Abstract
Flash flooding is considered one of the most lethal natural hazards in the USA as measured by the ratio of fatalities to people affected. However, the occurrence of injuries and fatali- ties during flash flooding was found to be rare (about 2% occurrence rate) based on our analysis of 6,065 flash flood events that occurred in Texas over a 15-year period (2005 to 2019). This article identifies climatic, environmental, and situational factors that affect the occurrence of fatalities and injuries in flash flood events and provides a predictive model to estimate the likelihood of these occurrences. Due to the highly imbalanced dataset, three forms of logit models were investigated to achieve unbiased estimations of the model coef- ficients. The rare event logistic regression (Relogit) model was found to be the most suit- able model. The model considers ten independent situational, climatic, and environmental variables that could affect human safety in flash flood events. Vehicle-related activities dur- ing flash flooding exhibited the greatest effect on the probability of human harm occur- rence, followed by the event's time (daytime vs. nighttime), precipitation amount, location with respect to the flash flood alley, median age of structures in the community, low water crossing density, and event duration. The application of the developed model as a simula- tion tool for informing flash flood mitigation planning was demonstrated in two study cases in Texas.
Collapse
Affiliation(s)
- Shi Chang
- Zachry Department of Civil & Environmental Engineering, Texas A&M University, 3136 TAMU, College Station, TX 77843-3136, USA
| | - Rohan Singh Wilkho
- Zachry Department of Civil & Environmental Engineering, Texas A&M University, 3136 TAMU, College Station, TX 77843-3136, USA
| | - Nasir Gharaibeh
- Zachry Department of Civil & Environmental Engineering, Texas A&M University, 3136 TAMU, College Station, TX 77843-3136, USA
| | - Garett Sansom
- Department of Environmental and Occupational Health, Texas A&M University, College Station, TX, USA
| | - Michelle Meyer
- Department of Landscape Architecture and Urban Planning, Texas A&M University, College Station, TX, USA
| | - Francisco Olivera
- Zachry Department of Civil & Environmental Engineering, Texas A&M University, 3136 TAMU, College Station, TX 77843-3136, USA
| | - Lei Zou
- Department of Geography, Texas A&M University, College Station, TX, USA
| |
Collapse
|
30
|
Murch WS, Kairouz S, Dauphinais S, Picard E, Costes JM, French M. Using machine learning to retrospectively predict self-reported gambling problems in Quebec. Addiction 2023. [PMID: 36880253 DOI: 10.1111/add.16179] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Accepted: 02/15/2023] [Indexed: 03/08/2023]
Abstract
BACKGROUND AND AIMS Participating in online gambling is associated with an increased risk for experiencing gambling-related harms, driving calls for more effective, personalized harm prevention initiatives. Such initiatives depend on the development of models capable of detecting at-risk online gamblers. We aimed to determine whether machine learning algorithms can use site data to detect retrospectively at-risk online gamblers indicated by the Problem Gambling Severity Index (PGSI). DESIGN Exploratory comparison of six prominent supervised machine learning methods (decision trees, random forests, K-nearest neighbours, logistic regressions, artificial neural networks and support vector machines) to predict problem gambling risk levels reported on the PGSI. SETTING Lotoquebec.com (formerly espacejeux.com), an online gambling platform operated by Loto-Québec (a provincial Crown Corporation) in Quebec, Canada. PARTICIPANTS N = 9145 adults (18+) who completed the survey measure and placed at least one bet using real money on the site. MEASUREMENTS Participants completed the PGSI, a self-report questionnaire with validated cut-offs denoting a moderate-to-high-risk (PGSI 5+) or high-risk (PGSI 8+) for experiencing past-year gambling-related problems. Participants agreed to release additional data about the preceding 12 months from their user accounts. Predictor variables (144) were derived from users' transactions, apparent betting behaviours, listed demographics and use of responsible gambling tools on the platform. FINDINGS Our best classification models (random forests) for the PGSI 5+ and 8+ outcome variables accounted for 84.33% (95% CI = 82.24-86.41) and 82.52% (95% CI = 79.96-85.08) of the total area under their receiver operating characteristic curves, respectively. The most important factors in these models included the frequency and variability of participants' betting behaviour and repeat engagement on the site. CONCLUSIONS Machine learning algorithms appear to be able to classify at-risk online gamblers using data generated from their use of online gambling platforms. They may enable personalized harm prevention initiatives, but are constrained by trade-offs between their sensitivity and precision.
Collapse
Affiliation(s)
- W Spencer Murch
- Department of Sociology and Anthropology, Concordia University, Montreal, Quebec, Canada
| | - Sylvia Kairouz
- Department of Sociology and Anthropology, Concordia University, Montreal, Quebec, Canada
| | - Sophie Dauphinais
- Department of Sociology and Anthropology, Concordia University, Montreal, Quebec, Canada
| | - Elyse Picard
- Department of Sociology and Anthropology, Concordia University, Montreal, Quebec, Canada
| | - Jean-Michel Costes
- Department of Sociology and Anthropology, Concordia University, Montreal, Quebec, Canada
| | - Martin French
- Department of Sociology and Anthropology, Concordia University, Montreal, Quebec, Canada
| |
Collapse
|
31
|
Eysenbach G, Chao HJ, Chiang YC, Chen HY. Explainable Machine Learning Techniques To Predict Amiodarone-Induced Thyroid Dysfunction Risk: Multicenter, Retrospective Study With External Validation. J Med Internet Res 2023; 25:e43734. [PMID: 36749620 PMCID: PMC9944157 DOI: 10.2196/43734] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Revised: 12/25/2022] [Accepted: 01/16/2023] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND Machine learning offers new solutions for predicting life-threatening, unpredictable amiodarone-induced thyroid dysfunction. Traditional regression approaches for adverse-effect prediction without time-series consideration of features have yielded suboptimal predictions. Machine learning algorithms with multiple data sets at different time points may generate better performance in predicting adverse effects. OBJECTIVE We aimed to develop and validate machine learning models for forecasting individualized amiodarone-induced thyroid dysfunction risk and to optimize a machine learning-based risk stratification scheme with a resampling method and readjustment of the clinically derived decision thresholds. METHODS This study developed machine learning models using multicenter, delinked electronic health records. It included patients receiving amiodarone from January 2013 to December 2017. The training set was composed of data from Taipei Medical University Hospital and Wan Fang Hospital, while data from Taipei Medical University Shuang Ho Hospital were used as the external test set. The study collected stationary features at baseline and dynamic features at the first, second, third, sixth, ninth, 12th, 15th, 18th, and 21st months after amiodarone initiation. We used 16 machine learning models, including extreme gradient boosting, adaptive boosting, k-nearest neighbor, and logistic regression models, along with an original resampling method and 3 other resampling methods, including oversampling with the borderline-synthesized minority oversampling technique, undersampling-edited nearest neighbor, and over- and undersampling hybrid methods. The model performance was compared based on accuracy; Precision, recall, F1-score, geometric mean, area under the curve of the receiver operating characteristic curve (AUROC), and the area under the precision-recall curve (AUPRC). Feature importance was determined by the best model. The decision threshold was readjusted to identify the best cutoff value and a Kaplan-Meier survival analysis was performed. RESULTS The training set contained 4075 patients from Taipei Medical University Hospital and Wan Fang Hospital, of whom 583 (14.3%) developed amiodarone-induced thyroid dysfunction, while the external test set included 2422 patients from Taipei Medical University Shuang Ho Hospital, of whom 275 (11.4%) developed amiodarone-induced thyroid dysfunction. The extreme gradient boosting oversampling machine learning model demonstrated the best predictive outcomes among all 16 models. The accuracy; Precision, recall, F1-score, G-mean, AUPRC, and AUROC were 0.923, 0.632, 0.756, 0.688, 0.845, 0.751, and 0.934, respectively. After readjusting the cutoff, the best value was 0.627, and the F1-score reached 0.699. The best threshold was able to classify 286 of 2422 patients (11.8%) as high-risk subjects, among which 275 were true-positive patients in the testing set. A shorter treatment duration; higher levels of thyroid-stimulating hormone and high-density lipoprotein cholesterol; and lower levels of free thyroxin, alkaline phosphatase, and low-density lipoprotein were the most important features. CONCLUSIONS Machine learning models combined with resampling methods can predict amiodarone-induced thyroid dysfunction and serve as a support tool for individualized risk prediction and clinical decision support.
Collapse
Affiliation(s)
| | - Horng-Jiun Chao
- Department of Clinical Pharmacy, School of Pharmacy, Taipei Medical University, Taipei, Taiwan
| | - Yi-Chun Chiang
- Department of Clinical Pharmacy, School of Pharmacy, Taipei Medical University, Taipei, Taiwan.,Department of Pharmacy, Wan Fang Hospital, Taipei Medical University, Taipei, Taiwan
| | - Hsiang-Yin Chen
- Department of Clinical Pharmacy, School of Pharmacy, Taipei Medical University, Taipei, Taiwan.,Department of Pharmacy, Wan Fang Hospital, Taipei Medical University, Taipei, Taiwan
| |
Collapse
|
32
|
National wetland mapping using remote-sensing-derived environmental variables, archive field data, and artificial intelligence. Heliyon 2023; 9:e13482. [PMID: 36816231 PMCID: PMC9929292 DOI: 10.1016/j.heliyon.2023.e13482] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Revised: 01/31/2023] [Accepted: 02/01/2023] [Indexed: 02/08/2023] Open
Abstract
While wetland ecosystem services are widely recognized, the lack of fine-scale national inventories prevents successful implementation of conservation policies. Wetlands are difficult to map due to their complex fine-grained spatial pattern and fuzzy boundaries. However, the increasing amount of open high-spatial-resolution remote sensing data and accurately georeferenced field data archives, as well as progress in artificial intelligence (AI), provide opportunities for fine-scale national wetland mapping. The objective of this study was to map wetlands over mainland France (ca. 550,000 km2) by applying AI to environmental variables derived from remote sensing and archive field data. A random forest model was calibrated using spatial cross-validation according to the precision-recall area under the curve (PR-AUC) index using ca. 135,000 soil or flora plots from archive databases, as well as 5 m topographical variables derived from an airborne DTM and a geological map. The model was validated using an experimentally designed sampling strategy with ca. 3000 plots collected during a ground survey in 2021 along non-wetland/wetland transects. Map accuracy was then compared to those of nine existing wetland maps with global, European, or national coverage. The model-derived suitability map (PR-AUC 0.76) highlights the gradual boundaries and fine-grained pattern of wetlands. The binary map is significantly more accurate (F1-score 0.75, overall accuracy 0.67) than existing wetland maps. The approach and end-results are of important value for spatial planning and environmental management since the high-resolution suitability and binary maps enable more targeted conservation measures to support biodiversity conservation, water resources maintenance, and carbon storage.
Collapse
|
33
|
Predicting wind-driven spatial deposition through simulated color images using deep autoencoders. Sci Rep 2023; 13:1394. [PMID: 36697487 PMCID: PMC9876895 DOI: 10.1038/s41598-023-28590-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2022] [Accepted: 01/20/2023] [Indexed: 01/27/2023] Open
Abstract
For centuries, scientists have observed nature to understand the laws that govern the physical world. The traditional process of turning observations into physical understanding is slow. Imperfect models are constructed and tested to explain relationships in data. Powerful new algorithms can enable computers to learn physics by observing images and videos. Inspired by this idea, instead of training machine learning models using physical quantities, we used images, that is, pixel information. For this work, and as a proof of concept, the physics of interest are wind-driven spatial patterns. These phenomena include features in Aeolian dunes and volcanic ash deposition, wildfire smoke, and air pollution plumes. We use computer model simulations of spatial deposition patterns to approximate images from a hypothetical imaging device whose outputs are red, green, and blue (RGB) color images with channel values ranging from 0 to 255. In this paper, we explore deep convolutional neural network-based autoencoders to exploit relationships in wind-driven spatial patterns, which commonly occur in geosciences, and reduce their dimensionality. Reducing the data dimension size with an encoder enables training deep, fully connected neural network models linking geographic and meteorological scalar input quantities to the encoded space. Once this is achieved, full spatial patterns are reconstructed using the decoder. We demonstrate this approach on images of spatial deposition from a pollution source, where the encoder compresses the dimensionality to 0.02% of the original size, and the full predictive model performance on test data achieves a normalized root mean squared error of 8%, a figure of merit in space of 94% and a precision-recall area under the curve of 0.93.
Collapse
|
34
|
Fiorentino MC, Villani FP, Di Cosmo M, Frontoni E, Moccia S. A review on deep-learning algorithms for fetal ultrasound-image analysis. Med Image Anal 2023; 83:102629. [PMID: 36308861 DOI: 10.1016/j.media.2022.102629] [Citation(s) in RCA: 27] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2022] [Revised: 07/12/2022] [Accepted: 09/10/2022] [Indexed: 11/07/2022]
Abstract
Deep-learning (DL) algorithms are becoming the standard for processing ultrasound (US) fetal images. A number of survey papers in the field is today available, but most of them are focusing on a broader area of medical-image analysis or not covering all fetal US DL applications. This paper surveys the most recent work in the field, with a total of 153 research papers published after 2017. Papers are analyzed and commented from both the methodology and the application perspective. We categorized the papers into (i) fetal standard-plane detection, (ii) anatomical structure analysis and (iii) biometry parameter estimation. For each category, main limitations and open issues are presented. Summary tables are included to facilitate the comparison among the different approaches. In addition, emerging applications are also outlined. Publicly-available datasets and performance metrics commonly used to assess algorithm performance are summarized, too. This paper ends with a critical summary of the current state of the art on DL algorithms for fetal US image analysis and a discussion on current challenges that have to be tackled by researchers working in the field to translate the research methodology into actual clinical practice.
Collapse
Affiliation(s)
| | | | - Mariachiara Di Cosmo
- Department of Information Engineering, Università Politecnica delle Marche, Italy
| | - Emanuele Frontoni
- Department of Information Engineering, Università Politecnica delle Marche, Italy; Department of Political Sciences, Communication and International Relations, Università degli Studi di Macerata, Italy
| | - Sara Moccia
- The BioRobotics Institute and Department of Excellence in Robotics & AI, Scuola Superiore Sant'Anna, Italy
| |
Collapse
|
35
|
Kougioumoutzis K, Trigas P, Tsakiri M, Kokkoris IP, Koumoutsou E, Dimopoulos P, Tzanoudakis D, Iatrou G, Panitsa M. Climate and Land-Cover Change Impacts and Extinction Risk Assessment of Rare and Threatened Endemic Taxa of Chelmos-Vouraikos National Park (Peloponnese, Greece). PLANTS (BASEL, SWITZERLAND) 2022; 11:3548. [PMID: 36559660 PMCID: PMC9784511 DOI: 10.3390/plants11243548] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/25/2022] [Revised: 12/04/2022] [Accepted: 12/10/2022] [Indexed: 06/17/2023]
Abstract
Chelmos-Vouraikos National Park is a floristic diversity and endemism hotspot in Greece and one of the main areas where Greek endemic taxa, preliminary assessed as critically endangered and threatened under the IUCN Criteria A and B, are mainly concentrated. The climate and land-cover change impacts on rare and endemic species distributions is more prominent in regional biodiversity hotspots. The main aims of the current study were: (a) to investigate how climate and land-cover change may alter the distribution of four single mountain endemics and three very rare Peloponnesian endemic taxa of the National Park via a species distribution modelling approach, and (b) to estimate the current and future extinction risk of the aforementioned taxa based on the IUCN Criteria A and B, in order to investigate the need for designing an effective plant micro-reserve network and to support decision making on spatial planning efforts and conservation research for a sustainable, integrated management. Most of the taxa analyzed are expected to continue to be considered as critically endangered based on both Criteria A and B under all land-cover/land-use scenarios, GCM/RCP and time-period combinations, while two, namely Alchemilla aroanica and Silene conglomeratica, are projected to become extinct in most future climate change scenarios. When land-cover/land-use data were included in the analyses, these negative effects were less pronounced. However, Silene conglomeratica, the rarest mountain endemic found in the study area, is still expected to face substantial range decline. Our results highlight the urgent need for the establishment of micro-reserves for these taxa.
Collapse
Affiliation(s)
| | - Panayiotis Trigas
- Laboratory of Systematic Botany, Department of Crop Science, Agricultural University of Athens, 11855 Athens, Greece
| | - Maria Tsakiri
- Laboratory of Botany, Department of Biology, University of Patras, 26504 Patras, Greece
| | - Ioannis P. Kokkoris
- Laboratory of Botany, Department of Biology, University of Patras, 26504 Patras, Greece
| | - Eleni Koumoutsou
- Laboratory of Botany, Department of Biology, University of Patras, 26504 Patras, Greece
| | - Panayotis Dimopoulos
- Laboratory of Botany, Department of Biology, University of Patras, 26504 Patras, Greece
| | - Dimitris Tzanoudakis
- Laboratory of Botany, Department of Biology, University of Patras, 26504 Patras, Greece
| | - Gregoris Iatrou
- Laboratory of Botany, Department of Biology, University of Patras, 26504 Patras, Greece
| | - Maria Panitsa
- Laboratory of Botany, Department of Biology, University of Patras, 26504 Patras, Greece
| |
Collapse
|
36
|
Zhi X, Du H, Zhang M, Long Z, Zhong L, Sun X. Mapping the habitat for the moose population in Northeast China by combining remote sensing products and random forests. Glob Ecol Conserv 2022. [DOI: 10.1016/j.gecco.2022.e02347] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
|
37
|
Langeland E, Johnsen IF, Sømme KK, Morken AM, Erevik EK, Kolberg E, Jonsson J, Mentzoni RA, Pallesen S. One size does not fit all. Should gambling loss limits be based on income? Front Psychiatry 2022; 13:1005172. [PMID: 36465287 PMCID: PMC9709812 DOI: 10.3389/fpsyt.2022.1005172] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/18/2022] [Accepted: 10/05/2022] [Indexed: 11/17/2022] Open
Abstract
Background Previous research has suggested empirically based gambling loss limits, with the goal of preventing gambling related harm in the population. However, there is a lack of studies relating gambling loss limits to individual factors such as income. The current study examines whether gambling loss limits should be income-specific. Materials and methods The dataset was derived from three representative cross-sectional surveys of the Norwegian population and consisted of 14,630 gamblers. Four income groups, based on a quartile approximation, were formed. Gambling related harm was measured with the Problem Gambling Severity Index (PGSI), and precision-recall (PR) analyses were used to identify loss limits for the different income groups at two levels of gambling severity: moderate-risk gambling and problem gambling. Results For both levels of gambling severity, we found the lowest income group to have the lowest gambling loss limits, and the highest income group to have the highest loss limits, which compared to the loss limits for the total sample, were lower and higher, respectively. Calculating the cut-offs for moderate-risk gamblers, we found a consistently ascending pattern from the lowest to the highest income group. Calculating the cut-offs for problem gamblers, we found a similar pattern except for the two middle income groups. Conclusion The results suggest that income moderates empirically derived gambling loss limits. Although replication is required, income-based gambling loss limits may have higher applied value for preventing gambling related harm, compared to general loss limits aimed at the entire population.
Collapse
Affiliation(s)
- Elias Langeland
- Department of Psychosocial Science, University of Bergen, Bergen, Norway
| | | | - Kaja Kastrup Sømme
- Department of Psychosocial Science, University of Bergen, Bergen, Norway
| | - Arne Magnus Morken
- Department of Psychosocial Science, University of Bergen, Bergen, Norway
- Norwegian Competence Center for Gambling and Gaming Research, University of Bergen, Bergen, Norway
| | - Eilin Kristine Erevik
- Department of Psychosocial Science, University of Bergen, Bergen, Norway
- Norwegian Competence Center for Gambling and Gaming Research, University of Bergen, Bergen, Norway
| | - Eirin Kolberg
- Department of Psychosocial Science, University of Bergen, Bergen, Norway
| | - Jakob Jonsson
- Department of Clinical Neuroscience, Centre for Psychiatry Research, Karolinska Institutet, Stockholm, Sweden
| | - Rune Aune Mentzoni
- Department of Psychosocial Science, University of Bergen, Bergen, Norway
- Norwegian Competence Center for Gambling and Gaming Research, University of Bergen, Bergen, Norway
| | - Ståle Pallesen
- Department of Psychosocial Science, University of Bergen, Bergen, Norway
- Norwegian Competence Center for Gambling and Gaming Research, University of Bergen, Bergen, Norway
| |
Collapse
|
38
|
Levy TJ, Coppa K, Cang J, Barnaby DP, Paradis MD, Cohen SL, Makhnevich A, van Klaveren D, Kent DM, Davidson KW, Hirsch JS, Zanos TP. Development and validation of self-monitoring auto-updating prognostic models of survival for hospitalized COVID-19 patients. Nat Commun 2022; 13:6812. [PMID: 36357420 PMCID: PMC9648888 DOI: 10.1038/s41467-022-34646-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2022] [Accepted: 11/02/2022] [Indexed: 11/12/2022] Open
Abstract
Clinical prognostic models can assist patient care decisions. However, their performance can drift over time and location, necessitating model monitoring and updating. Despite rapid and significant changes during the pandemic, prognostic models for COVID-19 patients do not currently account for these drifts. We develop a framework for continuously monitoring and updating prognostic models and apply it to predict 28-day survival in COVID-19 patients. We use demographic, laboratory, and clinical data from electronic health records of 34912 hospitalized COVID-19 patients from March 2020 until May 2022 and compare three modeling methods. Model calibration performance drift is immediately detected with minor fluctuations in discrimination. The overall calibration on the prospective validation cohort is significantly improved when comparing the dynamically updated models against their static counterparts. Our findings suggest that, using this framework, models remain accurate and well-calibrated across various waves, variants, race and sex and yield positive net-benefits.
Collapse
Affiliation(s)
- Todd J Levy
- Institute of Health System Science, Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY, 11030, USA
- Institute of Bioelectronic Medicine, Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY, 11030, USA
| | - Kevin Coppa
- Clinical Digital Solutions, Northwell Health, New Hyde Park, NY, 11042, USA
| | - Jinxuan Cang
- Institute of Health System Science, Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY, 11030, USA
- Institute of Bioelectronic Medicine, Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY, 11030, USA
| | - Douglas P Barnaby
- Institute of Health System Science, Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY, 11030, USA
- Donald and Barbara Zucker School of Medicine at Hofstra/Northwell, Northwell Health, Hempstead, NY, 11549, USA
| | - Marc D Paradis
- Northwell Holdings, Northwell Health, Manhasset, NY, 11030, USA
| | - Stuart L Cohen
- Institute of Health System Science, Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY, 11030, USA
- Donald and Barbara Zucker School of Medicine at Hofstra/Northwell, Northwell Health, Hempstead, NY, 11549, USA
| | - Alex Makhnevich
- Institute of Health System Science, Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY, 11030, USA
- Donald and Barbara Zucker School of Medicine at Hofstra/Northwell, Northwell Health, Hempstead, NY, 11549, USA
| | - David van Klaveren
- Department of Public Health, Erasmus MC University Medical Center, Rotterdam, Netherlands
- Predictive Analytics and Comparative Effectiveness Center, Tufts Medical Center, Boston, MA, USA
| | - David M Kent
- Predictive Analytics and Comparative Effectiveness Center, Tufts Medical Center, Boston, MA, USA
| | - Karina W Davidson
- Institute of Health System Science, Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY, 11030, USA
- Donald and Barbara Zucker School of Medicine at Hofstra/Northwell, Northwell Health, Hempstead, NY, 11549, USA
| | - Jamie S Hirsch
- Institute of Health System Science, Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY, 11030, USA
- Clinical Digital Solutions, Northwell Health, New Hyde Park, NY, 11042, USA
- Donald and Barbara Zucker School of Medicine at Hofstra/Northwell, Northwell Health, Hempstead, NY, 11549, USA
| | - Theodoros P Zanos
- Institute of Health System Science, Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY, 11030, USA.
- Institute of Bioelectronic Medicine, Feinstein Institutes for Medical Research, Northwell Health, Manhasset, NY, 11030, USA.
- Donald and Barbara Zucker School of Medicine at Hofstra/Northwell, Northwell Health, Hempstead, NY, 11549, USA.
| |
Collapse
|
39
|
Lotterhos KE, Fitzpatrick MC, Blackmon H. Simulation Tests of Methods in Evolution, Ecology, and Systematics: Pitfalls, Progress, and Principles. ANNUAL REVIEW OF ECOLOGY, EVOLUTION, AND SYSTEMATICS 2022; 53:113-136. [PMID: 38107485 PMCID: PMC10723108 DOI: 10.1146/annurev-ecolsys-102320-093722] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/19/2023]
Abstract
Complex statistical methods are continuously developed across the fields of ecology, evolution, and systematics (EES). These fields, however, lack standardized principles for evaluating methods, which has led to high variability in the rigor with which methods are tested, a lack of clarity regarding their limitations, and the potential for misapplication. In this review, we illustrate the common pitfalls of method evaluations in EES, the advantages of testing methods with simulated data, and best practices for method evaluations. We highlight the difference between method evaluation and validation and review how simulations, when appropriately designed, can refine the domain in which a method can be reliably applied. We also discuss the strengths and limitations of different evaluation metrics. The potential for misapplication of methods would be greatly reduced if funding agencies, reviewers, and journals required principled method evaluation.
Collapse
Affiliation(s)
- Katie E Lotterhos
- Department of Marine and Environmental Sciences, Northeastern University, Nahant, Massachusetts, USA
| | - Matthew C Fitzpatrick
- Appalachian Lab, University of Maryland Center for Environmental Science, Frostburg, Maryland, USA
| | - Heath Blackmon
- Department of Biology, Texas A&M University, College Station, Texas, USA
| |
Collapse
|
40
|
Hysen L, Nayeri D, Cushman S, Wan HY. Background sampling for multi-scale ensemble habitat selection modeling: Does the number of points matter? ECOL INFORM 2022. [DOI: 10.1016/j.ecoinf.2022.101914] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
|
41
|
Cicuttin A, Morales IR, Crespo ML, Carrato S, García LG, Molina RS, Valinoti B, Folla Kamdem J. A Simplified Correlation Index for Fast Real-Time Pulse Shape Recognition. SENSORS (BASEL, SWITZERLAND) 2022; 22:7697. [PMID: 36298048 PMCID: PMC9607046 DOI: 10.3390/s22207697] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Revised: 10/01/2022] [Accepted: 10/04/2022] [Indexed: 06/16/2023]
Abstract
A simplified correlation index is proposed to be used in real-time pulse shape recognition systems. This index is similar to the classic Pearson's correlation coefficient, but it can be efficiently implemented in FPGA devices with far fewer logic resources and excellent performance. Numerical simulations with synthetic data and comparisons with the Pearson's correlation show the suitability of the proposed index in applications such as the discrimination and counting of pulses with a predefined shape. Superior performance is evident in signal-to-noise ratio scenarios close to unity. FPGA implementation of Person's method and the proposed correlation index have been successfully tested and the main results are summarized.
Collapse
Affiliation(s)
- Andres Cicuttin
- Multidisciplinary Laboratory, The Abdus Salam International Centre for Theoretical Physics (ICTP), 34151 Trieste, Italy
| | - Iván René Morales
- Multidisciplinary Laboratory, The Abdus Salam International Centre for Theoretical Physics (ICTP), 34151 Trieste, Italy
- Dipartimento di Ingegneria e Architettura, Università degli Studi di Trieste (UNITS), 34127 Trieste, Italy
| | - Maria Liz Crespo
- Multidisciplinary Laboratory, The Abdus Salam International Centre for Theoretical Physics (ICTP), 34151 Trieste, Italy
| | - Sergio Carrato
- Dipartimento di Ingegneria e Architettura, Università degli Studi di Trieste (UNITS), 34127 Trieste, Italy
| | - Luis Guillermo García
- Multidisciplinary Laboratory, The Abdus Salam International Centre for Theoretical Physics (ICTP), 34151 Trieste, Italy
| | - Romina Soledad Molina
- Multidisciplinary Laboratory, The Abdus Salam International Centre for Theoretical Physics (ICTP), 34151 Trieste, Italy
- Dipartimento di Ingegneria e Architettura, Università degli Studi di Trieste (UNITS), 34127 Trieste, Italy
| | - Bruno Valinoti
- Multidisciplinary Laboratory, The Abdus Salam International Centre for Theoretical Physics (ICTP), 34151 Trieste, Italy
- Dipartimento di Ingegneria e Architettura, Università degli Studi di Trieste (UNITS), 34127 Trieste, Italy
| | - Jerome Folla Kamdem
- Multidisciplinary Laboratory, The Abdus Salam International Centre for Theoretical Physics (ICTP), 34151 Trieste, Italy
- Department of Physics, University of Yaoundé I, P.O. Box 812, Yaoundé 222, Cameroon
| |
Collapse
|
42
|
Jarnevich CS, Sofaer HR, Belamaric P, Engelstad P. Regional models do not outperform continental models for invasive species. NEOBIOTA 2022. [DOI: 10.3897/neobiota.77.86364] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
Aim: Species distribution models can guide invasive species prevention and management by characterizing invasion risk across space. However, extrapolation and transferability issues pose challenges for developing useful models for invasive species. Previous work has emphasized the importance of including all available occurrences in model estimation, but managers attuned to local processes may be skeptical of models based on a broad spatial extent if they suspect the captured responses reflect those of other regions where data are more numerous. We asked whether species distribution models for invasive plants performed better when developed at national versus regional extents.
Location: Continental United States.
Methods: We developed ensembles of species distribution models trained nationally, on sagebrush habitat, or on sagebrush habitat within three ecoregions (Great Basin, eastern sagebrush, and Great Plains) for nine invasive plants of interest for early detection and rapid response at local or regional scales. We compared the performance of national versus regional models using spatially independent withheld test data from each of the three ecoregions.
Results: We found that models trained using a national spatial extent tended to perform better than regionally trained models. Regional models did not outperform national ones even when considerable occurrence data were available for model estimation within the focal region. Information was often unavailable to fit informative regional models precisely in those areas of greatest interest for early detection and rapid response.
Main conclusions: Habitat suitability models for invasive plant species trained at a continental extent can reduce extrapolation while maximizing information on species’ responses to environmental variation. Standard modeling methods can capture spatially varying limiting factors, while regional or hierarchical models may only be advantageous when populations differ in their responses to environmental conditions, a condition expected to be relatively rare at the expanding boundaries of invasive species’ distributions.
Collapse
|
43
|
Perrot B, Hardouin JB, Thiabaud E, Saillard A, Grall-Bronnec M, Challet-Bouju G. Development and validation of a prediction model for online gambling problems based on players' account data. J Behav Addict 2022; 11:874-889. [PMID: 36125924 PMCID: PMC9872531 DOI: 10.1556/2006.2022.00063] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/05/2021] [Revised: 03/03/2022] [Accepted: 08/13/2022] [Indexed: 02/03/2023] Open
Abstract
BACKGROUND AND AIMS Gambling disorder is characterized by problematic gambling behavior that causes significant problems and distress. This study aimed to develop and validate a predictive model for screening online problem gamblers based on players' account data. METHODS Two random samples of French online gamblers in skill-based (poker, horse race betting and sports betting, n = 8,172) and pure chance games (scratch games and lotteries, n = 5,404) answered an online survey and gambling tracking data were retrospectively collected for the participants. The survey included age and gender, gambling habits, and the Problem Gambling Severity Index (PGSI). We used machine learning algorithms to predict the PGSI categories with gambling tracking data. We internally validated the prediction models in a leave-out sample. RESULTS When predicting gambling problems binary based on each PGSI threshold (1 for low-risk gambling, 5 for moderate-risk gambling and 8 for problem gambling), the predictive performances were good for the model for skill-based games (AUROCs from 0.72 to 0.82), but moderate for the model for pure chance games (AUROCs from 0.63 to 0.76, with wide confidence intervals) due to the lower frequency of problem gambling in this sample. When predicting the four PGSI categories altogether, performances were good for identifying extreme categories (non-problem and problem gamblers) but poorer for intermediate categories (low-risk and moderate-risk gamblers), whatever the type of game. CONCLUSIONS We developed an algorithm for screening online problem gamblers, excluding online casino gamblers, that could enable the setting of prevention measures for the most vulnerable gamblers.
Collapse
Affiliation(s)
- Bastien Perrot
- Nantes Université, Univ Tours, CHU Nantes, CHU Tours, INSERM, MethodS in Patients Centered Outcomes and HEalth ResEarch, SPHERE, F-44000, Nantes, France,Nantes Université, CHU Nantes, Biostatistics and Methodology Unit, Department of Clinical Research and Innovation, F-44000, Nantes, France
| | - Jean-Benoit Hardouin
- Nantes Université, Univ Tours, CHU Nantes, CHU Tours, INSERM, MethodS in Patients Centered Outcomes and HEalth ResEarch, SPHERE, F-44000, Nantes, France,Nantes Université, CHU Nantes, Biostatistics and Methodology Unit, Department of Clinical Research and Innovation, F-44000, Nantes, France
| | - Elsa Thiabaud
- Nantes Université, CHU Nantes, UIC Psychiatrie et Santé Mentale, F-44000, Nantes, France
| | - Anaïs Saillard
- Nantes Université, CHU Nantes, UIC Psychiatrie et Santé Mentale, F-44000, Nantes, France
| | - Marie Grall-Bronnec
- Nantes Université, Univ Tours, CHU Nantes, CHU Tours, INSERM, MethodS in Patients Centered Outcomes and HEalth ResEarch, SPHERE, F-44000, Nantes, France,Nantes Université, CHU Nantes, UIC Psychiatrie et Santé Mentale, F-44000, Nantes, France
| | - Gaëlle Challet-Bouju
- Nantes Université, Univ Tours, CHU Nantes, CHU Tours, INSERM, MethodS in Patients Centered Outcomes and HEalth ResEarch, SPHERE, F-44000, Nantes, France,Nantes Université, CHU Nantes, UIC Psychiatrie et Santé Mentale, F-44000, Nantes, France,Corresponding author. Tel.:+33(0) 2 40 84 76 20. E-mail:
| |
Collapse
|
44
|
Saak S, Huelsmeier D, Kollmeier B, Buhl M. A flexible data-driven audiological patient stratification method for deriving auditory profiles. Front Neurol 2022; 13:959582. [PMID: 36188360 PMCID: PMC9520582 DOI: 10.3389/fneur.2022.959582] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2022] [Accepted: 08/11/2022] [Indexed: 11/13/2022] Open
Abstract
For characterizing the complexity of hearing deficits, it is important to consider different aspects of auditory functioning in addition to the audiogram. For this purpose, extensive test batteries have been developed aiming to cover all relevant aspects as defined by experts or model assumptions. However, as the assessment time of physicians is limited, such test batteries are often not used in clinical practice. Instead, fewer measures are used, which vary across clinics. This study aimed at proposing a flexible data-driven approach for characterizing distinct patient groups (patient stratification into auditory profiles) based on one prototypical database (N = 595) containing audiogram data, loudness scaling, speech tests, and anamnesis questions. To further maintain the applicability of the auditory profiles in clinical routine, we built random forest classification models based on a reduced set of audiological measures which are often available in clinics. Different parameterizations regarding binarization strategy, cross-validation procedure, and evaluation metric were compared to determine the optimum classification model. Our data-driven approach, involving model-based clustering, resulted in a set of 13 patient groups, which serve as auditory profiles. The 13 auditory profiles separate patients within certain ranges across audiological measures and are audiologically plausible. Both a normal hearing profile and profiles with varying extents of hearing impairments are defined. Further, a random forest classification model with a combination of a one-vs.-all and one-vs.-one binarization strategy, 10-fold cross-validation, and the kappa evaluation metric was determined as the optimal model. With the selected model, patients can be classified into 12 of the 13 auditory profiles with adequate precision (mean across profiles = 0.9) and sensitivity (mean across profiles = 0.84). The proposed approach, consequently, allows generating of audiologically plausible and interpretable, data-driven clinical auditory profiles, providing an efficient way of characterizing hearing deficits, while maintaining clinical applicability. The method should by design be applicable to all audiological data sets from clinics or research, and in addition be flexible to summarize information across databases by means of profiles, as well as to expand the approach toward aided measurements, fitting parameters, and further information from databases.
Collapse
|
45
|
Naing KM, Boonsang S, Chuwongin S, Kittichai V, Tongloy T, Prommongkol S, Dekumyoy P, Watthanakulpanich D. Automatic recognition of parasitic products in stool examination using object detection approach. PeerJ Comput Sci 2022; 8:e1065. [PMID: 36092001 PMCID: PMC9455271 DOI: 10.7717/peerj-cs.1065] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2022] [Accepted: 07/19/2022] [Indexed: 06/15/2023]
Abstract
BACKGROUND Object detection is a new artificial intelligence approach to morphological recognition and labeling parasitic pathogens. Due to the lack of equipment and trained personnel, artificial intelligence innovation for searching various parasitic products in stool examination will enable patients in remote areas of undeveloped countries to access diagnostic services. Because object detection is a developing approach that has been tested for its effectiveness in detecting intestinal parasitic objects such as protozoan cysts and helminthic eggs, it is suitable for use in rural areas where many factors supporting laboratory testing are still lacking. Based on the literatures, the YOLOv4-Tiny produces faster results and uses less memory with the support of low-end GPU devices. In comparison to the YOLOv3 and YOLOv3-Tiny models, this study aimed to propose an automated object detection approach, specifically the YOLOv4-Tiny model, for automatic recognition of intestinal parasitic products in stools. METHODS To identify protozoan cysts and helminthic eggs in human feces, the three YOLO approaches; YOLOv4-Tiny, YOLOv3, and YOLOv3-Tiny, were trained to recognize 34 intestinal parasitic classes using training of image dataset. Feces were processed using a modified direct smear method adapted from the simple direct smear and the modified Kato-Katz methods. The image dataset was collected from intestinal parasitic objects discovered during stool examination and the three YOLO models were trained to recognize the image datasets. RESULTS The non-maximum suppression technique and the threshold level were used to analyze the test dataset, yielding results of 96.25% precision and 95.08% sensitivity for YOLOv4-Tiny. Additionally, the YOLOv4-Tiny model had the best AUPRC performance of the three YOLO models, with a score of 0.963. CONCLUSION This study, to our knowledge, was the first to detect protozoan cysts and helminthic eggs in the 34 classes of intestinal parasitic objects in human stools.
Collapse
Affiliation(s)
- Kaung Myat Naing
- Center of Industrial Robot and Automation (CiRA), College of Advanced Manufacturing Innovation, King Mongkut’s Institute of Technology Ladkrabang, Bangkok, Thailand
| | - Siridech Boonsang
- Department of Electrical Engineering, School of Engineering, King Mongkut’s Institute of Technology Ladkrabang, Bangkok, Thailand
| | - Santhad Chuwongin
- Center of Industrial Robot and Automation (CiRA), College of Advanced Manufacturing Innovation, King Mongkut’s Institute of Technology Ladkrabang, Bangkok, Thailand
| | - Veerayuth Kittichai
- Faculty of Medicine, King Mongkut’s Institute of Technology Ladkrabang, Bangkok, Thailand
| | - Teerawat Tongloy
- Center of Industrial Robot and Automation (CiRA), College of Advanced Manufacturing Innovation, King Mongkut’s Institute of Technology Ladkrabang, Bangkok, Thailand
| | - Samrerng Prommongkol
- Mahidol Bangkok School of Tropical Medicine, Faculty of Tropical Medicine, Mahidol University, Bangkok, Thailand
| | - Paron Dekumyoy
- Department of Helminthology, Faculty of Tropical Medicine, Mahidol University, Bangkok, Thailand
| | - Dorn Watthanakulpanich
- Department of Helminthology, Faculty of Tropical Medicine, Mahidol University, Bangkok, Thailand
| |
Collapse
|
46
|
Shao M, Fan J, Ma J, Wang L. Identifying the natural reserve area of Cistanche salsa under the effects of multiple host plants and climate change conditions using a maximum entropy model in Xinjiang, China. FRONTIERS IN PLANT SCIENCE 2022; 13:934959. [PMID: 36061800 PMCID: PMC9432852 DOI: 10.3389/fpls.2022.934959] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/03/2022] [Accepted: 07/21/2022] [Indexed: 06/15/2023]
Abstract
Cistanche salsa (C. A. Mey.) G. Beck, a holoparasitic desert medicine plant with multiple hosts, is regarded as a potential future desert economic plant. However, as a result of excessive exploitation and poaching, its wild resources have become scarce. Thus, before developing its desert economic value, this plant has to be protected, and the identification of its natural reserve is currently the top priority. However, in previous nature reserve prediction studies, the influence of host plants has been overlooked, particularly in holoparasitic plants with multiple hosts. In this study, we sought to identify the conservation areas of wild C. salsa by considering multiple host-plant interactions and climate change conditions using the MaxEnt model. Additionally, a Principal Component Analysis (PCA) was used to reduce the autocorrelation between environmental variables. The effects of the natural distribution of the host plants in terms of natural distribution from the perspective of niche similarities and extrapolation detection were considered by filtering the most influential hosts: Krascheninnikovia ceratoides (Linnaeus), Gueldenstaedt, and Nitraria sibirica Pall. Additionally, the change trends in these hosts based on climate change conditions combined with the change trends in C. salsa were used to identify a core protection area of 126483.5 km2. In this article, we corrected and tried to avoid some of the common mistakes found in species distribution models based on the findings of previous research and fully considered the effects of host plants for multiple-host holoparasitic plants to provide a new perspective on the prediction of holoparasitic plants and to provide scientific zoning for biodiversity conservation in desert ecosystems. This research will hopefully serve as a significant reference for decision-makers.
Collapse
Affiliation(s)
- Minghao Shao
- National Engineering Technology Research Center for Desert-Oasis Ecological Construction, Xinjiang Institute of Ecology and Geography, Chinese Academy of Sciences, Urumqi, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Jinglong Fan
- National Engineering Technology Research Center for Desert-Oasis Ecological Construction, Xinjiang Institute of Ecology and Geography, Chinese Academy of Sciences, Urumqi, China
- University of Chinese Academy of Sciences, Beijing, China
- Taklimakan Desert Research Station, Xinjiang Institute of Ecology and Geography, Chinese Academy of Sciences, Korla, China
| | - Jinbiao Ma
- University of Chinese Academy of Sciences, Beijing, China
- State Key Laboratory of Desert and Oasis Ecology, Xinjiang Institute of Ecology and Geography, Chinese Academy of Sciences, Urumqi, China
| | - Lei Wang
- University of Chinese Academy of Sciences, Beijing, China
- State Key Laboratory of Desert and Oasis Ecology, Xinjiang Institute of Ecology and Geography, Chinese Academy of Sciences, Urumqi, China
| |
Collapse
|
47
|
Wade MW, Fisher M, Matich P. Comparison of two machine learning frameworks for predicting aggregatory behavior of sharks. J Appl Ecol 2022. [DOI: 10.1111/1365-2664.14273] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Affiliation(s)
- Michael W. Wade
- Data Science Institute Vanderbilt University Nashville TN USA
| | - Mark Fisher
- Texas Parks and Wildlife Department, Coastal Fisheries Division, Rockport Marine Science Laboratory Rockport TX USA
| | | |
Collapse
|
48
|
Qiu L, Chen J, Fan L, Sun L, Zheng C. High-resolution mapping of wildfire drivers in California based on machine learning. THE SCIENCE OF THE TOTAL ENVIRONMENT 2022; 833:155155. [PMID: 35413339 DOI: 10.1016/j.scitotenv.2022.155155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/29/2021] [Revised: 03/31/2022] [Accepted: 04/06/2022] [Indexed: 06/14/2023]
Abstract
Wildfires are important natural disturbances of ecosystems; however, they threaten the sustainability of ecosystems, climate and humans worldwide. It is vital to quantify and map the controlling drivers of wildfires for effective wildfire prediction and risk management. However, high-resolution mapping of wildfire drivers remains challenging. Here we established machine-learning (Random Forests) models using 23 climate and land surface variables as model inputs to reconstruct the spatial variability and seasonality of wildfire occurrence and extent in California. The importance of individual drivers was then quantified based on the Shapley value method. Thus, we provided spatially resolved maps of wildfire drivers at high resolutions up to 0.004° × 0.004°. The results indicated that precipitation and soil moisture are the major drivers dominating 37% of the total burnt area for large and extreme wildfires in summer and 63% in autumn, while elevation plays a major role for 15-58% of burnt areas in small wildfires in all seasons. Winds are also an important contributor to summer wildfires, accounting for 41% of large and extreme burnt areas. This study enhanced our knowledge of spatial variability of wildfire drivers across diverse landscapes in a fine-scale mapping, providing valuable perspectives and case studies for other regions of the world with frequently occurred wildfire.
Collapse
Affiliation(s)
- Linghua Qiu
- School of Environmental Science and Engineering, Southern University of Science and Technology, Shenzhen, China; Department of Civil Engineering, The University of Hong Kong, Hong Kong, China
| | - Ji Chen
- Department of Civil Engineering, The University of Hong Kong, Hong Kong, China.
| | - Linfeng Fan
- School of Environmental Science and Engineering, Southern University of Science and Technology, Shenzhen, China
| | - Liqun Sun
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Chunmiao Zheng
- School of Environmental Science and Engineering, Southern University of Science and Technology, Shenzhen, China; Shenzhen Institute of Sustainable Development, Southern University of Science and Technology, Shenzhen, China.
| |
Collapse
|
49
|
Quantifying congestion with player tracking data in Australian football. PLoS One 2022; 17:e0272657. [PMID: 35939497 PMCID: PMC9359552 DOI: 10.1371/journal.pone.0272657] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2022] [Accepted: 07/24/2022] [Indexed: 12/04/2022] Open
Abstract
With 36 players on the field, congestion in Australian football is an important consideration in identifying passing capacity, assessing fan enjoyment, and evaluating the effect of rule changes. However, no current method of objectively measuring congestion has been reported. This study developed two methods to measure congestion in Australian football. The first continuously determined the number of players situated within various regions of density at successive time intervals during a match using density-based clustering to group players as ‘primary’, ‘secondary’, or ‘outside’. The second method aimed to classify the level of congestion a player experiences (high, nearby, or low) when disposing of the ball using the Random Forest algorithm. Both approaches were developed using data from the 2019 and 2021 Australian Football League (AFL) regular seasons, considering contextual variables, such as field position and quarter. Player tracking data and match event data from professional male players were collected from 56 matches performed at a single stadium. The random forest model correctly classified disposals in high congestion (0.89 precision, 0.86 recall, 0.96 AUC) and low congestion (0.98 precision, 0.86 recall, 0.96 AUC) at a higher rate compared to disposals nearby congestion (0.72 precision, 0.88 recall, 0.88 AUC). Overall, both approaches enable a more efficient method to quantify the characteristics of congestion more effectively, thereby eliminating manual input from human coders and allowing for a future comparison between additional contextual variables, such as, seasons, rounds, and teams.
Collapse
|
50
|
Tomal JH, Welch WJ, Zamar RH. Robust ranking by ensembling of diverse models and assessment metrics. J STAT COMPUT SIM 2022. [DOI: 10.1080/00949655.2022.2093873] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
Affiliation(s)
- Jabed H. Tomal
- Department of Mathematics and Statistics, Thompson Rivers University, Kamloops, British Columbia, Canada
| | - William J. Welch
- Department of Statistics, The University of British Columbia, Vancouver, British Columbia, Canada
| | - Ruben H. Zamar
- Department of Statistics, The University of British Columbia, Vancouver, British Columbia, Canada
| |
Collapse
|