1
|
Goh PK, A Wong AWW, Suh DE, Bodalski EA, Rother Y, Hartung CM, Lefler EK. Emotional Dysregulation in Emerging Adult ADHD: A Key Consideration in Explaining and Classifying Impairment and Co-Occurring Internalizing Problems. J Atten Disord 2024; 28:1627-1641. [PMID: 39342440 DOI: 10.1177/10870547241284829] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 10/01/2024]
Abstract
OBJECTIVE The current study sought to clarify and harness the incremental validity of emotional dysregulation and unawareness (EDU) in emerging adulthood, beyond ADHD symptoms and with respect to concurrent classification of impairment and co-occurring problems, using machine learning techniques. METHOD Participants were 1,539 college students (Mage = 19.5, 69% female) with self-reported ADHD diagnoses from a multisite study who completed questionnaires assessing ADHD symptoms, EDU, and co-occurring problems. RESULTS Random forest analyses suggested EDU dimensions significantly improved model performance (ps < .001) in classifying participants with impairment and internalizing problems versus those without, with the resulting ADHD + EDU classification model demonstrating acceptable to excellent performance (except in classification of Work Impairment) in a distinct sample. Variable importance analyses suggested inattention sum scores and the Limited Access to Emotional Regulation Strategies EDU dimension as the most important features for facilitating model classification. CONCLUSION Results provided support for EDU as a key deficit in those with ADHD that, when present, helps explain ADHD's co-occurrence with impairment and internalizing problems. Continued application of machine learning techniques may facilitate actuarial classification of ADHD-related outcomes while also incorporating multiple measures.
Collapse
Affiliation(s)
| | | | - Da Eun Suh
- University of Hawai'i at Mānoa, Honolulu, USA
| | | | | | | | | |
Collapse
|
2
|
Zhang S, Wang J, Sun S, Zhang Q, Zhai Y, Wang X, Ge P, Shi Z, Zhang D. CT Angiography Radiomics Combining Traditional Risk Factors to Predict Brain Arteriovenous Malformation Rupture: a Machine Learning, Multicenter Study. Transl Stroke Res 2024; 15:784-794. [PMID: 37311939 DOI: 10.1007/s12975-023-01166-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Revised: 05/04/2023] [Accepted: 06/06/2023] [Indexed: 06/15/2023]
Abstract
This study aimed to develop a machine learning model for predicting brain arteriovenous malformation (bAVM) rupture using a combination of traditional risk factors and radiomics features. This multicenter retrospective study enrolled 586 patients with unruptured bAVMs from 2010 to 2020. All patients were grouped into the hemorrhage (n = 368) and non-hemorrhage (n = 218) groups. The bAVM nidus were segmented on CT angiography images using Slicer software, and radiomic features were extracted using Pyradiomics. The dataset included a training set and an independent testing set. The machine learning model was developed on the training set and validated on the testing set by merging numerous base estimators and a final estimator based on the stacking method. The area under the receiver operating characteristic (ROC) curve, precision, and the f1 score were evaluated to determine the performance of the model. A total of 1790 radiomics features and 8 traditional risk factors were contained in the original dataset, and 241 features remained for model training after L1 regularization filtering. The base estimator of the ensemble model was Logistic Regression, whereas the final estimator was Random Forest. In the training set, the area under the ROC curve of the model was 0.982 (0.967-0.996) and 0.893 (0.826-0.960) in the testing set. This study indicated that radiomics features are a valuable addition to traditional risk factors for predicting bAVM rupture. In the meantime, ensemble learning can effectively improve the performance of a prediction model.
Collapse
Affiliation(s)
- Shaosen Zhang
- Department of Neurosurgery, Beijing Hospital, National Center of Gerontology; Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing, China
| | - Junjie Wang
- Department of Neurosurgery, Beijing Hospital, National Center of Gerontology; Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing, China
| | - Shengjun Sun
- Department of Radiology, Beijing Tiantan Hospital, Capital Medical University, Beijing, China
| | - Qian Zhang
- Department of Neurosurgery, Beijing Tiantan Hospital, Capital Medical University, Beijing, China
| | - Yuanren Zhai
- Department of Neurosurgery, Beijing Tiantan Hospital, Capital Medical University, Beijing, China
| | - Xiaochen Wang
- Department of Radiology, Beijing Tiantan Hospital, Capital Medical University, Beijing, China
| | - Peicong Ge
- Department of Neurosurgery, Beijing Tiantan Hospital, Capital Medical University, Beijing, China
| | - Zhiyong Shi
- Department of Neurosurgery, Nanjing Drum Tower Hospital, The Affiliated Hospital of Nanjing University Medical School, Nanjing, China
| | - Dong Zhang
- Department of Neurosurgery, Beijing Hospital, National Center of Gerontology; Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing, China.
| |
Collapse
|
3
|
Yang Y, Zhu DZ, Loewen MR, Ahmed SS, Zhang W, Yan H, van Duin B, Mahmood K. Evaluation of pollutant removal efficiency of urban stormwater wet ponds and the application of machine learning algorithms. THE SCIENCE OF THE TOTAL ENVIRONMENT 2023; 905:167119. [PMID: 37717762 DOI: 10.1016/j.scitotenv.2023.167119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/27/2023] [Revised: 09/12/2023] [Accepted: 09/14/2023] [Indexed: 09/19/2023]
Abstract
Wet ponds have been extensively used for controlling stormwater pollutants, such as sediment and nutrients, in urban watersheds. The removal of pollutants relies on a combination of physical, chemical, and biological processes. It is crucial to assess the performance of wet ponds in terms of removal efficiency and develop an effective modeling scheme for removal efficiency prediction to optimize water quality management. To achieve this, a two-year field program was conducted at two wet ponds in Calgary, Alberta, Canada to evaluate the wet ponds' performance. Additionally, machine learning (ML) algorithms have been shown to provide promising predictions in datasets with intricate interactions between variables. In this study, the generalized linear model (GLM), partial least squares (PLS) regression, support vector machine (SVM), random forest (RF), and K-nearest neighbors (KNN) were applied to predict the outflow concentrations of three key pollutants: total suspended solids (TSS), total nitrogen (TN), and total phosphorus (TP). Generally, the concentrations of inflow pollutants in the two study ponds are highly variable, and a wide range of removal efficiencies are observed. The results indicate that the concentrations of TSS, TN, and TP decrease significantly from the inlet to outlet of the ponds. Meanwhile, inflow concentration, rainfall characteristics, and wind are important indicators of pond removal efficiency. In addition, ML algorithms can be an effective approach for predicting outflow water quality: PLS, GLM, and SVM have shown strong potential to capture the dynamic interactions in wet ponds and predict the outflow concentration. This study highlights the complexity of pollutant removal dynamics in wet ponds and demonstrates the potential of data-driven outflow water quality prediction.
Collapse
Affiliation(s)
- Yang Yang
- Department of Civil and Environmental Engineering, University of Alberta, Edmonton, AB T6G 1H9, Canada
| | - David Z Zhu
- Department of Civil and Environmental Engineering, University of Alberta, Edmonton, AB T6G 1H9, Canada; School of Civil and Environmental Engineering, Ningbo University, Zhejiang 315211, China.
| | - Mark R Loewen
- Department of Civil and Environmental Engineering, University of Alberta, Edmonton, AB T6G 1H9, Canada
| | - Sherif S Ahmed
- Department of Civil and Environmental Engineering, University of Alberta, Edmonton, AB T6G 1H9, Canada
| | - Wenming Zhang
- Department of Civil and Environmental Engineering, University of Alberta, Edmonton, AB T6G 1H9, Canada
| | - Haibin Yan
- Department of Civil and Environmental Engineering, University of Alberta, Edmonton, AB T6G 1H9, Canada
| | - Bert van Duin
- Department of Civil and Environmental Engineering, University of Alberta, Edmonton, AB T6G 1H9, Canada; City & Regional Planning, City of Calgary, Calgary, AB T2P 2M5, Canada
| | - Khizar Mahmood
- Climate & Environment, City of Calgary, Calgary, AB T2P 2M5, Canada
| |
Collapse
|
4
|
Doyle JM, Hill RA, Leibowitz SG, Ebersole JL. Random Forest models to estimate bankfull and low flow channel widths and depths across the conterminous United States. JOURNAL OF THE AMERICAN WATER RESOURCES ASSOCIATION 2023; 59:1099-1114. [PMID: 37941964 PMCID: PMC10631553 DOI: 10.1111/1752-1688.13116] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/01/2022] [Accepted: 02/18/2023] [Indexed: 11/10/2023]
Abstract
Channel dimensions (width and depth) at varying flows influence a host of instream ecological processes, as well as habitat and biotic features; they are a major consideration in stream habitat restoration and instream flow assessments. Models of widths and depths are often used to assess climate change vulnerability, develop endangered species recovery plans, and model water quality. However, development and application of such models require specific skillsets and resources. To facilitate acquisition of such estimates, we created a dataset of modeled channel dimensions for perennial stream segments across the conterminous U.S. We used random forest models to predict wetted width, thalweg depth, bankfull width, and bankfull depth from several thousand field measurements of the National Rivers and Streams Assessment. Observed channel widths varied from <5 m to >2000 m and depths varied from <2 m to >125 m. Metrics of watershed area, runoff, slope, land use, and more were used as model predictors. The models had high pseudo R-squared values (0.70 to 0.91) and median absolute errors within ±6% to ±21% of the interquartile range of measured values across ten stream orders. Predicted channel dimensions can be joined to 1.1 million stream segments of the 1:100K resolution National Hydrography Dataset Plus (version 2.1). These predictions, combined with a rapidly growing body of nationally available data, will further enhance our ability to study and protect aquatic resources.
Collapse
Affiliation(s)
- Jessie M Doyle
- Oak Ridge Institute for Science and Education c/o Center for Public Health and Environmental Assessment, Pacific Ecological Systems Division (Doyle), U.S. Environmental Protection Agency, Corvallis, Oregon USA; Center for Public Health and Environmental Assessment, Pacific Ecological Systems Division (Hill, Leibowitz, Ebersole), U.S. Environmental Protection Agency, Corvallis, Oregon, USA
| | - Ryan A Hill
- Oak Ridge Institute for Science and Education c/o Center for Public Health and Environmental Assessment, Pacific Ecological Systems Division (Doyle), U.S. Environmental Protection Agency, Corvallis, Oregon USA; Center for Public Health and Environmental Assessment, Pacific Ecological Systems Division (Hill, Leibowitz, Ebersole), U.S. Environmental Protection Agency, Corvallis, Oregon, USA
| | - Scott G Leibowitz
- Oak Ridge Institute for Science and Education c/o Center for Public Health and Environmental Assessment, Pacific Ecological Systems Division (Doyle), U.S. Environmental Protection Agency, Corvallis, Oregon USA; Center for Public Health and Environmental Assessment, Pacific Ecological Systems Division (Hill, Leibowitz, Ebersole), U.S. Environmental Protection Agency, Corvallis, Oregon, USA
| | - Joseph L Ebersole
- Oak Ridge Institute for Science and Education c/o Center for Public Health and Environmental Assessment, Pacific Ecological Systems Division (Doyle), U.S. Environmental Protection Agency, Corvallis, Oregon USA; Center for Public Health and Environmental Assessment, Pacific Ecological Systems Division (Hill, Leibowitz, Ebersole), U.S. Environmental Protection Agency, Corvallis, Oregon, USA
| |
Collapse
|
5
|
Langenbucher A, Szentmáry N, Cayless A, Wendelstein J, Hoffmann P. Preconditioning of clinical data for intraocular lens formula constant optimisation using Random Forest Quantile Regression Trees. Z Med Phys 2023:S0939-3889(22)00129-5. [PMID: 36813595 DOI: 10.1016/j.zemedi.2022.11.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2022] [Revised: 10/31/2022] [Accepted: 11/21/2022] [Indexed: 02/22/2023]
Abstract
PURPOSE To implement a fully data driven strategy for identifying outliers in clinical datasets used for formula constant optimisation, in order to achieve proper formula predicted refraction after cataract surgery, and to assess the capabilities of this outlier detection method. METHODS 2 clinical datasets (DS1/DS2: N = 888/403) of eyes treated with a monofocal aspherical intraocular lens (Hoya XY1/Johnson&Johnson Vision Z9003) containing preoperative biometric data, power of the lens implant and postoperative spherical equivalent (SEQ) were transferred to us for formula constant optimisation. Original datasets were used to generate baseline formula constants. A random forest quantile regression algorithm was set up using bootstrap resampling with replacement. Quantile regression trees were grown and the 25% and 75% quantile, and the interquartile range were extracted from SEQ and formula predicted refraction REF for the SRKT, Haigis and Castrop formulae. Fences were defined from the quantiles and data points outside the fences were marked and removed as outliers before recalculating the formula constants. RESULTS NB = 1000 bootstrap samples were derived from both datasets, and random forest quantile regression trees were grown to model SEQ versus REF and to estimate the median and 25% and 75% quantiles. The fence boundaries were defined as being from 25% quantile - 1.5·IQR to 75% quantile + 1.5·IQR, with data points outside the fence being marked as outliers. In total, for DS1 and DS2, 25/27/32 and 4/5/4 data points were identified as outliers for the SRKT/Haigis/Castrop formulae respectively. The respective root mean squared formula prediction errors for the three formulae were slightly reduced from: 0.4370 dpt;0.4449 dpt/0.3625 dpt;0.4056 dpt/and 0.3376 dpt;0.3532 dpt to: 0.4271 dpt;0.4348 dpt/0.3528 dpt;0.3952 dpt/0.3277 dpt;0.3432 dpt for DS1;DS2. CONCLUSION We were able to prove that with random forest quantile regression trees a fully data driven outlier identification strategy acting in the response space is achievable. In a real life scenario this strategy has to be complemented by an outlier identification method acting in the parameter space for a proper qualification of datasets prior to formula constant optimisation.
Collapse
Affiliation(s)
- Achim Langenbucher
- Department of Experimental Ophthalmology, Saarland University, Homburg/Saar, Germany.
| | - Nóra Szentmáry
- Dr. Rolf M. Schwiete Center for Limbal Stem Cell and Aniridia Research, Saarland University, Homburg/Saar, Germany; Department of Ophthalmology, Semmelweis-University, Budapest, Hungary
| | - Alan Cayless
- School of Physical Sciences, The Open University, Milton Keynes, United Kingdom
| | - Jascha Wendelstein
- Department of Experimental Ophthalmology, Saarland University, Homburg/Saar, Germany; Department of Ophthalmology, Johannes Kepler University Linz, Austria
| | - Peter Hoffmann
- Augen- und Laserklinik Castrop-Rauxel, Castrop-Rauxel, Germany
| |
Collapse
|
6
|
Mallya G, Hantush MM, Govindaraju RS. A Machine Learning Approach to Predict Watershed Health Indices for Sediments and Nutrients at Ungauged Basins. WATER 2023; 15:1-23. [PMID: 37309416 PMCID: PMC10259765 DOI: 10.3390/w15030586] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Effective water quality management and reliable environmental modeling depend on the availability, size, and quality of water quality (WQ) data. Observed stream water quality data are usuallEEy sparse in both time and space. Reconstruction of water quality time series using surrogate variables such as streamflow have been used to evaluate risk metrics such as reliability, resilience, vulnerability, and watershed health (WH) but only at gauged locations. Estimating these indices for ungauged watersheds has not been attempted because of the high-dimensional nature of the potential predictor space. In this study, machine learning (ML) models, namely random forest regression, AdaBoost, gradient boosting machines, and Bayesian ridge regression (along with an ensemble model), were evaluated to predict watershed health and other risk metrics at ungauged hydrologic unit code 10 (HUC-10) basins using watershed attributes, long-term climate data, soil data, land use and land cover data, fertilizer sales data, and geographic information as predictor variables. These ML models were tested over the Upper Mississippi River Basin, the Ohio River Basin, and the Maumee River Basin for water quality constituents such as suspended sediment concentration, nitrogen, and phosphorus. Random forest, AdaBoost, and gradient boosting regressors typically showed a coefficient of determination R 2 > 0.8 for suspended sediment concentration and nitrogen during the testing stage, while the ensemble model exhibited R 2 > 0.95 . Watershed health values with respect to suspended sediments and nitrogen predicted by all ML models including the ensemble model were lower for areas with larger agricultural land use, moderate for areas with predominant urban land use, and higher for forested areas; the trained ML models adequately predicted WH in ungauged basins. However, low WH values (with respect to phosphorus) were predicted at some basins in the Upper Mississippi River Basin that had dominant forest land use. Results suggest that the proposed ML models provide robust estimates at ungauged locations when sufficient training data are available for a WQ constituent. ML models may be used as quick screening tools by decision makers and water quality monitoring agencies for identifying critical source areas or hotspots with respect to different water quality constituents, even for ungauged watersheds.
Collapse
Affiliation(s)
- Ganeshchandra Mallya
- Lyles School of Civil Engineering, Purdue University, West Lafayette, IN 47907, USA
| | - Mohamed M Hantush
- U.S. EPA Center for Environmental Solutions and Emergency Response, 26 West Martin Luther King Dr., Cincinnati, OH 45268, USA
| | - Rao S Govindaraju
- Lyles School of Civil Engineering, Purdue University, West Lafayette, IN 47907, USA
| |
Collapse
|
7
|
Zhao C, Yang J, Shi H, Chen T. Transforming approach for assessing the performance and applicability of rice arsenic contamination forecasting models based on regression and probability methods. JOURNAL OF HAZARDOUS MATERIALS 2022; 424:127375. [PMID: 34634707 DOI: 10.1016/j.jhazmat.2021.127375] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/07/2021] [Revised: 09/14/2021] [Accepted: 09/26/2021] [Indexed: 06/13/2023]
Abstract
Probability models are preferred over regression models recently in contamination evaluation but lacking proper performance comparison between two model types. Linear regression, logistic regression, XGBoost-based regression, and probability models were built considering soil arsenic and certain soil physicochemical properties of 287 samples to predict arsenic in rice grains. The outputs of all models were binarily classified uniformly for comparison. The complex algorithm-based models--XGBoost-based regression (R2 =0.046 ± 0.036) and probability models (cross-entropy = 0.697 ± 0.020)-did not surpass the simple linear regression (R2 =0.046 ± 0.031) and logistic regression models (cross-entropy = 0.694 ± 0.021). Accuracy, sensitivity, specificity, precision, and F1 score showed that the probability models exhibit no advantage on regression models, although the indicators above did not serve as proper scoring rules for the probability model. When discretizing the contaminant concentration in grains for probabilistic modeling, the limit concentration was considered as the splitting point but not the structure of the datasets, which would reduce the inherent advantage of the probability model. When predicting the contamination of crops, the probability model cannot eliminate the regression model, and simple but robust algorithm-based models are preferred when the quality and quantity of the dataset are undesirable.
Collapse
Affiliation(s)
- Chen Zhao
- Center for Environmental Remediation, Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, 11 A Datun Road, Beijing 100101, China.
| | - Jun Yang
- Center for Environmental Remediation, Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, 11 A Datun Road, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China.
| | - Huading Shi
- Technical Centre for Soil, Agricultural and Rural Ecology and Environment, Ministry of Ecology and Environment, Beijing 100012, China.
| | - Tongbin Chen
- Center for Environmental Remediation, Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, 11 A Datun Road, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China.
| |
Collapse
|
8
|
Lin J, Compton JE, Hill RA, Herlihy A, Sabo RD, Brooks JR, Weber M, Pickard B, Paulsen S, Stoddard JL. Context is Everything: Interacting Inputs and Landscape Characteristics Control Stream Nitrogen. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2021; 55:7890-7899. [PMID: 34060819 PMCID: PMC8673309 DOI: 10.1021/acs.est.0c07102] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
To understand the environmental and anthropogenic drivers of stream nitrogen (N) concentrations across the conterminous US, we combined summer low-flow data from 4997 streams with watershed information across three survey periods (2000-2014) of the US EPA's National Rivers and Streams Assessment. Watershed N inputs explained 51% of the variation in log-transformed stream total N (TN) concentrations. Both N source and input rates influenced stream NO3/TN ratios and N concentrations. Streams dominated by oxidized N forms (NO3/TN ratio > 0.50) were more strongly responsive to the N input rate compared to streams dominated by other N forms. NO3 proportional contribution increased with N inputs, supporting N saturation-enhanced NO3 export to aquatic ecosystems. By combining information about N inputs with climatic and landscape factors, random forest models of stream N concentrations explained 70, 58, and 60% of the spatial variation in stream concentrations of TN, dissolved inorganic N, and total organic N, respectively. The strength and direction of relationships between watershed drivers and stream N concentrations and forms varied with N input intensity. Model results for high N input watersheds not only indicated potential contributions from contaminated groundwater to high stream N concentrations but also the mitigating role of wetlands.
Collapse
Affiliation(s)
- Jiajia Lin
- US EPA, Office of Research and Development, Center for Public Health and Environmental Assessment, Pacific Ecological Systems Division, Corvallis, OR 97333
- Oak Ridge Institute for Science and Education, Corvallis, OR 97333
| | - Jana E. Compton
- US EPA, Office of Research and Development, Center for Public Health and Environmental Assessment, Pacific Ecological Systems Division, Corvallis, OR 97333
| | - Ryan A. Hill
- US EPA, Office of Research and Development, Center for Public Health and Environmental Assessment, Pacific Ecological Systems Division, Corvallis, OR 97333
| | - Alan Herlihy
- US EPA, Office of Research and Development, Center for Public Health and Environmental Assessment, Pacific Ecological Systems Division, Corvallis, OR 97333
- Oregon State University, Department of Fisheries and Wildlife, Corvallis, OR 97333
| | - Robert D. Sabo
- US EPA, Office of Research and Development, Center for Public Health and Environmental Assessment, HEEAD, Washington, DC 20004
| | - J. Renée Brooks
- US EPA, Office of Research and Development, Center for Public Health and Environmental Assessment, Pacific Ecological Systems Division, Corvallis, OR 97333
| | - Marc Weber
- US EPA, Office of Research and Development, Center for Public Health and Environmental Assessment, Pacific Ecological Systems Division, Corvallis, OR 97333
| | | | - Steve Paulsen
- US EPA, Office of Research and Development, Center for Public Health and Environmental Assessment, Pacific Ecological Systems Division, Corvallis, OR 97333
| | - John L. Stoddard
- US EPA, Office of Research and Development, Center for Public Health and Environmental Assessment, Pacific Ecological Systems Division, Corvallis, OR 97333
| |
Collapse
|
9
|
Liu X, Su T, Hsu YMS, Yu H, Yang HS, Jiang L, Zhao Z. Rapid identification and discrimination of methicillin-resistant Staphylococcus aureus strains via matrix-assisted laser desorption/ionization time-of-flight mass spectrometry. RAPID COMMUNICATIONS IN MASS SPECTROMETRY : RCM 2021; 35:e8972. [PMID: 33053243 DOI: 10.1002/rcm.8972] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/21/2020] [Revised: 08/30/2020] [Accepted: 10/08/2020] [Indexed: 06/11/2023]
Abstract
RATIONALE Methicillin-resistant Staphylococcus aureus (MRSA) is one of major clinical pathogens responsible for both hospital- and community-acquired infections worldwide. A delay in targeted antibiotic treatment contributes to longer hospitalization stay, higher costs, and increasing in-hospital mortality. Matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) has been integrated into the routine workflow for microbial identification over the past decade, and it has also shown promising functions in the detection of bacterial resistance. Therefore, we describe a rapid MALDI-TOF MS-based methodology for MRSA screening with machine-learning algorithms. METHODS A total of 452 clinical S. aureus isolates were included in this study, of which 194 were MRSA and 258 were methicillin-sensitive S. aureus (MSSA). The mass-to-charge ratio (m/z) features from MRSA and MSSA strains were binned and selected through Lasso regression. These features were then used to train a non-linear support vector machine (SVM) with radial basis function (RBF) kernels to evaluate the discrimination performance. The classifiers' accuracy, sensitivity, specificity, and the area under the receiver operating characteristic (ROC) curve (AUC) were evaluated and compared with those from the random forest (RF) model. RESULTS A total of 2601 unique spectral peaks of all isolates were identified and 38 m/z features were selected for the classifying model. The AUCs of the non-linear RBF-SVM model and the RF model were 0.89 and 0.87, respectively, and the accuracy ranged between 0.86 (RBF-SVM) and 0.82 (RF). CONCLUSIONS Our study demonstrates that MALDI-TOF MS coupled with machine-learning algorithms could be used to develop a rapid and easy-to-use method to discriminate MRSA from MSSA. Considering that this method is easy to implement in routine microbiology laboratories, it suggests a cost-effective and time-efficient alternative to conventional resistance detection in the future to improve clinical treatment.
Collapse
Affiliation(s)
- Xin Liu
- Department of Laboratory Medicine, Sichuan Provincial People's Hospital, University of Electronic Science and Technology of China, Chengdu, Sichuan 610072, China
| | - Taojunfeng Su
- Proteomics & Metabolomics Core Facility, Weill Cornell Medicine, New York, NY 10065, USA
| | - Yen-Michael S Hsu
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY 10065, USA
| | - Hua Yu
- Department of Laboratory Medicine, Sichuan Provincial People's Hospital, University of Electronic Science and Technology of China, Chengdu, Sichuan 610072, China
| | - He Sarina Yang
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY 10065, USA
| | - Li Jiang
- Department of Laboratory Medicine, Sichuan Provincial People's Hospital, University of Electronic Science and Technology of China, Chengdu, Sichuan 610072, China
| | - Zhen Zhao
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY 10065, USA
| |
Collapse
|
10
|
Liu H, Hitchcock DB, Samadi SZ. Spatio-temporal analysis of flood data from South Carolina. JOURNAL OF STATISTICAL DISTRIBUTIONS AND APPLICATIONS 2020. [DOI: 10.1186/s40488-020-00112-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
AbstractTo investigate the relationship between flood gage height and precipitation in South Carolina from 2012 to 2016, we built a conditional autoregressive (CAR) model using a Bayesian hierarchical framework. This approach allows the modelling of the main spatio-temporal properties of water height dynamics over multiple locations, accounting for the effect of river network, geomorphology, and forcing rainfall. In this respect, a proximity matrix based on watershed information was used to capture the spatial structure of gage height measurements in and around South Carolina. The temporal structure was handled by a first-order autoregressive term in the model. Several covariates, including the elevation of the sites and effects of seasonality, were examined, along with daily rainfall amount. A non-normal error structure was used to account for the heavy-tailed distribution of maximum gage heights. The proposed model captured some key features of the flood process such as seasonality and a stronger association between precipitation and flooding during summer season. The model is able to forecast short term flood gage height which is crucial for informed emergency decision. As a byproduct, we also developed a Python library to retrieve and handle environmental data provided by some main agencies in the United States. This library can be of general usefulness for studies requiring rainfall, flow, and geomorphological information over specific areas of the conterminous US.
Collapse
|
11
|
Yang J, Zhao C, Yang J, Wang J, Li Z, Wan X, Guo G, Lei M, Chen T. Discriminative algorithm approach to forecast Cd threshold exceedance probability for rice grain based on soil characteristics. ENVIRONMENTAL POLLUTION (BARKING, ESSEX : 1987) 2020; 261:114211. [PMID: 32113108 DOI: 10.1016/j.envpol.2020.114211] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/13/2019] [Revised: 01/30/2020] [Accepted: 02/16/2020] [Indexed: 06/10/2023]
Abstract
The relationship between cadmium (Cd) concentration in rice grains and the soil that they are cultivated in is highly uncertain due to the influence of soil properties, rice varieties, and other undetermined factors. In this study, we introduce the probability of exceeding the threshold to characterize this uncertainty and then, build a probabilistic forewarning model. Additionally, a number of associated factors have been used as parameters to improve model performance. Considering that the physicochemical properties and Cd concentration in the soil (Cdsoil) do not follow a normal distribution, and are not independent of each other, a discriminative algorithm, represented by a logistic regression (LR), performed better than generative algorithms, such as the naive Bayes and quadratic discriminant analysis models. The performance of the LR based model was found to be 0.5% better in the case of the univariate model (Cdsoil) and 4.1% better with a multivariate model (soil properties used as additional factors) (p < 0.01). The output of the LR based model predicted probabilities that were positively correlated to the true exceedance rate (R2 = 0.949,p < 0.01), within an exceedance threshold range of 0.1-0.4 mg kg-1 and a mean deviation of 5.75%. A sensitivity analysis showed that the effect of soil properties on the exceedance probability weakens with an increase in Cd concentration in rice grains. When the threshold is below 0.15 mg kg-1, soil pH strongly influences the exceedance probability. As the threshold increases, the influence of pH on the exceedance probability is gradually superseded. By quantifying the uncertainty regarding the relationship between Cd concentration in rice grains and soil, the discriminative algorithm-based probabilistic forecasting model offers a new way to assess Cd pollution in rice grown in contaminated paddy fields.
Collapse
Affiliation(s)
- Jun Yang
- Center for Environmental Remediation, Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing, 100101, China; University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Chen Zhao
- Center for Environmental Remediation, Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing, 100101, China; University of Chinese Academy of Sciences, Beijing, 100049, China.
| | - Junxing Yang
- Center for Environmental Remediation, Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing, 100101, China; University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Jingyun Wang
- Center for Environmental Remediation, Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing, 100101, China; University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Zhitao Li
- Technical Centre for Soil, Agriculture and Rural Ecology and Environment, Ministry of Ecology and Environment, Beijing, 100012, China
| | - Xiaoming Wan
- Center for Environmental Remediation, Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing, 100101, China; University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Guanghui Guo
- Center for Environmental Remediation, Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing, 100101, China; University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Mei Lei
- Center for Environmental Remediation, Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing, 100101, China; University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Tongbin Chen
- Center for Environmental Remediation, Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing, 100101, China; University of Chinese Academy of Sciences, Beijing, 100049, China
| |
Collapse
|
12
|
Mapping specific groundwater vulnerability to nitrate using random forest: case of Sais basin, Morocco. ACTA ACUST UNITED AC 2020. [DOI: 10.1007/s40808-020-00761-6] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
13
|
Meyer H, Reudenbach C, Wöllauer S, Nauss T. Importance of spatial predictor variable selection in machine learning applications – Moving from data reproduction to spatial prediction. Ecol Modell 2019. [DOI: 10.1016/j.ecolmodel.2019.108815] [Citation(s) in RCA: 109] [Impact Index Per Article: 21.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
14
|
Tang M, Hu P, Wang CF, Yu CQ, Sheng J, Ma SJ. Prediction Model of Cardiac Risk for Dental Extraction in Elderly Patients with Cardiovascular Diseases. Gerontology 2019; 65:591-598. [PMID: 31048587 DOI: 10.1159/000497424] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2018] [Accepted: 02/03/2019] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND With the rapidly increasing population of elderly people, dental extraction in elderly individuals with cardiovascular diseases (CVDs) has become quite common. The issue of how to assure the safety of elderly patients with CVDs undergoing dental extraction has perplexed dentists and internists for many years. And it is important to derive an appropriate risk prediction tool for this population. OBJECTIVES The aim of this retrospective, observational study was to establish and validate a prediction model based on the random forest (RF) algorithm for the risk of cardiac complications of dental extraction in elderly patients with CVDs. METHODS Between August 2017 and May 2018, a total of 603 patients who fulfilled the inclusion criteria were used to create a training set. An independent test set contained 230 patients between June 2018 and July 2018. Data regarding clinical parameters, laboratory tests, clinical examinations before dental extraction, and 1-week follow-up were retrieved. Predictors were identified by using logistic regression (LR) with penalized LASSO (least absolute shrinkage and selection operator) variable selection. Then, a prediction model was constructed based on the RF algorithm by using a 5-fold cross-validation method. RESULTS The training set, based on 603 participants, including 282 men and 321 women, had an average participant age of 72.38 ± 8.31 years. Using feature selection methods, 11 predictors for risk of cardiac complications were screened out. When the RF model was constructed, its overall classification accuracy was 0.82 at the optimal cutoff value of 18.5%. In comparison to the LR model, the RF model showed a superior predictive performance. The AUROC (area under the receiver operating characteristic curve) scores of the RF and LR models were 0.83 and 0.80, respectively, in the independent test set. The AUPRC (area under the precision-recall curve) scores of the RF and LR models were 0.56 and 0.35, respectively, in the independent test set. CONCLUSION The RF-based prediction model is expected to be applicable for preoperative clinical assessment for preventing cardiac complications in elderly patients with CVDs undergoing dental extraction. The findings may aid physicians and dentists in making more informed recommendations to prevent cardiac complications in this patient population.
Collapse
Affiliation(s)
- Min Tang
- Department of Geriatrics, Shanghai Ninth People's Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Ping Hu
- Department of Geriatrics, Shanghai Ninth People's Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Cao-Feng Wang
- Department of Geriatrics, Shanghai Ninth People's Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Chuang-Qi Yu
- Department of Oral Surgery, Ninth People's Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Jing Sheng
- Department of Geriatrics, Shanghai Ninth People's Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Shao-Jun Ma
- Department of Geriatrics, Shanghai Ninth People's Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China,
| |
Collapse
|
15
|
Mohammed H, Hameed IA, Seidu R. Comparative predictive modelling of the occurrence of faecal indicator bacteria in a drinking water source in Norway. THE SCIENCE OF THE TOTAL ENVIRONMENT 2018; 628-629:1178-1190. [PMID: 30045540 DOI: 10.1016/j.scitotenv.2018.02.140] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/13/2017] [Revised: 02/08/2018] [Accepted: 02/12/2018] [Indexed: 06/08/2023]
Abstract
Presently, concentrations of fecal indicator bacteria (FIB) in raw water sources are not known before water undergoes treatment, since analysis takes approximately 24h to produce results. Using data on water quality and environmental variables, models can be used to predict real time concentrations of FIB in raw water. This study evaluates the potentials of zero-inflated regression models (ZI), Random Forest regression model (RF) and adaptive neuro-fuzzy inference system (ANFIS) to predict the concentration of FIB in the raw water source of a water treatment plant in Norway. The ZI, RF and ANFIS faecal indicator bacteria predictive models were built using physico-chemical (pH, temperature, electrical conductivity, turbidity, color, and alkalinity) and catchment precipitation data from 2009 to 2015. The study revealed that pH, temperature, turbidity, and electrical conductivity in the raw water were the most significant factors associated with the concentration of FIB in the raw water source. Compared to the other models, the ANFIS model was superior (Mean Square Error=39.49, 0.35, 0.09, 0.23CFU/100ml respectively for coliform bacteria, E. coli, Intestinal enterococci and Clostridium perfringens) in predicting the variations of FIB in the raw water during model testing. However, the model was not capable of predicting low counts of FIB during both training and testing stages of the models. The ZI and RF models were more consistent when applied to testing data, and they predicted FIB concentrations that characterized the observed FIB concentrations. While these models might need further improvement, results of this study indicate that ZI and RF regression models have high prospects as tools for the real-time prediction of FIB in raw water sources for proactive microbial risk management in water treatment plants.
Collapse
Affiliation(s)
- Hadi Mohammed
- Water and Environmental Engineering Group, Institute of Marine Operations and Civil Engineering, Norway.
| | - Ibrahim A Hameed
- Dept. of ICT and Natural Sciences, Faculty of Information Technology and Electrical Engineering, Norwegian University of Science and Technology (NTNU) in Ålesund, Larsgårdsvegen 2, 6009 Ålesund, Norway
| | - Razak Seidu
- Water and Environmental Engineering Group, Institute of Marine Operations and Civil Engineering, Norway
| |
Collapse
|