1
|
Bonaccorso A, Ortis A, Musumeci T, Carbone C, Hussain M, Di Salvatore V, Battiato S, Pappalardo F, Pignatello R. Nose-to-Brain Drug Delivery and Physico-Chemical Properties of Nanosystems: Analysis and Correlation Studies of Data from Scientific Literature. Int J Nanomedicine 2024; 19:5619-5636. [PMID: 38882536 PMCID: PMC11179666 DOI: 10.2147/ijn.s452316] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Accepted: 03/12/2024] [Indexed: 06/18/2024] Open
Abstract
Background In the last few decades, nose-to-brain delivery has been investigated as an alternative route to deliver molecules to the Central Nervous System (CNS), bypassing the Blood-Brain Barrier. The use of nanotechnological carriers to promote drug transfer via this route has been widely explored. The exact mechanisms of transport remain unclear because different pathways (systemic or axonal) may be involved. Despite the large number of studies in this field, various aspects still need to be addressed. For example, what physicochemical properties should a suitable carrier possess in order to achieve this goal? To determine the correlation between carrier features (eg, particle size and surface charge) and drug targeting efficiency percentage (DTE%) and direct transport percentage (DTP%), correlation studies were performed using machine learning. Methods Detailed analysis of the literature from 2010 to 2021 was performed on Pubmed in order to build "NANOSE" database. Regression analyses have been applied to exploit machine-learning technology. Results A total of 64 research articles were considered for building the NANOSE database (102 formulations). Particle-based formulations were characterized by an average size between 150-200 nm and presented a negative zeta potential (ZP) from -10 to -25 mV. The most general-purpose model for the regression of DTP/DTE values is represented by Decision Tree regression, followed by K-Nearest Neighbors Regressor (KNeighbor regression). Conclusion A literature review revealed that nose-to-brain delivery has been widely investigated in neurodegenerative diseases. Correlation studies between the physicochemical properties of nanosystems (mean size and ZP) and DTE/DTP parameters suggest that ZP may be more significant than particle size for DTP/DTE predictability.
Collapse
Affiliation(s)
- Angela Bonaccorso
- Department of Drug and Health Sciences, University of Catania, Catania, Italy
- NANOMED–Research Centre for Nanomedicine and Pharmaceutical Nanotechnology, University of Catania, Catania, 95125, Italy
| | - Alessandro Ortis
- Department of Mathematics and Computer Science, University of Catania, Catania, Italy
| | - Teresa Musumeci
- Department of Drug and Health Sciences, University of Catania, Catania, Italy
- NANOMED–Research Centre for Nanomedicine and Pharmaceutical Nanotechnology, University of Catania, Catania, 95125, Italy
| | - Claudia Carbone
- Department of Drug and Health Sciences, University of Catania, Catania, Italy
- NANOMED–Research Centre for Nanomedicine and Pharmaceutical Nanotechnology, University of Catania, Catania, 95125, Italy
| | - Mazhar Hussain
- Department of Mathematics and Computer Science, University of Catania, Catania, Italy
| | | | - Sebastiano Battiato
- Department of Mathematics and Computer Science, University of Catania, Catania, Italy
| | - Francesco Pappalardo
- Department of Drug and Health Sciences, University of Catania, Catania, Italy
- NANOMED–Research Centre for Nanomedicine and Pharmaceutical Nanotechnology, University of Catania, Catania, 95125, Italy
| | - Rosario Pignatello
- Department of Drug and Health Sciences, University of Catania, Catania, Italy
- NANOMED–Research Centre for Nanomedicine and Pharmaceutical Nanotechnology, University of Catania, Catania, 95125, Italy
| |
Collapse
|
2
|
Khadem H, Nemat H, Elliott J, Benaissa M. In Vitro Glucose Measurement from NIR and MIR Spectroscopy: Comprehensive Benchmark of Machine Learning and Filtering Chemometrics. Heliyon 2024; 10:e30981. [PMID: 38778952 PMCID: PMC11108977 DOI: 10.1016/j.heliyon.2024.e30981] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2024] [Revised: 05/08/2024] [Accepted: 05/08/2024] [Indexed: 05/25/2024] Open
Abstract
The quantitative analysis of glucose using spectroscopy is a topic of great significance and interest in science and industry. One conundrum in this area is deploying appropriate preprocessing and regression tools. To contribute to addressing this challenge, in this study, we conducted a comprehensive and novel comparative analysis of various machine learning and preprocessing filtering techniques applied to near-infrared, mid-infrared, and a combination of near-infrared and mid-infrared spectroscopy for glucose assay. Our objective was to evaluate the effectiveness of these techniques in accurately predicting glucose levels and to determine which approach was most optimal. Our investigation involved the acquisition of spectral data from samples of glucose solutions using the three aforementioned spectroscopy techniques. The data was subjected to several preprocessing filtering methods, including convolutional moving average, Savitzky-Golay, multiplicative scatter correction, and normalisation. We then applied representative machine learning algorithms from three categories: linear modelling, traditional nonlinear modelling, and artificial neural networks. The evaluation results revealed that linear models exhibited higher predictive accuracy than nonlinear models, whereas artificial neural network models demonstrated comparable performance. Additionally, the comparative analysis of various filtering methods demonstrated that the convolutional moving average and Savitzky-Golay filters yielded the most precise outcomes overall. In conclusion, our study provides valuable insights into the efficacy of different machine learning techniques for glucose measurement and highlights the importance of applying appropriate filtering methods in enhancing predictive accuracy. These findings have important implications for the development of new and improved glucose quantification technologies.
Collapse
Affiliation(s)
- Heydar Khadem
- Department of Electronic and Electrical Engineering, University of Sheffield, UK
- Department of Computer Science, University of Manchester, Manchester, UK
- Artificial Intelligence & Machine Learning Team, KultraLab, London, UK
| | - Hoda Nemat
- Department of Electronic and Electrical Engineering, University of Sheffield, UK
| | - Jackie Elliott
- Department of Oncology and Metabolism, University of Sheffield, UK
- Sheffield Teaching Hospitals, Diabetes and Endocrine Centre, Northern General Hospital, Sheffield, UK
| | - Mohammed Benaissa
- Department of Electronic and Electrical Engineering, University of Sheffield, UK
| |
Collapse
|
3
|
Pasokh Z, Seif M, Ghaem H, Rezaianzadeh A, Ghoddusi Johari M. Age at natural menopause and its determinants in female population of Kharameh cohort study: Comparison of regression, conditional tree and forests. PLoS One 2024; 19:e0300448. [PMID: 38625988 PMCID: PMC11020934 DOI: 10.1371/journal.pone.0300448] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Accepted: 02/28/2024] [Indexed: 04/18/2024] Open
Abstract
BACKGROUND Natural menopause is defined as the permanent cessation of menstruation that occurs after 12 consecutive months of amenorrhea without any obvious pathological or physiological cause. The age of this phenomenon has been reported to be associated with several health outcomes. OBJECTIVES This study aimed to estimate the Age at Natural Menopause (ANM) and to identify reproductive and demographic factors affecting ANM. METHODS This cross-sectional, population-based study was conducted on 2517 post-menopausal women aged 40-70 years participating in the first phase of the PERSIAN cohort study of Kharameh, Iran, during 2014-2017. To more accurately detect the determinants of ANM, we applied multiple linear regression beside some machine learning algorithms including conditional tree, conditional forest, and random forest. Then, the fitness of these methods was compared using Mean Squared Error (MSE) and Pearson correlation coefficient. RESULTS The mean±SD of ANM was 48.95±6.13. Both applied forests provided more accurate results and identified more predictors. However, according to the final comparison, the conditional forest was the most accurate method which recognized that more pregnancies, longer breastfeeding, Fars ethnicity, and urbanization have the greatest impact on later ANM. CONCLUSIONS This study found a wide range of reproductive and demographic factors affecting ANM. Considering our findings in decision-making can reduce the complications related to this phenomenon and, consequently, improve the quality of life of post-menopausal women.
Collapse
Affiliation(s)
- Zahra Pasokh
- Student Research Committee, Shiraz University of Medical Sciences, Shiraz, Iran
| | - Mozhgan Seif
- Non-Communicable Diseases Research Center, Department of Epidemiology, School of Health, Shiraz University of Medical Sciences, Shiraz, Iran
| | - Haleh Ghaem
- Non-Communicable Diseases Research Center, Department of Epidemiology, School of Health, Shiraz University of Medical Sciences, Shiraz, Iran
| | - Abbas Rezaianzadeh
- Colorectal Research Center, Shiraz University of Medical Sciences, Shiraz, Iran
| | | |
Collapse
|
4
|
Vera Cruz G, Aboujaoude E, Rochat L, Bianchi-Demicheli F, Khazaal Y. Online dating: predictors of problematic tinder use. BMC Psychol 2024; 12:106. [PMID: 38424651 PMCID: PMC10905798 DOI: 10.1186/s40359-024-01566-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Accepted: 01/30/2024] [Indexed: 03/02/2024] Open
Abstract
BACKGROUND Geolocation apps have radically transformed dating practices around the world, with profound sociocultural implications. Few studies, however, have explored their addictive potential or factors that are associated with their misuse. OBJECTIVE The present study aimed to assess the level of problematic Tinder use (PTU) in an adult sample, using a machine learning algorithm to determine, among 29 relevant variables, the most important predictors of PTU. METHODS 1,387 users of Tinder (18-74 years-old; male = 50.3%; female = 49.1%) completed an online questionnaire, and a machine learning tool was used to analyze their responses. RESULTS On 5-point scale, participants' mean PTU score was 1.91 (SD = 0.70), indicating a relatively low overall level of problematic app use. Among the most important predictors of Problematic use were the use of Tinder for enhancement (reduce boredom and increase positive emotions), coping with psychological problems, and increasing social connectedness. The number of "matches" (when two users show mutual interest), the number of online contacts on Tinder, and the number of resulting offline dates were also among the top predictors of PTU. Depressive mood and loneliness were among the middle-ranked predictors of PTU. CONCLUSION In accordance with the Interaction of Person-Affect-Cognition-Execution model of problematic internet use, the results suggest that PTU relates to how individual experience on the app interacts with dispositional and situational characteristics. However, variables that seemed to relate to PTU, including lack of self-esteem, negative mood states and loneliness, are not problems that online dating services as currently designed can be expected to resolve. This argues for increased digital services to identify and address potential problems helping drive the popularity of dating apps.
Collapse
Affiliation(s)
- Germano Vera Cruz
- Department of Psychology, CRP-CPO, University of Picardie Jules Verne, Amiens, UR, 7273, France.
| | - Elias Aboujaoude
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, CA, USA
- Cedars-Sinai Medical Center, Los Angeles, CA, USA
| | - Lucien Rochat
- Addiction Division, Department of Psychiatry, University Hospitals of Geneva, Geneva, Switzerland
| | - Francesco Bianchi-Demicheli
- Department of Obstetrics and Gynecology, University Hospitals of Lausanne, Lausanne, Switzerland
- Center for Preventive & Integrative Medicine, Clinique des Grangettes and Center for Internal Medicine and its Specialties, Clinique La Colline, Hirslanden Group, Geneva, Switzerland
| | - Yasser Khazaal
- Addiction Medicine, Department of Psychiatry, Lausanne University Hospital, Lausanne, Switzerland.
- Research Centre, University Institute of Mental Health at Montreal and Department of Psychiatry and Addiction Montreal University, Montreal, Canada.
| |
Collapse
|
5
|
Sidorov P, Tsuji N. A Primer on 2D Descriptors in Selectivity Modeling for Asymmetric Catalysis. Chemistry 2024; 30:e202302837. [PMID: 38010242 DOI: 10.1002/chem.202302837] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Revised: 11/21/2023] [Accepted: 11/23/2023] [Indexed: 11/29/2023]
Abstract
Machine learning has permeated all fields of research, including chemistry, and is now an integral part of the design of novel compounds with desired properties. In the field of asymmetric catalysis, the preference still lies with models based on a physical understanding of the catalysis phenomenon and the electronic and steric properties of catalysts. However, such models require quantum chemical calculations and are thus limited by their computational cost. Here, we highlight the recent advances in modeling catalyst selectivity by using the 2D structures of catalysts and substrates. While these have a less explicit mechanistic connection to the modeled property, 2D descriptors, such as topological indices, molecular fingerprints, and fragments, offer the tremendous advantages of low cost and high speed of calculations. This makes them optimal for the in-silico screening of large amounts of data. We provide an overview of common quantitative structure-property relationship workflow, model building and validation techniques, applications of these methodologies in asymmetric catalysis design, and an outlook on improving the understanding of 2D-based models.
Collapse
Affiliation(s)
- Pavel Sidorov
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Sapporo, 001-0021, Japan
| | - Nobuya Tsuji
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Sapporo, 001-0021, Japan
| |
Collapse
|
6
|
Manessa MDM, Ummam MAF, Efriana AF, Semedi JM, Ayu F. Assessing Derawan Island's Coral Reefs over Two Decades: A Machine Learning Classification Perspective. SENSORS (BASEL, SWITZERLAND) 2024; 24:466. [PMID: 38257559 PMCID: PMC10818429 DOI: 10.3390/s24020466] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Revised: 12/23/2023] [Accepted: 01/09/2024] [Indexed: 01/24/2024]
Abstract
This study aims to understand the dynamic changes in the coral reef habitats of Derawan Island over two decades (2003, 2011, and 2021) using advanced machine learning classification techniques. The motivation stems from the urgent need for accurate, detailed environmental monitoring to inform conservation strategies, particularly in ecologically sensitive areas like coral reefs. We employed non-parametric machine learning algorithms, including Random Forest (RF), Support Vector Machine (SVM), and Classification and Regression Tree (CART), to assess spatial and temporal changes in coral habitats. Our analysis utilized high-resolution data from Landsat 9, Landsat 7, Sentinel-2, and Multispectral Aerial Photos. The RF algorithm proved to be the most accurate, achieving an accuracy of 71.43% with Landsat 9, 73.68% with Sentinel-2, and 78.28% with Multispectral Aerial Photos. Our findings indicate that the classification accuracy is significantly influenced by the geographic resolution and the quality of the field and satellite/aerial image data. Over the two decades, there was a notable decrease in the coral reef area from 2003 to 2011, with a reduction to 16 hectares, followed by a slight increase in area but with more heterogeneous densities between 2011 and 2021. The study underscores the dynamic nature of coral reef habitats and the efficacy of machine learning in environmental monitoring. The insights gained highlight the importance of advanced analytical methods in guiding conservation efforts and understanding ecological changes over time.
Collapse
Affiliation(s)
- Masita Dwi Mandini Manessa
- Department of Geography, Faculty of Mathematics and Natural Sciences, University of Indonesia, Depok 16424, Indonesia; (M.A.F.U.); (A.F.E.); (J.M.S.); (F.A.)
| | | | | | | | | |
Collapse
|
7
|
Yousefmarzi F, Haratian A, Mahdavi Kalatehno J, Keihani Kamal M. Machine learning approaches for estimating interfacial tension between oil/gas and oil/water systems: a performance analysis. Sci Rep 2024; 14:858. [PMID: 38195685 PMCID: PMC10776576 DOI: 10.1038/s41598-024-51597-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2023] [Accepted: 01/07/2024] [Indexed: 01/11/2024] Open
Abstract
Interfacial tension (IFT) is a key physical property that affects various processes in the oil and gas industry, such as enhanced oil recovery, multiphase flow, and emulsion stability. Accurate prediction of IFT is essential for optimizing these processes and increasing their efficiency. This article compares the performance of six machine learning models, namely Support Vector Regression (SVR), Random Forests (RF), Decision Tree (DT), Gradient Boosting (GB), Catboosting (CB), and XGBoosting (XGB), in predicting IFT between oil/gas and oil/water systems. The models are trained and tested on a dataset that contains various input parameters that influence IFT, such as gas-oil ratio, gas formation volume factor, oil density, etc. The results show that SVR and Catboost models achieve the highest accuracy for oil/gas IFT prediction, with an R-squared value of 0.99, while SVR outperforms Catboost for Oil/Water IFT prediction, with an R-squared value of 0.99. The study demonstrates the potential of machine learning models as a reliable and resilient tool for predicting IFT in the oil and gas industry. The findings of this study can help improve the understanding and optimization of IFT forecasting and facilitate the development of more efficient reservoir management strategies.
Collapse
Affiliation(s)
- Fatemeh Yousefmarzi
- Department of Petroleum Engineering, Amirkabir University of Technology, Tehran, Iran
| | - Ali Haratian
- Department of Petroleum Engineering, Amirkabir University of Technology, Tehran, Iran
| | | | - Mostafa Keihani Kamal
- Department of Petroleum Engineering, Amirkabir University of Technology, Tehran, Iran
| |
Collapse
|
8
|
Xu S, Yang X, Zhang S, Zheng X, Zheng F, Liu Y, Zhang H, Ye Q, Li L. Machine learning models for orthokeratology lens fitting and axial length prediction. Ophthalmic Physiol Opt 2023; 43:1462-1468. [PMID: 37574762 DOI: 10.1111/opo.13212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Revised: 07/25/2023] [Accepted: 07/26/2023] [Indexed: 08/15/2023]
Abstract
PURPOSE In order to improve the efficiency of orthokeratology (OK) lens fitting and predict the axial length after 1 year of OK lens wear, machine learning models were proposed. METHODS Clinical data from 1302 myopic subjects were collected retrospectively, and two machine learning models were implemented. Demographic and corneal topographic data were collected as input variables. The output variables were the parameters of the OK lens and the axial length after 1 year. Eighty percent of input variables was used as the training set and the remaining 20% was used as the validation set. The first alignment curve (AC1) of the OK lenses, deduced using machine learning models and formula calculation, were compared. Multiple regression models (support vector machine, Gaussian process, decision tree and random forest) were used to predict the axial length after 1 year. In addition, we classified data based on lens brand, and carried out more detailed parameter fitting and analysis for spherical and toric OK lenses. RESULTS The OK lens fitting model showed higher (R2 = 0.93) and lower errors (mean absolute error [MAE] = 0.19, mean square error [MSE] = 0.09) when predicting AC1, compared with the formula calculation (R2 = 0.66, MAE = 0.44, MSE = 0.25). The machine learning model still had high R2 values ranging from 0.91 to 0.96 when considering the brand and design of the OK lenses. Further, the R2 value for the axial length prediction model was 0.94, which indicated that the machine learning model had high accuracy and good robustness. CONCLUSION The OK lens fitting model and the axial length prediction model played an important role in guiding OK lens fitting, with high accuracy and robustness in prediction performance.
Collapse
Affiliation(s)
- Shuai Xu
- Key Laboratory of Weak-Light Nonlinear Photonics, Ministry of Education, School of Physics and TEDA Applied Physics, Nankai University, Tianjin, China
| | - Xiaoyan Yang
- Tianjin Eye Hospital Optometric Center, Tianjin, China
- Tianjin Eye Hospital, Tianjin, China
- Nankai University Affiliated Eye Hospital, Nankai University, Tianjin, China
| | - Shuxian Zhang
- Tianjin Eye Hospital Optometric Center, Tianjin, China
- Tianjin Eye Hospital, Tianjin, China
- Nankai University Affiliated Eye Hospital, Nankai University, Tianjin, China
| | - Xuan Zheng
- Key Laboratory of Weak-Light Nonlinear Photonics, Ministry of Education, School of Physics and TEDA Applied Physics, Nankai University, Tianjin, China
| | - Fang Zheng
- Key Laboratory of Weak-Light Nonlinear Photonics, Ministry of Education, School of Physics and TEDA Applied Physics, Nankai University, Tianjin, China
| | - Yin Liu
- School of Medicine, Nankai University, Tianjin, China
| | - Hanyu Zhang
- School of Medicine, Nankai University, Tianjin, China
| | - Qing Ye
- Key Laboratory of Weak-Light Nonlinear Photonics, Ministry of Education, School of Physics and TEDA Applied Physics, Nankai University, Tianjin, China
| | - Lihua Li
- Tianjin Eye Hospital Optometric Center, Tianjin, China
- Tianjin Eye Hospital, Tianjin, China
- Nankai University Affiliated Eye Hospital, Nankai University, Tianjin, China
| |
Collapse
|
9
|
Dang T, Fermin ASR, Machizawa MG. oFVSD: a Python package of optimized forward variable selection decoder for high-dimensional neuroimaging data. Front Neuroinform 2023; 17:1266713. [PMID: 37829329 PMCID: PMC10566623 DOI: 10.3389/fninf.2023.1266713] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Accepted: 09/08/2023] [Indexed: 10/14/2023] Open
Abstract
The complexity and high dimensionality of neuroimaging data pose problems for decoding information with machine learning (ML) models because the number of features is often much larger than the number of observations. Feature selection is one of the crucial steps for determining meaningful target features in decoding; however, optimizing the feature selection from such high-dimensional neuroimaging data has been challenging using conventional ML models. Here, we introduce an efficient and high-performance decoding package incorporating a forward variable selection (FVS) algorithm and hyper-parameter optimization that automatically identifies the best feature pairs for both classification and regression models, where a total of 18 ML models are implemented by default. First, the FVS algorithm evaluates the goodness-of-fit across different models using the k-fold cross-validation step that identifies the best subset of features based on a predefined criterion for each model. Next, the hyperparameters of each ML model are optimized at each forward iteration. Final outputs highlight an optimized number of selected features (brain regions of interest) for each model with its accuracy. Furthermore, the toolbox can be executed in a parallel environment for efficient computation on a typical personal computer. With the optimized forward variable selection decoder (oFVSD) pipeline, we verified the effectiveness of decoding sex classification and age range regression on 1,113 structural magnetic resonance imaging (MRI) datasets. Compared to ML models without the FVS algorithm and with the Boruta algorithm as a variable selection counterpart, we demonstrate that the oFVSD significantly outperformed across all of the ML models over the counterpart models without FVS (approximately 0.20 increase in correlation coefficient, r, with regression models and 8% increase in classification models on average) and with Boruta variable selection algorithm (approximately 0.07 improvement in regression and 4% in classification models). Furthermore, we confirmed the use of parallel computation considerably reduced the computational burden for the high-dimensional MRI data. Altogether, the oFVSD toolbox efficiently and effectively improves the performance of both classification and regression ML models, providing a use case example on MRI datasets. With its flexibility, oFVSD has the potential for many other modalities in neuroimaging. This open-source and freely available Python package makes it a valuable toolbox for research communities seeking improved decoding accuracy.
Collapse
Affiliation(s)
- Tung Dang
- Center for Brain, Mind, and KANSEI Sciences Research, Hiroshima University, Hiroshima, Japan
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo, Japan
| | - Alan S. R. Fermin
- Center for Brain, Mind, and KANSEI Sciences Research, Hiroshima University, Hiroshima, Japan
| | - Maro G. Machizawa
- Center for Brain, Mind, and KANSEI Sciences Research, Hiroshima University, Hiroshima, Japan
| |
Collapse
|
10
|
Berni M, Veronesi F, Fini M, Giavaresi G, Marchiori G. Relations between Structure/Composition and Mechanics in Osteoarthritic Regenerated Articular Tissue: A Machine Learning Approach. Int J Mol Sci 2023; 24:13374. [PMID: 37686179 PMCID: PMC10487849 DOI: 10.3390/ijms241713374] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Revised: 08/22/2023] [Accepted: 08/25/2023] [Indexed: 09/10/2023] Open
Abstract
In the context of a large animal model of early osteoarthritis (OA) treated by orthobiologics, the purpose of this study was to reveal relations between articular tissues structure/composition and cartilage viscoelasticity. Twenty-four sheep, with induced knee OA, were treated by mesenchymal stem cells in various preparations-adipose-derived mesenchymal stem cells (ADSCs), stromal vascular fraction (SVF), and amniotic endothelial cells (AECs)-and euthanized at 3 or 6 months to evaluate the (i) biochemistry of synovial fluid; (ii) histology, immunohistochemistry, and histomorphometry of articular cartilage; and (iii) viscoelasticity of articular cartilage. After performing an initial analysis to evaluate the correlation and multicollinearity between the investigated variables, this study used machine learning (ML) models-Variable Selection Using Random Forests (VSURF) and Extreme Gradient Boosting (XGB)-to classify variables according to their importance and employ them for interpretation and prediction. The experimental setup revealed a potential relation between cartilage elastic modulus and cartilage thickness (CT), synovial fluid interleukin 6 (IL6), and prostaglandin E2 (PGE2), and between cartilage relaxation time and CT and PGE2. SVF treatment was the only limit on the deleterious OA effect on cartilage viscoelastic properties. This work provides indications to future studies aiming to highlight these and other relationships and focusing on advanced regeneration targets.
Collapse
Affiliation(s)
- Matteo Berni
- Medical Technology Laboratory, IRCCS Istituto Ortopedico Rizzoli, Via Di Barbiano 1/10, 40136 Bologna, Italy;
| | - Francesca Veronesi
- Surgical Sciences and Technologies, IRCCS Istituto Ortopedico Rizzoli, Via Di Barbiano 1/10, 40136 Bologna, Italy; (G.G.); (G.M.)
| | - Milena Fini
- Scientific Direction, IRCCS Istituto Ortopedico Rizzoli, Via Di Barbiano 1/10, 40136 Bologna, Italy;
| | - Gianluca Giavaresi
- Surgical Sciences and Technologies, IRCCS Istituto Ortopedico Rizzoli, Via Di Barbiano 1/10, 40136 Bologna, Italy; (G.G.); (G.M.)
| | - Gregorio Marchiori
- Surgical Sciences and Technologies, IRCCS Istituto Ortopedico Rizzoli, Via Di Barbiano 1/10, 40136 Bologna, Italy; (G.G.); (G.M.)
| |
Collapse
|
11
|
Shi Y, Du Z, Zhang J, Han F, Chen F, Wang D, Liu M, Zhang H, Dong C, Sui S. Construction and evaluation of hourly average indoor PM 2.5 concentration prediction models based on multiple types of places. Front Public Health 2023; 11:1213453. [PMID: 37637795 PMCID: PMC10447970 DOI: 10.3389/fpubh.2023.1213453] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Accepted: 07/28/2023] [Indexed: 08/29/2023] Open
Abstract
Background People usually spend most of their time indoors, so indoor fine particulate matter (PM2.5) concentrations are crucial for refining individual PM2.5 exposure evaluation. The development of indoor PM2.5 concentration prediction models is essential for the health risk assessment of PM2.5 in epidemiological studies involving large populations. Methods In this study, based on the monitoring data of multiple types of places, the classical multiple linear regression (MLR) method and random forest regression (RFR) algorithm of machine learning were used to develop hourly average indoor PM2.5 concentration prediction models. Indoor PM2.5 concentration data, which included 11,712 records from five types of places, were obtained by on-site monitoring. Moreover, the potential predictor variable data were derived from outdoor monitoring stations and meteorological databases. A ten-fold cross-validation was conducted to examine the performance of all proposed models. Results The final predictor variables incorporated in the MLR model were outdoor PM2.5 concentration, type of place, season, wind direction, surface wind speed, hour, precipitation, air pressure, and relative humidity. The ten-fold cross-validation results indicated that both models constructed had good predictive performance, with the determination coefficients (R2) of RFR and MLR were 72.20 and 60.35%, respectively. Generally, the RFR model had better predictive performance than the MLR model (RFR model developed using the same predictor variables as the MLR model, R2 = 71.86%). In terms of predictors, the importance results of predictor variables for both types of models suggested that outdoor PM2.5 concentration, type of place, season, hour, wind direction, and surface wind speed were the most important predictor variables. Conclusion In this research, hourly average indoor PM2.5 concentration prediction models based on multiple types of places were developed for the first time. Both the MLR and RFR models based on easily accessible indicators displayed promising predictive performance, in which the machine learning domain RFR model outperformed the classical MLR model, and this result suggests the potential application of RFR algorithms for indoor air pollutant concentration prediction.
Collapse
Affiliation(s)
- Yewen Shi
- Shanghai Municipal Center for Disease Control and Prevention, Shanghai, China
| | - Zhiyuan Du
- Department of Environmental Health, Key Laboratory of the Public Health Safety, Ministry of Education, School of Public Health, Fudan University, Shanghai, China
| | - Jianghua Zhang
- Shanghai Municipal Center for Disease Control and Prevention, Shanghai, China
| | - Fengchan Han
- Shanghai Municipal Center for Disease Control and Prevention, Shanghai, China
| | - Feier Chen
- Shanghai Municipal Center for Disease Control and Prevention, Shanghai, China
| | - Duo Wang
- Shanghai Municipal Center for Disease Control and Prevention, Shanghai, China
| | - Mengshuang Liu
- Shanghai Municipal Center for Disease Control and Prevention, Shanghai, China
| | - Hao Zhang
- Department of Environmental Health, Key Laboratory of the Public Health Safety, Ministry of Education, School of Public Health, Fudan University, Shanghai, China
| | - Chunyang Dong
- Shanghai Municipal Center for Disease Control and Prevention, Shanghai, China
| | - Shaofeng Sui
- Shanghai Municipal Center for Disease Control and Prevention, Shanghai, China
| |
Collapse
|
12
|
Dargi M, Khamehchi E, Mahdavi Kalatehno J. Optimizing acidizing design and effectiveness assessment with machine learning for predicting post-acidizing permeability. Sci Rep 2023; 13:11851. [PMID: 37481625 PMCID: PMC10363159 DOI: 10.1038/s41598-023-39156-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2023] [Accepted: 07/20/2023] [Indexed: 07/24/2023] Open
Abstract
Formation damage poses a widespread challenge in the oil and gas industry, leading to diminished permeability, flow rates, and overall well productivity. Acidizing is a commonly employed technique aimed at mitigating damage and enhancing permeability. In this study, to predict the permeability after acidizing in oil and gas reservoirs, three machine learning models, namely artificial neural networks, random forest, and XGBoost, along with genetic programming were used to estimate permeability changes after acidizing. These models are utilized to estimate permeability changes following acidizing operations. Training of the models involved a dataset comprising 218 acidizing operations conducted in diverse reservoirs across Iran. The input parameters, namely permeability, porosity, skin factor, calcite mineral fraction, acid injection rate, and injected acid volume, were optimized through the use of a genetic algorithm. Statistical and graphical analysis of the results demonstrates that genetic programming outperformed the other machine learning techniques, yielding superior performance with R square and RMSE values of 0.82 and 17.65, respectively. Nevertheless, the other models also exhibited commendable performance, surpassing an R square value of 0.73. The post-acidizing permeability data obtained from core flooding experiments conducted on carbonate and sandstone cores was utilized to validate the models. The genetic programming model demonstrates an average error of 21.1%. The evaluation of post-acidizing permeability using genetic programming, in comparison with the results obtained from the core-flood test, revealed errors of 22.95% and 32.4% for carbonate and sandstone cores, respectively. Furthermore, a comparison between the calculated post-acidizing permeability derived from the GP model and previous studies indicated errors within the range of 8.6-26.59%. The findings highlight the potential of genetic programming and machine learning algorithms in accurately predicting post-acidizing permeability, thereby aiding in acidizing design, effectiveness assessment, and ultimately enhancing oil and gas production rates.
Collapse
Affiliation(s)
- Matin Dargi
- Department of Petroleum Engineering, Amirkabir University of Technology, Tehran, Iran
| | - Ehsan Khamehchi
- Department of Petroleum Engineering, Amirkabir University of Technology, Tehran, Iran.
| | | |
Collapse
|
13
|
Vera Cruz G, Aboujaoude E, Rochat L, Bianchi-Demichelli F, Khazaal Y. Finding Intimacy Online: A Machine Learning Analysis of Predictors of Success. CYBERPSYCHOLOGY, BEHAVIOR AND SOCIAL NETWORKING 2023. [PMID: 37352415 DOI: 10.1089/cyber.2022.0367] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/25/2023]
Abstract
While an extensive scientific literature now exists on the use of online dating services, there are very few studies on user satisfaction with dating apps and with the resulting offline dates. This study aimed to assess the level of satisfaction with Tinder use (STU) and the level of satisfaction with Tinder offline dates (STOD) in a sample of adult users of the app. The study also aimed to examine, among 28 variables, those that are the most important in predicting STU and STOD. Overall, 1,387 Tinder users completed an online questionnaire. A machine learning model was used to rank order predictors from most to least important. On a 4-point scale, participants' mean STU score was 2.39, and, on a 5-point scale, mean STOD score was 3.05. The results indicate that satisfaction with dating apps and with resulting offline dates is strongly predicted by participants' age and by their motives for using Tinder (enhancement, emotional coping, socialization, finding "true love," or casual sexual partners), whereas the variables negatively associated with satisfaction were those related to psychopathology. Interestingly, 65.3 percent of app users were married or "in a relationship," and only 50.3 percent of app users were using it to meet someone offline. Generally, participants who engage with the app to cope with personal difficulties seem more likely to report higher levels of dissatisfaction, suggesting that dating apps are a poor coping mechanism and highlighting the need to address underlying problems or pathologies that may be driving their use.
Collapse
Affiliation(s)
- Germano Vera Cruz
- Department of Psychology, University of Picardie Jules Verne, Amiens, France
| | - Elias Aboujaoude
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, California, USA
| | - Lucien Rochat
- Addiction Division, Department of Psychiatry, University Hospitals of Geneva, Geneva, Switzerland
| | - Francesco Bianchi-Demichelli
- Sexual Medicine Consultation, Department of Obstetrics and Gynecology, University Hospitals of Lausanne, Lausanne, Switzerland
- Center for Preventive & Integrative Medicine, Clinique des Grangettes and Center for Internal Medicine and Its Specialties, Clinique La Colline, Hirslanden Group, Geneva, Switzerland
| | - Yasser Khazaal
- Addiction Medicine, Department of Psychiatry, Lausanne University Hospital, Lausanne, Switzerland
- Research Centre, University Institute of Mental Health at Montreal and Department of Psychiatry and Addiction Montreal University, Montreal, Canada
| |
Collapse
|
14
|
Amankulova K, Farmonov N, Akramova P, Tursunov I, Mucsi L. Comparison of PlanetScope, Sentinel-2, and landsat 8 data in soybean yield estimation within-field variability with random forest regression. Heliyon 2023; 9:e17432. [PMID: 37408926 PMCID: PMC10319221 DOI: 10.1016/j.heliyon.2023.e17432] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2022] [Revised: 06/12/2023] [Accepted: 06/16/2023] [Indexed: 07/07/2023] Open
Abstract
Accurate timely and early-season crop yield estimation within the field variability is important for precision farming and sustainable management applications. Therefore, the ability to estimate the within-field variability of grain yield is crucial for ensuring food security worldwide, especially under climate change. Several Earth observation systems have thus been developed to monitor crops and predict yields. Despite this, new research is required to combine multiplatform data integration, advancements in satellite technologies, data processing, and the application of this discipline to agricultural practices. This study provides further developments in soybean yield estimation by comparing multisource satellite data from PlanetScope (PS), Sentinel-2 (S2), and Landsat 8 (L8) and introducing topographic and meteorological variables. Herein, a new method of combining soybean yield, global positioning systems, harvester data, climate, topographic variables, and remote sensing images has been demonstrated. Soybean yield shape points were obtained from a combine-harvester-installed GPS and yield monitoring system from seven fields over the 2021 season. The yield estimation models were trained and validated using random forest, and four vegetation indices were tested. The result showed that soybean yield can be accurately predicted at 3-, 10-, and 30-m resolutions with mean absolute error (MAE) value of 0.091 t/ha for PS, 0.118 t/ha for S2, and 0.120 t/ha for L8 data (root mean square error (RMSE) of 0.111, 0.076). The combination of the environmental data with the original bands provided further improvements and an accurate yield estimation model within the soybean yield variability with MAE of 0.082 t/ha for PS, 0.097 t/ha for S2, and 0.109 t/ha for L8 (RMSE of 0.094, 0.069, and 0.108 t/ha). The results showed that the optimal date to predict the soybean yield within the field scale was approximately 60 or 70 days before harvesting periods during the beginning bloom stage. The developed model can be applied for other crops and locations when suitable training yield data, which are critical for precision farming, are available.
Collapse
Affiliation(s)
- Khilola Amankulova
- Department of Geoinformatics, Physical and Environmental Geography, University of Szeged, Egyetem Utca 2, Szeged 6722, Hungary
| | - Nizom Farmonov
- Department of Geoinformatics, Physical and Environmental Geography, University of Szeged, Egyetem Utca 2, Szeged 6722, Hungary
| | - Parvina Akramova
- Department of Hydrology and Ecology, “TIIAME” NRU Bukhara Institute of Natural Resources Management, Gazli Avenue 32, Bukhara, Uzbekistan
| | - Ikrom Tursunov
- Department of Hydrology and Ecology, “TIIAME” NRU Bukhara Institute of Natural Resources Management, Gazli Avenue 32, Bukhara, Uzbekistan
| | - László Mucsi
- Department of Geoinformatics, Physical and Environmental Geography, University of Szeged, Egyetem Utca 2, Szeged 6722, Hungary
| |
Collapse
|
15
|
Gnyawali K, Dahal K, Talchabhadel R, Nirandjan S. Framework for rainfall-triggered landslide-prone critical infrastructure zonation. THE SCIENCE OF THE TOTAL ENVIRONMENT 2023; 872:162242. [PMID: 36804983 DOI: 10.1016/j.scitotenv.2023.162242] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/07/2022] [Revised: 02/09/2023] [Accepted: 02/10/2023] [Indexed: 06/18/2023]
Abstract
Rainfall-induced landslides cause frequent disruptions to critical infrastructure in mountainous countries. Climate change is altering rainfall patterns and localizing extreme rainfall events, increasing the occurrence of landslides. For planning climate-resilient critical infrastructure in landslide-prone regions, it is urgent to understand the changing landslide susceptibility in relation to changing rainfall extremes and spatially overlay them with critical infrastructure to determine risk zones. As such, areas requiring financial reinforcements can be prioritized. In this paper, we develop a framework linking changing rainfall extremes to landslide susceptibility and intensity of critical infrastructure - exemplified on a national scale using Nepal as a case study. First, we define a set of 21 different unique rainfall indices that describe extreme and localized rainfall. Second, we prepare a new annual (2016-2020) inventory of 107,900 landslides in Nepal mapped on PlanetScope satellite imagery. Next, we prepare a landslide susceptibility map by training a random forest model using the collected extreme rainfall indices and landslide locations in combination with spatial data on topography. Fourth, we construct a gridded critical infrastructure spatial density map that quantifies the intensity of infrastructure (i.e., transportation, energy, telecommunication, waste, water, health, and education) at each grid location using OpenStreetMap. The landslide susceptibility map classified Nepal's topography into low (36 %), medium (33 %), and (32 %) high rainfall-triggered landslide susceptibility zones. The landslide susceptibility map had an average area under the receiver characteristic curve value of 0.94. Finally, we overlay the landslide susceptibility map with the critical infrastructure intensity to identify areas needing financial reinforcement. Our framework reasonably mapped critical infrastructure hotspots in Nepal prone to landslides on a 1 km grid. The hotspots are mainly concentrated along major national highways and in provinces 4, 3, and 1, highlighting the need for improved land management practices. These hotspots need spatial prioritization regarding climate-resilient critical infrastructure financing and slope conservation policies. The research data, output maps, and code are publicly released via an ArcGIS WebApp and GitHub repository. The framework is scalable and can be used for developing infrastructure financing strategies for landslide mountain regions and countries.
Collapse
Affiliation(s)
- Kaushal Gnyawali
- School of Engineering, University of British Columbia, Kelowna, BC, V1V 1V7, Canada; Natural Hazards Section, Himalayan Risk Research Institute, Bhaktapur, Nepal.
| | - Kshitij Dahal
- Natural Hazards Section, Himalayan Risk Research Institute, Bhaktapur, Nepal
| | - Rocky Talchabhadel
- Texas A&M AgriLife Research, Texas A&M University, El Paso, TX 79927, USA
| | - Sadhana Nirandjan
- Institute for Environmental Studies (IVM), Vrije Universiteit Amsterdam, 1081HV Amsterdam, the Netherlands
| |
Collapse
|
16
|
Lerebourg L, Saboul D, Clémençon M, Coquart JB. Prediction of Marathon Performance using Artificial Intelligence. Int J Sports Med 2023; 44:352-360. [PMID: 36473492 DOI: 10.1055/a-1993-2371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Although studies used machine learning algorithms to predict performances in sports activities, none, to the best of our knowledge, have used and validated two artificial intelligence techniques: artificial neural network (ANN) and k-nearest neighbor (KNN) in the running discipline of marathon and compared the accuracy or precision of the predicted performances. Official French rankings for the 10-km road and marathon events in 2019 were scrutinized over a dataset of 820 athletes (aged 21, having run 10 km and a marathon in the same year that was run slower, etc.). For the KNN and ANN the same inputs (10-km race time, body mass index, age and sex) were used to solve a linear regression problem to estimate the marathon race time. No difference was found between the actual and predicted marathon performances for either method (p>0,05). All predicted performances were significantly correlated with the actual ones, with very high correlation coefficients (r>0,90; p<0,001). KNN outperformed ANN with a mean absolute error of 2,4 vs 5,6%. The study confirms the validity of both algorithms, with better accuracy for KNN in predicting marathon performance. Consequently, the predictions from these artificial intelligence methods may be used in training programs and competitions.
Collapse
Affiliation(s)
- Lucie Lerebourg
- Centre d'Etudes des Transformations des Activités Physiques et Sportives Normandie Univ, UNIROUEN, CETAPS, 76000 Rouen, France
| | - Damien Saboul
- Research and Innovation, Be-ys-research, Argonay, France
| | - Michel Clémençon
- Centre d'Etudes des Transformations des Activités Physiques et Sportives Normandie Univ, UNIROUEN, CETAPS, 76000 Rouen, France
| | - Jérémy Bernard Coquart
- Centre d'Etudes des Transformations des Activités Physiques et Sportives Normandie Univ, UNIROUEN, CETAPS, 76000 Rouen, France.,Unité de Recherche Pluridisciplinaire Sport, Santé, Société Eurasport, 413 avenue Eugène Avinée, 59 120 Loos, France
| |
Collapse
|
17
|
Abstract
Over the past decade, advances in plant genotyping have been critical in enabling the identification of genetic diversity, in understanding evolution, and in dissecting important traits in both crops and native plants. The widespread popularity of single-nucleotide polymorphisms (SNPs) has prompted significant improvements to SNP-based genotyping, including SNP arrays, genotyping by sequencing, and whole-genome resequencing. More recent approaches, including genotyping structural variants, utilizing pangenomes to capture species-wide genetic diversity and exploiting machine learning to analyze genotypic data sets, are pushing the boundaries of what plant genotyping can offer. In this chapter, we highlight these innovations and discuss how they will accelerate and advance future genotyping efforts.
Collapse
|
18
|
Vera Cruz G, Aboujaoude E, Khan R, Rochat L, Ben Brahim F, Courtois R, Khazaal Y. Smartphone apps for mental health and wellbeing: A usage survey and machine learning analysis of psychological and behavioral predictors. Digit Health 2023; 9:20552076231152164. [PMID: 36714544 PMCID: PMC9880571 DOI: 10.1177/20552076231152164] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Accepted: 01/03/2023] [Indexed: 01/24/2023] Open
Abstract
Objective Despite the availability of thousands of mental health applications, the extent to which they are used and the factors associated with their use remain largely unknown. The present study aims to (a) assess in a representative US-based population sample the use of smartphone apps for mental health and wellbeing (SAMHW), (b) determine the variables predicting the use of SAMHW, and (c) explore how a set of variables related to mental health, smartphone use, and smartphone "addiction" may be associated with the use of SAMHW. Methods Data was collected via online questionnaire from 1989 adults. The data gathered included information on smartphone use behavior, mental health, and the use of SAMHW. Latent class analysis was used to categorize participants. Machine learning and logistic regression analyses were used to determine the most important predictors of SAMHW use and associations between predictors and outcome variables. Results While two-thirds of participants had a statistically high probability for using SAMHW, nearly twice more had high probability for using them to improve wellbeing compared to using them to address mental health problems (43% vs. 18%). In both groups, these participants were more likely to be female and in the younger adult age bracket than male and in the adult or older adult age bracket. According to the machine learning model, the most important predictors for using the relevant smartphone apps were variables associated with smartphone problematic use, COVID-19 impact, and mental health problems. Conclusion Findings from the present study confirm that the use of SAMHW is growing, particularly among younger adult and female individuals who are negatively impacted by problematic smartphone use, COVID-19, and mental health problems. These individuals tend to bypass traditional care via psychotherapy or psychopharmacology, relying instead on smartphones to address mental health conditions or improve wellbeing. Advising users of these apps to also seek professional help and promoting efforts to prove the efficacy and safety of SAMHW would seem necessary.
Collapse
Affiliation(s)
- Germano Vera Cruz
- Department of Psychology, University of Picardie Jules Verne,
Amiens, France,Yasser Khazaal, CHUV, Département de
Psychiatrie, Service de médecine des addictions, Rue du Bugnon 23, 1011
Lausanne, Switzerland.
| | - Elias Aboujaoude
- Department of Psychiatry and Behavioral Sciences, Stanford University School of
Medicine, Stanford, CA, USA
| | - Riaz Khan
- Addiction Psychiatry, Foederatio Medicorum Helveticorum, Geneva,
Switzerland
| | - Lucien Rochat
- Addiction Division, Department of Psychiatry, University Hospitals
of Geneva, Geneva, Switzerland
| | | | - Robert Courtois
- Department of Psychology, University of Tours, Tours, France
| | - Yasser Khazaal
- Addiction Medicine, Lausanne University
Hospital, Lausanne, Switzerland,Department of Psychiatry, Lausanne University, Lausanne,
Switzerland,Department of Psychiatry and Addictology, Montreal University,
Montreal, QC, Canada
| |
Collapse
|
19
|
Kaya H, Guler E, Kırmacı V. Prediction of temperature separation of a nitrogen-driven vortex tube with linear, kNN, SVM, and RF regression models. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-08030-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
20
|
Li X, Tang X, Cheng Q. Predicting the clinical citation count of biomedical papers using multilayer perceptron neural network. J Informetr 2022. [DOI: 10.1016/j.joi.2022.101333] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
21
|
Couckuyt A, Seurinck R, Emmaneel A, Quintelier K, Novak D, Van Gassen S, Saeys Y. Challenges in translational machine learning. Hum Genet 2022; 141:1451-1466. [PMID: 35246744 PMCID: PMC8896412 DOI: 10.1007/s00439-022-02439-8] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2021] [Accepted: 02/08/2022] [Indexed: 11/25/2022]
Abstract
Machine learning (ML) algorithms are increasingly being used to help implement clinical decision support systems. In this new field, we define as "translational machine learning", joint efforts and strong communication between data scientists and clinicians help to span the gap between ML and its adoption in the clinic. These collaborations also improve interpretability and trust in translational ML methods and ultimately aim to result in generalizable and reproducible models. To help clinicians and bioinformaticians refine their translational ML pipelines, we review the steps from model building to the use of ML in the clinic. We discuss experimental setup, computational analysis, interpretability and reproducibility, and emphasize the challenges involved. We highly advise collaboration and data sharing between consortia and institutes to build multi-centric cohorts that facilitate ML methodologies that generalize across centers. In the end, we hope that this review provides a way to streamline translational ML and helps to tackle the challenges that come with it.
Collapse
Affiliation(s)
- Artuur Couckuyt
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Gent, Belgium
- Data Mining and Modeling for Biomedicine, VIB-UGent Center for Inflammation Research, Gent, Belgium
| | - Ruth Seurinck
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Gent, Belgium
- Data Mining and Modeling for Biomedicine, VIB-UGent Center for Inflammation Research, Gent, Belgium
| | - Annelies Emmaneel
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Gent, Belgium
- Data Mining and Modeling for Biomedicine, VIB-UGent Center for Inflammation Research, Gent, Belgium
| | - Katrien Quintelier
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Gent, Belgium
- Data Mining and Modeling for Biomedicine, VIB-UGent Center for Inflammation Research, Gent, Belgium
- Department of Pulmonary Diseases, Erasmus MC, Rotterdam, The Netherlands
| | - David Novak
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Gent, Belgium
- Data Mining and Modeling for Biomedicine, VIB-UGent Center for Inflammation Research, Gent, Belgium
| | - Sofie Van Gassen
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Gent, Belgium
- Data Mining and Modeling for Biomedicine, VIB-UGent Center for Inflammation Research, Gent, Belgium
| | - Yvan Saeys
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Gent, Belgium.
- Data Mining and Modeling for Biomedicine, VIB-UGent Center for Inflammation Research, Gent, Belgium.
| |
Collapse
|
22
|
Chung PY, Liao CT. Selection of parental lines for plant breeding via genomic prediction. FRONTIERS IN PLANT SCIENCE 2022; 13:934767. [PMID: 35968112 PMCID: PMC9363737 DOI: 10.3389/fpls.2022.934767] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/03/2022] [Accepted: 07/01/2022] [Indexed: 06/15/2023]
Abstract
A set of superior parental lines is imperative for the development of high-performing inbred lines in any biparental crossing program for crops. The main objectives of this study are to (a) develop a genomic prediction approach to identify superior parental lines for multi-trait selection, and (b) generate a software package for users to execute the proposed approach before conducting field experiments. According to different breeding goals of the target traits, a novel selection index integrating information from genomic-estimated breeding values (GEBVs) of candidate accessions was proposed to evaluate the composite performance of simulated progeny populations. Two rice (Oryza sativa L.) genome datasets were analyzed to illustrate the potential applications of the proposed approach. One dataset applied to the parental selection for producing inbred lines with satisfactory performance in primary and secondary traits simultaneously. The other one applied to demonstrate the application of producing inbred lines with high adaptability to different environments. Overall, the results showed that incorporating GEBV and genomic diversity into a selection strategy based on the proposed selection index could assist in selecting superior parents to meet the desired breeding goals and increasing long-term genetic gain. An R package, called IPLGP, was generated to facilitate the widespread application of the approach.
Collapse
Affiliation(s)
- Ping-Yuan Chung
- Department of Agronomy, National Taiwan University, Taipei, Taiwan
- Institute of Statistical Science, Academia Sinica, Taipei, Taiwan
| | - Chen-Tuo Liao
- Department of Agronomy, National Taiwan University, Taipei, Taiwan
| |
Collapse
|
23
|
Chen X, Zheng H, Wang H, Yan T. Can machine learning algorithms perform better than multiple linear regression in predicting nitrogen excretion from lactating dairy cows. Sci Rep 2022; 12:12478. [PMID: 35864287 PMCID: PMC9304409 DOI: 10.1038/s41598-022-16490-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2021] [Accepted: 07/11/2022] [Indexed: 11/09/2022] Open
Abstract
This study aims to compare the performance of multiple linear regression and machine learning algorithms for predicting manure nitrogen excretion in lactating dairy cows, and to develop new machine learning prediction models for MN excretion. Dataset used were collated from 43 total diet digestibility studies with 951 lactating dairy cows. Prediction models for MN were developed and evaluated using MLR technique and three machine learning algorithms, artificial neural networks, random forest regression and support vector regression. The ANN model produced a lower RMSE and a higher CCC, compared to the MLR, RFR and SVR model, in the tenfold cross validation. Meanwhile, a hybrid knowledge-based and data-driven approach was developed and implemented to selecting features in this study. Results showed that the performance of ANN models were greatly improved by the turning process of selection of features and learning algorithms. The proposed new ANN models for prediction of MN were developed using nitrogen intake as the primary predictor. Alternative models were also developed based on live weight and milk yield for use in the condition where nitrogen intake data are not available (e.g., in some commercial farms). These new models provide benchmark information for prediction and mitigation of nitrogen excretion under typical dairy production conditions managed within grassland-based dairy systems.
Collapse
Affiliation(s)
- Xianjiang Chen
- Livestock Production Science Branch, Agri-Food and Biosciences Institute, Hillsborough, County Down, BT26 6DR, UK.,School of Computing, University of Ulster, Belfast, BT15 1ED, UK
| | - Huiru Zheng
- School of Computing, University of Ulster, Belfast, BT15 1ED, UK.
| | - Haiying Wang
- School of Computing, University of Ulster, Belfast, BT15 1ED, UK.
| | - Tianhai Yan
- Livestock Production Science Branch, Agri-Food and Biosciences Institute, Hillsborough, County Down, BT26 6DR, UK.
| |
Collapse
|
24
|
Santhanam P, Nath T, Lindquist MA, Cooper DS. Relationship Between TSH Levels and Cognition in the Young Adult: An Analysis of the Human Connectome Project Data. J Clin Endocrinol Metab 2022; 107:1897-1905. [PMID: 35389477 DOI: 10.1210/clinem/dgac189] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/15/2021] [Indexed: 11/19/2022]
Abstract
CONTEXT The nature of the relationship between serum thyrotropin (TSH) levels and higher cognitive abilities is unclear, especially within the normal reference range and in the younger population. OBJECTIVE To assess the relationship between serum TSH levels and mental health and sleep quality parameters (fluid intelligence [Gf], MMSE (Mini-Mental State Examination), depression scores, and, finally, Pittsburgh Sleep Quality Index (PSQI) scores (working memory, processing speed, and executive function) in young adults. METHODS This was a retrospective analysis of the data from the Human Connectome Project (HCP). The HCP consortium is seeking to map human brain circuits systematically and identify their relationship to behavior in healthy adults. Included were 391 female and 412 male healthy participants aged 22-35 years at the time of the screening interview. We excluded persons with serum TSH levels outside the reference range (0.4-4.5 mU/L). TSH was transformed logarithmically (log TSH). All the key variables were normalized and then linear regression analysis was performed to assess the relationship between log TSH as a cofactor and Gf as the dependent variable. Finally, a machine learning method, random forest regression, predicted Gf from the dependent variables (including alcohol and tobacco use). The main outcome was normalized Gf (nGf) and Gf scores. RESULTS Log TSH was a significant co-predictor of nGF in females (β = 0.31(±0.1), P < .01) but not in males. Random forest analysis showed that the model(s) had a better predictive value for females (r = 0.39, mean absolute error [MAE] = 0.81) than males (r = 0.24, MAE = 0.77). CONCLUSION Higher serum TSH levels might be associated with higher Gf scores in young women.
Collapse
Affiliation(s)
- Prasanna Santhanam
- Division of Endocrinology, Diabetes, & Metabolism, Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Tanmay Nath
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Martin A Lindquist
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - David S Cooper
- Division of Endocrinology, Diabetes, & Metabolism, Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| |
Collapse
|
25
|
A Geographically Weighted Random Forest Approach to Predict Corn Yield in the US Corn Belt. REMOTE SENSING 2022. [DOI: 10.3390/rs14122843] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
Abstract
Crop yield prediction before the harvest is crucial for food security, grain trade, and policy making. Previously, several machine learning methods have been applied to predict crop yield using different types of variables. In this study, we propose using the Geographically Weighted Random Forest Regression (GWRFR) approach to improve crop yield prediction at the county level in the US Corn Belt. We trained the GWRFR and five other popular machine learning algorithms (Multiple Linear Regression (MLR), Partial Least Square Regression (PLSR), Support Vector Regression (SVR), Decision Tree Regression (DTR), and Random Forest Regression (RFR)) with the following different sets of features: (1) full length features; (2) vegetation indices; (3) gross primary production (GPP); (4) climate data; and (5) soil data. We compared the results of the GWRFR with those of the other five models. The results show that the GWRFR with full length features (R2 = 0.90 and RMSE = 0.764 MT/ha) outperforms other machine learning algorithms. For individual categories of features such as GPP, vegetation indices, climate, and soil features, the GWRFR also outperforms other models. The Moran’s I value of the residuals generated by GWRFR is smaller than that of other models, which shows that GWRFR can better address the spatial non-stationarity issue. The proposed method in this article can also be potentially used to improve yield prediction for other types of crops in other regions.
Collapse
|
26
|
The advanced design of bioleaching process for metal recovery: A machine learning approach. Sep Purif Technol 2022. [DOI: 10.1016/j.seppur.2022.120919] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
27
|
Patiyal S, Dhall A, Raghava GPS. Prediction of risk-associated genes and high-risk liver cancer patients from their mutation profile: Benchmarking of mutation calling techniques. Biol Methods Protoc 2022; 7:bpac012. [PMID: 35734767 PMCID: PMC9204470 DOI: 10.1093/biomethods/bpac012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2021] [Revised: 05/20/2022] [Accepted: 05/20/2022] [Indexed: 11/12/2022] Open
Abstract
Abstract
Identification of somatic mutations with high precision is one of the major challenges in the prediction of high-risk liver-cancer patients. In the past, number of mutations calling techniques have been developed that include MuTect2, MuSE, Varscan2, and SomaticSniper. In this study, an attempt has been made to benchmark the potential of these techniques in predicting the prognostic biomarkers for liver cancer. Initially, we extracted somatic mutations in liver cancer patients using Variant Call Format (VCF) and Mutation Annotation Format (MAF) files from the cancer genome atlas. In terms of size, the MAF files are 42 times smaller than VCF files and containing only high-quality somatic mutations. Further, machine learning based models have been developed for predicting high-risk cancer patients using mutations obtained from different techniques. The performance of different techniques and data files have been compared based on their potential to discriminate high and low-risk liver-cancer patients. Based on correlation analysis, we selected 80 genes having significant negative-correlation with the overall survival of liver cancer patients. The univariate survival analysis revealed the prognostic role of highly mutated genes. Single-gene based analysis showed that MuTect2 technique based MAF file has achieved maximum hazard ratio (HRLAMC3) of 9.25 with p-value 1.78E-06. Further, we developed various prediction models using risk-associated top-10 genes for each technique. Our results indicate that MuTect2 technique based VCF files outperform all other methods with maximum Area Under the Receiver-Operating Characteristic (AUROC) curve of 0.765 and HR 4.50 (p-value 3.83E-15). Eventually, VCF file generated using MuTect2 technique performs better among other mutation calling techniques for the prediction of high-risk liver cancer patients. We hope that our findings will provide a useful and comprehensive comparison of various mutation calling techniques for the prognostic analysis of cancer patients. In order to serve the scientific community, we have provided a Python-based pipeline to develop the prediction models using mutation profiles (VCF/MAF) of cancer patients. It is available on GitHub at https://github.com/raghavagps/mutation_bench.
Collapse
Affiliation(s)
- Sumeet Patiyal
- Indraprastha Institute of Information Technology Department of Computational Biology, , Okhla Phase 3, New Delhi-110020, India
| | - Anjali Dhall
- Indraprastha Institute of Information Technology Department of Computational Biology, , Okhla Phase 3, New Delhi-110020, India
| | - Gajendra P S Raghava
- Indraprastha Institute of Information Technology Department of Computational Biology, , Okhla Phase 3, New Delhi-110020, India
| |
Collapse
|
28
|
Quantifying the reproducibility of graph neural networks using multigraph data representation. Neural Netw 2022; 148:254-265. [DOI: 10.1016/j.neunet.2022.01.018] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2021] [Revised: 01/10/2022] [Accepted: 01/26/2022] [Indexed: 11/20/2022]
|
29
|
A Meta-Model to Predict the Drag Coefficient of a Particle Translating in Viscoelastic Fluids: A Machine Learning Approach. Polymers (Basel) 2022; 14:polym14030430. [PMID: 35160419 PMCID: PMC8838701 DOI: 10.3390/polym14030430] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2021] [Revised: 01/14/2022] [Accepted: 01/18/2022] [Indexed: 02/06/2023] Open
Abstract
This study presents a framework based on Machine Learning (ML) models to predict the drag coefficient of a spherical particle translating in viscoelastic fluids. For the purpose of training and testing the ML models, two datasets were generated using direct numerical simulations (DNSs) for the viscoelastic unbounded flow of Oldroyd-B (OB-set containing 12,120 data points) and Giesekus (GI-set containing 4950 data points) fluids past a spherical particle. The kinematic input features were selected to be Reynolds number, 0<Re≤50, Weissenberg number, 0≤Wi≤10, polymeric retardation ratio, 0<ζ<1, and shear thinning mobility parameter, 0<α<1. The ML models, specifically Random Forest (RF), Deep Neural Network (DNN) and Extreme Gradient Boosting (XGBoost), were all trained, validated, and tested, and their best architecture was obtained using a 10-Fold cross-validation method. All the ML models presented remarkable accuracy on these datasets; however the XGBoost model resulted in the highest R2 and the lowest root mean square error (RMSE) and mean absolute percentage error (MAPE) measures. Additionally, a blind dataset was generated using DNSs, where the input feature coverage was outside the scope of the training set or interpolated within the training sets. The ML models were tested against this blind dataset, to further assess their generalization capability. The DNN model achieved the highest R2 and the lowest RMSE and MAPE measures when inferred on this blind dataset. Finally, we developed a meta-model using stacking technique to ensemble RF, XGBoost and DNN models and output a prediction based on the individual learner's predictions and a DNN meta-regressor. The meta-model consistently outperformed the individual models on all datasets.
Collapse
|
30
|
Merrigan JJ, Stone JD, Wagle JP, Hornsby WG, Ramadan J, Joseph M, Galster SM, Hagen JA. Using Random Forest Regression to Determine Influential Force-Time Metrics for Countermovement Jump Height: A Technical Report. J Strength Cond Res 2022; 36:277-283. [PMID: 34941613 DOI: 10.1519/jsc.0000000000004154] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
ABSTRACT Merrigan, JJ, Stone, JD, Wagle, JP, Hornsby, WG, Ramadan, J, Joseph, M, and Hagen, JA. Using random forest regression to determine influential force-time metrics for countermovement jump height: a technical report. J Strength Cond Res 36(1): 277-283, 2022-The purpose of this study was to indicate the most influential force-time metrics on countermovement jump (CMJ) height using multiple statistical procedures. Eighty-two National Collegiate Athletic Association Division I American football players performed 2 maximal-effort, no arm-swing, CMJs on force plates. The average absolute and relative (i.e., power/body mass) metrics were included as predictor variables, whereas jump height was the dependent variable within regression models (p < 0.05). Best subsets regression (8 metrics, R2 = 0.95) included less metrics compared with stepwise regression (18 metrics, R2 = 0.96), while explaining similar overall variance in jump height (p = 0.083). Random forest regression (RFR) models included 8 metrics, explained ∼93% of jump height variance, and were not significantly different than best subsets regression models (p > 0.05). Players achieved higher CMJs by attaining a deeper, faster, and more forceful countermovement with lower eccentric-to-concentric force ratios. An additional RFR was conducted on metrics scaled to body mass and revealed relative mean and peak concentric power to be the most influential. For exploratory purposes, additional RFR were run for each positional group and suggested that the most influential variables may differ across positions. Thus, developing power output capabilities and providing coaching to improve technique during the countermovement may maximize jump height capabilities. Scientists and practitioners may use best subsets or RFR analyses to help identify which force-time metrics are of interest to reduce the selectable number of multicollinear force-time metrics to monitor. These results may inform their training programs to maximize individual performance capabilities.
Collapse
Affiliation(s)
- Justin J Merrigan
- Human Performance Innovation Center, Rockefeller Neuroscience Institute, West Virginia University, Morgantown, West Virginia
| | - Jason D Stone
- Human Performance Innovation Center, Rockefeller Neuroscience Institute, West Virginia University, Morgantown, West Virginia
- College of Physical Activity and Sport Sciences, West Virginia University, Morgantown, West Virginia
| | | | - W G Hornsby
- Human Performance Innovation Center, Rockefeller Neuroscience Institute, West Virginia University, Morgantown, West Virginia
- College of Physical Activity and Sport Sciences, West Virginia University, Morgantown, West Virginia
| | - Jad Ramadan
- Human Performance Innovation Center, Rockefeller Neuroscience Institute, West Virginia University, Morgantown, West Virginia
| | - Michael Joseph
- Athletic Department, West Virginia University, Morgantown, West Virginia
| | - Scott M Galster
- Human Performance Innovation Center, Rockefeller Neuroscience Institute, West Virginia University, Morgantown, West Virginia
| | - Joshua A Hagen
- Human Performance Innovation Center, Rockefeller Neuroscience Institute, West Virginia University, Morgantown, West Virginia
| |
Collapse
|
31
|
Hearing loss versus vestibular loss as contributors to cognitive dysfunction. J Neurol 2022; 269:87-99. [PMID: 33387012 DOI: 10.1007/s00415-020-10343-2] [Citation(s) in RCA: 26] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2020] [Revised: 11/23/2020] [Accepted: 12/04/2020] [Indexed: 02/02/2023]
Abstract
In the last 5 years, there has been a surge in evidence that hearing loss (HL) may be a risk factor for cognitive dysfunction, including dementia. At the same time, there has been an increase in the number of studies implicating vestibular loss in cognitive dysfunction. Due to the fact that vestibular disorders often present with HL and other auditory disorders such as tinnitus, it has been suggested that, in many cases, what appears to be vestibular-related cognitive dysfunction may be due to HL (e.g., Dobbels et al. Front Neurol 11:710, 2020). This review analyses the studies of vestibular-related cognitive dysfunction which have controlled HL. It is suggested that despite the fact that many studies in the area have not controlled HL, many other studies have (~ 19/44 studies or 43%). Therefore, although there is certainly a need for further studies controlling HL, there is evidence to suggest that vestibular loss is associated with cognitive dysfunction, especially related to spatial memory. This is consistent with the overwhelming evidence from animal studies that the vestibular system transmits specific types of information about self-motion to structures such as the hippocampus.
Collapse
|
32
|
Papaioannou A, Kalantzi E, Papageorgiou CC, Korombili K, Bokou A, Pehlivanidis A, Papageorgiou CC, Papaioannou G. Differences in Performance of ASD and ADHD Subjects Facing Cognitive Loads in an Innovative Reasoning Experiment. Brain Sci 2021; 11:1531. [PMID: 34827530 PMCID: PMC8615740 DOI: 10.3390/brainsci11111531] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Revised: 11/09/2021] [Accepted: 11/09/2021] [Indexed: 11/17/2022] Open
Abstract
We aim to investigate whether EEG dynamics differ in adults with ASD (Autism Spectrum Disorders) and ADHD (attention-deficit/hyperactivity disorder) compared with healthy subjects during the performance of an innovative cognitive task, Aristotle's valid and invalid syllogisms, and how these differences correlate with brain regions and behavioral data for each subject. We recorded EEGs from 14 scalp electrodes (channels) in 21 adults with ADHD, 21 with ASD, and 21 healthy, normal subjects. The subjects were exposed in a set of innovative cognitive tasks (inducing varying cognitive loads), Aristotle's two types of syllogism mentioned above. A set of 39 questions were given to participants related to valid-invalid syllogisms as well as a separate set of questionnaires, in order to collect a number of demographic and behavioral data, with the aim of detecting shared information with values of a feature extracted from EEG, the multiscale entropy (MSE), in the 14 channels ('brain regions'). MSE, a nonlinear information-theoretic measure of complexity, was computed to extract a feature that quantifies the complexity of the EEG. Behavior-Partial Least Squares Correlation, PLSC, is the method to detect the correlation between two sets of data, brain, and behavioral measures. -PLSC, a variant of PLSC, was applied to build a functional connectivity of the brain regions involved in the reasoning tasks. Graph-theoretic measures were used to quantify the complexity of the functional networks. Based on the results of the analysis described in this work, a mixed 14 × 2 × 3 ANOVA showed significant main effects of group factor and brain region* syllogism factor, as well as a significant brain region* group interaction. There are significant differences between the means of MSE (complexity) values at the 14 channels of the members of the 'pathological' groups of participants, i.e., between ASD and ADHD, while the difference in means of MSE between both ASD and ADHD and that of the control group is not significant. In conclusion, the valid-invalid type of syllogism generates significantly different complexity values, MSE, between ASD and ADHD. The complexity of activated brain regions of ASD participants increased significantly when switching from a valid to an invalid syllogism, indicating the need for more resources to 'face' the task escalating difficulty in ASD subjects. This increase is not so evident in both ADHD and control. Statistically significant differences were found also in the behavioral response of ASD and ADHD, compared with those of control subjects, based on the principal brain and behavior saliences extracted by PLSC. Specifically, two behavioral measures, the emotional state and the degree of confidence of participants in answering questions in Aristotle's valid-invalid syllogisms, and one demographic variable, age, statistically and significantly discriminate the three groups' ASD. The seed-PLC generated functional connectivity networks for ASD, ADHD, and control, were 'projected' on the regions of the Default Mode Network (DMN), the 'reference' connectivity, of which the structural changes were found significant in distinguishing the three groups. The contribution of this work lies in the examination of the relationship between brain activity and behavioral responses of healthy and 'pathological' participants in the case of cognitive reasoning of the type of Aristotle's valid and invalid syllogisms, using PLSC, a machine learning approach combined with MSE, a nonlinear method of extracting a feature based on EEGs that captures a broad spectrum of EEGs linear and nonlinear characteristics. The results seem promising in adopting this type of reasoning, in the future, after further enhancements and experimental tests, as a supplementary instrument towards examining the differences in brain activity and behavioral responses of ASD and ADHD patients. The application of the combination of these two methods, after further elaboration and testing as new and complementary to the existing ones, may be considered as a tool of analysis in helping detecting more effectively such types of disorders.
Collapse
Affiliation(s)
- Anastasia Papaioannou
- 1st Department of Psychiatry, Eginition Hospital, Medical School, National University of Athens, 11528 Athens, Greece; (E.K.); (K.K.); (A.B.); (A.P.); (C.C.P.)
- Neurosciences and Precision Medicine Research Institute “COSTAS STEFANIS” (UMHRI), University Mental Health, Papagou, 15601 Athens, Greece
| | - Eva Kalantzi
- 1st Department of Psychiatry, Eginition Hospital, Medical School, National University of Athens, 11528 Athens, Greece; (E.K.); (K.K.); (A.B.); (A.P.); (C.C.P.)
| | | | - Kalliopi Korombili
- 1st Department of Psychiatry, Eginition Hospital, Medical School, National University of Athens, 11528 Athens, Greece; (E.K.); (K.K.); (A.B.); (A.P.); (C.C.P.)
| | - Anastasia Bokou
- 1st Department of Psychiatry, Eginition Hospital, Medical School, National University of Athens, 11528 Athens, Greece; (E.K.); (K.K.); (A.B.); (A.P.); (C.C.P.)
| | - Artemios Pehlivanidis
- 1st Department of Psychiatry, Eginition Hospital, Medical School, National University of Athens, 11528 Athens, Greece; (E.K.); (K.K.); (A.B.); (A.P.); (C.C.P.)
| | - Charalabos C. Papageorgiou
- 1st Department of Psychiatry, Eginition Hospital, Medical School, National University of Athens, 11528 Athens, Greece; (E.K.); (K.K.); (A.B.); (A.P.); (C.C.P.)
- Neurosciences and Precision Medicine Research Institute “COSTAS STEFANIS” (UMHRI), University Mental Health, Papagou, 15601 Athens, Greece
| | - George Papaioannou
- Center for Research of Nonlinear Systems (CRANS), Department of Mathematics, University of Patras, 26500 Patra, Greece;
| |
Collapse
|
33
|
Sandhu K, Patil SS, Pumphrey M, Carter A. Multitrait machine- and deep-learning models for genomic selection using spectral information in a wheat breeding program. THE PLANT GENOME 2021; 14:e20119. [PMID: 34482627 DOI: 10.1002/tpg2.20119] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/05/2021] [Accepted: 05/18/2021] [Indexed: 06/13/2023]
Abstract
Prediction of breeding values is central to plant breeding and has been revolutionized by the adoption of genomic selection (GS). Use of machine- and deep-learning algorithms applied to complex traits in plants can improve prediction accuracies. Because of the tremendous increase in collected data in breeding programs and the slow rate of genetic gain increase, it is required to explore the potential of artificial intelligence in analyzing the data. The main objectives of this study include optimization of multitrait (MT) machine- and deep-learning models for predicting grain yield and grain protein content in wheat (Triticum aestivum L.) using spectral information. This study compares the performance of four machine- and deep-learning-based unitrait (UT) and MT models with traditional genomic best linear unbiased predictor (GBLUP) and Bayesian models. The dataset consisted of 650 recombinant inbred lines (RILs) from a spring wheat breeding program grown for three years (2014-2016), and spectral data were collected at heading and grain filling stages. The MT-GS models performed 0-28.5 and -0.04 to 15% superior to the UT-GS models. Random forest and multilayer perceptron were the best performing machine- and deep-learning models to predict both traits. Four explored Bayesian models gave similar accuracies, which were less than machine- and deep-learning-based models and required increased computational time. Green normalized difference vegetation index (GNDVI) best predicted grain protein content in seven out of the nine MT-GS models. Overall, this study concluded that machine- and deep-learning-based MT-GS models increased prediction accuracy and should be employed in large-scale breeding programs.
Collapse
Affiliation(s)
- Karansher Sandhu
- Department of Crop and Soil Sciences, WA State University, Pullman, WA, 99164, USA
| | - Shruti Sunil Patil
- School of Electrical Engineering and Computer Science, WA State University, Pullman, WA, 99164, USA
| | - Michael Pumphrey
- Department of Crop and Soil Sciences, WA State University, Pullman, WA, 99164, USA
| | - Arron Carter
- Department of Crop and Soil Sciences, WA State University, Pullman, WA, 99164, USA
| |
Collapse
|
34
|
Aboveground Biomass Estimation in Short Rotation Forest Plantations in Northern Greece Using ESA’s Sentinel Medium-High Resolution Multispectral and Radar Imaging Missions. FORESTS 2021. [DOI: 10.3390/f12070902] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
Abstract
Plantations of fast-growing forest species such as black locust (Robinia Pseudoacacia) can contribute to energy transformation, mitigate industrial pollution, and restore degraded, marginal land. In this study, the synergistic use of Sentinel-2 and Sentinel-1 time series data is explored for modeling aboveground biomass (AGB) in black locust short-rotation plantations in northeastern Greece. Optimal modeling dates and EO sensor data are also identified through the analysis. Random forest (RF) models were originally developed using monthly Sentinel-2 spectral indices, while, progressively, monthly Sentinel-1 bands were incorporated in the statistical analysis. The highest accuracy was observed for the models generated using Sentinel-2 August composites (R2 = 0.52). The inclusion of Sentinel-1 bands in the spectral indices’ models had a negligible effect on modeling accuracy during the leaf-on period. The correlation and comparative performance of the spectral indices in terms of pairwise correlation with AGB varied among the phenophases of the forest plantations. Overall, the field-measured AGB in the forest plantations plots presented a higher correlation with the optical Sentinel-2 images. The synergy of Sentinel-1 and Sentinel-2 data proved to be a non-efficient approach for improving forest biomass RF models throughout the year within the geographical and environmental context of our study.
Collapse
|
35
|
Mapping 30 m Fractional Forest Cover over China’s Three-North Region from Landsat-8 Data Using Ensemble Machine Learning Methods. REMOTE SENSING 2021. [DOI: 10.3390/rs13132592] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
The accurate monitoring of forest cover and its changes are essential for environmental change research, but current satellite products for forest coverage carry many uncertainties. This study used 30-m Landsat-8 data, and aggregated 1-m GaoFen-2 (GF-2) satellite images to construct the training samples and used multiple machine learning algorithms (MLAs) to estimate the fractional forest cover (FFC) in China’s Three North Region (TNR). In this study, multiple MLAs were merged to construct stacked generalization (SG) models based on the idea of SG, and the performances of the MLAs in the FFC estimation were evaluated. The results of the 10-fold cross-validation showed that all non-linear algorithms had a good performance, with an R2 value of greater than 0.8 and a root-mean square error (RMSE) of less than 0.05. In the bagging ensemble, the random forest (RF) (R2 = 0.993, RMSE = 0.020) model performed the best and in the boosting ensemble, the light gradient boosted machine (LGBM) (R2 = 0.992, RMSE = 0.022) performed the best. Although the evaluation index of the RF is slightly better than that of the LGBM, the independent validation results show that the two models have similar performances. The model evaluation results of the independent datasets showed that, in the SG model, the performance of the SG(LGBM) (R2 = 0.991, RMSE = 0.034) was better than that of the single or non-ensemble model. Comparing the FFC estimates of our model with those of existing datasets showed that our model exhibited more forest spatial distribution details and higher accuracy in complex landscapes. Overall, in this study, the method of using high-resolution remote sensing (RS) images to extract samples for FFC estimation is feasible. Our results demonstrate the potential of the ensemble MLAs to map the FFC. The research results also show that among many MALs, the RF algorithm is the most suitable algorithm for estimating FFC, which provides a reference for future research.
Collapse
|
36
|
Smith PF, Zheng Y. Applications of Multivariate Statistical and Data Mining Analyses to the Search for Biomarkers of Sensorineural Hearing Loss, Tinnitus, and Vestibular Dysfunction. Front Neurol 2021; 12:627294. [PMID: 33746881 PMCID: PMC7966509 DOI: 10.3389/fneur.2021.627294] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2020] [Accepted: 02/01/2021] [Indexed: 11/24/2022] Open
Abstract
Disorders of sensory systems, as with most disorders of the nervous system, usually involve the interaction of multiple variables to cause some change, and yet often basic sensory neuroscience data are analyzed using univariate statistical analyses only. The exclusive use of univariate statistical procedures, analyzing one variable at a time, may limit the potential of studies to determine how interactions between variables may, as a network, determine a particular result. The use of multivariate statistical and data mining methods provides the opportunity to analyse many variables together, in order to appreciate how they may function as a system of interacting variables, and how this system or network may change as a result of sensory disorders such as sensorineural hearing loss, tinnitus or different types of vestibular dysfunction. Here we provide an overview of the potential applications of multivariate statistical and data mining techniques, such as principal component and factor analysis, cluster analysis, multiple linear regression, random forest regression, linear discriminant analysis, support vector machines, random forest classification, Bayesian classification, and orthogonal partial least squares discriminant analysis, to the study of auditory and vestibular dysfunction, with an emphasis on classification analytic methods that may be used in the search for biomarkers of disease.
Collapse
Affiliation(s)
- Paul F. Smith
- Department of Pharmacology and Toxicology, Brain Health Research Centre, School of Biomedical Sciences, University of Otago, Dunedin, New Zealand
- Brain Research New Zealand Centre of Research Excellence, University of Auckland, Auckland, New Zealand
- The Eisdell Moore Centre for Hearing and Balance Research, University of Auckland, Auckland, New Zealand
| | - Yiwen Zheng
- Department of Pharmacology and Toxicology, Brain Health Research Centre, School of Biomedical Sciences, University of Otago, Dunedin, New Zealand
- Brain Research New Zealand Centre of Research Excellence, University of Auckland, Auckland, New Zealand
- The Eisdell Moore Centre for Hearing and Balance Research, University of Auckland, Auckland, New Zealand
| |
Collapse
|
37
|
Sep MSC, Joëls M, Geuze E. Individual differences in the encoding of contextual details following acute stress: An explorative study. Eur J Neurosci 2020; 55:2714-2738. [PMID: 33249674 PMCID: PMC9291333 DOI: 10.1111/ejn.15067] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2020] [Revised: 11/05/2020] [Accepted: 11/21/2020] [Indexed: 12/19/2022]
Abstract
Information processing under stressful circumstances depends on many experimental conditions, like the information valence or the point in time at which brain function is probed. This also holds true for memorizing contextual details (or ‘memory contextualization’). Moreover, large interindividual differences appear to exist in (context‐dependent) memory formation after stress, but it is mostly unknown which individual characteristics are essential. Various characteristics were explored from a theory‐driven and data‐driven perspective, in 120 healthy men. In the theory‐driven model, we postulated that life adversity and trait anxiety shape the stress response, which impacts memory contextualization following acute stress. This was indeed largely supported by linear regression analyses, showing significant interactions depending on valence and time point after stress. Thus, during the acutephase of the stress response, reduced neutral memory contextualization was related to salivary cortisol level; moreover, certain individual characteristics correlated with memory contextualization of negatively valenced material: (a) life adversity, (b) α‐amylase reactivity in those with low life adversity and (c) cortisol reactivity in those with low trait anxiety. Better neutral memory contextualization during the recoveryphase of the stress response was associated with (a) cortisol in individuals with low life adversity and (b) α‐amylase in individuals with high life adversity. The data‐driven Random Forest‐based variable selection also pointed to (early) life adversity—during the acutephase—and (moderate) α‐amylase reactivity—during the recoveryphase—as individual characteristics related to better memory contextualization. Newly identified characteristics sparked novel hypotheses about non‐anxious personality traits, age, mood and states during retrieval of context‐related information.
Collapse
Affiliation(s)
- Milou S C Sep
- Brain Research and Innovation Centre, Ministry of Defence, Utrecht, The Netherlands.,Department of Translational Neuroscience, UMC Utrecht Brain Center, Utrecht University, Utrecht, The Netherlands
| | - Marian Joëls
- Department of Translational Neuroscience, UMC Utrecht Brain Center, Utrecht University, Utrecht, The Netherlands.,University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| | - Elbert Geuze
- Brain Research and Innovation Centre, Ministry of Defence, Utrecht, The Netherlands.,Department of Psychiatry, UMC Utrecht Brain Center, Utrecht University, Utrecht, The Netherlands
| |
Collapse
|
38
|
Tian Y, He YL, Zhu QX. Soft Sensor Development Using Improved Whale Optimization and Regularization-Based Functional Link Neural Network. Ind Eng Chem Res 2020. [DOI: 10.1021/acs.iecr.0c03839] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Ye Tian
- College of Information Science & Technology, Beijing University of Chemical Technology, Beijing 100029, China
- Engineering Research Center of Intelligent PSE, Ministry of Education of China, Beijing 100029, China
| | - Yan-Lin He
- College of Information Science & Technology, Beijing University of Chemical Technology, Beijing 100029, China
- Engineering Research Center of Intelligent PSE, Ministry of Education of China, Beijing 100029, China
| | - Qun-Xiong Zhu
- College of Information Science & Technology, Beijing University of Chemical Technology, Beijing 100029, China
- Engineering Research Center of Intelligent PSE, Ministry of Education of China, Beijing 100029, China
| |
Collapse
|
39
|
Hisham S, Rasheed SA, Dsouza B. Application of Predictive Modelling to Improve the Discharge Process in Hospitals. Healthc Inform Res 2020; 26:166-174. [PMID: 32819034 PMCID: PMC7438692 DOI: 10.4258/hir.2020.26.3.166] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2020] [Accepted: 07/21/2001] [Indexed: 11/23/2022] Open
Abstract
Objective To find out the factors influencing discharge process turnaround time (TAT) and to accurately predict the discharge process TAT. Methods The discharge process of cardiology department inpatients in a tertiary care hospital was mapped over a month. The likely factors influencing discharge TAT were tested for significance by ANOVA. Multiple linear regression (MLR) was used to predict the TAT. The sample was divided into testing and training sets for regression. A model was generated using the training set and compared with the testing set for accuracy. RESULTS After a process map was plotted, the significant factors influencing the TAT were identified to be the treating doctor, and pending evaluations on the day of discharge. The MLR model was developed with Python libraries based on the two factors identified. The model predicted the discharge TAT with a 69% R2 value and 32.4 minutes (standard error) on the testing set and a 77.3% R2 value and 26.7 minutes (standard error) on the overall sample. Conclusion This study was an initiation to find out factors influencing discharge TAT and how those factors can be used to predict discharge in the hospital of interest. The study was validated and predicted the TAT with 77% accuracy after the significant factors that affect the discharge process were identified.
Collapse
Affiliation(s)
- Sayed Hisham
- Healthcare Analytics, Baby Memorial Hospital, Kozhikode, India
| | - Shahina Abdul Rasheed
- Prasanna School of Public Health, Manipal Academy of Higher Education, Manipal, India
| | - Brayal Dsouza
- Prasanna School of Public Health, Manipal Academy of Higher Education, Manipal, India
| |
Collapse
|
40
|
A Random Forest Modelling Procedure for a Multi-Sensor Assessment of Tree Species Diversity. REMOTE SENSING 2020. [DOI: 10.3390/rs12071210] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Earth observation data can provide important information for tree species diversity mapping and monitoring. The relatively recent advances in remote sensing data characteristics and processing systems elevate the potential of satellite imagery for providing accurate, timely, consistent, and robust spatially explicit estimates of tree species diversity over forest ecosystems. This study was conducted in Northern Pindos National Park, the largest terrestrial park in Greece and aimed to assess the potential of four satellite sensors with different instrumental characteristics, for the estimation of tree diversity. Through field measurements, we originally quantified two diversity indices, namely the Shannon diversity index (H’) and Simpson’s diversity (D1). Random forest regression models were developed for associating remotely sensed spectral signal with tree species diversity within the area. The models generated from the use of the WorldView-2 image were the most accurate with a coefficient of determination of up to 0.44 for H’ and 0.37 for D1. The Sentinel-2 -based models of tree species diversity performed slightly worse, but were better than the Landsat-8 and RapidEye models. The coefficient of variation quantifying internal variability of spectral values within each plot provided little or no usage for improving the modelling accuracy. Our results suggest that very-high-spatial-resolution imagery provides the most important information for the assessment of tree species diversity in heterogeneous Mediterranean ecosystems.
Collapse
|
41
|
Zhang S, Tan Z, Liu J, Xu Z, Du Z. Determination of the food dye indigotine in cream by near-infrared spectroscopy technology combined with random forest model. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2020; 227:117551. [PMID: 31677907 DOI: 10.1016/j.saa.2019.117551] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/10/2019] [Revised: 09/09/2019] [Accepted: 09/17/2019] [Indexed: 06/10/2023]
Abstract
Artificial pigment is a common food additive in cream products. If added in excess, it will do harm to human body. At present, there is no research on the detection of cream pigment by Near Infrared (NIR) spectroscopy. In this paper, a method based on random forest was applied to determine the indigotine in cream. Weighting in the experiments was accomplished using analytical balances with precision as low as 0.0001 g. The NIR spectra data of cream with different concentration of indigotine were recorded. The original spectra was pretreated by SG smoothing, mean centering and second derivative. Random forest was applied to establish a quantitative analysis model for cream pigment content, and multiple evaluation criteria were selected to comprehensively evaluate the model. The R2 was 0.9402, RMSEP was 0.2509 and RPD was 4.0893. Consequently, NIR spectroscopy, combined with data pretreatments and random forest model, was confirmed to be an interesting tool for non-destructive evaluation of pigment content in cream.
Collapse
Affiliation(s)
- Supei Zhang
- School of Computer Science & Engineering, Wuhan Institute of Technology, Wuhan, 430205, China
| | - Zhenglin Tan
- Department of Cuisine and Nutrition, Hubei University of Economics, Wuhan, 430205, China.
| | - Jun Liu
- Hubei Key Laboratory of Intelligent Robot, Wuhan Institute of Technology, Wuhan, 430205, China; School of Computer Science & Engineering, Wuhan Institute of Technology, Wuhan, 430205, China
| | - Zihan Xu
- School of Computer Science & Engineering, Wuhan Institute of Technology, Wuhan, 430205, China
| | - Zhuang Du
- School of Computer Science & Engineering, Wuhan Institute of Technology, Wuhan, 430205, China
| |
Collapse
|
42
|
Yuchi W, Gombojav E, Boldbaatar B, Galsuren J, Enkhmaa S, Beejin B, Naidan G, Ochir C, Legtseg B, Byambaa T, Barn P, Henderson SB, Janes CR, Lanphear BP, McCandless LC, Takaro TK, Venners SA, Webster GM, Allen RW. Evaluation of random forest regression and multiple linear regression for predicting indoor fine particulate matter concentrations in a highly polluted city. ENVIRONMENTAL POLLUTION (BARKING, ESSEX : 1987) 2019; 245:746-753. [PMID: 30500754 DOI: 10.1016/j.envpol.2018.11.034] [Citation(s) in RCA: 44] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/22/2018] [Revised: 11/01/2018] [Accepted: 11/11/2018] [Indexed: 05/14/2023]
Abstract
BACKGROUND Indoor and outdoor fine particulate matter (PM2.5) are both leading risk factors for death and disease, but making indoor measurements is often infeasible for large study populations. METHODS We developed models to predict indoor PM2.5 concentrations for pregnant women who were part of a randomized controlled trial of portable air cleaners in Ulaanbaatar, Mongolia. We used multiple linear regression (MLR) and random forest regression (RFR) to model indoor PM2.5 concentrations with 447 independent 7-day PM2.5 measurements and 87 potential predictor variables obtained from outdoor monitoring data, questionnaires, home assessments, and geographic data sets. We also developed blended models that combined the MLR and RFR approaches. All models were evaluated in a 10-fold cross-validation. RESULTS The predictors in the MLR model were season, outdoor PM2.5 concentration, the number of air cleaners deployed, and the density of gers (traditional felt-lined yurts) surrounding the apartments. MLR and RFR had similar performance in cross-validation (R2 = 50.2%, R2 = 48.9% respectively). The blended MLR model that included RFR predictions had the best performance (cross validation R2 = 81.5%). Intervention status alone explained only 6.0% of the variation in indoor PM2.5 concentrations. CONCLUSIONS We predicted a moderate amount of variation in indoor PM2.5 concentrations using easily obtained predictor variables and the models explained substantially more variation than intervention status alone. While RFR shows promise for modelling indoor concentrations, our results highlight the importance of out-of-sample validation when evaluating model performance. We also demonstrate the improved performance of blended MLR/RFR models in predicting indoor air pollution.
Collapse
Affiliation(s)
- Weiran Yuchi
- Faculty of Health Sciences, Simon Fraser University, 8888 University Drive, Burnaby, BC, V5A 1S6, Canada
| | - Enkhjargal Gombojav
- School of Public Health, Mongolian National University of Medical Sciences, Zorig Street, Ulaanbaatar, 14210, Mongolia
| | - Buyantushig Boldbaatar
- School of Public Health, Mongolian National University of Medical Sciences, Zorig Street, Ulaanbaatar, 14210, Mongolia
| | - Jargalsaikhan Galsuren
- School of Public Health, Mongolian National University of Medical Sciences, Zorig Street, Ulaanbaatar, 14210, Mongolia
| | - Sarangerel Enkhmaa
- Institute of Meteorology and Environmental Monitoring, Ministry of Environment of Mongolia, Mongolia
| | - Bolor Beejin
- Mongolian National Center for Public Health, Olympic Street 2, Ulaanbaatar, Mongolia
| | - Gerel Naidan
- School of Public Health, Mongolian National University of Medical Sciences, Zorig Street, Ulaanbaatar, 14210, Mongolia
| | - Chimedsuren Ochir
- School of Public Health, Mongolian National University of Medical Sciences, Zorig Street, Ulaanbaatar, 14210, Mongolia
| | - Bayarkhuu Legtseg
- Sukhbaatar District Health Center, 11 Horoo, Tsagdaagiin Gudamj, Sukhbaatar District, Ulaanbaatar, Mongolia
| | - Tsogtbaatar Byambaa
- Ministry of Health of Mongolia, Olympic Street-2, Government Building VIII, Sukhbaatar District, Ulaanbaatar, Mongolia
| | - Prabjit Barn
- Faculty of Health Sciences, Simon Fraser University, 8888 University Drive, Burnaby, BC, V5A 1S6, Canada
| | - Sarah B Henderson
- Environmental Health Services, British Columbia Centre for Disease Control, 655 W. 12th Ave, Vancouver, BC, V5T 4R4, Canada
| | - Craig R Janes
- School of Public Health and Health Systems, University of Waterloo, 200 University Avenue West, Waterloo, ON, N2L 3G1, Canada
| | - Bruce P Lanphear
- Faculty of Health Sciences, Simon Fraser University, 8888 University Drive, Burnaby, BC, V5A 1S6, Canada
| | - Lawrence C McCandless
- Faculty of Health Sciences, Simon Fraser University, 8888 University Drive, Burnaby, BC, V5A 1S6, Canada
| | - Tim K Takaro
- Faculty of Health Sciences, Simon Fraser University, 8888 University Drive, Burnaby, BC, V5A 1S6, Canada
| | - Scott A Venners
- Faculty of Health Sciences, Simon Fraser University, 8888 University Drive, Burnaby, BC, V5A 1S6, Canada
| | - Glenys M Webster
- Faculty of Health Sciences, Simon Fraser University, 8888 University Drive, Burnaby, BC, V5A 1S6, Canada
| | - Ryan W Allen
- Faculty of Health Sciences, Simon Fraser University, 8888 University Drive, Burnaby, BC, V5A 1S6, Canada.
| |
Collapse
|
43
|
GIS-Based Random Forest Weight for Rainfall-Induced Landslide Susceptibility Assessment at a Humid Region in Southern China. WATER 2018. [DOI: 10.3390/w10081019] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
Landslide susceptibility assessment is presently considered an effective tool for landslide warning and forecasting. Under the assessment procedure, a credible index weight can greatly increase the rationality of the assessment result. Using the Beijiang River Basin, China, as a case study, this paper proposes a new weight-determining method based on random forest (RF) and used the weighted linear combination (WLC) to evaluate the landslide susceptibility. The RF weight and eight indices were used to construct the assessment model. As a comparison, the entropy weight (EW) and weight determined by analytic hierarchy process (AHP) were also used, respectively, to demonstrate the rationality of the proposed weight-determining method. The results show that: (1) the average error rates of training and testing based on RF are 18.12% and 15.83%, respectively, suggesting that the RF model can be considered rational and credible; (2) RF ranks the indices elevation (EL), slope (SL), maximum one-day precipitation (M1DP) and distance to fault (DF) as the Top 4 most important of the eight indices, occupying 73.24% of the total, while the indices runoff coefficient (RC), normalized difference vegetation index (NDVI), shear resistance capacity (SRC) and available water capacity (AWC) are less consequential, with an index importance degree of only 26.76% of the total; and (3) the verification of landslide susceptibility indicates that the accuracy rate based on the RF weight reaches 75.41% but are only 59.02% and 72.13% for the other two weights (EW and AHP), respectively. This paper shows the potential to provide a new weight-determining method for landslide susceptibility assessment. Evaluation results are expected to provide a reference for landslide management, prevention and reduction in the studied basin.
Collapse
|
44
|
Zhao D, Wu Q. An approach to predict the height of fractured water-conducting zone of coal roof strata using random forest regression. Sci Rep 2018; 8:10986. [PMID: 30030501 PMCID: PMC6054685 DOI: 10.1038/s41598-018-29418-2] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2018] [Accepted: 06/08/2018] [Indexed: 12/03/2022] Open
Abstract
Water inrushes from coal-roof strata account for a great proportion of coal mine accidents, and the height of fractured water-conducting zone (FWCZ) is of significant importance for the safe production of coal mines. A novel and promising model for predicting the height of FWCZ was proposed based on random forest regression (RFR), which is a powerful intelligent machine learning algorithm. RFR has high prediction accuracy and is robust in dealing with the complicated and non-linear problems. Also, it can evaluate the importance of the variables. In this study, the proposed model was applied to Hongliu Coal Mine in Northwest China. 85 field measured samples were collected in total, with 60 samples (70%) used for training and 20 (30%) used for validation. For comparison, a support vector machine (SVM) model was also constructed for the prediction. The results show that the two models are in accordance with the field measured data, and RFR shows a better performance on good tolerance to outliers and noises and efficiently on high-dimensional data sets. It is demonstrated that RFR is more practicable and accurate to predict the height of FWCZ. The achievements will be helpful in preventing and controlling the water inrushes from coal-roof strata, and also can be extended to various engineering applications.
Collapse
Affiliation(s)
- Dekang Zhao
- College of Geoscience and Surveying Engineering, China University of Mining & Technology (Beijing), Beijing, 100083, China.,National Engineering Research Center of Coal Mine Water Hazard Controlling, Beijing, 100083, China
| | - Qiang Wu
- College of Geoscience and Surveying Engineering, China University of Mining & Technology (Beijing), Beijing, 100083, China. .,National Engineering Research Center of Coal Mine Water Hazard Controlling, Beijing, 100083, China.
| |
Collapse
|
45
|
Liu B, Stevenson RJ. Improving assessment accuracy for lake biological condition by classifying lakes with diatom typology, varying metrics and modeling multimetric indices. THE SCIENCE OF THE TOTAL ENVIRONMENT 2017; 609:263-271. [PMID: 28750229 DOI: 10.1016/j.scitotenv.2017.07.152] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/18/2017] [Revised: 07/16/2017] [Accepted: 07/17/2017] [Indexed: 06/07/2023]
Abstract
Site grouping by regions or typologies, site-specific modeling and varying metrics among site groups are four approaches that account for natural variation, which can be a major source of error in ecological assessments. Using a data set from the 2007 National Lakes Assessment project of the USEPA, we compared performances of multimetric indices (MMI) of biological condition that were developed: (1) with different lake grouping methods, ecoregions or diatom typologies; (2) by varying or not varying metrics among site groups; and (3) with different statistical techniques for modeling diatom metric values expected for minimally disturbed condition for each lake. Hierarchical modeling of MMIs, i.e. grouping sites by ecoregions or typologies and then modeling natural variability in metrics among lakes within groups, substantially improved MMI performance compared to using either ecoregions or site-specific modeling alone. Compared with MMIs based on ecoregion site groups, MMI precision and sensitivity to human disturbance were better when sites were grouped by diatom typologies and assessing performance nationwide. However, when MMI performance was evaluated at site group levels, as some government agencies often do, there was little difference in MMI performance between the two site grouping methods. Low numbers of reference and highly impacted sites in some typology groups likely limited MMI performance at the group level of analysis. Varying metrics among site groups did not improve MMI performance. Random forest models for site-specific expected metric values performed better than classification and regression tree and multiple linear regression, except when numbers of reference sites were small in site groups. Then classification and regression tree models were most precise. Based on our results, we recommend hierarchical modeling in future large scale lake assessments where lakes are grouped by ecoregions or diatom typologies and site-specific metric models are used to establish expected metric values.
Collapse
Affiliation(s)
- Bo Liu
- Department of Integrative Biology, Michigan State University, East Lansing, MI 48824, USA.
| | - R Jan Stevenson
- Department of Integrative Biology, Michigan State University, East Lansing, MI 48824, USA; Center for Water Sciences, Department of Integrative Biology, Michigan State University, East Lansing, MI 48824, USA.
| |
Collapse
|
46
|
Dimitriadis SI, Liparas D, Tsolaki MN. Random forest feature selection, fusion and ensemble strategy: Combining multiple morphological MRI measures to discriminate among healhy elderly, MCI, cMCI and alzheimer's disease patients: From the alzheimer's disease neuroimaging initiative (ADNI) database. J Neurosci Methods 2017; 302:14-23. [PMID: 29269320 DOI: 10.1016/j.jneumeth.2017.12.010] [Citation(s) in RCA: 73] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2017] [Revised: 12/14/2017] [Accepted: 12/17/2017] [Indexed: 02/06/2023]
Abstract
BACKGROUND In the era of computer-assisted diagnostic tools for various brain diseases, Alzheimer's disease (AD) covers a large percentage of neuroimaging research, with the main scope being its use in daily practice. However, there has been no study attempting to simultaneously discriminate among Healthy Controls (HC), early mild cognitive impairment (MCI), late MCI (cMCI) and stable AD, using features derived from a single modality, namely MRI. NEW METHOD Based on preprocessed MRI images from the organizers of a neuroimaging challenge,3 we attempted to quantify the prediction accuracy of multiple morphological MRI features to simultaneously discriminate among HC, MCI, cMCI and AD. We explored the efficacy of a novel scheme that includes multiple feature selections via Random Forest from subsets of the whole set of features (e.g. whole set, left/right hemisphere etc.), Random Forest classification using a fusion approach and ensemble classification via majority voting. From the ADNI database, 60 HC, 60 MCI, 60 cMCI and 60 CE were used as a training set with known labels. An extra dataset of 160 subjects (HC: 40, MCI: 40, cMCI: 40 and AD: 40) was used as an external blind validation dataset to evaluate the proposed machine learning scheme. RESULTS In the second blind dataset, we succeeded in a four-class classification of 61.9% by combining MRI-based features with a Random Forest-based Ensemble Strategy. We achieved the best classification accuracy of all teams that participated in this neuroimaging competition. COMPARISON WITH EXISTING METHOD(S) The results demonstrate the effectiveness of the proposed scheme to simultaneously discriminate among four groups using morphological MRI features for the very first time in the literature. CONCLUSIONS Hence, the proposed machine learning scheme can be used to define single and multi-modal biomarkers for AD.
Collapse
Affiliation(s)
- S I Dimitriadis
- Neuroscience and Mental Health Research Institute, Cardiff University, Cardiff, UK; Cardiff University Brain Research Imaging Centre (CUBRIC), School of Psychology, Cardiff University, Cardiff, UK; MRC Centre for Neuropsychiatric Genetics and Genomics, Institute of Psychological Medicine and Clinical Neurosciences, Cardiff School of Medicine, Cardiff University, Cardiff, UK; Neuroinformatics Group, (CUBRIC), School of Psychology, Cardiff University, Cardiff, UK; School of Psychology, Cardiff University, Cardiff, UK; 3rd Department of Neurology, Medical School, Aristotle University of Thessaloniki, Thessaloniki, Greece.
| | - Dimitris Liparas
- High Performance Computing Center Stuttgart (HLRS), University of Stuttgart, Stuttgart, Germany; Department of Informatics, Aristotle University of Thessaloniki, Thessaloniki, Greece.
| | - Magda N Tsolaki
- School of Psychology, Cardiff University, Cardiff, UK; 3rd Department of Neurology, Medical School, Aristotle University of Thessaloniki, Thessaloniki, Greece
| | | |
Collapse
|
47
|
Evaluating Site-Specific and Generic Spatial Models of Aboveground Forest Biomass Based on Landsat Time-Series and LiDAR Strip Samples in the Eastern USA. REMOTE SENSING 2017. [DOI: 10.3390/rs9060598] [Citation(s) in RCA: 30] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
48
|
Swanson A, Willette AA. Neuronal Pentraxin 2 predicts medial temporal atrophy and memory decline across the Alzheimer's disease spectrum. Brain Behav Immun 2016; 58:201-208. [PMID: 27444967 PMCID: PMC5349324 DOI: 10.1016/j.bbi.2016.07.148] [Citation(s) in RCA: 47] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/14/2016] [Revised: 06/29/2016] [Accepted: 07/16/2016] [Indexed: 12/21/2022] Open
Abstract
Chronic neuroinflammation is thought to potentiate medial temporal lobe (MTL) atrophy and memory decline in Alzheimer's disease (AD). It has become increasingly important to find novel immunological biomarkers of neuroinflammation or other processes that can track AD development and progression. Our study explored which pro- or anti-inflammatory cerebrospinal fluid (CSF) biomarkers best predicted AD neuropathology over 24months. Using Alzheimer's Disease Neuroimaging Initiative data (N=285), CSF inflammatory biomarkers from mass spectrometry and multiplex panels were screened using stepwise regression, followed up with 50%/50% model retests for validation. Neuronal Pentraxin 2 (NPTX2) and Chitinase-3-like-protein-1 (C3LP1), biomarkers of glutamatergic synaptic plasticity and microglial activation respectively, were the only consistently significant biomarkers selected. Once these biomarkers were selected, linear mixed models were used to analyze their baseline and longitudinal associations with bilateral MTL volume, memory decline, global cognition, and established AD biomarkers including CSF amyloid and tau. Higher baseline NPTX2 levels corresponded to less MTL atrophy [R2=0.287, p<0.001] and substantially less memory decline [R2=0.560, p<0.001] by month 24. Conversely, higher C3LP1 modestly predicted more MTL atrophy [R2=0.083, p<0.001], yet did not significantly track memory decline over time. In conclusion, NPTX2 is a novel pro-inflammatory cytokine that predicts AD-related outcomes better than any immunological biomarker to date, substantially accounting for brain atrophy and especially memory decline. C3LP1 as the microglial biomarker, by contrast, performed modestly and did not predict longitudinal memory decline. This research may advance the current understanding of AD etiopathogenesis, while expanding early diagnostic techniques through the use of novel pro-inflammatory biomarkers, such as NPTX2. Future studies should also see if NPTX2 causally affects MTL morphometry and memory performance.
Collapse
Affiliation(s)
- Ashley Swanson
- Department of Food Science and Human Nutrition, Iowa State University, Ames, IA, United States
| | - A A Willette
- Department of Food Science and Human Nutrition, Iowa State University, Ames, IA, United States; Department of Psychology, Iowa State University, Ames, IA, United States; Aging Mind and Brain Institute, University of Iowa, Iowa City, IA, United States.
| |
Collapse
|
49
|
Smith PF. Age-Related Neurochemical Changes in the Vestibular Nuclei. Front Neurol 2016; 7:20. [PMID: 26973593 PMCID: PMC4776078 DOI: 10.3389/fneur.2016.00020] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2015] [Accepted: 02/09/2016] [Indexed: 12/18/2022] Open
Abstract
There is evidence that the normal aging process is associated with impaired vestibulo-ocular reflexes (VOR) and vestibulo-spinal reflexes, causing reduced visual acuity and postural instability. Nonetheless, the available evidence is not entirely consistent, especially with respect to the VOR. Some recent studies have reported that VOR gain can be intact even above 80 years of age. Similarly, although there is evidence for age-related hair cell loss and neuronal loss in Scarpa's ganglion and the vestibular nucleus complex (VNC), it is not entirely consistent. Whatever structural and functional changes occur in the VNC as a result of aging, either to cause vestibular impairment or to compensate for it, neurochemical changes must underlie them. However, the neurochemical changes that occur in the VNC with aging are poorly understood because the available literature is very limited. This review summarizes and critically evaluates the available evidence relating to the noradrenaline, serotonin, dopamine, glutamate, GABA, glycine, and nitric oxide neurotransmitter systems in the aging VNC. It is concluded that, at present, it is difficult, if not impossible, to relate the neurochemical changes observed to the function of specific VNC neurons and whether the observed changes are the cause of a functional deficit in the VNC or an effect of it. A better understanding of the neurochemical changes that occur during aging may be important for the development of potential drug treatments for age-related vestibular disorders. However, this will require the use of more sophisticated methodology such as in vivo microdialysis with single neuron recording and perhaps new technologies such as optogenetics.
Collapse
Affiliation(s)
- Paul F Smith
- Department of Pharmacology and Toxicology, School of Medical Sciences and Brain Health Research Centre, University of Otago , Dunedin , New Zealand
| |
Collapse
|
50
|
Tetschke F, Schneider U, Schleussner E, Witte OW, Hoyer D. Assessment of fetal maturation age by heart rate variability measures using random forest methodology. Comput Biol Med 2016; 70:157-162. [PMID: 26848727 DOI: 10.1016/j.compbiomed.2016.01.020] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2015] [Revised: 01/14/2016] [Accepted: 01/16/2016] [Indexed: 11/17/2022]
Abstract
Fetal maturation age assessment based on heart rate variability (HRV) is a predestinated tool in prenatal diagnosis. To date, almost linear maturation characteristic curves are used in univariate and multivariate models. Models using complex multivariate maturation characteristic curves are pending. To address this problem, we use Random Forest (RF) to assess fetal maturation age and compare RF with linear, multivariate age regression. We include previously developed HRV indices such as traditional time and frequency domain indices and complexity indices of multiple scales. We found that fetal maturation was best assessed by complexity indices of short scales and skewness in state-dependent datasets (quiet sleep, active sleep) as well as in state-independent recordings. Additionally, increasing fluctuation amplitude contributed to the model in the active sleep state. None of the traditional linear HRV parameters contributed to the RF models. Compared to linear, multivariate regression, the mean prediction of gestational age (GA) is more accurate with RF than in linear, multivariate regression (quiet state: R(2)=0,617 vs. R(2)=0,461, active state: R(2)=0,521 vs. R(2)=0,436, state independent: R(2)=0,583 vs. R(2)=0,548). We conclude that classification and regression tree models such as RF methodology are appropriate for the evaluation of fetal maturation age. The decisive role of adjustments between different time scales of complexity may essentially extend previous analysis concepts mainly based on rhythms and univariate complexity indices. Those system characteristics may have implication for better understanding and accessibility of the maturating complex autonomic control and its disturbance.
Collapse
Affiliation(s)
- F Tetschke
- Biomagnetic Center, Hans Berger Department of Neurology, Jena University Hospital, Friedrich Schiller University, Jena, Germany.
| | - U Schneider
- Department of Obstetrics, Jena University Hospital, Friedrich Schiller University, Jena, Germany
| | - E Schleussner
- Department of Obstetrics, Jena University Hospital, Friedrich Schiller University, Jena, Germany
| | - O W Witte
- Biomagnetic Center, Hans Berger Department of Neurology, Jena University Hospital, Friedrich Schiller University, Jena, Germany
| | - D Hoyer
- Biomagnetic Center, Hans Berger Department of Neurology, Jena University Hospital, Friedrich Schiller University, Jena, Germany
| |
Collapse
|