1
|
Majd E, Xing L, Zhang X. Segmentation of patients with small cell lung cancer into responders and non-responders using the optimal cross-validation technique. BMC Med Res Methodol 2024; 24:83. [PMID: 38589775 PMCID: PMC11000309 DOI: 10.1186/s12874-024-02185-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2022] [Accepted: 02/20/2024] [Indexed: 04/10/2024] Open
Abstract
BACKGROUND The timing of treating cancer patients is an essential factor in the efficacy of treatment. So, patients who will not respond to current therapy should receive a different treatment as early as possible. Machine learning models can be built to classify responders and nonresponders. Such classification models predict the probability of a patient being a responder. Most methods use a probability threshold of 0.5 to convert the probabilities into binary group membership. However, the cutoff of 0.5 is not always the optimal choice. METHODS In this study, we propose a novel data-driven approach to select a better cutoff value based on the optimal cross-validation technique. To illustrate our novel method, we applied it to three clinical trial datasets of small-cell lung cancer patients. We used two different datasets to build a scoring system to segment patients. Then the models were applied to segment patients into the test data. RESULTS We found that, in test data, the predicted responders and non-responders had significantly different long-term survival outcomes. Our proposed novel method segments patients better than the standard approach using a cutoff of 0.5. Comparing clinical outcomes of responders versus non-responders, our novel method had a p-value of 0.009 with a hazard ratio of 0.668 for grouping patients using the Cox proportion hazard model and a p-value of 0.011 using the accelerated failure time model which approved a significant difference between responders and non-responders. In contrast, the standard approach had a p-value of 0.194 with a hazard ratio of 0.823 using the Cox proportion hazard model and a p-value of 0.240 using the accelerated failure time model indicating the responders and non-responders do not differ significantly in survival. CONCLUSION In summary, our novel prediction method can successfully segment new patients into responders and non-responders. Clinicians can use our prediction to decide if a patient should receive a different treatment or stay with the current treatment.
Collapse
Affiliation(s)
- Elham Majd
- Department of Mathematics and Statistics, University of Victoria, Victoria, BC, Canada
| | - Li Xing
- Department of Mathematics and Statistics, University of Saskatchewan, Saskatoon, SK, Canada
| | - Xuekui Zhang
- Department of Mathematics and Statistics, University of Victoria, Victoria, BC, Canada.
| |
Collapse
|
2
|
Sennett MA, Theobald DL. Extant Sequence Reconstruction: The Accuracy of Ancestral Sequence Reconstructions Evaluated by Extant Sequence Cross-Validation. J Mol Evol 2024; 92:181-206. [PMID: 38502220 PMCID: PMC10978691 DOI: 10.1007/s00239-024-10162-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Accepted: 02/20/2024] [Indexed: 03/21/2024]
Abstract
Ancestral sequence reconstruction (ASR) is a phylogenetic method widely used to analyze the properties of ancient biomolecules and to elucidate mechanisms of molecular evolution. Despite its increasingly widespread application, the accuracy of ASR is currently unknown, as it is generally impossible to compare resurrected proteins to the true ancestors. Which evolutionary models are best for ASR? How accurate are the resulting inferences? Here we answer these questions using a cross-validation method to reconstruct each extant sequence in an alignment with ASR methodology, a method we term "extant sequence reconstruction" (ESR). We thus can evaluate the accuracy of ASR methodology by comparing ESR reconstructions to the corresponding known true sequences. We find that a common measure of the quality of a reconstructed sequence, the average probability, is indeed a good estimate of the fraction of correct amino acids when the evolutionary model is accurate or overparameterized. However, the average probability is a poor measure for comparing reconstructions from different models, because, surprisingly, a more accurate phylogenetic model often results in reconstructions with lower probability. While better (more predictive) models may produce reconstructions with lower sequence identity to the true sequences, better models nevertheless produce reconstructions that are more biophysically similar to true ancestors. In addition, we find that a large fraction of sequences sampled from the reconstruction distribution may have fewer errors than the single most probable (SMP) sequence reconstruction, despite the fact that the SMP has the lowest expected error of all possible sequences. Our results emphasize the importance of model selection for ASR and the usefulness of sampling sequence reconstructions for analyzing ancestral protein properties. ESR is a powerful method for validating the evolutionary models used for ASR and can be applied in practice to any phylogenetic analysis of real biological sequences. Most significantly, ESR uses ASR methodology to provide a general method by which the biophysical properties of resurrected proteins can be compared to the properties of the true protein.
Collapse
Affiliation(s)
- Michael A Sennett
- Department of Biochemistry, Brandeis University, Waltham, MA, 02453, USA
| | - Douglas L Theobald
- Department of Biochemistry, Brandeis University, Waltham, MA, 02453, USA.
| |
Collapse
|
3
|
Salzer EB, Meireles JFF, Kirk E, Preston CEJ, Vasconcelos E Sá D, Neves CM. Body understanding measure for pregnancy scale (BUMPs): Cross-cultural adaptation and psychometric properties among Brazilian pregnant women. Body Image 2024; 49:101689. [PMID: 38522365 DOI: 10.1016/j.bodyim.2024.101689] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Revised: 02/15/2024] [Accepted: 02/26/2024] [Indexed: 03/26/2024]
Abstract
The Body Understanding Measure for Pregnancy Scale (BUMPs) is a scale developed and validated for British pregnant women to assess body satisfaction during pregnancy. The aim of this study was to perform a cross-cultural adaptation and verify the psychometric properties of BUMPs for Brazilian adult pregnant women. The cross-cultural adaptation was performed using translation, back-translation, expert committee, expert analysis, and pre-testing, which showed easy comprehension by pregnant women. Psychometric analyses were evaluated in a sample of 618 pregnant women (31.08 ± 4.94 years old). Exploratory and confirmatory factor analyses resulted in 19 items and three factors, with satisfactory fit indices. BUMPs presented an invariant measurement across white vs. nonwhite women and across the three gestational trimesters. BUMPs showed good indicators of convergent, internal consistency, and test-retest reproducibility validity. It was concluded that the Brazilian version of BUMPs has adequate psychometric properties for Brazilian pregnant women, being an excellent instrument for analyzing body satisfaction in this population, facilitating additional investigations into these constructs.
Collapse
Affiliation(s)
- Eduardo Borba Salzer
- Federal University of Juiz de Fora, Faculty of Physical Education and Sports, Juiz de Fora, Brazil
| | | | | | | | | | - Clara Mockdece Neves
- Federal University of Juiz de Fora, Faculty of Physical Education and Sports, Juiz de Fora, Brazil.
| |
Collapse
|
4
|
Wang S, McGibbon J, Zhang Y. Predicting high-resolution air quality using machine learning: Integration of large eddy simulation and urban morphology data. Environ Pollut 2024; 344:123371. [PMID: 38266694 DOI: 10.1016/j.envpol.2024.123371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Revised: 01/15/2024] [Accepted: 01/15/2024] [Indexed: 01/26/2024]
Abstract
Accurately predicting air pollutants, especially in urban areas with well-defined spatial structures, is crucial. Over the past decade, machine learning techniques have been widely used to forecast urban air quality. However, traditional machine learning approaches have limitations in accuracy and interpretability for predicting pollutants. In this study, we propose a convolutional neural network (CNN) model to predict the spatial distribution of CO concentration in Nanjing urban area at 10 m resolution. Our model incorporates various factors as input, such as building height, topography, emissions, and is trained against the outputs simulated by the parallelized large-eddy simulation model (PALM). The PALM model has 48 different scenarios that varied in emissions, wind speeds, and wind directions. The results display a strong consistency between the two models. Furthermore, we evaluate the performance of our model using a 10-fold cross-validation and out-of-sample cross-validation approach. This yields a robust correlation (with both R2 > 0.8) and a low RMSE between the CO predicted by the PALM and CNN models, which demonstrates the generalization capability of our CNN model. The CNN can extract crucial features from the resulted weight contribution map. This map indicates that the CO concentration at a location is more influenced by nearby buildings and emissions than distant ones. The interpretable patterns uncovered by our model are related to neighborhood effects, wind speeds, directions, and the impact of orientation on urban CO distribution. The model also shows high prediction accuracy (R > 0.8) when applied to another city. Overall, the integration of our CNN framework with the PALM model enhances the accuracy of air quality predictions, while enabling a fluid dynamic laws interpretation, providing effective tools for air quality management.
Collapse
Affiliation(s)
- Shibao Wang
- School of Atmospheric Sciences, Nanjing University, Nanjing, Jiangsu, China
| | | | - Yanxu Zhang
- School of Atmospheric Sciences, Nanjing University, Nanjing, Jiangsu, China.
| |
Collapse
|
5
|
Chhillar I, Singh A. A feature engineering-based machine learning technique to detect and classify lung and colon cancer from histopathological images. Med Biol Eng Comput 2024; 62:913-924. [PMID: 38091162 DOI: 10.1007/s11517-023-02984-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Accepted: 11/29/2023] [Indexed: 02/22/2024]
Abstract
Globally, lung and colon cancers are among the most prevalent and lethal tumors. Early cancer identification is essential to increase the likelihood of survival. Histopathological images are considered an appropriate tool for diagnosing cancer, which is tedious and error-prone if done manually. Recently, machine learning methods based on feature engineering have gained prominence in automatic histopathological image classification. Furthermore, these methods are more interpretable than deep learning, which operates in a "black box" manner. In the medical profession, the interpretability of a technique is critical to gaining the trust of end users to adopt it. In view of the above, this work aims to create an accurate and interpretable machine-learning technique for the automated classification of lung and colon cancers from histopathology images. In the proposed approach, following the preprocessing steps, texture and color features are retrieved by utilizing the Haralick and Color histogram feature extraction algorithms, respectively. The obtained features are concatenated to form a single feature set. The three feature sets (texture, color, and combined features) are passed into the Light Gradient Boosting Machine (LightGBM) classifier for classification. And their performance is evaluated on the LC25000 dataset using hold-out and stratified 10-fold cross-validation (Stratified 10-FCV) techniques. With a test/hold-out set, the LightGBM with texture, color, and combined features classifies the lung and colon cancer images with 97.72%, 99.92%, and 100% accuracy respectively. In addition, a stratified 10-fold cross-validation method also revealed that LightGBM's combined or color features performed well, with an excellent mean auc_mu score and a low mean multi_logloss value. Thus, this proposed technique can help histologists detect and classify lung and colon histopathology images more efficiently, effectively, and economically, resulting in more productivity.
Collapse
Affiliation(s)
- Indu Chhillar
- Department of Computer Science and Engineering, Deenbandhu Chhotu Ram University of Science and Technology, Murthal, Haryana, India.
| | - Ajmer Singh
- Department of Computer Science and Engineering, Deenbandhu Chhotu Ram University of Science and Technology, Murthal, Haryana, India
| |
Collapse
|
6
|
Liu A, Qu C, Zhang J, Sun W, Shi C, Lima A, De Vivo B, Huang H, Palmisano M, Guarino A, Qi S, Albanese S. Screening and optimization of interpolation methods for mapping soil-borne polychlorinated biphenyls. Sci Total Environ 2024; 913:169498. [PMID: 38154632 DOI: 10.1016/j.scitotenv.2023.169498] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Revised: 11/28/2023] [Accepted: 12/17/2023] [Indexed: 12/30/2023]
Abstract
There is yet no scientific consensus, and for now, on how to choose the optimal interpolation method and its parameters for mapping soil-borne organic pollutants. Take the polychlorinated biphenyls (PCBs) for instance, we present the comparison of some classic interpolation methods using a high-resolution soil monitoring database. The results showed that empirical Bayesian kriging (EBK) has the highest accuracy for predicting the total PCB concentration, while root mean squared error (RMSE) in inverse distance weighting (IDW) is among the highest in these interpolation methods. The logarithmic transformation of non-normally distributed data contributed to enhance considerably the semivariogram for modeling in kriging interpolation. The increasing of search neighborhood reduced IDW's RMSE, but slightly affected in ordinary kriging (OK), while both of them resulted in over smooth of prediction map. The existence of outliers made the difference between two points increase sharply, and thereby weakening spatial autocorrelation and decreasing the accuracy. As predicted error increased continuously, the prediction accuracy of different interpolation methods reached unanimity gradually. The attempt of the assisted interpolation algorithm did not significantly improve the prediction accuracy of the IDW method. This study constructed a standardized workflow for interpolation, which could reduce human error to reach higher interpolation accuracy for mapping soil-borne PCBs.
Collapse
Affiliation(s)
- Ao Liu
- State Key Laboratory of Biogeology and Environmental Geology, China University of Geosciences, Wuhan 430074, China
| | - Chengkai Qu
- State Key Laboratory of Biogeology and Environmental Geology, China University of Geosciences, Wuhan 430074, China.
| | - Jiaquan Zhang
- Hubei Key Laboratory of Mine Environmental Pollution Control and Remediation, School of Environmental Science and Engineering, Hubei Polytechnic University, Huangshi 435003, China
| | - Wen Sun
- Hubei Key Laboratory of Mine Environmental Pollution Control and Remediation, School of Environmental Science and Engineering, Hubei Polytechnic University, Huangshi 435003, China
| | - Changhe Shi
- State Key Laboratory of Biogeology and Environmental Geology, China University of Geosciences, Wuhan 430074, China
| | - Annamaria Lima
- Department of Earth Sciences, Environment and Resources, University of Naples Federico II, Naples 80125, Italy
| | - Benedetto De Vivo
- Hubei Key Laboratory of Mine Environmental Pollution Control and Remediation, School of Environmental Science and Engineering, Hubei Polytechnic University, Huangshi 435003, China; Pegaso On-Line University, Naples 80132, Italy
| | - Huanfang Huang
- State Environmental Protection Key Laboratory of Water Environmental Simulation and Pollution Control, South China Institute of Environmental Sciences, Ministry of Ecology and Environment, Guangzhou 510535, China
| | - Maurizio Palmisano
- Experimental Research Center, National Research Council, Benevento 82100, Italy
| | - Annalise Guarino
- Department of Earth Sciences, Environment and Resources, University of Naples Federico II, Naples 80125, Italy
| | - Shihua Qi
- State Key Laboratory of Biogeology and Environmental Geology, China University of Geosciences, Wuhan 430074, China
| | - Stefano Albanese
- Department of Earth Sciences, Environment and Resources, University of Naples Federico II, Naples 80125, Italy
| |
Collapse
|
7
|
Amenaghawon AN, Igemhokhai S, Eshiemogie SA, Ugbodu F, Evbarunegbe NI. Data-driven intelligent modeling, optimization, and global sensitivity analysis of a xanthan gum biosynthesis process. Heliyon 2024; 10:e25432. [PMID: 38322872 PMCID: PMC10845917 DOI: 10.1016/j.heliyon.2024.e25432] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Revised: 01/19/2024] [Accepted: 01/26/2024] [Indexed: 02/08/2024] Open
Abstract
In this study, the focus was to produce xanthan gum from pineapple waste using Xanthomonas campestris. Six machine learning models were employed to optimize fermentation time and key metabolic stimulants (KH2PO4 and NH4NO3). The production of xanthan gum was optimized using two evolutionary optimization algorithms, particle swarm optimization, and genetic algorithm while the importance of input features was ranked using global sensitivity analysis. KH2PO4 was the most important input and was found to be beneficial for xanthan gum production, while a limited amount of nitrogen was needed. The extreme learning machine model was the most adequate for modeling xanthan gum production, predicting a maximum xanthan yield of 10.34 g/l (an 11.9 % increase over the control) at a fermentation time of 3 days, KH2PO4 of 15 g/l, and NH4NO3 of 2 g/l. This study has provided important insights into the intelligent modeling of a biostimulated process for valorizing pineapple waste.
Collapse
Affiliation(s)
- Andrew Nosakhare Amenaghawon
- Bioresources Valorization Laboratory, Department of Chemical Engineering, University of Benin, Benin City, Edo State, Nigeria
| | - Shedrach Igemhokhai
- Bioresources Valorization Laboratory, Department of Chemical Engineering, University of Benin, Benin City, Edo State, Nigeria
- Department of Petroleum Engineering, University of Benin, Benin City, Edo State, Nigeria
| | - Stanley Aimhanesi Eshiemogie
- Bioresources Valorization Laboratory, Department of Chemical Engineering, University of Benin, Benin City, Edo State, Nigeria
| | - Favour Ugbodu
- Bioresources Valorization Laboratory, Department of Chemical Engineering, University of Benin, Benin City, Edo State, Nigeria
| | - Nelson Iyore Evbarunegbe
- Department of Chemical Engineering, University of Massachusetts Amherst, Amherst, MA, 01003, USA
| |
Collapse
|
8
|
Deng Y, Gao X, Tu T. Enhancing skeletal age estimation accuracy using support vector regression models. Leg Med (Tokyo) 2024; 66:102362. [PMID: 38041906 DOI: 10.1016/j.legalmed.2023.102362] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2023] [Revised: 11/05/2023] [Accepted: 11/22/2023] [Indexed: 12/04/2023]
Abstract
OBJECTIVE The objective of the study was to determine if support vector regression (SVR) models could enhance the accuracy of skeletal age estimation compared to original metrics. METHOD The study used a dataset of 5,018 individuals from Wuhan, spanning ages 1 to 17. Optimal model parameters were found using cross-validation and grid search techniques. The study compared SVR-based bone age assessment metrics with original metrics and evaluated the performance of the SVR model across different sample sizes. RESULTS The findings unequivocally demonstrated SVR's superior reliability over original metrics in assessing bone age among children in central China. Regardless of the training set size, constructing SVR models based on TW3, CHN05, or a combination of TW3, CHN05, and GP consistently results in top-tier predictive accuracy. CONCLUSION This research highlights SVR's potential for accuracy improvement and robustness with limited datasets.
Collapse
Affiliation(s)
- Ying Deng
- Hubei University of Technology, National "111" Center for Cellular Regulation and Molecular Pharmaceutics, Key Laboratory of Fermentation Engineering (Ministry of Education), No.28, Nanli Road, Hongshan District, Wuhan, Hubei Province 430068, China.
| | - Xiaoyan Gao
- Hubei University of Technology, National "111" Center for Cellular Regulation and Molecular Pharmaceutics, Key Laboratory of Fermentation Engineering (Ministry of Education), No.28, Nanli Road, Hongshan District, Wuhan, Hubei Province 430068, China.
| | - Taotao Tu
- College of Economics and Management, Huazhong Agricultural University, No.1 Shizishan Street, Hongshan District, Wuhan, Hubei Province 430070, China.
| |
Collapse
|
9
|
San Martin G, Hautier L, Mingeot D, Dubois B. How reliable is metabarcoding for pollen identification? An evaluation of different taxonomic assignment strategies by cross-validation. PeerJ 2024; 12:e16567. [PMID: 38313030 PMCID: PMC10838070 DOI: 10.7717/peerj.16567] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Accepted: 11/12/2023] [Indexed: 02/06/2024] Open
Abstract
Metabarcoding is a powerful tool, increasingly used in many disciplines of environmental sciences. However, to assign a taxon to a DNA sequence, bioinformaticians need to choose between different strategies or parameter values and these choices sometimes seem rather arbitrary. In this work, we present a case study on ITS2 and rbcL databases used to identify pollen collected by bees in Belgium. We blasted a random sample of sequences from the reference database against the remainder of the database using different strategies and compared the known taxonomy with the predicted one. This in silico cross-validation (CV) approach proved to be an easy yet powerful way to (1) assess the relative accuracy of taxonomic predictions, (2) define rules to discard dubious taxonomic assignments and (3) provide a more objective basis to choose the best strategy. We obtained the best results with the best blast hit (best bit score) rather than by selecting the majority taxon from the top 10 hits. The predictions were further improved by favouring the most frequent taxon among those with tied best bit scores. We obtained better results with databases containing the full sequences available on NCBI rather than restricting the sequences to the region amplified by the primers chosen in our study. Leaked CV showed that when the true sequence is present in the database, blast might still struggle to match the right taxon at the species level, particularly with rbcL. Classical 10-fold CV-where the true sequence is removed from the database-offers a different yet more realistic view of the true error rates. Taxonomic predictions with this approach worked well up to the genus level, particularly for ITS2 (5-7% of errors). Using a database containing only the local flora of Belgium did not improve the predictions up to the genus level for local species and made them worse for foreign species. At the species level, using a database containing exclusively local species improved the predictions for local species by ∼12% but the error rate remained rather high: 25% for ITS2 and 42% for rbcL. Foreign species performed worse even when using a world database (59-79% of errors). We used classification trees and GLMs to model the % of errors vs. identity and consensus scores and determine appropriate thresholds below which the taxonomic assignment should be discarded. This resulted in a significant reduction in prediction errors, but at the cost of a much higher proportion of unassigned sequences. Despite this stringent filtering, at least 1/5 sequences deemed suitable for species-level identification ultimately proved to be misidentified. An examination of the variability in prediction accuracy between plant families showed that rbcL outperformed ITS2 for only two of the 27 families examined, and that the % correct species-level assignments were much better for some families (e.g. 95% for Sapindaceae) than for others (e.g. 35% for Salicaceae).
Collapse
Affiliation(s)
- Gilles San Martin
- Life Sciences Department, Plant and Forest Health Unit, Walloon Agricultural Research Centre, Gembloux, Belgium
| | - Louis Hautier
- Life Sciences Department, Plant and Forest Health Unit, Walloon Agricultural Research Centre, Gembloux, Belgium
| | - Dominique Mingeot
- Life Sciences Department, Bioengineering Unit, Walloon Agricultural Research Centre, Gembloux, Belgium
| | - Benjamin Dubois
- Life Sciences Department, Bioengineering Unit, Walloon Agricultural Research Centre, Gembloux, Belgium
| |
Collapse
|
10
|
Wu H, Guo B, Guo T, Pei L, Jing P, Wang Y, Ma X, Bai H, Wang Z, Xie T, Chen M. A study on identifying synergistic prevention and control regions for PM 2.5 and O 3 and exploring their spatiotemporal dynamic in China. Environ Pollut 2024; 341:122880. [PMID: 37944886 DOI: 10.1016/j.envpol.2023.122880] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Revised: 10/18/2023] [Accepted: 11/04/2023] [Indexed: 11/12/2023]
Abstract
Air pollutants, notably ozone (O3) and fine particulate matter (PM2.5) give rise to evident adverse impacts on public health and the ecotope, prompting extensive global apprehension. Though PM2.5 has been effectively mitigated in China, O3 has been emerging as a primary pollutant, especially in summer. Currently, alleviating PM2.5 and O3 synergistically faces huge challenges. The synergistic prevention and control (SPC) regions of PM2.5 and O3 and their spatiotemporal patterns were still unclear. To address the above issues, this study utilized ground monitoring station data, meteorological data, and auxiliary data to predict the China High-Resolution O3 Dataset (CHROD) via a two-stage model. Furthermore, SPC regions were identified based on a spatial overlay analysis using a Geographic Information System (GIS). The standard deviation ellipse was employed to investigate the spatiotemporal dynamic characteristics of SPC regions. Some outcomes were obtained. The two-stage model significantly improved the accuracy of O3 concentration prediction with acceptable R2 (0.86), and our CHROD presented higher spatiotemporal resolution compared with existing products. SPC regions exhibited significant spatiotemporal variations during the Blue Sky Protection Campaign (BSPC) in China. SPC regions were dominant in spring and autumn, and O3-controlled and PM2.5-dominated zones were detected in summer and winter, respectively. SPC regions were primarily located in the northwest, north, east, and central regions of China, specifically in the Beijing-Tianjin-Hebei urban agglomeration (BTH), Shanxi, Shaanxi, Shandong, Henan, Jiangsu, Xinjiang, and Anhui provinces. The gravity center of SPC regions was distributed in the BTH in winter, and in Xinjiang during spring, summer, and autumn. This study can supply scientific references for the collaborative management of PM2.5 and O3.
Collapse
Affiliation(s)
- Haojie Wu
- College of Geomatics, Xi'an University of Science and Technology, Xi'an, Shaanxi, 710054, China; Shaanxi Key Laboratory of Environmental Monitoring and Forewarning of Trace Pollutants, Xi'an, Shaanxi, 710043, China
| | - Bin Guo
- College of Geomatics, Xi'an University of Science and Technology, Xi'an, Shaanxi, 710054, China.
| | - Tengyue Guo
- Department of Geological Engineering, Qinghai University, Xining, Qinghai, 810016, China
| | - Lin Pei
- School of Exercise and Health Sciences, Xi'an Physical Education University, Xi'an, Shaanxi, 710068, China
| | - Peiqing Jing
- State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan, Hubei, 430072, China
| | - Yan Wang
- School of Geography and Tourism, Shaanxi Normal University, Xi'an, Shaanxi, 710119, China
| | - Xuying Ma
- College of Geomatics, Xi'an University of Science and Technology, Xi'an, Shaanxi, 710054, China
| | - Haorui Bai
- College of Geomatics, Xi'an University of Science and Technology, Xi'an, Shaanxi, 710054, China
| | - Zheng Wang
- College of Geomatics, Xi'an University of Science and Technology, Xi'an, Shaanxi, 710054, China
| | - Tingting Xie
- College of Geomatics, Xi'an University of Science and Technology, Xi'an, Shaanxi, 710054, China
| | - Miaoyi Chen
- College of Geomatics, Xi'an University of Science and Technology, Xi'an, Shaanxi, 710054, China
| |
Collapse
|
11
|
Stoyanov D, Paunova R, Dichev J, Kandilarova S, Khorev V, Kurkin S. Functional magnetic resonance imaging study of group independent components underpinning item responses to paranoid-depressive scale. World J Clin Cases 2023; 11:8458-8474. [PMID: 38188204 PMCID: PMC10768520 DOI: 10.12998/wjcc.v11.i36.8458] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Revised: 11/10/2023] [Accepted: 12/05/2023] [Indexed: 12/22/2023] Open
Abstract
BACKGROUND Our study expand upon a large body of evidence in the field of neuropsychiatric imaging with cognitive, affective and behavioral tasks, adapted for the functional magnetic resonance imaging (MRI) (fMRI) experimental environment. There is sufficient evidence that common networks underpin activations in task-based fMRI across different mental disorders. AIM To investigate whether there exist specific neural circuits which underpin differential item responses to depressive, paranoid and neutral items (DN) in patients respectively with schizophrenia (SCZ) and major depressive disorder (MDD). METHODS 60 patients were recruited with SCZ and MDD. All patients have been scanned on 3T magnetic resonance tomography platform with functional MRI paradigm, comprised of block design, including blocks with items from diagnostic paranoid (DP), depression specific (DS) and DN from general interest scale. We performed a two-sample t-test between the two groups-SCZ patients and depressive patients. Our purpose was to observe different brain networks which were activated during a specific condition of the task, respectively DS, DP, DN. RESULTS Several significant results are demonstrated in the comparison between SCZ and depressive groups while performing this task. We identified one component that is task-related and independent of condition (shared between all three conditions), composed by regions within the temporal (right superior and middle temporal gyri), frontal (left middle and inferior frontal gyri) and limbic/salience system (right anterior insula). Another component is related to both diagnostic specific conditions (DS and DP) e.g. It is shared between DEP and SCZ, and includes frontal motor/language and parietal areas. One specific component is modulated preferentially by to the DP condition, and is related mainly to prefrontal regions, whereas other two components are significantly modulated with the DS condition and include clusters within the default mode network such as posterior cingulate and precuneus, several occipital areas, including lingual and fusiform gyrus, as well as parahippocampal gyrus. Finally, component 12 appeared to be unique for the neutral condition. In addition, there have been determined circuits across components, which are either common, or distinct in the preferential processing of the sub-scales of the task. CONCLUSION This study has delivers further evidence in support of the model of trans-disciplinary cross-validation in psychiatry.
Collapse
Affiliation(s)
- Drozdstoy Stoyanov
- Department of Psychiatry, Medical University Plovdiv, Plovdiv 4000, Bulgaria
| | - Rositsa Paunova
- Research Institute, Medical University, Plovdiv 4002, Bulgaria
| | - Julian Dichev
- Faculty of Medicine, Medical University, Plovdiv 4002, Bulgaria
| | - Sevdalina Kandilarova
- Department of Psychiatry and Medical Psychology, Medical University, Plovdiv 4002, Bulgaria
| | - Vladimir Khorev
- Baltic Center for Artificial Intelligence and Neurotechnology, Immanuel Kant Baltic Federal University, Kaliningrad 236041, Russia
| | - Semen Kurkin
- Baltic Center for Artificial Intelligence and Neurotechnology, Immanuel Kant Baltic Federal University, Kaliningrad 236041, Russia
| |
Collapse
|
12
|
Arciniegas-Alarcón S, García-Peña M, Krzanowski WJ, Rengifo C. Missing value imputation in a data matrix using the regularised singular value decomposition. MethodsX 2023; 11:102289. [PMID: 37560402 PMCID: PMC10407287 DOI: 10.1016/j.mex.2023.102289] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Accepted: 07/16/2023] [Indexed: 08/11/2023] Open
Abstract
Some statistical analysis techniques may require complete data matrices, but a frequent problem in the construction of databases is the incomplete collection of information for different reasons. One option to tackle the problem is to estimate and impute the missing data. This paper describes a form of imputation that mixes regression with lower rank approximations. To improve the quality of the imputations, a generalisation is proposed that replaces the singular value decomposition (SVD) of the matrix with a regularised SVD in which the regularisation parameter is estimated by cross-validation. To evaluate the performance of the proposal, ten sets of real data from multienvironment trials were used. Missing values were created in each set at four percentages of missing not at random, and three criteria were then considered to investigate the effectiveness of the proposal. The results show that the regularised method proves very competitive when compared to the original method, beating it in several of the considered scenarios. As it is a very general system, its application can be extended to all multivariate data matrices. •The imputation method is modified through the inclusion of a stable and efficient computational algorithm that replaces the classical SVD least squares criterion by a penalised criterion. This penalty produces smoothed eigenvectors and eigenvalues that avoid overfitting problems, improving the performance of the method when the penalty is necessary. The size of the penalty can be determined by minimising one of the following criteria: the prediction errors, the Procrustes similarity statistic or the critical angles between subspaces of principal components.
Collapse
Affiliation(s)
| | - Marisol García-Peña
- Pontificia Universidad Javeriana, Departamento de Matemáticas, Bogotá, Colombia
| | - Wojtek J. Krzanowski
- University of Exeter, College of Engineering, Mathematics and Physical Sciences, Exeter, UK
| | - Camilo Rengifo
- Universidad de La Sabana, Facultad de Ingeniería, Chía, Colombia
| |
Collapse
|
13
|
Mertten D, Baldwin S, Cheng CH, McCallum J, Thomson S, Ashton DT, McKenzie CM, Lenhard M, Datson PM. Implementation of different relationship estimate methodologies in breeding value prediction in kiwiberry ( Actinidia arguta). Mol Breed 2023; 43:75. [PMID: 37868140 PMCID: PMC10584781 DOI: 10.1007/s11032-023-01419-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Accepted: 10/02/2023] [Indexed: 10/24/2023]
Abstract
In dioecious crops such as Actinidia arguta (kiwiberries), some of the main challenges when breeding for fruit characteristics are the selection of potential male parents and the long juvenile period. Currently, breeding values of male parents are estimated through progeny tests, which makes the breeding of new kiwiberry cultivars time-consuming and costly. The application of best linear unbiased prediction (BLUP) would allow direct estimation of sex-related traits and speed up kiwiberry breeding. In this study, we used a linear mixed model approach to estimate narrow sense heritability for one vine-related trait and five fruit-related traits for two incomplete factorial crossing designs. We obtained BLUPs for all genotypes, taking into consideration whether the relationship was pedigree-based or marker-based. Owing to the high cost of genome sequencing, it is important to understand the effects of different sources of relationship matrices on estimating breeding values across a breeding population. Because of the increasing implementation of genomic selection in crop breeding, we compared the effects of incorporating different sources of information in building relationship matrices and ploidy levels on the accuracy of BLUPs' heritability and predictive ability. As kiwiberries are autotetraploids, multivalent chromosome formation and occasionally double reduction can occur during meiosis, and this can affect the accuracy of prediction. This study innovates the breeding programme of autotetraploid kiwiberries. We demonstrate that the accuracy of BLUPs of male siblings, without phenotypic observations, strongly improved when a tetraploid marker-based relationship matrix was used rather than parental BLUPs and female siblings with phenotypic observations. Supplementary Information The online version contains supplementary material available at 10.1007/s11032-023-01419-8.
Collapse
Affiliation(s)
- Daniel Mertten
- The New Zealand Institute for Plant and Food Research Ltd (PFR), Auckland, 1142 New Zealand
- Institute for Biochemistry and Biology, University of Potsdam, 14476 Potsdam-Golm, Germany
| | | | | | | | | | | | | | - Michael Lenhard
- Institute for Biochemistry and Biology, University of Potsdam, 14476 Potsdam-Golm, Germany
| | | |
Collapse
|
14
|
Qu Y, Wang P, Yao H, Wang D, Song C, Yang H, Zhang Z, Chen P, Kang X, Du K, Fan L, Zhou B, Han T, Yu C, Zhang X, Zuo N, Jiang T, Zhou Y, Liu B, Han Y, Lu J, Liu Y. Reproducible Abnormalities and Diagnostic Generalizability of White Matter in Alzheimer's Disease. Neurosci Bull 2023; 39:1533-1543. [PMID: 37014553 PMCID: PMC10533766 DOI: 10.1007/s12264-023-01041-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Accepted: 11/29/2022] [Indexed: 04/05/2023] Open
Abstract
Alzheimer's disease (AD) is associated with the impairment of white matter (WM) tracts. The current study aimed to verify the utility of WM as the neuroimaging marker of AD with multisite diffusion tensor imaging datasets [321 patients with AD, 265 patients with mild cognitive impairment (MCI), 279 normal controls (NC)], a unified pipeline, and independent site cross-validation. Automated fiber quantification was used to extract diffusion profiles along tracts. Random-effects meta-analyses showed a reproducible degeneration pattern in which fractional anisotropy significantly decreased in the AD and MCI groups compared with NC. Machine learning models using tract-based features showed good generalizability among independent site cross-validation. The diffusion metrics of the altered regions and the AD probability predicted by the models were highly correlated with cognitive ability in the AD and MCI groups. We highlighted the reproducibility and generalizability of the degeneration pattern of WM tracts in AD.
Collapse
Affiliation(s)
- Yida Qu
- Brainnetome Center and National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China
- School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Pan Wang
- Department of Neurology, Tianjin Huanhu Hospital, Tianjin University, Tianjin, 300222, China
| | - Hongxiang Yao
- Department of Neurology, Tianjin Huanhu Hospital, Tianjin University, Tianjin, 300222, China
| | - Dawei Wang
- Department of Radiology, Department of Epidemiology and Health Statistics, School of Public Health, Qilu Hospital of Shandong University, Ji'nan, 250063, China
| | - Chengyuan Song
- Department of Neurology, Qilu Hospital of Shandong University, Ji'nan, 250063, China
| | - Hongwei Yang
- Department of Radiology, Xuanwu Hospital of Capital Medical University, Beijing, 100053, China
| | - Zengqiang Zhang
- Branch of Chinese, PLA General Hospital, Sanya, 572022, China
| | - Pindong Chen
- Brainnetome Center and National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China
- School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Xiaopeng Kang
- Brainnetome Center and National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China
- School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Kai Du
- Brainnetome Center and National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China
- School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Lingzhong Fan
- Brainnetome Center and National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China
- School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Bo Zhou
- Department of Neurology, The Second Medical Centre, National Clinical Research Centre for Geriatric Diseases, Chinese PLA General Hospital, Beijing, 100089, China
| | - Tong Han
- Department of Radiology, Tianjin Huanhu Hospital, Tianjin, 300222, China
| | - Chunshui Yu
- Department of Radiology, Tianjin Medical University General Hospital, Tianjin, 300052, China
| | - Xi Zhang
- Department of Neurology, The Second Medical Centre, National Clinical Research Centre for Geriatric Diseases, Chinese PLA General Hospital, Beijing, 100089, China
| | - Nianming Zuo
- Brainnetome Center and National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China
- School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Tianzi Jiang
- Brainnetome Center and National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China
- School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Yuying Zhou
- Department of Neurology, Tianjin Huanhu Hospital, Tianjin University, Tianjin, 300222, China
| | - Bing Liu
- Brainnetome Center and National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China
- School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, 100049, China
- State Key Lab of Cognition Neuroscience & Learning, Beijing Normal University, Beijing, 100091, China
| | - Ying Han
- Department of Neurology, Xuanwu Hospital of Capital Medical University, Beijing, 100053, China
- Beijing Institute of Geriatrics, Beijing, 100053, China
- National Clinical Research Center for Geriatric Disorders, Beijing, 100053, China
- Center of Alzheimer's Disease, Beijing Institute for Brain Disorders, Beijing, 100053, China
| | - Jie Lu
- Department of Radiology, Xuanwu Hospital of Capital Medical University, Beijing, 100053, China.
| | - Yong Liu
- Brainnetome Center and National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China.
- School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing, 100876, China.
| |
Collapse
|
15
|
Ma Y, Hendrickson T, Ramsay I, Shen A, Sponheim SR, MacDonald AW. Resting-State Functional Connectivity Explained Psychotic-like Experiences in the General Population and Partially Generalized to Patients and Relatives. Biol Psychiatry Glob Open Sci 2023; 3:1094-1103. [PMID: 37881569 PMCID: PMC10593874 DOI: 10.1016/j.bpsgos.2022.08.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2021] [Revised: 08/11/2022] [Accepted: 08/31/2022] [Indexed: 11/15/2022] Open
Abstract
Background Psychotic-like experiences (PLEs) are considered the subclinical portion of the psychosis continuum. Research suggests that there are resting-state functional connectivity (rsFC) substrates of PLEs, yet it is unclear if the same substrates underlie more severe psychosis. Here, to our knowledge, we report the first study to build a cross-validated rsFC model of PLEs in a large community sample and directly test its ability to explain psychosis in an independent sample of patients with psychosis and their relatives. Methods Resting-state FC of 855 healthy young adults from the WU-Minn Human Connectome Project (HCP) was used to predict PLEs with elastic net. An rsFC composite score based on the resulting model was correlated with psychotic traits and symptoms in 118 patients with psychosis, 71 nonpsychotic first-degree relatives, and 45 healthy control subjects from the psychosis HCP. Results In the HCP, the cross-validated model explained 3.3% of variance in PLEs. Predictive connections spread primarily across the default, frontoparietal, cingulo-opercular, and dorsal attention networks. The model partially generalized to a younger, but not older, subsample in the psychosis HCP, explaining two measures of positive/disorganized psychotic traits (the Structured Interview for Schizotypy: β = 0.25, pone-tailed = .027; the Schizotypy Personality Questionnaire positive factor: β = 0.14, pone-tailed = .041). However, it did not differentiate patients from relatives and control subjects or explain psychotic symptoms in patients. Conclusions Some rsFC substrates of PLEs are shared across the psychosis continuum. However, explanatory power was modest, and generalization was partial. It is equally important to understand shared versus distinct rsFC variances across the psychosis continuum.
Collapse
Affiliation(s)
- Yizhou Ma
- Maryland Psychiatric Research Center, Department of Psychiatry, University of Maryland School of Medicine, Baltimore, Maryland
| | | | - Ian Ramsay
- Department of Psychology, University of Minnesota, Minneapolis, Minnesota
- Department of Psychiatry, University of Minnesota, Minneapolis, Minnesota
| | - Amanda Shen
- Department of Psychology, University of Minnesota, Minneapolis, Minnesota
| | - Scott R. Sponheim
- Department of Psychiatry, University of Minnesota, Minneapolis, Minnesota
- Minneapolis Veterans Affairs Health Care System, Minneapolis, Minnesota
| | - Angus W. MacDonald
- Department of Psychology, University of Minnesota, Minneapolis, Minnesota
- Department of Psychiatry, University of Minnesota, Minneapolis, Minnesota
| |
Collapse
|
16
|
Lopez E, Etxebarria-Elezgarai J, Amigo JM, Seifert A. The importance of choosing a proper validation strategy in predictive models. A tutorial with real examples. Anal Chim Acta 2023; 1275:341532. [PMID: 37524478 DOI: 10.1016/j.aca.2023.341532] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 06/13/2023] [Accepted: 06/14/2023] [Indexed: 08/02/2023]
Abstract
Machine learning is the art of combining a set of measurement data and predictive variables to forecast future events. Every day, new model approaches (with high levels of sophistication) can be found in the literature. However, less importance is given to the crucial stage of validation. Validation is the assessment that the model reliably links the measurements and the predictive variables. Nevertheless, there are many ways in which a model can be validated and cross-validated reliably, but still, it may be a model that wrongly reflects the real nature of the data and cannot be used to predict external samples. This manuscript shows in a didactical manner how important the data structure is when a model is constructed and how easy it is to obtain models that look promising with wrong-designed cross-validation and external validation strategies. A comprehensive overview of the main validation strategies is shown, exemplified by three different scenarios, all of them focused on classification.
Collapse
Affiliation(s)
- Eneko Lopez
- CIC NanoGUNE BRTA, Tolosa Hiribidea 76, San Sebastián, 20018, Spain; Department of Physics, University of the Basque Country (UPV/EHU), San Sebastián, 20018, Spain
| | | | - Jose Manuel Amigo
- IKERBASQUE, Basque Foundation for Science, Plaza Euskadi, 5, Bilbao, 48009, Spain; Department of Analytical Chemistry, University of the Basque Country, Barrio Sarriena S/N, Leioa, 48940, Spain.
| | - Andreas Seifert
- CIC NanoGUNE BRTA, Tolosa Hiribidea 76, San Sebastián, 20018, Spain; IKERBASQUE, Basque Foundation for Science, Plaza Euskadi, 5, Bilbao, 48009, Spain.
| |
Collapse
|
17
|
Rodríguez-Testal JF, Trinidad-Montero JM, Rosales Becerra Á, Faija C, Senín-Calderón C. Psychometric properties of the Pride in Eating Pathology Scale in a Spanish population. J Eat Disord 2023; 11:124. [PMID: 37507784 PMCID: PMC10386289 DOI: 10.1186/s40337-023-00847-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Accepted: 07/18/2023] [Indexed: 07/30/2023] Open
Abstract
BACKGROUND In its relation to eating disorders, pride is one of the self-conscious emotions least analyzed, and requires valid and reliable instruments for its measurement. This study aimed to examine the factor structure and the psychometric properties of the Pride in Eating Pathology Scale (PEP-S), in the Spanish general population, as well as between-sex differences in PEP-S scores. METHODS Of the 1483 participants aged 18 to 34 (M = 21.99; SD = 3.09), 954 were women (65.2%) and the majority were university students (78.8%). Psychometric properties of the scale were tested in a cross-sectional design using cross-validation, i.e., exploratory and confirmatory factor analysis, and estimation of invariance (sex). RESULTS The four-factor structure found was similar to the original scale with invariance across sex and internal consistency (ordinal alpha .99) and stability (.85). Evidence of convergent validity and differences between sexes were found. Specifically, women scored higher on all the factors, including the healthier sense of pride. CONCLUSIONS The PEP-S scale is an instrument with evidence of validity and reliability in the Spanish population. Although it still has to be tested in a clinical population, it constitutes a promising instrument for the evaluation of the self-conscious emotion, pride.
Collapse
Affiliation(s)
| | | | - Ángela Rosales Becerra
- Personality, Evaluation and Psychological Treatment Department, University of Seville, Seville, Spain
| | - Cintia Faija
- Department of Primary Care & Mental Health, University of Liverpool, Liverpool, UK
| | - Cristina Senín-Calderón
- Department of Psychology, University of Cádiz, Avda. República Árabe Saharaui S/N. Puerto Real, Cádiz, Spain.
| |
Collapse
|
18
|
Meaney C, Stukel TA, Austin PC, Moineddin R, Greiver M, Escobar M. Quality indices for topic model selection and evaluation: a literature review and case study. BMC Med Inform Decis Mak 2023; 23:132. [PMID: 37481523 PMCID: PMC10362613 DOI: 10.1186/s12911-023-02216-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Accepted: 06/22/2023] [Indexed: 07/24/2023] Open
Abstract
BACKGROUND Topic models are a class of unsupervised machine learning models, which facilitate summarization, browsing and retrieval from large unstructured document collections. This study reviews several methods for assessing the quality of unsupervised topic models estimated using non-negative matrix factorization. Techniques for topic model validation have been developed across disparate fields. We synthesize this literature, discuss the advantages and disadvantages of different techniques for topic model validation, and illustrate their usefulness for guiding model selection on a large clinical text corpus. DESIGN, SETTING AND DATA Using a retrospective cohort design, we curated a text corpus containing 382,666 clinical notes collected between 01/01/2017 through 12/31/2020 from primary care electronic medical records in Toronto Canada. METHODS Several topic model quality metrics have been proposed to assess different aspects of model fit. We explored the following metrics: reconstruction error, topic coherence, rank biased overlap, Kendall's weighted tau, partition coefficient, partition entropy and the Xie-Beni statistic. Depending on context, cross-validation and/or bootstrap stability analysis were used to estimate these metrics on our corpus. RESULTS Cross-validated reconstruction error favored large topic models (K ≥ 100 topics) on our corpus. Stability analysis using topic coherence and the Xie-Beni statistic also favored large models (K = 100 topics). Rank biased overlap and Kendall's weighted tau favored small models (K = 5 topics). Few model evaluation metrics suggested mid-sized topic models (25 ≤ K ≤ 75) as being optimal. However, human judgement suggested that mid-sized topic models produced expressive low-dimensional summarizations of the corpus. CONCLUSIONS Topic model quality indices are transparent quantitative tools for guiding model selection and evaluation. Our empirical illustration demonstrated that different topic model quality indices favor models of different complexity; and may not select models aligning with human judgment. This suggests that different metrics capture different aspects of model goodness of fit. A combination of topic model quality indices, coupled with human validation, may be useful in appraising unsupervised topic models.
Collapse
Affiliation(s)
- Christopher Meaney
- Department of Family and Community Medicine, University of Toronto, 500 University Ave, Toronto, ON, M5G1V7, Canada.
| | - Therese A Stukel
- Institute of Health Policy, Management and Evaluation, ICES, University of Toronto, Toronto, Canada
| | - Peter C Austin
- Institute of Health Policy, Management and Evaluation, ICES, University of Toronto, Toronto, Canada
| | - Rahim Moineddin
- Department of Family and Community Medicine, University of Toronto, 500 University Ave, Toronto, ON, M5G1V7, Canada
| | - Michelle Greiver
- Department of Family and Community Medicine, University of Toronto, 500 University Ave, Toronto, ON, M5G1V7, Canada
| | - Michael Escobar
- Dalla Lana School of Public Health, University of Toronto, Toronto, Canada
| |
Collapse
|
19
|
Kocur A, Rubik J, Czarnowski P, Czajkowska A, Marszałek D, Sierakowski M, Górska M, Pawiński T. Therapeutic drug monitoring of mycophenolic acid (MPA) using volumetric absorptive microsampling (VAMS) in pediatric renal transplant recipients: ultra-high-performance liquid chromatography-tandem mass spectrometry analytical method development, cross-validation, and clinical application. Pharmacol Rep 2023:10.1007/s43440-023-00509-w. [PMID: 37452967 PMCID: PMC10374821 DOI: 10.1007/s43440-023-00509-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Revised: 07/03/2023] [Accepted: 07/04/2023] [Indexed: 07/18/2023]
Abstract
BACKGROUND Mycophenolic acid (MPA) is widely used in posttransplant pharmacotherapy for pediatric patients after renal transplantation. Volumetric absorptive microsampling (VAMS) is a recent approach for sample collection, particularly during therapeutic drug monitoring (TDM). The recommended matrix for MPA determination is plasma (PL), and conversion between capillary-blood VAMS samples and PL concentrations is required for the appropriate interpretation of the results. METHODS This study aimed to validate and develop a UHPLC-MS/MS method for MPA quantification in whole blood (WB), PL, and VAMS samples, with cross and clinical validation based on regression calculations. Methods were validated in the 0.10-15 µg/mL range for trough MPA concentration measurement according to the European Medicines Agency (EMA) guidelines. Fifty pediatric patients treated with MPA after renal transplantation were included in this study. PL and WB samples were obtained via venipuncture, whereas VAMS samples were collected after the fingerstick. The conversion from VAMSMPA to PLMPA concentration was performed using formulas based on hematocrit values and a regression model. RESULTS LC-MS/MS methods were successfully developed and validated according to EMA guidelines. The cross-correlation between the methods was evaluated using Passing-Bablok regression, Bland-Altman bias plots, and predictive performance calculations. Clinical validation of the developed method was successfully performed, and the formula based on regression was successfully validated for VAMSMPA to PLMPA concentration and confirmed on an independent group of samples. CONCLUSIONS This study is the first development of a triple matrix-based LC-MS/MS method for MPA determination in the pediatric population after renal transplantation. For the first time, the developed methods were cross-validated with routinely used HPLC-DAD protocol.
Collapse
Affiliation(s)
- Arkadiusz Kocur
- Department of Drug Chemistry, Medical University of Warsaw, 1 Banacha St, 02-091, Warsaw, Poland.
- Pharmacokinetics Laboratory, Department of Biochemistry, Radioimmunology, and Experimental Medicine, The Children's Memorial Health Institute, Dzieci Polskich 20, 04-730, Warsaw, Poland.
| | - Jacek Rubik
- Department of Nephrology, Kidney Transplantation, and Arterial Hypertension, The Children's Memorial Health Institute, Dzieci Polskich 20, 04-730, Warsaw, Poland
| | - Paweł Czarnowski
- Department of Genetics, Maria Sklodowska-Curie National Research Institute of Oncology, Roentgena 5, 02-781, Warsaw, Poland
| | - Agnieszka Czajkowska
- Pharmacokinetics Laboratory, Department of Biochemistry, Radioimmunology, and Experimental Medicine, The Children's Memorial Health Institute, Dzieci Polskich 20, 04-730, Warsaw, Poland
| | - Dorota Marszałek
- Department of Drug Chemistry, Medical University of Warsaw, 1 Banacha St, 02-091, Warsaw, Poland
| | - Maciej Sierakowski
- Institute of Biological Sciences, Cardinal Stefan Wyszynski University, 1/3 Kazimierza Wóycickiego St, 01-938, Warsaw, Poland
| | - Marta Górska
- Pharmacokinetics Laboratory, Department of Biochemistry, Radioimmunology, and Experimental Medicine, The Children's Memorial Health Institute, Dzieci Polskich 20, 04-730, Warsaw, Poland
| | - Tomasz Pawiński
- Department of Drug Chemistry, Medical University of Warsaw, 1 Banacha St, 02-091, Warsaw, Poland
| |
Collapse
|
20
|
Suleman MT, Khan YD. PseU-pred: An ensemble model for accurate identification of pseudouridine sites. Anal Biochem 2023:115247. [PMID: 37437648 DOI: 10.1016/j.ab.2023.115247] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2023] [Revised: 06/25/2023] [Accepted: 07/08/2023] [Indexed: 07/14/2023]
Abstract
Pseudouridine (ψ) is reported to occur frequently in all types of RNA. This uridine modification has been shown to be essential for processes such as RNA stability and stress response. Also, it is linked to a few human diseases, such as prostate cancer, anemia, etc. A few laboratory techniques, such as Pseudo-seq and N3-CMC-enriched Pseudouridine sequencing (CeU-Seq) are used for detecting ψ sites. However, these are laborious and drawn-out methods. The convenience of sequencing data has enabled the development of computationally intelligent models for improving ψ site identification methods. The proposed work provides a prediction model for the identification of ψ sites through popular ensemble methods such as stacking, bagging, and boosting. Features were obtained through a novel feature extraction mechanism with the assimilation of statistical moments, which were used to train ensemble models. The cross-validation test and independent set test were used to evaluate the precision of the trained models. The proposed model outperformed the preexisting predictors and revealed 87% accuracy, 0.90 specificity, 0.85 sensitivity, and a 0.75 Matthews correlation coefficient. A web server has been built and is available publicly for the researchers at https://taseersuleman-y-test-pseu-pred-c2wmtj.streamlit.app/.
Collapse
Affiliation(s)
- Muhammad Taseer Suleman
- Department of Computer Science, School of Systems and Technology, University of Management and Technology, Lahore, 54770, Pakistan.
| | - Yaser Daanial Khan
- Department of Computer Science, School of Systems and Technology, University of Management and Technology, Lahore, 54770, Pakistan.
| |
Collapse
|
21
|
Li P, Wei X, Wang M, Liu D, Liu J, Pei Z, Shi F, Wang S, Zuo X, Li D, Yu H, Zhang N, Yu Q, Luo Y. Simulation of anaerobic co-digestion of steam explosion pulping wastewater with cattle manure: Focusing on degradation and inhibition of furfural. Bioresour Technol 2023; 380:129086. [PMID: 37100292 DOI: 10.1016/j.biortech.2023.129086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/10/2022] [Revised: 04/08/2023] [Accepted: 04/20/2023] [Indexed: 05/14/2023]
Abstract
In this study, an extended Anaerobic Digestion Model No.1, which considered the degradation and inhibition properties of furfural, was established and implemented to simulate the anaerobic co-digestion of steam explosion pulping wastewater and cattle manure in batch and semi-continuous modes. Batch and semi-continuous experimental data helped calibrate the new model and recalibrate the parameters related to furfural degradation, respectively. The cross-validation results showed the batch-stage calibration model accurately predicted the methanogenic behavior of all experimental treatments (R2 ≥ 0.959). Meanwhile, the recalibrated model satisfactorily matched the methane production results in the stable and high furfural loading stages in the semi-continuous experiment. In addition, recalibration results revealed the semi-continuous system tolerated furfural better than the batch system. These results provide insights into the anaerobic treatments and mathematical simulations of furfural-rich substrates.
Collapse
Affiliation(s)
- Pengfei Li
- Heilongjiang Academy of Agricultural Sciences Postdoctoral Workstation, Harbin 150086, PR China; Heilongjiang Academy of Black Soil Conservation and Utilization, Key Laboratory Combining Farming & Animal Husbandry, Key Laboratory of Straw Energy Utilization, Harbin 150086, PR China
| | - Xinyu Wei
- Rural Energy and Environment Agency, Ministry of Agriculture and Rural Affairs, Beijing 100125, PR China
| | - Ming Wang
- Department of Agriculture Biological Environment and Energy Engineering, School of Engineering, Northeast Agriculture University, Harbin 150030, PR China
| | - Di Liu
- Heilongjiang Academy of Agricultural Sciences Postdoctoral Workstation, Harbin 150086, PR China
| | - Jie Liu
- Heilongjiang Academy of Agricultural Sciences Postdoctoral Workstation, Harbin 150086, PR China; Heilongjiang Academy of Black Soil Conservation and Utilization, Key Laboratory Combining Farming & Animal Husbandry, Key Laboratory of Straw Energy Utilization, Harbin 150086, PR China.
| | - Zhanjiang Pei
- Heilongjiang Academy of Black Soil Conservation and Utilization, Key Laboratory Combining Farming & Animal Husbandry, Key Laboratory of Straw Energy Utilization, Harbin 150086, PR China
| | - Fengmei Shi
- Heilongjiang Academy of Black Soil Conservation and Utilization, Key Laboratory Combining Farming & Animal Husbandry, Key Laboratory of Straw Energy Utilization, Harbin 150086, PR China
| | - Su Wang
- Heilongjiang Academy of Black Soil Conservation and Utilization, Key Laboratory Combining Farming & Animal Husbandry, Key Laboratory of Straw Energy Utilization, Harbin 150086, PR China
| | - Xin Zuo
- Heilongjiang Academy of Black Soil Conservation and Utilization, Key Laboratory Combining Farming & Animal Husbandry, Key Laboratory of Straw Energy Utilization, Harbin 150086, PR China
| | - Dan Li
- Heilongjiang Academy of Black Soil Conservation and Utilization, Key Laboratory Combining Farming & Animal Husbandry, Key Laboratory of Straw Energy Utilization, Harbin 150086, PR China
| | - Hongjiu Yu
- Heilongjiang Academy of Black Soil Conservation and Utilization, Key Laboratory Combining Farming & Animal Husbandry, Key Laboratory of Straw Energy Utilization, Harbin 150086, PR China
| | - Nan Zhang
- Heilongjiang Academy of Black Soil Conservation and Utilization, Key Laboratory Combining Farming & Animal Husbandry, Key Laboratory of Straw Energy Utilization, Harbin 150086, PR China
| | - Qiuyue Yu
- Heilongjiang Academy of Black Soil Conservation and Utilization, Key Laboratory Combining Farming & Animal Husbandry, Key Laboratory of Straw Energy Utilization, Harbin 150086, PR China
| | - Yifei Luo
- Heilongjiang Academy of Black Soil Conservation and Utilization, Key Laboratory Combining Farming & Animal Husbandry, Key Laboratory of Straw Energy Utilization, Harbin 150086, PR China
| |
Collapse
|
22
|
Zaidi A. Predicting wildfires in Algerian forests using machine learning models. Heliyon 2023; 9:e18064. [PMID: 37519679 PMCID: PMC10372657 DOI: 10.1016/j.heliyon.2023.e18064] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2023] [Revised: 07/02/2023] [Accepted: 07/05/2023] [Indexed: 08/01/2023] Open
Abstract
Algeria is one of the Maghreb countries most affected by wildfires. The economic, environmental, and societal consequences of these fires can last several years after the wildfire. Often, it is possible to avoid such disasters if the detection of the outbreak of fire is fast enough, reliable, and early. The lack of datasets has limited the methods used to predict wildfires in Algeria to the mapping risk areas, which is updated annually. This study is the result of the availability of a recent dataset relating the history of forest fires in the cities of Bejaia and Sidi Bel-Abbes during the year 2012. The dataset being small size, we used principal component analysis to reduce the number of variables to 6, while retaining 96.65% of the total variance. Moreover, we developed an artificial neural network (ANN) with two hidden layers to predict wildfires in these cities. Next, we trained and compared the performance of our classifier with those provided by the Logistic Regression, K Nearest Neighbors, Support Vector Machine, and Random Forest classifiers, using a 10-fold stratified cross-validation. The experiment shows a slight superiority of the ANN classifier compared to the others, in terms of accuracy, precision, and recall. Our classifier achieves an accuracy of 0.967±0.026 and F1-score of 0.971±0.023. The SHAP technique revealed the importance of the features (RH, DC, ISI) in the predictions of the ANN model.
Collapse
|
23
|
Héberger K. Selection of optimal validation methods for quantitative structure-activity relationships and applicability domain. SAR QSAR Environ Res 2023:1-20. [PMID: 37227317 DOI: 10.1080/1062936x.2023.2214871] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
This brief literature survey groups the (numerical) validation methods and emphasizes the contradictions and confusion considering bias, variance and predictive performance. A multicriteria decision-making analysis has been made using the sum of absolute ranking differences (SRD), illustrated with five case studies (seven examples). SRD was applied to compare external and cross-validation techniques, indicators of predictive performance, and to select optimal methods to determine the applicability domain (AD). The ordering of model validation methods was in accordance with the sayings of original authors, but they are contradictory within each other, suggesting that any variant of cross-validation can be superior or inferior to other variants depending on the algorithm, data structure and circumstances applied. A simple fivefold cross-validation proved to be superior to the Bayesian Information Criterion in the vast majority of situations. It is simply not sufficient to test a numerical validation method in one situation only, even if it is a well defined one. SRD as a preferable multicriteria decision-making algorithm is suitable for tailoring the techniques for validation, and for the optimal determination of the applicability domain according to the dataset in question.
Collapse
Affiliation(s)
- K Héberger
- Plasma Chemistry Research Group, Institute of Materials and Environmental Chemistry, Research Centre for Natural Sciences, Institute of Excellence of the Hungarian Academy of Sciences, Budapest, Hungary
| |
Collapse
|
24
|
Chen Y, Sheng Z, Xiao H, Liang Q, Li W, Gan Y. Effects of connection-based physician-patient relationships on perceptions of outcome: A vignette experiment. Patient Educ Couns 2023; 114:107802. [PMID: 37224748 DOI: 10.1016/j.pec.2023.107802] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Revised: 05/15/2023] [Accepted: 05/17/2023] [Indexed: 05/26/2023]
Abstract
OBJECTIVES To investigate the effects of media reports of medical outcomes and connection-based medicine on trust in physicians. In "connection-based medicine," people use personal connections to obtain better medical resources. METHODS Vignette experiments were used to investigate attitudes toward physicians among 230 cancer patients and their families (Sample 1) and a cross-validated sample of 280 employees from various industries (Sample 2). RESULTS For both samples, negative media reports were associated with lower trust in physicians; when the reports were positive, the participants generally perceived physicians as more competent and trustworthy. However, with negative reports, patients and families perceived connection-based physicians as less right and professional than non-connection-based physicians; the public (represented by the employee sample) perceived connection-based physicians as less right than non-connection-based physicians and negative outcomes to be caused more by connection-based physicians than non-connection-based physicians. CONCLUSIONS Medical reports can influence the perception of a physician's traits, which are important for trust. Positive reports promote evaluation of Rightness, Attribution, and Professionalism, whereas negative results may elicit the opposite effect, especially for connection-based physicians. PRACTICAL IMPLICATIONS Positive media images of physicians can help facilitate trust. Connection-based medical treatment should be reduced to improve access to medical resources in China.
Collapse
Affiliation(s)
- Yidi Chen
- Peking University, School of Psychological Cognitive Sciences, and Beijing Key Laboratory of Behavior and Mental Health, Beijing, People's Republic of China
| | - Zhengyu Sheng
- The Australian National University, ANU School of Medicine and Psychology, Canberra, Australia
| | - Han Xiao
- Peking University, School of Psychological Cognitive Sciences, and Beijing Key Laboratory of Behavior and Mental Health, Beijing, People's Republic of China
| | - Qi Liang
- King's College London, Institute of Psychiatry, Psychology & Neuroscience, London, UK
| | - Wenju Li
- Beijing Hospital, Department of Oncology, National Center of Gerontology, Beijing, People's Republic of China
| | - Yiqun Gan
- Peking University, School of Psychological Cognitive Sciences, and Beijing Key Laboratory of Behavior and Mental Health, Beijing, People's Republic of China.
| |
Collapse
|
25
|
Pelizzola M, Laursen R, Hobolth A. Model selection and robust inference of mutational signatures using Negative Binomial non-negative matrix factorization. BMC Bioinformatics 2023; 24:187. [PMID: 37158829 PMCID: PMC10165836 DOI: 10.1186/s12859-023-05304-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Accepted: 04/25/2023] [Indexed: 05/10/2023] Open
Abstract
BACKGROUND The spectrum of mutations in a collection of cancer genomes can be described by a mixture of a few mutational signatures. The mutational signatures can be found using non-negative matrix factorization (NMF). To extract the mutational signatures we have to assume a distribution for the observed mutational counts and a number of mutational signatures. In most applications, the mutational counts are assumed to be Poisson distributed, and the rank is chosen by comparing the fit of several models with the same underlying distribution and different values for the rank using classical model selection procedures. However, the counts are often overdispersed, and thus the Negative Binomial distribution is more appropriate. RESULTS We propose a Negative Binomial NMF with a patient specific dispersion parameter to capture the variation across patients and derive the corresponding update rules for parameter estimation. We also introduce a novel model selection procedure inspired by cross-validation to determine the number of signatures. Using simulations, we study the influence of the distributional assumption on our method together with other classical model selection procedures. We also present a simulation study with a method comparison where we show that state-of-the-art methods are highly overestimating the number of signatures when overdispersion is present. We apply our proposed analysis on a wide range of simulated data and on two real data sets from breast and prostate cancer patients. On the real data we describe a residual analysis to investigate and validate the model choice. CONCLUSIONS With our results on simulated and real data we show that our model selection procedure is more robust at determining the correct number of signatures under model misspecification. We also show that our model selection procedure is more accurate than the available methods in the literature for finding the true number of signatures. Lastly, the residual analysis clearly emphasizes the overdispersion in the mutational count data. The code for our model selection procedure and Negative Binomial NMF is available in the R package SigMoS and can be found at https://github.com/MartaPelizzola/SigMoS .
Collapse
Affiliation(s)
- Marta Pelizzola
- Department of Mathematics, Aarhus University, Aarhus, Denmark.
| | | | - Asger Hobolth
- Department of Mathematics, Aarhus University, Aarhus, Denmark
| |
Collapse
|
26
|
Pacheco VL, Bragagnolo L, Dalla Rosa F, Thomé A. Optimization of biocementation responses by artificial neural network and random forest in comparison to response surface methodology. Environ Sci Pollut Res Int 2023; 30:61863-61887. [PMID: 36934187 DOI: 10.1007/s11356-023-26362-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/11/2022] [Accepted: 03/05/2023] [Indexed: 05/10/2023]
Abstract
In this article, the optimization of the specific urease activity (SUA) and the calcium carbonate (CaCO3) using microbially induced calcite precipitation (MICP) was compared to optimization using three algorithms based on machine learning: random forest regressor, artificial neural networks (ANNs), and multivariate linear regression. This study applied the techniques in two existing response surface method (RSM) experiments involving MICP technique. Random forest-based models and artificial neural network-based models were submitted through the optimization of hyperparameters via cross-validation technique and grid search, to select the best-optimized model. For this study, the random forest-based algorithm is aimed at having the best performance of 0.9381 and 0.9463 in comparison to the original r2 of 0.9021 and 0.8530, respectively. This study is aimed at exploring the capability of using machine learning-based models in small datasets for the purpose of optimization of experimental variables in MICP technique and the meaningfulness of the models by their specificities in the small experimental datasets applied to experimental designs. This study is aimed at exploring the capability of using machine learning-based models in small datasets for experimental variable optimization in MICP technique. The use of these techniques can create prerogatives to scale and mitigate costs in future experiments associated to the field.
Collapse
Affiliation(s)
- Vinicius Luiz Pacheco
- Graduate Program in Civil and Environmental Engineering, University of Passo Fundo (UPF), Campus I, Km 171, BR 285, Passo Fundo, Rio Grande Do Sul, CEP: 99001-970, Brazil.
| | - Lucimara Bragagnolo
- Graduate Program in Civil and Environmental Engineering, University of Passo Fundo (UPF), Campus I, Km 171, BR 285, Passo Fundo, Rio Grande Do Sul, CEP: 99001-970, Brazil
| | - Francisco Dalla Rosa
- Graduate Program in Civil and Environmental Engineering, University of Passo Fundo (UPF), Campus I, Km 171, BR 285, Passo Fundo, Rio Grande Do Sul, CEP: 99001-970, Brazil
| | - Antonio Thomé
- Graduate Program in Civil and Environmental Engineering, University of Passo Fundo (UPF), Campus I, Km 171, BR 285, Passo Fundo, Rio Grande Do Sul, CEP: 99001-970, Brazil
| |
Collapse
|
27
|
Gu C, Li X. Prediction of disease-related miRNAs by voting with multiple classifiers. BMC Bioinformatics 2023; 24:177. [PMID: 37122001 PMCID: PMC10150488 DOI: 10.1186/s12859-023-05308-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2022] [Accepted: 04/26/2023] [Indexed: 05/02/2023] Open
Abstract
There is strong evidence to support that mutations and dysregulation of miRNAs are associated with a variety of diseases, including cancer. However, the experimental methods used to identify disease-related miRNAs are expensive and time-consuming. Effective computational approaches to identify disease-related miRNAs are in high demand and would aid in the detection of lncRNA biomarkers for disease diagnosis, treatment, and prevention. In this study, we develop an ensemble learning framework to reveal the potential associations between miRNAs and diseases (ELMDA). The ELMDA framework does not rely on the known associations when calculating miRNA and disease similarities and uses multi-classifiers voting to predict disease-related miRNAs. As a result, the average AUC of the ELMDA framework was 0.9229 for the HMDD v2.0 database in a fivefold cross-validation. All potential associations in the HMDD V2.0 database were predicted, and 90% of the top 50 results were verified with the updated HMDD V3.2 database. The ELMDA framework was implemented to investigate gastric neoplasms, prostate neoplasms and colon neoplasms, and 100%, 94%, and 90%, respectively, of the top 50 potential miRNAs were validated by the HMDD V3.2 database. Moreover, the ELMDA framework can predict isolated disease-related miRNAs. In conclusion, ELMDA appears to be a reliable method to uncover disease-associated miRNAs.
Collapse
Affiliation(s)
- Changlong Gu
- College of Information Science and Engineering, Hunan University, Changsha, 410082, Hunan, China.
| | - Xiaoying Li
- College of Information Science and Engineering, Hunan University, Changsha, 410082, Hunan, China.
| |
Collapse
|
28
|
Dutschmann TM, Kinzel L, Ter Laak A, Baumann K. Large-scale evaluation of k-fold cross-validation ensembles for uncertainty estimation. J Cheminform 2023; 15:49. [PMID: 37118768 PMCID: PMC10142532 DOI: 10.1186/s13321-023-00709-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2022] [Accepted: 03/10/2023] [Indexed: 04/30/2023] Open
Abstract
It is insightful to report an estimator that describes how certain a model is in a prediction, additionally to the prediction alone. For regression tasks, most approaches implement a variation of the ensemble method, apart from few exceptions. Instead of a single estimator, a group of estimators yields several predictions for an input. The uncertainty can then be quantified by measuring the disagreement between the predictions, for example by the standard deviation. In theory, ensembles should not only provide uncertainties, they also boost the predictive performance by reducing errors arising from variance. Despite the development of novel methods, they are still considered the "golden-standard" to quantify the uncertainty of regression models. Subsampling-based methods to obtain ensembles can be applied to all models, regardless whether they are related to deep learning or traditional machine learning. However, little attention has been given to the question whether the ensemble method is applicable to virtually all scenarios occurring in the field of cheminformatics. In a widespread and diversified attempt, ensembles are evaluated for 32 datasets of different sizes and modeling difficulty, ranging from physicochemical properties to biological activities. For increasing ensemble sizes with up to 200 members, the predictive performance as well as the applicability as uncertainty estimator are shown for all combinations of five modeling techniques and four molecular featurizations. Useful recommendations were derived for practitioners regarding the success and minimum size of ensembles, depending on whether predictive performance or uncertainty quantification is of more importance for the task at hand.
Collapse
Affiliation(s)
- Thomas-Martin Dutschmann
- Institute of Medicinal and Pharmaceutical Chemistry, University of Technology Braunschweig, Beethovenstrasse 55, 38106, Brunswick, Germany
| | - Lennart Kinzel
- Institute of Medicinal and Pharmaceutical Chemistry, University of Technology Braunschweig, Beethovenstrasse 55, 38106, Brunswick, Germany
| | - Antonius Ter Laak
- Bayer AG, Research & Development, Pharmaceuticals, Muellerstrasse 178, 13353, Berlin, Germany
| | - Knut Baumann
- Institute of Medicinal and Pharmaceutical Chemistry, University of Technology Braunschweig, Beethovenstrasse 55, 38106, Brunswick, Germany.
| |
Collapse
|
29
|
Tao J, Yin X, Yao X, Cheng Z, Yan B, Chen G. Prediction of NH 3 and HCN yield from biomass fast pyrolysis: Machine learning modeling and evaluation. Sci Total Environ 2023; 885:163743. [PMID: 37116814 DOI: 10.1016/j.scitotenv.2023.163743] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/31/2022] [Revised: 04/11/2023] [Accepted: 04/21/2023] [Indexed: 05/12/2023]
Abstract
Rapid pyrolysis is a promising technique to convert biomass into fuel oil, where NOX emission remains a substantial environmental risk. NH3 and HCN are top precursors for NOX emission. In order to clarify their migration path and provide appropriate strategies for their controlling, six up-to-date machine learning (ML) models were established to predict the NH3 and HCN yield during rapid pyrolysis of 26 biomass feedstocks. Cross-validation and grid search methods were used to determine the optimal hyperparameters for these ML models. The support vector regression (SVR) model achieved optimal accuracy among them. The optimal root means square error (%), mean absolute error (%), and R2 of test set for NH3/HCN yield were 1.2901/1.1531, 1.0501/0.84712, and 0.98253/0.96152, respectively. In addition, based on the results of Pearson correlation analysis, the input variables with a weak linear correlation with the target product were eliminated, which was found capable of improving the prediction accuracy of almost all ML models except SVR. While after input variables elimination, the SVR model still showed the optimal NH3 and HCN yield prediction accuracy. It reflects SVR's great significance and potential for predicting the yield of NOX precursors during rapid biomass pyrolysis.
Collapse
Affiliation(s)
- Junyu Tao
- School of Mechanical Engineering, Tianjin University of Commerce, Tianjin 300134, China
| | - Xiaoxiao Yin
- School of Environmental Science and Engineering, Tianjin University, Tianjin 300350, China
| | - Xilei Yao
- School of Mechanical Engineering, Tianjin University of Commerce, Tianjin 300134, China
| | - Zhanjun Cheng
- School of Environmental Science and Engineering, Tianjin University, Tianjin 300350, China; Tianjin Engineering Research Center for Organic Wastes Safe Disposal and Energy Utilization/Key Laboratory of Efficient Utilization of Low and Medium Energy of Ministry of Education/Tianjin Key Lab of Biomass/Wastes Utilization, Tianjin, 300072, China.
| | - Beibei Yan
- School of Environmental Science and Engineering, Tianjin University, Tianjin 300350, China; Tianjin Engineering Research Center for Organic Wastes Safe Disposal and Energy Utilization/Key Laboratory of Efficient Utilization of Low and Medium Energy of Ministry of Education/Tianjin Key Lab of Biomass/Wastes Utilization, Tianjin, 300072, China
| | - Guanyi Chen
- School of Mechanical Engineering, Tianjin University of Commerce, Tianjin 300134, China; Tianjin Engineering Research Center for Organic Wastes Safe Disposal and Energy Utilization/Key Laboratory of Efficient Utilization of Low and Medium Energy of Ministry of Education/Tianjin Key Lab of Biomass/Wastes Utilization, Tianjin, 300072, China
| |
Collapse
|
30
|
Rahardiantoro S, Sakamoto W. Spatio-temporal clustering analysis using generalized lasso with an application to reveal the spread of Covid-19 cases in Japan. Comput Stat 2023:1-25. [PMID: 37360994 PMCID: PMC10089565 DOI: 10.1007/s00180-023-01331-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Accepted: 01/27/2023] [Indexed: 06/28/2023]
Abstract
This study addressed the issue of determining multiple potential clusters with regularization approaches for the purpose of spatio-temporal clustering. The generalized lasso framework has flexibility to incorporate adjacencies between objects in the penalty matrix and to detect multiple clusters. A generalized lasso model with two L 1 penalties is proposed, which can be separated into two generalized lasso models: trend filtering of temporal effect and fused lasso of spatial effect for each time point. To select the tuning parameters, the approximate leave-one-out cross-validation (ALOCV) and generalized cross-validation (GCV) are considered. A simulation study is conducted to evaluate the proposed method compared to other approaches in different problems and structures of multiple clusters. The generalized lasso with ALOCV and GCV provided smaller MSE in estimating the temporal and spatial effect compared to unpenalized method, ridge, lasso, and generalized ridge. In temporal effects detection, the generalized lasso with ALOCV and GCV provided relatively smaller and more stable MSE than other methods, for different structure of true risk values. In spatial effects detection, the generalized lasso with ALOCV provided higher index of edges detection accuracy. The simulation also suggested using a common tuning parameter over all time points in spatial clustering. Finally, the proposed method was applied to the weekly Covid-19 data in Japan form March 21, 2020, to September 11, 2021, along with the interpretation of dynamic behavior of multiple clusters.
Collapse
Affiliation(s)
- Septian Rahardiantoro
- Department of Human Ecology, Graduate School of Environmental and Life Science, Okayama University, Okayama, 700-8350 Japan
- Department of Statistics, Faculty of Mathematics and Natural Science, IPB University, Bogor, 16680 Indonesia
| | - Wataru Sakamoto
- Department of Human Ecology, Graduate School of Environmental and Life Science, Okayama University, Okayama, 700-8350 Japan
| |
Collapse
|
31
|
Mahanty B, Lhamo P, Sahoo NK. Inconsistency of PCA-based water quality index - Does it reflect the quality? Sci Total Environ 2023; 866:161353. [PMID: 36603615 DOI: 10.1016/j.scitotenv.2022.161353] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/13/2022] [Revised: 12/29/2022] [Accepted: 12/29/2022] [Indexed: 06/17/2023]
Abstract
The formalization of a stable water quality index (WQI) from measured hydrogeochemical parameters is essential for the identification and classification of water resources. In the principal component analysis (PCA) based WQI approach, the parameter weight is derived using either PC loading or rotated factor loading from a large number of samples pooled for WQI measurement. The PCA-based approach is paradoxical, as the calculated WQI rating of a sample would rather be dependent on the size, and composition of the population. Though this issue is well anticipated, no attempt has been made to regularize or measure the extent of WQI disagreement. In the present study, the WQI of 106 groundwater samples analyzed for 12 different hydrochemical parameters were modelled using PC loading or rotated factor loading (referred to as PCQ-1, PCQ-2, respectively) approach. Analysis reveals PCQ-1 to be positively biased in 78 % of samples and rating disagreements were evident in 9.43 % of samples. WQI of the data set was estimated using repeated (1000) random non-overlapping 2 to 5-fold data partitioning (containing 21 to 83 samples in each fold) adopting either an in-sample (test set) or out-sample (train set) modelling approach. The mean of WQI deviations in repeated resampling from the reference (i.e., using the entire dataset) has been positive in most of the samples using the PCQ-1 model, irrespective of the fold partition size. The median root mean square deviation values of the data set increased with the number of fold partitioning for in-sample calibration for both PCQ-1 and PCQ-2 approaches. The exclusion of a single water quality parameter from the PCA model can cause up to a 60 % deviation of the WQI score in some water samples. The cross-validation and Monte Carlo resampling approach can serve as a framework to test the stability of PCA-based WQI.
Collapse
Affiliation(s)
- Biswanath Mahanty
- Department of Biotechnology, Karunya Institute of Technology and Sciences, Coimbatore 641114, India.
| | - Pema Lhamo
- Department of Biotechnology, Karunya Institute of Technology and Sciences, Coimbatore 641114, India
| | - Naresh K Sahoo
- Department of Chemistry, Environmental Science Program, Siksha 'O' Anusandhan (Deemed to University), Bhubaneswar, Odisha, India
| |
Collapse
|
32
|
Yang X, Zhang X, Zhang P, Bidegain G, Dong J, Hu C, Li M, Zhang Z, Guo H. Ensemble habitat suitability modeling for predicting optimal sites for eelgrass (Zostera marina) in the tidal lagoon ecosystem: Implications for restoration and conservation. J Environ Manage 2023; 330:117108. [PMID: 36584472 DOI: 10.1016/j.jenvman.2022.117108] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/19/2022] [Revised: 12/18/2022] [Accepted: 12/20/2022] [Indexed: 06/17/2023]
Abstract
Seagrass systems are in decline, mainly due to anthropogenic pressures and ongoing climate change. Implementing seagrass protection and restoration measures requires accurate assessment of suitable habitats. Commonly, such assessments have been performed using single-algorithm habitat suitability models, nearly always based on low environmental resolution information and short-term species data series. Here we address eelgrass (Zoostera marina) meadows' large-scale decline (>80%) in Shandong province (Yellow Sea, China) by developing an ensemble habitat model (EHM) to inform eelgrass conservation and restoration strategies in the Swan Lake (SL). For this, we applied a weighted EHM derived from ten single-algorithm models including profile, regression, classification, and machine learning methods to generate a high-resolution habitat suitability map. The EHM was constructed based on the predictive performances of each model, by combining a series of present-absent eelgrass datasets from recent years coupled with oceanographic and sediment data. The model was cross-validated with independent historical datasets, and a final habitat suitability map for conservation and restoration was generated. Our EHM scheme outperformed all single models in terms of habitat suitability, scoring ∼0.95 for both true statistic skill (TSS) and area under the curve (AUC) performance criteria. Machine learning methods outperformed profile, regression and classification methods. Regarding model explanatory variables, overall, topographic characteristics such as depth (DEP) and seafloor slope (SSL) are the most significant factors determining the distribution of eelgrass. The EHM predicted that the overlapping area was almost 90% of the current eelgrass habitat. Using results from our EHM, a LOESS regression model for the relationship of the habitat suitability to both the biomass and density of Z. marina outperformed better than the classic Ordinary Least Squares regression model. The EHM is a promising tool for supporting eelgrass protection and restoration areas in temperate lagoons as data availability improves.
Collapse
Affiliation(s)
- Xiaolong Yang
- Fishery College, Zhejiang Ocean University, Zhoushan, 316022, China; State Environmental Protection Key Laboratory of Coastal Ecosystem, National Marine Environmental Monitoring Center, Dalian, 116023, China
| | - Xiumei Zhang
- Fishery College, Zhejiang Ocean University, Zhoushan, 316022, China.
| | - Peidong Zhang
- The Key Laboratory of Mariculture, Ministry of Education, Ocean University of China, Qingdao, China
| | - Gorka Bidegain
- Department of Applied Mathematics, Engineering School of Bilbao, University of the Basque Country (UPV/EHU), Ingeniero Torres Quevedo s/n, 48013, Bilbao, Spain; Research Center for Experimental Marine Biology and Biotechnology, Plentzia Marine Station, University of the Basque Country (PiE-UPV/EHU), Areatza Pasealekua, 48620, Plentzia, Spain
| | - Jianyu Dong
- The Key Laboratory of Mariculture, Ministry of Education, Ocean University of China, Qingdao, China
| | - Chengye Hu
- Fishery College, Zhejiang Ocean University, Zhoushan, 316022, China
| | - Min Li
- The Institute for Advanced Study of Coastal Ecology, Ludong University, Yantai, 264025, China
| | - Zhixin Zhang
- CAS Key Laboratory of Tropical Marine Bio-resources and Ecology, South China Sea Institute of Oceanology, Chinese Academy of Sciences, Guangzhou, 510301, China
| | - Hao Guo
- State Environmental Protection Key Laboratory of Coastal Ecosystem, National Marine Environmental Monitoring Center, Dalian, 116023, China
| |
Collapse
|
33
|
Meng X, Wang F, Gao X, Wang B, Xu X, Wang Y, Wang W, Zeng Q. Association of IgG N-glycomics with prevalent and incident type 2 diabetes mellitus from the paradigm of predictive, preventive, and personalized medicine standpoint. EPMA J 2023; 14:1-20. [PMID: 36866157 PMCID: PMC9971369 DOI: 10.1007/s13167-022-00311-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Accepted: 12/12/2022] [Indexed: 12/25/2022]
Abstract
Objectives Type 2 diabetes mellitus (T2DM), a major metabolic disorder, is expanding at a rapidly rising worldwide prevalence and has emerged as one of the most common chronic diseases. Suboptimal health status (SHS) is considered a reversible intermediate state between health and diagnosable disease. We hypothesized that the time frame between the onset of SHS and the clinical manifestation of T2DM is the operational area for the application of reliable risk assessment tools, such as immunoglobulin G (IgG) N-glycans. From the viewpoint of predictive, preventive, and personalized medicine (PPPM/3PM), the early detection of SHS and dynamic monitoring by glycan biomarkers could provide a window of opportunity for targeted prevention and personalized treatment of T2DM. Methods Case-control and nested case-control studies were performed and consisted of 138 and 308 participants, respectively. The IgG N-glycan profiles of all plasma samples were detected by an ultra-performance liquid chromatography instrument. Results After adjustment for confounders, 22, five, and three IgG N-glycan traits were significantly associated with T2DM in the case-control setting, baseline SHS, and baseline optimal health participants from the nested case-control setting, respectively. Adding the IgG N-glycans to the clinical trait models, the average area under the receiver operating characteristic curves (AUCs) of the combined models based on repeated 400 times fivefold cross-validation differentiating T2DM from healthy individuals were 0.807 in the case-control setting and 0.563, 0.645, and 0.604 in the pooled samples, baseline SHS, and baseline optimal health samples of nested case-control setting, respectively, which presented moderate discriminative ability and were generally better than models with either glycans or clinical features alone. Conclusions This study comprehensively illustrated that the observed altered IgG N-glycosylation, i.e., decreased galactosylation and fucosylation/sialylation without bisecting GlcNAc, as well as increased galactosylation and fucosylation/sialylation with bisecting GlcNAc, reflects a pro-inflammatory state of T2DM. SHS is an important window period of early intervention for individuals at risk for T2DM; glycomic biosignatures as dynamic biomarkers have the ability to identify populations at risk for T2DM early, and the combination of evidence could provide suggestive ideas and valuable insight for the PPPM of T2DM. Supplementary information The online version contains supplementary material available at 10.1007/s13167-022-00311-3.
Collapse
Affiliation(s)
- Xiaoni Meng
- Beijing Key Laboratory of Clinical Epidemiology, School of Public Health, Capital Medical University, 10 Youanmen, Fengtai District, Beijing, 100069 China
| | - Fei Wang
- Health Management Institute, Second Medical Center & National Clinical Research Center for Geriatric Diseases, Chinese People’s Liberation Army General Hospital, 28 Fuxing Road, Haidian District, Beijing, 100853 China
| | - Xiangyang Gao
- Health Management Institute, Second Medical Center & National Clinical Research Center for Geriatric Diseases, Chinese People’s Liberation Army General Hospital, 28 Fuxing Road, Haidian District, Beijing, 100853 China
| | - Biyan Wang
- Beijing Key Laboratory of Clinical Epidemiology, School of Public Health, Capital Medical University, 10 Youanmen, Fengtai District, Beijing, 100069 China
| | - Xizhu Xu
- School of Public Health, Shandong First Medical University and Shandong Academy of Medical Sciences, Jinan, 250117 China
| | - Youxin Wang
- Beijing Key Laboratory of Clinical Epidemiology, School of Public Health, Capital Medical University, 10 Youanmen, Fengtai District, Beijing, 100069 China
| | - Wei Wang
- Beijing Key Laboratory of Clinical Epidemiology, School of Public Health, Capital Medical University, 10 Youanmen, Fengtai District, Beijing, 100069 China
- School of Public Health, Shandong First Medical University and Shandong Academy of Medical Sciences, Jinan, 250117 China
- Centre for Precision Health, Edith Cowan University, 270 Joondalup Drive, Joondalup, Perth, WA 6027 Australia
| | - Qiang Zeng
- Health Management Institute, Second Medical Center & National Clinical Research Center for Geriatric Diseases, Chinese People’s Liberation Army General Hospital, 28 Fuxing Road, Haidian District, Beijing, 100853 China
| |
Collapse
|
34
|
Warschburger P, Behrend N. Further evaluation of the psychometric properties of the German version of the Body Appreciation Scale-2 (BAS-2): Cross-validation, measurement invariance, and population-based norms. Body Image 2023; 45:105-16. [PMID: 36867965 DOI: 10.1016/j.bodyim.2023.02.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Revised: 02/13/2023] [Accepted: 02/13/2023] [Indexed: 03/05/2023]
Abstract
Using a representative sample of the German general population (N = 2509, 16-74 years), this work aimed to cross-validate the modified one-factor model recently reported for the German Body Appreciation Scale 2 (BAS-2). We also examined measurement invariance across gender, tested differential item functioning across age and BMI, systematically evaluated subgroup differences, and provided norms according to subgroups. Overall, the BAS-2 demonstrates good internal consistency. Cross-validation supported the generalizability of the modified one-factor model. Multi-group confirmatory factor analyses supported full scalar invariance across gender; comparisons revealed higher scores among men compared to women with a small effect size. Age (only women) and BMI (both genders) significantly predicted latent BAS-2 scores. Of note, differential item functioning for age and BMI was observed. Concerning manifest group differences, we found a significant main effect of weight status: Individuals with obesity reported the lowest body appreciation levels, while individuals with underweight/ normal weight reported highest levels. Our findings suggest that the German BAS-2 has good psychometric properties and is suitable for examining body appreciation across gender among German women and men. Moreover, norm values enable future usage of the scale in health and clinical research by providing reference data for interpretation.
Collapse
|
35
|
Coley RY, Liao Q, Simon N, Shortreed SM. Empirical evaluation of internal validation methods for prediction in large-scale clinical data with rare-event outcomes: a case study in suicide risk prediction. BMC Med Res Methodol 2023; 23:33. [PMID: 36721082 PMCID: PMC9890785 DOI: 10.1186/s12874-023-01844-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2022] [Accepted: 01/17/2023] [Indexed: 02/02/2023] Open
Abstract
BACKGROUND There is increasing interest in clinical prediction models for rare outcomes such as suicide, psychiatric hospitalizations, and opioid overdose. Accurate model validation is needed to guide model selection and decisions about whether and how prediction models should be used. Split-sample estimation and validation of clinical prediction models, in which data are divided into training and testing sets, may reduce predictive accuracy and precision of validation. Using all data for estimation and validation increases sample size for both procedures, but validation must account for overfitting, or optimism. Our study compared split-sample and entire-sample methods for estimating and validating a suicide prediction model. METHODS We compared performance of random forest models estimated in a sample of 9,610,318 mental health visits ("entire-sample") and in a 50% subset ("split-sample") as evaluated in a prospective validation sample of 3,754,137 visits. We assessed optimism of three internal validation approaches: for the split-sample prediction model, validation in the held-out testing set and, for the entire-sample model, cross-validation and bootstrap optimism correction. RESULTS The split-sample and entire-sample prediction models showed similar prospective performance; the area under the curve, AUC, and 95% confidence interval was 0.81 (0.77-0.85) for both. Performance estimates evaluated in the testing set for the split-sample model (AUC = 0.85 [0.82-0.87]) and via cross-validation for the entire-sample model (AUC = 0.83 [0.81-0.85]) accurately reflected prospective performance. Validation of the entire-sample model with bootstrap optimism correction overestimated prospective performance (AUC = 0.88 [0.86-0.89]). Measures of classification accuracy, including sensitivity and positive predictive value at the 99th, 95th, 90th, and 75th percentiles of the risk score distribution, indicated similar conclusions: bootstrap optimism correction overestimated classification accuracy in the prospective validation set. CONCLUSIONS While previous literature demonstrated the validity of bootstrap optimism correction for parametric models in small samples, this approach did not accurately validate performance of a rare-event prediction model estimated with random forests in a large clinical dataset. Cross-validation of prediction models estimated with all available data provides accurate independent validation while maximizing sample size.
Collapse
Affiliation(s)
- R Yates Coley
- Kaiser Permanente Washington Health Research Institute, 1730 Minor Ave. #1600, Seattle, WA, 98101, USA. .,Department of Biostatistics, University of Washington, Seattle, WA, USA.
| | - Qinqing Liao
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Noah Simon
- Kaiser Permanente Washington Health Research Institute, 1730 Minor Ave. #1600, Seattle, WA, 98101, USA.,Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Susan M Shortreed
- Kaiser Permanente Washington Health Research Institute, 1730 Minor Ave. #1600, Seattle, WA, 98101, USA.,Department of Biostatistics, University of Washington, Seattle, WA, USA
| |
Collapse
|
36
|
Koh J. Gradient boosting with extreme-value theory for wildfire prediction. Extremes (Boston) 2023; 26:273-299. [PMID: 37091211 PMCID: PMC10115709 DOI: 10.1007/s10687-022-00454-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/03/2022] [Revised: 10/28/2022] [Accepted: 10/31/2022] [Indexed: 05/03/2023]
Abstract
This paper details the approach of the team Kohrrelation in the 2021 Extreme Value Analysis data challenge, dealing with the prediction of wildfire counts and sizes over the contiguous US. Our approach uses ideas from extreme-value theory in a machine learning context with theoretically justified loss functions for gradient boosting. We devise a spatial cross-validation scheme and show that in our setting it provides a better proxy for test set performance than naive cross-validation. The predictions are benchmarked against boosting approaches with different loss functions, and perform competitively in terms of the score criterion, finally placing second in the competition ranking.
Collapse
Affiliation(s)
- Jonathan Koh
- Institute of Mathematics, EPFL, Lausanne, Switzerland
- Institute of Mathematical Statistics and Actuarial Science, Oeschger Centre for Climate Change Research, University of Bern, Bern, Switzerland
| |
Collapse
|
37
|
Guo B, Wu H, Pei L, Zhu X, Zhang D, Wang Y, Luo P. Study on the spatiotemporal dynamic of ground-level ozone concentrations on multiple scales across China during the blue sky protection campaign. Environ Int 2022; 170:107606. [PMID: 36335896 DOI: 10.1016/j.envint.2022.107606] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/30/2022] [Revised: 10/25/2022] [Accepted: 10/26/2022] [Indexed: 06/16/2023]
Abstract
Surface ozone (O3), one of the harmful air pollutants, generated significantly negative effects on human health and plants. Existing O3 datasets with coarse spatiotemporal resolution and limited coverage, and the uncertainties of O3 influential factors seriously restrain related epidemiology and air pollution studies. To tackle above issues, we proposed a novel scheme to estimate daily O3 concentrations on a fine grid scale (1 km × 1 km) from 2018 to 2020 across China based on machine learning methods using hourly observed ground-level pollutant concentrations data, meteorological data, satellite data, and auxiliary data including digital elevation model (DEM), land use data (LUD), normalized difference vegetation index (NDVI), population (POP), and nighttime light images (NTL), and to identify the difference of influential factors of O3 on diverse urbanization and topography conditions. Some findings were achieved. The correlation coefficients (R2) between O3 concentrations and surface net solar radiation (SNSR), boundary layer height (BLH), 2 m temperature (T2M), 10 m v-component (MVW), and NDVI were 0.80, 0.40, 0.35, 0.30, and 0.20, respectively. The random forest (RF) demonstrated the highest validation R2 (0.86) and lowest validation RMSE (13.74 μg/m3) in estimating O3 concentrations, followed by support vector machine (SVM) (R2 = 0.75, RMSE = 18.39 μg/m3), backpropagation neural network (BP) (R2 = 0.74, RMSE = 19.26 μg/m3), and multiple linear regression (MLR) (R2 = 0.52, RMSE = 25.99 μg/m3). Our China High-Resolution O3 Dataset (CHROD) exhibited an acceptable accuracy at different spatial-temporal scales. Additionally, O3 concentrations showed decreasing trend and represented obviously spatiotemporal heterogeneity across China from 2018 to 2020. Overall, O3 was mainly affected by human activities in higher urbanization regions, while O3 was mainly controlled by meteorological factors, vegetation coverage, and elevation in lower urbanization regions. The scheme of this study is useful and valuable in understanding the mechanism of O3 formation and improving the quality of the O3 dataset.
Collapse
Affiliation(s)
- Bin Guo
- College of Geomatics, Xi'an University of Science and Technology, Xi'an, Shaanxi 710054, China.
| | - Haojie Wu
- College of Geomatics, Xi'an University of Science and Technology, Xi'an, Shaanxi 710054, China
| | - Lin Pei
- School of Exercise and Health Sciences, Xi'an Physical Education University, Xi'an, Shaanxi 710068, China; School of Public Health, Xi'an Jiaotong University, Xi'an, Shaanxi 710043, China.
| | - Xiaowei Zhu
- Department of Mechanical and Materials Engineering, Portland State University, Portland, OR 97207, USA.
| | - Dingming Zhang
- College of Geomatics, Xi'an University of Science and Technology, Xi'an, Shaanxi 710054, China
| | - Yan Wang
- School of Geography and Tourism, Shaanxi Normal University, Xi'an, Shaanxi 710119, China
| | - Pingping Luo
- School of Water and Environment, Chang'an University, Xi'an, Shaanxi 710054, China.
| |
Collapse
|
38
|
Parekh P, Vivek Bhalerao G, John JP, Venkatasubramanian G. Sample size requirement for achieving multisite harmonization using structural brain MRI features. Neuroimage 2022; 264:119768. [PMID: 36435343 PMCID: PMC7615107 DOI: 10.1016/j.neuroimage.2022.119768] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2022] [Accepted: 11/22/2022] [Indexed: 11/25/2022] Open
Abstract
When data is pooled across multiple sites, the extracted features are confounded by site effects. Harmonization methods attempt to correct these site effects while preserving the biological variability within the features. However, little is known about the sample size requirement for effectively learning the harmonization parameters and their relationship with the increasing number of sites. In this study, we performed experiments to find the minimum sample size required to achieve multisite harmonization (using neuroHarmonize) using volumetric and surface features by leveraging the concept of learning curves. Our first two experiments show that site-effects are effectively removed in a univariate and multivariate manner; however, it is essential to regress the effect of covariates from the harmonized data additionally. Our following two experiments with actual and simulated data showed that the minimum sample size required for achieving harmonization grows with the increasing average Mahalanobis distances between the sites and their reference distribution. We conclude by positing a general framework to understand the site effects using the Mahalanobis distance. Further, we provide insights on the various factors in a cross-validation design to achieve optimal inter-site harmonization.
Collapse
Affiliation(s)
- Pravesh Parekh
- NORMENT, Division of Mental Health and Addiction, Oslo University Hospital & Institute of Clinical Medicine, University of Oslo, Oslo, Norway; ADBS Neuroimaging Centre, National Institute of Mental Health and Neurosciences (NIMHANS), Bangalore, India; Department of Psychiatry, National Institute of Mental Health and Neurosciences (NIMHANS), Bangalore, India
| | - Gaurav Vivek Bhalerao
- Translational Psychiatry Lab, National Institute of Mental Health and Neurosciences (NIMHANS), Bangalore, India; ADBS Neuroimaging Centre, National Institute of Mental Health and Neurosciences (NIMHANS), Bangalore, India; Department of Psychiatry, National Institute of Mental Health and Neurosciences (NIMHANS), Bangalore, India; Department of Psychiatry, University of Oxford, United Kingdom
| | - John P John
- NORMENT, Division of Mental Health and Addiction, Oslo University Hospital & Institute of Clinical Medicine, University of Oslo, Oslo, Norway; ADBS Neuroimaging Centre, National Institute of Mental Health and Neurosciences (NIMHANS), Bangalore, India; Department of Psychiatry, National Institute of Mental Health and Neurosciences (NIMHANS), Bangalore, India.
| | - G Venkatasubramanian
- Translational Psychiatry Lab, National Institute of Mental Health and Neurosciences (NIMHANS), Bangalore, India; ADBS Neuroimaging Centre, National Institute of Mental Health and Neurosciences (NIMHANS), Bangalore, India; Department of Psychiatry, National Institute of Mental Health and Neurosciences (NIMHANS), Bangalore, India.
| |
Collapse
|
39
|
Wu L, Yang S. Transfer learning of individualized treatment rules from experimental to real-world data. J Comput Graph Stat 2022; 32:1036-1045. [PMID: 37997592 PMCID: PMC10664843 DOI: 10.1080/10618600.2022.2141752] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2021] [Accepted: 10/04/2022] [Indexed: 11/06/2022]
Abstract
Individualized treatment effect lies at the heart of precision medicine. Interpretable individualized treatment rules (ITRs) are desirable for clinicians or policymakers due to their intuitive appeal and transparency. The gold-standard approach to estimating the ITRs is randomized experiments, where subjects are randomized to different treatment groups and the confounding bias is minimized to the extent possible. However, experimental studies are limited in external validity because of their selection restrictions, and therefore the underlying study population is not representative of the target real-world population. Conventional learning methods of optimal interpretable ITRs for a target population based only on experimental data are biased. On the other hand, real-world data (RWD) are becoming popular and provide a representative sample of the target population. To learn the generalizable optimal interpretable ITRs, we propose an integrative transfer learning method based on weighting schemes to calibrate the covariate distribution of the experiment to that of the RWD. Theoretically, we establish the risk consistency for the proposed ITR estimator. Empirically, we evaluate the finite-sample performance of the transfer learner through simulations and apply it to a real data application of a job training program.
Collapse
Affiliation(s)
- Lili Wu
- Department of Statistics, North Carolina State University
| | - Shu Yang
- Department of Statistics, North Carolina State University
| |
Collapse
|
40
|
Shan G. Monte Carlo cross-validation for a study with binary outcome and limited sample size. BMC Med Inform Decis Mak 2022; 22:270. [PMID: 36253749 PMCID: PMC9578204 DOI: 10.1186/s12911-022-02016-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2021] [Accepted: 10/10/2022] [Indexed: 11/26/2022] Open
Abstract
Cross-validation (CV) is a resampling approach to evaluate machine learning models when sample size is limited. The number of all possible combinations of folds for the training data, known as CV rounds, are often very small in leave-one-out CV. Alternatively, Monte Carlo cross-validation (MCCV) can be performed with a flexible number of simulations when computational resources are feasible for a study with limited sample size. We conduct extensive simulation studies to compare accuracy between MCCV and CV with the same number of simulations for a study with binary outcome (e.g., disease progression or not). Accuracy of MCCV is generally higher than CV although the gain is small. They have similar performance when sample size is large. Meanwhile, MCCV is going to provide reliable performance metrics as the number of simulations increases. Two real examples are used to illustrate the comparison between MCCV and CV.
Collapse
Affiliation(s)
- Guogen Shan
- Department of Biostatistics, University of Florida, Gainesville, FL, 32610, USA.
| |
Collapse
|
41
|
Akbulut Özen S, Yesilkanat CM, Özen M, Başsarı A, Taşkın H. Health risk assessment of soil trace elements using the Sequential Gaussian Simulation approach. Environ Sci Pollut Res Int 2022; 29:72683-72698. [PMID: 35610455 DOI: 10.1007/s11356-022-20974-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/18/2022] [Accepted: 05/17/2022] [Indexed: 06/15/2023]
Abstract
In this study, the performance of the Sequential Gaussian Simulation (SGS) approach was studied with the aim of accurately determining local health risk distributions associated with trace elements (V, Cr, Mn, Co, Ni, Cu, Zn, As, and Pb). This study plays a crucial role in determining the distribution of health risk levels, especially from heavy metals. In the SGS approach, health risk levels (non-carcinogenic and carcinogenic) were calculated for pixel sizes of 250 × 250 m2. Results were compared to the conventional Ordinary Kriging (OK) method. The cross-validation performances of both methods were compared. Non-carcinogenic health risks calculated according to SGS and OK for children were, respectively, ρc: 0.57 and 0.23, RMSE: 0.45 and 0.57, and MAE: 0.33 and 0.43. In the case of adults, non-carcinogenic SGS and OK results were, respectively, ρc: 0.53 and 0.24, RMSE: 0.06 and 0.07, and MAE: 0.04 and 0.05 for adults. Carcinogenic health risk estimates obtained by SGS and OK were, respectively, ρc: 0.72 and 0.31, RMSE: 4.1 × 10-5 and 5.8 × 10-5, and MAE: 3.2 × 10-5 and 4.3 × 10-5 in the case of children, and in the case of adults the results were, respectively, ρc: 0.71 and 0.30, RMSE: 5 × 10-6 and 4.3 × 10-6, and MAE: 4 × 10-6 and 5 × 10-6. These results indicated that SGS offered a more accurate approach in determining health risk distributions.
Collapse
Affiliation(s)
- Songül Akbulut Özen
- Department of Physics, Faculty of Engineering and Natural Sciences, Bursa Technical University, Bursa, Turkey.
| | | | - Murat Özen
- Department of Chemistry, Faculty of Engineering and Natural Sciences, Bursa Technical University, Bursa, Turkey
| | - Asiye Başsarı
- Cekmece Nuclear Research and Training Center, Turkish Atomic Energy Authority (TAEK), Istanbul, Turkey
| | - Halim Taşkın
- Cekmece Nuclear Research and Training Center, Turkish Atomic Energy Authority (TAEK), Istanbul, Turkey
| |
Collapse
|
42
|
Vieira BH, Liem F, Dadi K, Engemann DA, Gramfort A, Bellec P, Craddock RC, Damoiseaux JS, Steele CJ, Yarkoni T, Langer N, Margulies DS, Varoquaux G. Predicting future cognitive decline from non-brain and multimodal brain imaging data in healthy and pathological aging. Neurobiol Aging 2022; 118:55-65. [PMID: 35878565 PMCID: PMC9853405 DOI: 10.1016/j.neurobiolaging.2022.06.008] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2020] [Revised: 06/21/2022] [Accepted: 06/23/2022] [Indexed: 01/24/2023]
Abstract
Previous literature has focused on predicting a diagnostic label from structural brain imaging. Since subtle changes in the brain precede a cognitive decline in healthy and pathological aging, our study predicts future decline as a continuous trajectory instead. Here, we tested whether baseline multimodal neuroimaging data improve the prediction of future cognitive decline in healthy and pathological aging. Nonbrain data (demographics, clinical, and neuropsychological scores), structural MRI, and functional connectivity data from OASIS-3 (N = 662; age = 46-96 years) were entered into cross-validated multitarget random forest models to predict future cognitive decline (measured by CDR and MMSE), on average 5.8 years into the future. The analysis was preregistered, and all analysis code is publicly available. Combining non-brain with structural data improved the continuous prediction of future cognitive decline (best test-set performance: R2 = 0.42). Cognitive performance, daily functioning, and subcortical volume drove the performance of our model. Including functional connectivity did not improve predictive accuracy. In the future, the prognosis of age-related cognitive decline may enable earlier and more effective individualized cognitive, pharmacological, and behavioral interventions.
Collapse
Affiliation(s)
- Bruno Hebling Vieira
- Methods of Plasticity Research, Department of Psychology, University of Zurich, Zurich, Switzerland,Neuroscience Center Zurich (ZNZ), University of Zurich & ETH Zurich, Zurich, Switzerland,Corresponding author. (B. Hebling Vieira)
| | - Franziskus Liem
- University Research Priority Program “Dynamics of Healthy Aging”, University of Zurich, Zurich, Switzerland
| | | | - Denis A. Engemann
- UniversitéParis-Saclay, Inria, CEA, Palaiseau, France,Department of Neurology, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
| | | | - Pierre Bellec
- Functional Neuroimaging Unit, Geriatric Institute, University of Montreal, Montreal, Quebec, Canada
| | | | - Jessica S. Damoiseaux
- Institute of Gerontology and the Department of Psychology, Wayne State University, Detroit, MI, USA
| | | | - Tal Yarkoni
- Department of Psychology, The University of Texas, Austin, TX, USA
| | - Nicolas Langer
- Methods of Plasticity Research, Department of Psychology, University of Zurich, Zurich, Switzerland,Neuroscience Center Zurich (ZNZ), University of Zurich & ETH Zurich, Zurich, Switzerland,University Research Priority Program “Dynamics of Healthy Aging”, University of Zurich, Zurich, Switzerland
| | - Daniel S. Margulies
- Cognitive Neuroanatomy Lab, Institut du Cerveau et de la Moelle épinière, Paris, France
| | | |
Collapse
|
43
|
Anilkumar C, Sunitha NC, Devate NB, Ramesh S. Advances in integrated genomic selection for rapid genetic gain in crop improvement: a review. Planta 2022; 256:87. [PMID: 36149531 DOI: 10.1007/s00425-022-03996-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/20/2021] [Accepted: 09/11/2022] [Indexed: 06/16/2023]
Abstract
Genomic selection and its importance in crop breeding. Integration of GS with new breeding tools and developing SOP for GS to achieve maximum genetic gain with low cost and time. The success of conventional breeding approaches is not sufficient to meet the demand of a growing population for nutritious food and other plant-based products. Whereas, marker assisted selection (MAS) is not efficient in capturing all the favorable alleles responsible for economic traits in the process of crop improvement. Genomic selection (GS) developed in livestock breeding and then adapted to plant breeding promised to overcome the drawbacks of MAS and significantly improve complicated traits controlled by gene/QTL with small effects. Large-scale deployment of GS in important crops, as well as simulation studies in a variety of contexts, addressed G × E interaction effects and non-additive effects, as well as lowering breeding costs and time. The current study provides a complete overview of genomic selection, its process, and importance in modern plant breeding, along with insights into its application. GS has been implemented in the improvement of complex traits including tolerance to biotic and abiotic stresses. Furthermore, this review hypothesises that using GS in conjunction with other crop improvement platforms accelerates the breeding process to increase genetic gain. The objective of this review is to highlight the development of an appropriate GS model, the global open source network for GS, and trans-disciplinary approaches for effective accelerated crop improvement. The current study focused on the application of data science, including machine learning and deep learning tools, to enhance the accuracy of prediction models. Present study emphasizes on developing plant breeding strategies centered on GS combined with routine conventional breeding principles by developing GS-SOP to achieve enhanced genetic gain.
Collapse
Affiliation(s)
- C Anilkumar
- ICAR-National Rice Research Institute, Cuttack, India
| | - N C Sunitha
- University of Agricultural Sciences, Bangalore, India
| | | | - S Ramesh
- University of Agricultural Sciences, Bangalore, India.
| |
Collapse
|
44
|
Chen S, Dai M, Hu J, Cheng J, Duan Y, Zou X, Su Y, Liu N, Jingesi M, Chen Z, Yin P, Huang S, He Q, Wang P. Evaluating the predictive ability of temperature-related indices on the stroke morbidity in Shenzhen, China: Under cross-validation methods framework. Sci Total Environ 2022; 838:156425. [PMID: 35660600 DOI: 10.1016/j.scitotenv.2022.156425] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Revised: 05/29/2022] [Accepted: 05/30/2022] [Indexed: 06/15/2023]
Abstract
BACKGROUND Composite temperature-related indices have been utilized to comprehensively reflect the impact of multiple meteorological factors on health. We aimed to evaluate the predictive ability of temperature-related indices, choose the best predictor of stroke morbidity, and explore the association between them. METHODS We built distributed lag nonlinear models to estimate the associations between temperature-related indices and stroke morbidity and then applied two types of cross-validation (CV) methods to choose the best predictor. The effects of this index on overall stroke, intracerebral hemorrhage (ICH), and ischemic stroke (IS) morbidity were explored and we explained how this index worked using heatmaps. Stratified analyses were conducted to identify vulnerable populations. RESULTS Among 12 temperature-related indices, the alternative temperature-humidity index (THIa) had the best overall performance in terms of root mean square error when combining the results from two CVs. With the median value of THIa (25.70 °C) as the reference, the relative risks (RRs) of low THIa (10th percentile) reached a maximum at lag 0-10, with RRs of 1.20 (95%CI:1.10-1.31), 1.49 (95%CI:1.29-1.73) and 1.12 (95%CI:1.03-1.23) for total stroke, ICH and IS, respectively. According to the THIa formula, we matched the effects of THIa on stroke under various combinations of temperature and relative humidity. We found that, although the low temperature (<20 °C) had the greatest adverse effect, the modification effect of humidity on it was not evident. In contrast, lower humidity could reverse the protective effect of temperature into a harmful effect at the moderate-high temperature (24 °C-27 °C). Stratification analyses showed that the female was more vulnerable to low THIa in IS. CONCLUSIONS THIa is the best temperature-related predictor of stroke morbidity. In addition to the most dangerous cold weather, the government should pay more attention to days with moderate-high temperature and low humidity, which have been overlooked in the past.
Collapse
Affiliation(s)
- Siyi Chen
- Department of Epidemiology and Biostatistics, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Mengyi Dai
- Department of Epidemiology and Biostatistics, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Jing Hu
- Department of Epidemiology and Biostatistics, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Jinquan Cheng
- Shenzhen Center for Disease Control and Prevention, Shenzhen, China
| | - Yanran Duan
- Department of Epidemiology and Biostatistics, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Xuan Zou
- Shenzhen Center for Disease Control and Prevention, Shenzhen, China
| | - Youpeng Su
- Department of Epidemiology and Biostatistics, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Ning Liu
- Department of Environment and Health, Shenzhen Center for Disease Control and Prevention, Shenzhen, China
| | - Maidina Jingesi
- Department of Epidemiology and Biostatistics, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Ziwei Chen
- Department of Epidemiology and Biostatistics, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Ping Yin
- Department of Epidemiology and Biostatistics, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Suli Huang
- Shenzhen Center for Disease Control and Prevention, Shenzhen, China
| | - Qingqing He
- School of Resource and Environmental Engineering, Wuhan University of Technology, Wuhan, China
| | - Peng Wang
- Department of Epidemiology and Biostatistics, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China.
| |
Collapse
|
45
|
Dramé M, Hombert V, Cantegrit E, Proye E, Godaert L. Derivation and validation of a 90-day unplanned hospital readmission score in older patients discharged form a geriatric ward. Eur Geriatr Med 2022. [PMID: 36040646 DOI: 10.1007/s41999-022-00687-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Accepted: 08/02/2022] [Indexed: 10/25/2022]
Abstract
PURPOSE To derive and validate a 90-day unplanned hospital readmission (UHR) score based on information available to non-hospital based care providers. METHODS Retrospective longitudinal study with cross-validation method. Participants were older adults (≥ 65 years) admitted to a geriatric short-stay department in a general hospital in France. Patients were split into a derivation cohort and a validation cohort. We recorded demographic information, medical history, and concurrent clinical characteristics. The main outcome was 90-day UHR. Data obtained from hospital discharge letters were used in a logistic regression model to construct a predictive score, and to identify risk groups for 90-day UHR. RESULTS In total, 750 and 250 aged adults were included in both the derivation and the validation cohorts. Mean age was 87.2 ± 5.2 years, most were women (68.1%). Independent risk factors for 90-day UHR were: use of mobility aids (p = .02), presence of dementia syndrome (p = .02), history of recent hospitalisation (p = .03), and discharge to domiciliary home (p = .005). From these four risk factors, three groups were determined: low-risk group (score < 4), medium-risk group (score between 4 and 6), and high-risk group (score ≥ 6). In the derivation cohort the 90-day UHR rates increased significantly across risk groups (14%, 22%, and 30%, respectively). The 90-day UHR score had the same discriminant power in the derivation cohort (c-statistic = 0.63) as in the validation cohort (c-statistic = 0.63). CONCLUSIONS This score makes it possible to identify aged adults at risk of 90-day UHR and to target multidisciplinary interventions to limit UHR for patients discharged from a Geriatric Short-Stay Unit.
Collapse
|
46
|
Coakley KJ, Sanford NA. Learning Atom Probe Tomography time-of-flight peaks for mass-to-charge ratio spectrometry. Ultramicroscopy 2022; 237:113521. [PMID: 35452870 PMCID: PMC9844238 DOI: 10.1016/j.ultramic.2022.113521] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2021] [Revised: 03/11/2022] [Accepted: 03/27/2022] [Indexed: 01/19/2023]
Abstract
In laser-assisted atom probe tomography, an important goal is to reconstruct the mass-to-charge ratio, (m/z), spectrum due to various ion species. In general, the probability mass function (pmf) associated with the time-of-flight (TOF) spectrum produced by each ion species is unknown and varies from species-to-species. Moreover, measuring pmfs for distinct ion species in calibration experiments is not practical. Here, we present a mixture model method to determine TOF pmfs that can vary from peak-to-peak. In this approach, we determine weights of candidate pmfs with a maximum likelihood method. In a proof-of-principle study, we apply our method to a TOF spectrum acquired from a silicon sample and determine intensity estimates of singly charged isotopes of silicon.
Collapse
Affiliation(s)
- Kevin J Coakley
- National Institute of Standards and Technology, 325 Broadway, Boulder CO 80305, USA.
| | - Norman A Sanford
- National Institute of Standards and Technology, 325 Broadway, Boulder CO 80305, USA.
| |
Collapse
|
47
|
Khan A, Sharma S, Chowdhury KR, Sharma P. A novel seasonal index-based machine learning approach for air pollution forecasting. Environ Monit Assess 2022; 194:429. [PMID: 35556182 DOI: 10.1007/s10661-022-10092-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/08/2021] [Accepted: 05/02/2022] [Indexed: 06/15/2023]
Abstract
Novel machine learning models (MLMs) using the seasonal indexing approach that captures the variation in air quality caused due to meteorological changes have been used to provide short-term, real-time forecasts of PM2.5 concentration for one of the most polluted air quality control regions (AQCR) in the capital city of Delhi. Two MLMs-multi-linear regression and random forest-have been developed for using time series data for 1-h and 24-h average PM2.5 concentration. Short-term, real-time forecasts have been made using the developed models. Various model performance evaluation indices indicate satisfactory model performance. R2 values for the hourly and daily models varied between 0.95 and 0.72 and between 0.76 and 0.68 for the 1st to 5th h/day, respectively. The lagged values of PM2.5 concentration (persistence) and the hourly and daily indices are the most influential variables for the forecasts for immediate time steps. In contrast, seasonal indices become more important with the forecasting time horizon. The developed models can be used for making short-term, real-time air quality forecasts and issuing a warning when the pollution levels go beyond acceptable limits.
Collapse
Affiliation(s)
- Adeel Khan
- Council On Energy, Environment and Water, New Delhi, 110016, India
| | - Sumit Sharma
- TERI, The Energy and Resources Institute, IHC Complex, Lodi Road, New Delhi, 110003, India.
| | | | - Prateek Sharma
- TERI School of Advanced Studies, New Delhi, 110070, India
| |
Collapse
|
48
|
Zeng L, Hang J, Wang X, Shao M. Influence of urban spatial and socioeconomic parameters on PM 2.5 at subdistrict level: A land use regression study in Shenzhen, China. J Environ Sci (China) 2022; 114:485-502. [PMID: 35459511 DOI: 10.1016/j.jes.2021.12.002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2021] [Revised: 11/21/2021] [Accepted: 12/08/2021] [Indexed: 06/14/2023]
Abstract
The intraurban distribution of PM2.5 concentration is influenced by various spatial, socioeconomic, and meteorological parameters. This study investigated the influence of 37 parameters on monthly average PM2.5 concentration at the subdistrict level with Pearson correlation analysis and land-use regression (LUR) using data from a subdistrict-level air pollution monitoring network in Shenzhen, China. Performance of LUR models is evaluated with leave-one-out-cross-validation (LOOCV) and holdout cross-validation (holdout CV). Pearson correlation analysis revealed that Normalized Difference Built-up Index, artificial land fraction, land surface temperature, and point-of-interest (POI) numbers of factories and industrial parks are significantly positively correlated with monthly average PM2.5 concentrations, while Normalized Difference Vegetation Index and Green View Factor show significant negative correlations. For the sparse national stations, robust LUR modelling may rely on a priori assumptions in direction of influence during the predictor selection process. The month-by-month spatial regression shows that RF models for both national stations and all stations show significantly inflated mean values of R2 compared with cross-validation results. For MLR models, inflation of both R2 and R2CV was detected when using only national stations and may indicate the restricted ability to predict spatial distribution of PM2.5 levels. Inflated within-sample R2 also exist in the spatiotemporal LUR models developed with only national stations, although not as significant as spatial LUR models. Our results suggest that a denser subdistrict level air pollutant monitoring network may improve the accuracy and robustness in intraurban spatial/spatiotemporal prediction of PM2.5 concentrations.
Collapse
Affiliation(s)
- Liyue Zeng
- School of Atmospheric Sciences, Sun Yat-sen University, and Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai), Zhuhai 519082, China; Key Laboratory of Tropical Atmosphere-Ocean System (Sun Yat-sen University), Ministry of Education, Zhuhai 519000, China; Guangdong Provincial Field Observation and Research Station for Climate Environment and Air Quality Change in the Pearl River Estuary, Guangzhou 510275, China
| | - Jian Hang
- School of Atmospheric Sciences, Sun Yat-sen University, and Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai), Zhuhai 519082, China; Key Laboratory of Tropical Atmosphere-Ocean System (Sun Yat-sen University), Ministry of Education, Zhuhai 519000, China; Guangdong Provincial Field Observation and Research Station for Climate Environment and Air Quality Change in the Pearl River Estuary, Guangzhou 510275, China.
| | - Xuemei Wang
- Institute for Environmental and Climate Research, Jinan University, Guangzhou 510632, China
| | - Min Shao
- Institute for Environmental and Climate Research, Jinan University, Guangzhou 510632, China
| |
Collapse
|
49
|
Abstract
Machine learning models may outperform traditional statistical regression algorithms for predicting clinical outcomes. Proper validation of building such models and tuning their underlying algorithms is necessary to avoid over-fitting and poor generalizability, which smaller datasets can be more prone to. In an effort to educate readers interested in artificial intelligence and model-building based on machine-learning algorithms, we outline important details on cross-validation techniques that can enhance the performance and generalizability of such models.
Collapse
Affiliation(s)
- Paris Charilaou
- Jill Roberts Center for Inflammatory Bowel Disease - Division of Gastroenterology & Hepatology, Weill Cornell Medicine, New York, NY 10021, United States
| | - Robert Battat
- Jill Roberts Center for Inflammatory Bowel Disease - Division of Gastroenterology & Hepatology, Weill Cornell Medicine, New York, NY 10021, United States
| |
Collapse
|
50
|
López-García D, Peñalver JMG, Górriz JM, Ruz M. MVPAlab: A machine learning decoding toolbox for multidimensional electroencephalography data. Comput Methods Programs Biomed 2022; 214:106549. [PMID: 34910975 DOI: 10.1016/j.cmpb.2021.106549] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/28/2021] [Revised: 10/30/2021] [Accepted: 11/17/2021] [Indexed: 06/14/2023]
Abstract
BACKGROUND AND OBJECTIVE The study of brain function has recently expanded from classical univariate to multivariate analyses. These multivariate, machine learning-based algorithms afford neuroscientists extracting more detailed and richer information from the data. However, the implementation of these procedures is usually challenging, especially for researchers with no coding experience. To address this problem, we have developed MVPAlab, a MATLAB-based, flexible decoding toolbox for multidimensional electroencephalography and magnetoencephalography data. METHODS The MVPAlab Toolbox implements several machine learning algorithms to compute multivariate pattern analyses, cross-classification, temporal generalization matrices and feature and frequency contribution analyses. It also provides access to an extensive set of preprocessing routines for, among others, data normalization, data smoothing, dimensionality reduction and supertrial generation. To draw statistical inferences at the group level, MVPAlab includes a non-parametric cluster-based permutation approach. RESULTS A sample electroencephalography dataset was compiled to test all the MVPAlab main functionalities. Significant clusters (p<0.01) were found for the proposed decoding analyses and different configurations, proving the software capability for discriminating between different experimental conditions. CONCLUSIONS This toolbox has been designed to include an easy-to-use and intuitive graphic user interface and data representation software, which makes MVPAlab a very convenient tool for users with few or no previous coding experience. In addition, MVPAlab is not for beginners only, as it implements several high and low-level routines allowing more experienced users to design their own projects in a highly flexible manner.
Collapse
Affiliation(s)
| | - José M G Peñalver
- Mind, Brain and Behavior Research Center, University of Granada, Spain
| | - Juan M Górriz
- Data Science & Computational Intelligence Institute, University of Granada, Spain
| | - María Ruz
- Mind, Brain and Behavior Research Center, Department of Experimental Psychology, University of Granada, Spain
| |
Collapse
|