1
|
Evaluation of different machine learning algorithms for extraction decision in orthodontic treatment. Orthod Craniofac Res 2024. [PMID: 38764408 DOI: 10.1111/ocr.12811] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/08/2024] [Indexed: 05/21/2024]
Abstract
INTRODUCTION The extraction decision significantly affects the treatment process and outcome. Therefore, it is crucial to make this decision with a more objective and standardized method. The objectives of this study were (1) to identify the best-performing model among seven machine learning (ML) models, which will standardize the extraction decision and serve as a guide for inexperienced clinicians, and (2) to determine the important variables for the extraction decision. METHODS This study included 1000 patients who received orthodontic treatment with or without extraction (500 extraction and 500 non-extraction). The success criteria of the study were the decisions made by the four experienced orthodontists. Seven ML models were trained using 36 variables; including demographic information, cephalometric and model measurements. First, the extraction decision was performed, and then the extraction type was identified. Accuracy and area under the curve (AUC) of the receiver operating characteristics (ROC) curve were used to measure the success of ML models. RESULTS The Stacking Classifier model, which consists of Gradient Boosted Trees, Support Vector Machine, and Random Forest models, showed the highest performance in extraction decision with 91.2% AUC. The most important features determining extraction decision were maxillary and mandibular arch length discrepancy, Wits Appraisal, and ANS-Me length. Likewise, the Stacking Classifier showed the highest performance with 76.3% accuracy in extraction type decisions. The most important variables for the extraction type decision were mandibular arch length discrepancy, Class I molar relationship, cephalometric overbite, Wits Appraisal, and L1-NB distance. CONCLUSION The Stacking Classifier model exhibited the best performance for the extraction decision. While ML models showed a high performance in extraction decision, they could not able to achieve the same level of performance in extraction type decision.
Collapse
|
2
|
An Ensemble Framework for Projecting the Impact of Lymphatic Filariasis Interventions Across Sub-Saharan Africa at a Fine Spatial Scale. Clin Infect Dis 2024; 78:S108-S116. [PMID: 38662704 PMCID: PMC11045016 DOI: 10.1093/cid/ciae071] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/28/2024] Open
Abstract
BACKGROUND Lymphatic filariasis (LF) is a neglected tropical disease targeted for elimination as a public health problem by 2030. Although mass treatments have led to huge reductions in LF prevalence, some countries or regions may find it difficult to achieve elimination by 2030 owing to various factors, including local differences in transmission. Subnational projections of intervention impact are a useful tool in understanding these dynamics, but correctly characterizing their uncertainty is challenging. METHODS We developed a computationally feasible framework for providing subnational projections for LF across 44 sub-Saharan African countries using ensemble models, guided by historical control data, to allow assessment of the role of subnational heterogeneities in global goal achievement. Projected scenarios include ongoing annual treatment from 2018 to 2030, enhanced coverage, and biannual treatment. RESULTS Our projections suggest that progress is likely to continue well. However, highly endemic locations currently deploying strategies with the lower World Health Organization recommended coverage (65%) and frequency (annual) are expected to have slow decreases in prevalence. Increasing intervention frequency or coverage can accelerate progress by up to 5 or 6 years, respectively. CONCLUSIONS While projections based on baseline data have limitations, our methodological advancements provide assessments of potential bottlenecks for the global goals for LF arising from subnational heterogeneities. In particular, areas with high baseline prevalence may face challenges in achieving the 2030 goals, extending the "tail" of interventions. Enhancing intervention frequency and/or coverage will accelerate progress. Our approach facilitates preimplementation assessments of the impact of local interventions and is applicable to other regions and neglected tropical diseases.
Collapse
|
3
|
An enhanced diabetes prediction amidst COVID-19 using ensemble models. Front Public Health 2023; 11:1331517. [PMID: 38155892 PMCID: PMC10754515 DOI: 10.3389/fpubh.2023.1331517] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Accepted: 11/21/2023] [Indexed: 12/30/2023] Open
Abstract
In the contemporary landscape of healthcare, the early and accurate prediction of diabetes has garnered paramount importance, especially in the wake of the COVID-19 pandemic where individuals with diabetes exhibit increased vulnerability. This research embarked on a mission to enhance diabetes prediction by employing state-of-the-art machine learning techniques. Initial evaluations highlighted the Support Vector Machines (SVM) classifier as a promising candidate with an accuracy of 76.62%. To further optimize predictions, the study delved into advanced feature engineering techniques, generating interaction and polynomial features that unearthed hidden patterns in the data. Subsequent correlation analyses, visualized through heatmaps, revealed significant correlations, especially with attributes like Glucose. By integrating the strengths of Decision Trees, Gradient Boosting, and SVM in an ensemble model, we achieved an accuracy of 93.2%, showcasing the potential of harmonizing diverse algorithms. This research offers a robust blueprint for diabetes prediction, holding profound implications for early diagnosis, personalized treatments, and preventive care in the context of global health challenges and with the goal of increasing life expectancy.
Collapse
|
4
|
SpiderLearner: An ensemble approach to Gaussian graphical model estimation. Stat Med 2023; 42:2116-2133. [PMID: 37004994 DOI: 10.1002/sim.9714] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2022] [Revised: 12/10/2022] [Accepted: 03/07/2023] [Indexed: 04/04/2023]
Abstract
Gaussian graphical models (GGMs) are a popular form of network model in which nodes represent features in multivariate normal data and edges reflect conditional dependencies between these features. GGM estimation is an active area of research. Currently available tools for GGM estimation require investigators to make several choices regarding algorithms, scoring criteria, and tuning parameters. An estimated GGM may be highly sensitive to these choices, and the accuracy of each method can vary based on structural characteristics of the network such as topology, degree distribution, and density. Because these characteristics are a priori unknown, it is not straightforward to establish universal guidelines for choosing a GGM estimation method. We address this problem by introducing SpiderLearner, an ensemble method that constructs a consensus network from multiple estimated GGMs. Given a set of candidate methods, SpiderLearner estimates the optimal convex combination of results from each method using a likelihood-based loss function.K $$ K $$ -fold cross-validation is applied in this process, reducing the risk of overfitting. In simulations, SpiderLearner performs better than or comparably to the best candidate methods according to a variety of metrics, including relative Frobenius norm and out-of-sample likelihood. We apply SpiderLearner to publicly available ovarian cancer gene expression data including 2013 participants from 13 diverse studies, demonstrating our tool's potential to identify biomarkers of complex disease. SpiderLearner is implemented as flexible, extensible, open-source code in the R package ensembleGGM at https://github.com/katehoffshutta/ensembleGGM.
Collapse
|
5
|
Classification of Monkeypox Images Using LIME-Enabled Investigation of Deep Convolutional Neural Network. Diagnostics (Basel) 2023; 13:diagnostics13091639. [PMID: 37175030 PMCID: PMC10178151 DOI: 10.3390/diagnostics13091639] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Revised: 03/04/2023] [Accepted: 03/07/2023] [Indexed: 05/15/2023] Open
Abstract
In this research, we demonstrate a Deep Convolutional Neural Network-based classification model for the detection of monkeypox. Monkeypox can be difficult to diagnose clinically in its early stages since it resembles both chickenpox and measles in symptoms. The early diagnosis of monkeypox helps doctors cure it more quickly. Therefore, pre-trained models are frequently used in the diagnosis of monkeypox, because the manual analysis of a large number of images is labor-intensive and prone to inaccuracy. Therefore, finding the monkeypox virus requires an automated process. The large layer count of convolutional neural network (CNN) architectures enables them to successfully conceptualize the features on their own, thereby contributing to better performance in image classification. The scientific community has recently articulated significant attention in employing artificial intelligence (AI) to diagnose monkeypox from digital skin images due primarily to AI's success in COVID-19 identification. The VGG16, VGG19, ResNet50, ResNet101, DenseNet201, and AlexNet models were used in our proposed method to classify patients with monkeypox symptoms with other diseases of a similar kind (chickenpox, measles, and normal). The majority of images in our research are collected from publicly available datasets. This study suggests an adaptive k-means clustering image segmentation technique that delivers precise segmentation results with straightforward operation. Our preliminary computational findings reveal that the proposed model could accurately detect patients with monkeypox. The best overall accuracy achieved by ResNet101 is 94.25%, with an AUC of 98.59%. Additionally, we describe the categorization of our model utilizing feature extraction using Local Interpretable Model-Agnostic Explanations (LIME), which provides a more in-depth understanding of particular properties that distinguish the monkeypox virus.
Collapse
|
6
|
Data Diversity in Convolutional Neural Network Based Ensemble Model for Diabetic Retinopathy. Biomimetics (Basel) 2023; 8:biomimetics8020187. [PMID: 37218773 DOI: 10.3390/biomimetics8020187] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2023] [Revised: 04/26/2023] [Accepted: 04/29/2023] [Indexed: 05/24/2023] Open
Abstract
The medical and healthcare domains require automatic diagnosis systems (ADS) for the identification of health problems with technological advancements. Biomedical imaging is one of the techniques used in computer-aided diagnosis systems. Ophthalmologists examine fundus images (FI) to detect and classify stages of diabetic retinopathy (DR). DR is a chronic disease that appears in patients with long-term diabetes. Unattained patients can lead to severe conditions of DR, such as retinal eye detachments. Therefore, early detection and classification of DR are crucial to ward off advanced stages of DR and preserve the vision. Data diversity in an ensemble model refers to the use of multiple models trained on different subsets of data to improve the ensemble's overall performance. In the context of an ensemble model based on a convolutional neural network (CNN) for diabetic retinopathy, this could involve training multiple CNNs on various subsets of retinal images, including images from different patients or those captured using distinct imaging techniques. By combining the predictions of these multiple models, the ensemble model can potentially make more accurate predictions than a single prediction. In this paper, an ensemble model (EM) of three CNN models is proposed for limited and imbalanced DR data using data diversity. Detecting the Class 1 stage of DR is important to control this fatal disease in time. CNN-based EM is incorporated to classify the five classes of DR while giving attention to the early stage, i.e., Class 1. Furthermore, data diversity is created by applying various augmentation and generation techniques with affine transformation. Compared to the single model and other existing work, the proposed EM has achieved better multi-class classification accuracy, precision, sensitivity, and specificity of 91.06%, 91.00%, 95.01%, and 98.38%, respectively.
Collapse
|
7
|
Assigning trend-based conservation status despite high uncertainty. CONSERVATION BIOLOGY : THE JOURNAL OF THE SOCIETY FOR CONSERVATION BIOLOGY 2023:e14084. [PMID: 36919474 DOI: 10.1111/cobi.14084] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Revised: 02/16/2023] [Accepted: 02/23/2023] [Indexed: 05/26/2023]
Abstract
Estimates of temporal trends in species' occupancy are essential for conservation policy and planning, but limitations to the data and models often result in very high trend uncertainty. A critical source of uncertainty that degrades scientific credibility is that caused by disagreement among studies or models. Modelers are aware of this uncertainty but usually only partially estimate it and communicate it to decision makers. At the same time, there is growing awareness that full disclosure of uncertainty is critical for effective translation of science into policies and plans. But what are the most effective approaches to estimating uncertainty and communicating uncertainty to decision makers? We explored how alternative approaches to estimating and communicating uncertainty of species trends could affect decisions concerning conservation status of freshwater fishes. We used ensemble models to propagate trend uncertainty within and among models and communicated this uncertainty with categorical distributions of trend direction and magnitude. All approaches were designed to fit an established decision-making system used to assign species conservation status by the New Zealand government. Our results showed how approaches that failed to fully disclose uncertainty, while simplifying the information presented, could hamper species conservation or lead to ineffective decisions. We recommend an approach that was recently used effectively to communicate trend uncertainty to a panel responsible for setting the conservation status of New Zealand's freshwater fishes.
Collapse
|
8
|
Designing a large-scale track-based monitoring program to detect changes in species distributions in arid Australia. ECOLOGICAL APPLICATIONS : A PUBLICATION OF THE ECOLOGICAL SOCIETY OF AMERICA 2023; 33:e2762. [PMID: 36218186 DOI: 10.1002/eap.2762] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/26/2021] [Revised: 04/27/2022] [Accepted: 07/06/2022] [Indexed: 06/16/2023]
Abstract
Monitoring trends in animal populations in arid regions is challenging due to remoteness and low population densities. However, detecting species' tracks or signs is an effective survey technique for monitoring population trends across large spatial and temporal scales. In this study, we developed a simulation framework to evaluate the performance of alternative track-based monitoring designs at detecting change in species distributions in arid Australia. We collated presence-absence records from 550 2-ha track-based plots for 11 vertebrates over 13 years and fitted ensemble species distribution models to predict occupancy in 2018. We simulated plausible changes in species' distributions over the next 15 years and, with estimates of detectability, simulated monitoring to evaluate the statistical power of three alternative monitoring scenarios: (1) where surveys were restricted to existing 2-ha plots, (2) where surveys were optimized to target all species equally, and (3) where surveys were optimized to target two species of conservation concern. Across all monitoring designs and scenarios, we found that power was higher when detecting increasing occupancy trends compared to decreasing trends owing to the relatively low levels of initial occupancy. Our results suggest that surveying 200 of the existing plots annually (with a small subset resurveyed twice within a year) will have at least an 80% chance of detecting 30% declines in occupancy for four of the five invasive species modeled and one of the six native species. This increased to 10 of the 11 species assuming larger (50%) declines. When plots were positioned to target all species equally, power improved slightly for most compared to the existing survey network. When plots were positioned to target two species of conservation concern (crest-tailed mulgara and dusky hopping mouse), power to detect 30% declines increased by 29% and 31% for these species, respectively, at the cost of reduced power for the remaining species. The effect of varying survey frequency depended on its trade-off with the number of sites sampled and requires further consideration. Nonetheless, our research suggests that track-based surveying is an effective and logistically feasible approach to monitoring broad-scale occupancy trends in desert species with both widespread and restricted distributions.
Collapse
|
9
|
IPPF-FE: an integrated peptide and protein function prediction framework based on fused features and ensemble models. Brief Bioinform 2023; 24:6834141. [PMID: 36403184 DOI: 10.1093/bib/bbac476] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Revised: 09/23/2022] [Accepted: 10/05/2022] [Indexed: 11/21/2022] Open
Abstract
The prediction of peptide and protein function is important for research and industrial applications, and many machine learning methods have been developed for this purpose. The existing models have encountered many challenges, including the lack of effective and comprehensive features and the limited applicability of each model. Here, we introduce an Integrated Peptide and Protein function prediction Framework based on Fused features and Ensemble models (IPPF-FE), which can accurately capture the relationship between features and labels. The results indicated that IPPF-FE outperformed existing state-of-the-art (SOTA) models on more than 8 different categories of peptide and protein tasks. In addition, t-distributed Stochastic Neighbour Embedding demonstrated the advantages of IPPF-FE. We anticipate that our method will become a versatile tool for peptide and protein prediction tasks and shed light on the future development of related models. The model is open source and available in the GitHub repository https://github.com/Luo-SynBioLab/IPPF-FE.
Collapse
|
10
|
Framework for environment perception: Ensemble method for vision-based scene understanding algorithms in agriculture. Front Robot AI 2023; 9:982581. [PMID: 36714805 PMCID: PMC9878339 DOI: 10.3389/frobt.2022.982581] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Accepted: 12/27/2022] [Indexed: 01/15/2023] Open
Abstract
The safe and reliable operation of autonomous agricultural vehicles requires an advanced environment perception system. An important component of perception systems is vision-based algorithms for detecting objects and other structures in the fields. This paper presents an ensemble method for combining outputs of three scene understanding tasks: semantic segmentation, object detection and anomaly detection in the agricultural context. The proposed framework uses an object detector to detect seven agriculture-specific classes. The anomaly detector detects all other objects that do not belong to these classes. In addition, the segmentation map of the field is utilized to provide additional information if the objects are located inside or outside the field area. The detections of different algorithms are combined at inference time, and the proposed ensemble method is independent of underlying algorithms. The results show that combining object detection with anomaly detection can increase the number of detected objects in agricultural scene images.
Collapse
|
11
|
Investigating the Bond Strength of FRP Laminates with Concrete Using LIGHT GBM and SHAPASH Analysis. Polymers (Basel) 2022; 14:polym14214717. [PMID: 36365710 PMCID: PMC9656809 DOI: 10.3390/polym14214717] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Revised: 09/28/2022] [Accepted: 10/13/2022] [Indexed: 11/06/2022] Open
Abstract
The corrosion of steel reinforcement necessitates regular maintenance and repair of a variety of reinforced concrete structures. Retrofitting of beams, joints, columns, and slabs frequently involves the use of fiber-reinforced polymer (FRP) laminates. In order to develop simple prediction models for calculating the interfacial bond strength (IBS) of FRP laminates on a concrete prism containing grooves, this research evaluated the nonlinear capabilities of three ensemble methods—namely, random forest (RF) regression, extreme gradient boosting (XGBoost), and Light Gradient Boosting Machine (LIGHT GBM) models—based on machine learning (ML). In the present study, the IBS was the desired variable, while the model comprised five input parameters: elastic modulus x thickness of FRP (EfTf), width of FRP plate (bf), concrete compressive strength (fc′), width of groove (bg), and depth of groove (hg). The optimal parameters for each ensemble model were selected based on trial-and-error methods. The aforementioned models were trained on 70% of the entire dataset, while the remaining data (i.e., 30%) were used for the validation of the developed models. The evaluation was conducted on the basis of reliable accuracy indices. The minimum value of correlation of determination (R2 = 0.82) was observed for the testing data of the RF regression model. In contrast, the highest (R2 = 0.942) was obtained for LIGHT GBM for the training data. Overall, the three models showed robust performance in terms of correlation and error evaluation; however, the trend of accuracy was obtained as follows: LIGHT GBM > XGBoost > RF regression. Owing to the superior performance of LIGHT GBM, it may be considered a reliable ML prediction technique for computing the bond strength of FRP laminates and concrete prisms. The performance of the models was further supplemented by comparing the slopes of regression lines between the observed and predicted values, along with error analysis (i.e., mean absolute error (MAE), and root-mean-square error (RMSE)), predicted-to-experimental ratio, and Taylor diagrams. Moreover, the SHAPASH analysis revealed that the elastic modulus x thickness of FRP and width of FRP plate are the factors most responsible for IBS in FRP.
Collapse
|
12
|
Prediction of Autogenous Shrinkage of Concrete Incorporating Super Absorbent Polymer and Waste Materials through Individual and Ensemble Machine Learning Approaches. MATERIALS (BASEL, SWITZERLAND) 2022; 15:7412. [PMID: 36363008 PMCID: PMC9656842 DOI: 10.3390/ma15217412] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/03/2022] [Revised: 10/14/2022] [Accepted: 10/19/2022] [Indexed: 06/16/2023]
Abstract
The use of superabsorbent polymers, sometimes known as SAP, is a tremendously efficacious method for reducing the amount of autogenous shrinkage (AS) that occurs in high-performance concrete. This study utilizes support vector regression (SVR) as a standalone machine-learning algorithm (MLA) which is then ensemble with boosting and bagging approaches to reduce the bias and overfitting issues. In addition, these ensemble methods are optimized with twenty sub-models with varying the nth estimators to achieve a robust R2. Moreover, modified bagging as random forest regression (RFR) is also employed to predict the AS of concrete containing supplementary cementitious materials (SCMs) and SAP. The data for modeling of AS includes water to cement ratio (W/C), water to binder ratio (W/B), cement, silica fume, fly ash, slag, the filer, metakaolin, super absorbent polymer, superplasticizer, super absorbent polymer size, curing time, and super absorbent polymer water intake. Statistical and k-fold validation is used to verify the validation of the data using MAE and RMSE. Furthermore, SHAPLEY analysis is performed on the variables to show the influential parameters. The SVM with AdaBoost and modified bagging (RF) illustrates strong models by delivering R2 of approximately 0.95 and 0.98, respectively, as compared to individual SVR models. An enhancement of 67% and 63% in the RF model, while in the case of SVR with AdaBoost, it was 47% and 36%, in RMSE and MAE of both models, respectively, when compared with the standalone SVR model. Thus, the impact of a strong learner can upsurge the efficiency of the model.
Collapse
|
13
|
A Stacked Generalization Model to Enhance Prediction of Earthquake-Induced Soil Liquefaction. SENSORS (BASEL, SWITZERLAND) 2022; 22:7292. [PMID: 36236392 PMCID: PMC9572518 DOI: 10.3390/s22197292] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/08/2022] [Revised: 09/16/2022] [Accepted: 09/20/2022] [Indexed: 06/16/2023]
Abstract
Earthquakes cause liquefaction, which disturbs the design phase during the building construction process. The potential of earthquake-induced liquefaction was estimated initially based on analytical and numerical methods. The conventional methods face problems in providing empirical formulations in the presence of uncertainties. Accordingly, machine learning (ML) algorithms were implemented to predict the liquefaction potential. Although the ML models perform well with the specific liquefaction dataset, they fail to produce accurate results when used on other datasets. This study proposes a stacked generalization model (SGM), constructed by aggregating algorithms with the best performances, such as the multilayer perceptron regressor (MLPR), support vector regression (SVR), and linear regressor, to build an efficient prediction model to estimate the potential of earthquake-induced liquefaction on settlements. The dataset from the Korean Geotechnical Information database system and the standard penetration test conducted on the 2016 Pohang earthquake in South Korea were used. The model performance was evaluated by using the R2 score, mean-square error (MSE), standard deviation, covariance, and root-MSE. Model validation was performed to compare the performance of the proposed SGM with SVR and MLPR models. The proposed SGM yielded the best performance compared with those of the other base models.
Collapse
|
14
|
Ensemble Models for Tick Vectors: Standard Surveys Compared with Convenience Samples. Diseases 2022; 10:32. [PMID: 35735632 PMCID: PMC9222110 DOI: 10.3390/diseases10020032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Revised: 05/14/2022] [Accepted: 06/03/2022] [Indexed: 11/29/2022] Open
Abstract
Ensembles of Species Distribution Models (SDMs) represent the geographic ranges of pathogen vectors by combining alternative analytical approaches and merging information on vector occurrences with more extensive environmental data. Biased collection data impact SDMs, regardless of the target species, but no studies have compared the differences in the distributions predicted by the ensemble models when different sampling frameworks are used for the same species. We compared Ensemble SDMs for two important Ixodid tick vectors, Amblyomma americanum and Ixodes scapularis in mainland Florida, USA, when inputs were either convenience samples of ticks, or collections obtained using the standard protocols promulgated by the U.S. Centers for Disease Control and Prevention. The Ensemble SDMs for the convenience samples and standard surveys showed only a slight agreement (Kappa = 0.060, A. americanum; 0.053, I. scapularis). Convenience sample SDMs indicated A. americanum and I. scapularis should be absent from nearly one third (34.5% and 30.9%, respectively) of the state where standard surveys predicted the highest likelihood of occurrence. Ensemble models from standard surveys predicted 81.4% and 72.5% (A. americanum and I. scapularis) of convenience sample sites. Omission errors by standard survey SDMs of the convenience collections were associated almost exclusively with either adjacency to at least one SDM, or errors in geocoding algorithms that failed to correctly locate geographic locations of convenience samples. These errors emphasize commonly overlooked needs to explicitly evaluate and improve data quality for arthropod survey data that are applied to spatial models.
Collapse
|
15
|
Ensemble Tree-Based Approach towards Flexural Strength Prediction of FRP Reinforced Concrete Beams. Polymers (Basel) 2022; 14:polym14071303. [PMID: 35406177 PMCID: PMC9003558 DOI: 10.3390/polym14071303] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2021] [Revised: 03/07/2022] [Accepted: 03/18/2022] [Indexed: 11/18/2022] Open
Abstract
Due to rise in infrastructure development and demand for seawater and sea sand concrete, fiber-reinforced polymer (FRP) rebars are widely used in the construction industry. Flexural strength is an important component of reinforced concrete structural design. Therefore, this research focuses on estimating the flexural capacity of FRP-reinforced concrete beams using novel artificial intelligence (AI) decision tree (DT) and gradient boosting tree (GBT) approaches. For this purpose, six input parameters, namely the area of bottom flexural reinforcement, depth of the beam, width of the beam, concrete compressive strength, the elastic modulus of FRP rebar, and the tensile strength of rebar at failure, are considered to predict the moment bearing capacity of the beam under bending loads. The models were trained using 60% of the database and were validated first-hand on the remaining 40% database employing the correlation coefficient (R), error indices namely, mean absolute error, root mean square error (MAE, RMSE) and slope of the regression line between observed and predicted results. The developed models were further validated using sensitivity and parametric analysis. Both models revealed comparable performance; however, based on the comparison of the slope of the validation data (0.83 for GBT model against 0.75 for the DT model) and higher R for the validation phase in case of the GBT model in comparison to the DT, the GBT model can be considered more accurate and robust. The sensitivity analysis yielded depth of the beam as the most influential parameter in contributing flexural strength of the beam, followed by the area of flexural reinforcement. The developed GBT model surpasses the existing gene expression programming (GEP) model in terms of accuracy; however, the current American Concrete Institute (ACI) model equations are more reliable than AI models in predicting the flexural strength of FRP-reinforced concrete beams.
Collapse
|
16
|
Deep Ensemble Model for COVID-19 Diagnosis and Classification Using Chest CT Images. BIOLOGY 2021; 11:43. [PMID: 35053041 PMCID: PMC8773139 DOI: 10.3390/biology11010043] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/19/2021] [Revised: 12/09/2021] [Accepted: 12/21/2021] [Indexed: 01/04/2023]
Abstract
Coronavirus disease 2019 (COVID-19) has spread worldwide, and medicinal resources have become inadequate in several regions. Computed tomography (CT) scans are capable of achieving precise and rapid COVID-19 diagnosis compared to the RT-PCR test. At the same time, artificial intelligence (AI) techniques, including machine learning (ML) and deep learning (DL), find it useful to design COVID-19 diagnoses using chest CT scans. In this aspect, this study concentrates on the design of an artificial intelligence-based ensemble model for the detection and classification (AIEM-DC) of COVID-19. The AIEM-DC technique aims to accurately detect and classify the COVID-19 using an ensemble of DL models. In addition, Gaussian filtering (GF)-based preprocessing technique is applied for the removal of noise and improve image quality. Moreover, a shark optimization algorithm (SOA) with an ensemble of DL models, namely recurrent neural networks (RNN), long short-term memory (LSTM), and gated recurrent unit (GRU), is employed for feature extraction. Furthermore, an improved bat algorithm with a multiclass support vector machine (IBA-MSVM) model is applied for the classification of CT scans. The design of the ensemble model with optimal parameter tuning of the MSVM model for COVID-19 classification shows the novelty of the work. The effectiveness of the AIEM-DC technique take place on benchmark CT image data set, and the results reported the promising classification performance of the AIEM-DC technique over the recent state-of-the-art approaches.
Collapse
|
17
|
White matter hyperintensities segmentation using an ensemble of neural networks. Hum Brain Mapp 2021; 43:929-939. [PMID: 34704337 PMCID: PMC8764480 DOI: 10.1002/hbm.25695] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2021] [Accepted: 10/08/2021] [Indexed: 11/30/2022] Open
Abstract
White matter hyperintensities (WMHs) represent the most common neuroimaging marker of cerebral small vessel disease (CSVD). The volume and location of WMHs are important clinical measures. We present a pipeline using deep fully convolutional network and ensemble models, combining U‐Net, SE‐Net, and multi‐scale features, to automatically segment WMHs and estimate their volumes and locations. We evaluated our method in two datasets: a clinical routine dataset comprising 60 patients (selected from Chinese National Stroke Registry, CNSR) and a research dataset composed of 60 patients (selected from MICCAI WMH Challenge, MWC). The performance of our pipeline was compared with four freely available methods: LGA, LPA, UBO detector, and U‐Net, in terms of a variety of metrics. Additionally, to access the model generalization ability, another research dataset comprising 40 patients (from Older Australian Twins Study and Sydney Memory and Aging Study, OSM), was selected and tested. The pipeline achieved the best performance in both research dataset and the clinical routine dataset with DSC being significantly higher than other methods (p < .001), reaching .833 and .783, respectively. The results of model generalization ability showed that the model trained on the research dataset (DSC = 0.736) performed higher than that trained on the clinical dataset (DSC = 0.622). Our method outperformed widely used pipelines in WMHs segmentation. This system could generate both image and text outputs for whole brain, lobar and anatomical automatic labeling WMHs. Additionally, software and models of our method are made publicly available at https://www.nitrc.org/projects/what_v1.
Collapse
|
18
|
Ensemble Models of Cutting-Edge Deep Neural Networks for Blood Glucose Prediction in Patients with Diabetes. SENSORS (BASEL, SWITZERLAND) 2021; 21:7090. [PMID: 34770397 PMCID: PMC8588394 DOI: 10.3390/s21217090] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/23/2021] [Revised: 10/15/2021] [Accepted: 10/20/2021] [Indexed: 12/21/2022]
Abstract
This article proposes two ensemble neural network-based models for blood glucose prediction at three different prediction horizons-30, 60, and 120 min-and compares their performance with ten recently proposed neural networks. The twelve models' performances are evaluated under the same OhioT1DM Dataset, preprocessing workflow, and tools at the three prediction horizons using the most common metrics in blood glucose prediction, and we rank the best-performing ones using three methods devised for the statistical comparison of the performance of multiple algorithms: scmamp, model confidence set, and superior predictive ability. Our analysis provides a comparison of the state-of-the-art neural networks for blood glucose prediction, estimating the model's error, highlighting those with the highest probability of being the best predictors, and providing a guide for their use in clinical practice.
Collapse
|
19
|
Modelling the spatial distribution of mycetoma in Sudan. Trans R Soc Trop Med Hyg 2021; 115:1144-1152. [PMID: 34037803 PMCID: PMC8486737 DOI: 10.1093/trstmh/trab076] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2021] [Revised: 04/18/2021] [Accepted: 04/26/2021] [Indexed: 11/12/2022] Open
Abstract
BACKGROUND Mycetoma is a neglected tropical disease that is reported worldwide and Sudan has the highest reported number of mycetoma infections across the globe. The incidence, prevalence and burden of mycetoma globally are not precisely known and its risk factors remain largely unelucidated. METHODS This study aimed to identify the environmental predictors of fungal and bacterial mycetoma in Sudan and to identify areas of the country where these niche predictors are met. Demographic and clinical data from confirmed mycetoma patients seen at the Mycetoma Research Centre from 1991 to 2018 were included in this study. Regression and machine learning techniques were used to model the relationships between mycetoma occurrence in Sudan and environmental predictors. RESULTS The strongest predictors of mycetoma occurrence were aridity, proximity to water, low soil calcium and sodium concentrations and the distribution of various species of thorny trees. The models predicted the occurrence of eumycetoma and actinomycetoma in the central and southeastern states of Sudan and along the Nile river valley and its tributaries. CONCLUSION Our results showed that the risk of mycetoma in Sudan varies geographically and is linked to identifiable environmental risk factors. Suitability maps are intended to guide health authorities, academic institutes and organisations involved in planning national scale surveys for early case detection and management, leading to better patient treatment, prevention and control of mycetoma.
Collapse
|
20
|
On the relationship between COVID-19 reported fatalities early in the pandemic and national socio-economic status predating the pandemic. AIMS Public Health 2021; 8:439-455. [PMID: 34395694 PMCID: PMC8334639 DOI: 10.3934/publichealth.2021034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Accepted: 05/18/2021] [Indexed: 11/28/2022] Open
Abstract
This study investigates the relationship between socio-economic determinants pre-dating the pandemic and the reported number of cases, deaths, and the ratio of deaths/cases in 199 countries/regions during the first months of the COVID-19 pandemic. The analysis is performed by means of machine learning methods. It involves a portfolio/ensemble of 32 interpretable models and considers the case in which the outcome variables (number of cases, deaths, and their ratio) are independent and the case in which their dependence is weighted based on geographical proximity. We build two measures of variable importance, the Absolute Importance Index (AII) and the Signed Importance Index (SII) whose roles are to identify the most contributing socio-economic factors to the variability of the COVID-19 pandemic. Our results suggest that, together with the established influence on cases and deaths of the level of mobility, the specific features of the health care system (smart/poor allocation of resources), the economy of a country (equity/non-equity), and the society (religious/not religious or community-based vs not) might contribute to the number of COVID-19 cases and deaths heterogeneously across countries.
Collapse
|
21
|
Flash-Flood Potential Mapping Using Deep Learning, Alternating Decision Trees and Data Provided by Remote Sensing Sensors. SENSORS 2021; 21:s21010280. [PMID: 33406613 PMCID: PMC7796316 DOI: 10.3390/s21010280] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/06/2020] [Revised: 12/02/2020] [Accepted: 12/22/2020] [Indexed: 11/23/2022]
Abstract
There is an evident increase in the importance that remote sensing sensors play in the monitoring and evaluation of natural hazards susceptibility and risk. The present study aims to assess the flash-flood potential values, in a small catchment from Romania, using information provided remote sensing sensors and Geographic Informational Systems (GIS) databases which were involved as input data into a number of four ensemble models. In a first phase, with the help of high-resolution satellite images from the Google Earth application, 481 points affected by torrential processes were acquired, another 481 points being randomly positioned in areas without torrential processes. Seventy percent of the dataset was kept as training data, while the other 30% was assigned to validating sample. Further, in order to train the machine learning models, information regarding the 10 flash-flood predictors was extracted in the training sample locations. Finally, the following four ensembles were used to calculate the Flash-Flood Potential Index across the Bâsca Chiojdului river basin: Deep Learning Neural Network–Frequency Ratio (DLNN-FR), Deep Learning Neural Network–Weights of Evidence (DLNN-WOE), Alternating Decision Trees–Frequency Ratio (ADT-FR) and Alternating Decision Trees–Weights of Evidence (ADT-WOE). The model’s performances were assessed using several statistical metrics. Thus, in terms of Sensitivity, the highest value of 0.985 was achieved by the DLNN-FR model, meanwhile the lowest one (0.866) was assigned to ADT-FR ensemble. Moreover, the specificity analysis shows that the highest value (0.991) was attributed to DLNN-WOE algorithm, while the lowest value (0.892) was achieved by ADT-FR. During the training procedure, the models achieved overall accuracies between 0.878 (ADT-FR) and 0.985 (DLNN-WOE). K-index shows again that the most performant model was DLNN-WOE (0.97). The Flash-Flood Potential Index (FFPI) values revealed that the surfaces with high and very high flash-flood susceptibility cover between 46.57% (DLNN-FR) and 59.38% (ADT-FR) of the study zone. The use of the Receiver Operating Characteristic (ROC) curve for results validation highlights the fact that FFPIDLNN-WOE is characterized by the most precise results with an Area Under Curve of 0.96.
Collapse
|
22
|
Comparative Analysis of Machine Learning Models for Nanofluids Viscosity Assessment. NANOMATERIALS 2020; 10:nano10091767. [PMID: 32906742 PMCID: PMC7558292 DOI: 10.3390/nano10091767] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/23/2020] [Revised: 08/29/2020] [Accepted: 08/31/2020] [Indexed: 11/16/2022]
Abstract
The process of selecting a nanofluid for a particular application requires determining the thermophysical properties of nanofluid, such as viscosity. However, the experimental measurement of nanofluid viscosity is expensive. Several closed-form formulas for calculating the viscosity have been proposed by scientists based on theoretical and empirical methods, but these methods produce inaccurate results. Recently, a machine learning model based on the combination of seven baselines, which is called the committee machine intelligent system (CMIS), was proposed to predict the viscosity of nanofluids. CMIS was applied on 3144 experimental data of relative viscosity of 42 different nanofluid systems based on five features (temperature, the viscosity of the base fluid, nanoparticle volume fraction, size, and density) and returned an average absolute relative error (AARE) of 4.036% on the test. In this work, eight models (on the same dataset as the one used in CMIS), including two multilayer perceptron (MLP), each with Nesterov accelerated adaptive moment (Nadam) optimizer; two MLP, each with three hidden layers and Adamax optimizer; a support vector regression (SVR) with radial basis function (RBF) kernel; a decision tree (DT); tree-based ensemble models, including random forest (RF) and extra tree (ET), were proposed. The performance of these models at different ranges of input variables was assessed and compared with the ones presented in the literature. Based on our result, all the eight suggested models outperformed the baselines used in the literature, and five of our presented models outperformed the CMIS, where two of them returned an AARE less than 3% on the test data. Besides, the physical validity of models was studied by examining the physically expected trends of nanofluid viscosity due to changing volume fraction.
Collapse
|
23
|
What and where? Predicting invasion hotspots in the Arctic marine realm. GLOBAL CHANGE BIOLOGY 2020; 26:4752-4771. [PMID: 32407554 PMCID: PMC7496761 DOI: 10.1111/gcb.15159] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/26/2019] [Revised: 05/03/2020] [Accepted: 05/04/2020] [Indexed: 06/11/2023]
Abstract
The risk of aquatic invasions in the Arctic is expected to increase with climate warming, greater shipping activity and resource exploitation in the region. Planktonic and benthic marine aquatic invasive species (AIS) with the greatest potential for invasion and impact in the Canadian Arctic were identified and the 23 riskiest species were modelled to predict their potential spatial distributions at pan-Arctic and global scales. Modelling was conducted under present environmental conditions and two intermediate future (2050 and 2100) global warming scenarios. Invasion hotspots-regions of the Arctic where habitat is predicted to be suitable for a high number of potential AIS-were located in Hudson Bay, Northern Grand Banks/Labrador, Chukchi/Eastern Bering seas and Barents/White seas, suggesting that these regions could be more vulnerable to invasions. Globally, both benthic and planktonic organisms showed a future poleward shift in suitable habitat. At a pan-Arctic scale, all organisms showed suitable habitat gains under future conditions. However, at the global scale, habitat loss was predicted in more tropical regions for some taxa, particularly most planktonic species. Results from the present study can help prioritize management efforts in the face of climate change in the Arctic marine ecosystem. Moreover, this particular approach provides information to identify present and future high-risk areas for AIS in response to global warming.
Collapse
|
24
|
Breast Cancer Histopathology Image Classification Using an Ensemble of Deep Learning Models. SENSORS 2020; 20:s20164373. [PMID: 32764398 PMCID: PMC7472736 DOI: 10.3390/s20164373] [Citation(s) in RCA: 43] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/10/2020] [Revised: 08/01/2020] [Accepted: 08/03/2020] [Indexed: 12/13/2022]
Abstract
Breast cancer is one of the major public health issues and is considered a leading cause of cancer-related deaths among women worldwide. Its early diagnosis can effectively help in increasing the chances of survival rate. To this end, biopsy is usually followed as a gold standard approach in which tissues are collected for microscopic analysis. However, the histopathological analysis of breast cancer is non-trivial, labor-intensive, and may lead to a high degree of disagreement among pathologists. Therefore, an automatic diagnostic system could assist pathologists to improve the effectiveness of diagnostic processes. This paper presents an ensemble deep learning approach for the definite classification of non-carcinoma and carcinoma breast cancer histopathology images using our collected dataset. We trained four different models based on pre-trained VGG16 and VGG19 architectures. Initially, we followed 5-fold cross-validation operations on all the individual models, namely, fully-trained VGG16, fine-tuned VGG16, fully-trained VGG19, and fine-tuned VGG19 models. Then, we followed an ensemble strategy by taking the average of predicted probabilities and found that the ensemble of fine-tuned VGG16 and fine-tuned VGG19 performed competitive classification performance, especially on the carcinoma class. The ensemble of fine-tuned VGG16 and VGG19 models offered sensitivity of 97.73% for carcinoma class and overall accuracy of 95.29%. Also, it offered an F1 score of 95.29%. These experimental results demonstrated that our proposed deep learning approach is effective for the automatic classification of complex-natured histopathology images of breast cancer, more specifically for carcinoma images.
Collapse
|
25
|
What is Machine Learning? A Primer for the Epidemiologist. Am J Epidemiol 2019; 188:2222-2239. [PMID: 31509183 DOI: 10.1093/aje/kwz189] [Citation(s) in RCA: 87] [Impact Index Per Article: 17.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2018] [Revised: 07/29/2019] [Accepted: 08/14/2019] [Indexed: 12/22/2022] Open
Abstract
Machine learning is a branch of computer science that has the potential to transform epidemiologic sciences. Amid a growing focus on "Big Data," it offers epidemiologists new tools to tackle problems for which classical methods are not well-suited. In order to critically evaluate the value of integrating machine learning algorithms and existing methods, however, it is essential to address language and technical barriers between the two fields that can make it difficult for epidemiologists to read and assess machine learning studies. Here, we provide an overview of the concepts and terminology used in machine learning literature, which encompasses a diverse set of tools with goals ranging from prediction to classification to clustering. We provide a brief introduction to 5 common machine learning algorithms and 4 ensemble-based approaches. We then summarize epidemiologic applications of machine learning techniques in the published literature. We recommend approaches to incorporate machine learning in epidemiologic research and discuss opportunities and challenges for integrating machine learning and existing epidemiologic research methods.
Collapse
|
26
|
Ensembles of Deep Learning Models and Transfer Learning for Ear Recognition. SENSORS 2019; 19:s19194139. [PMID: 31554303 PMCID: PMC6806105 DOI: 10.3390/s19194139] [Citation(s) in RCA: 43] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/20/2019] [Revised: 09/09/2019] [Accepted: 09/22/2019] [Indexed: 11/29/2022]
Abstract
The recognition performance of visual recognition systems is highly dependent on extracting and representing the discriminative characteristics of image data. Convolutional neural networks (CNNs) have shown unprecedented success in a variety of visual recognition tasks due to their capability to provide in-depth representations exploiting visual image features of appearance, color, and texture. This paper presents a novel system for ear recognition based on ensembles of deep CNN-based models and more specifically the Visual Geometry Group (VGG)-like network architectures for extracting discriminative deep features from ear images. We began by training different networks of increasing depth on ear images with random weight initialization. Then, we examined pretrained models as feature extractors as well as fine-tuning them on ear images. After that, we built ensembles of the best models to further improve the recognition performance. We evaluated the proposed ensembles through identification experiments using ear images acquired under controlled and uncontrolled conditions from mathematical analysis of images (AMI), AMI cropped (AMIC) (introduced here), and West Pomeranian University of Technology (WPUT) ear datasets. The experimental results indicate that our ensembles of models yield the best performance with significant improvements over the recently published results. Moreover, we provide visual explanations of the learned features by highlighting the relevant image regions utilized by the models for making decisions or predictions.
Collapse
|
27
|
Physically adjusted neutral detergent fiber system for lactating dairy cow rations. II: Development of feeding recommendations. J Dairy Sci 2017; 100:9569-9584. [PMID: 28987583 DOI: 10.3168/jds.2017-12766] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2017] [Accepted: 08/07/2017] [Indexed: 12/27/2022]
Abstract
The objective of this work was to leverage equations derived in a meta-analysis into an ensemble modeling system for estimating dietary physical and chemical characteristics required to maintain desired rumen conditions in lactating dairy cattle. Given the availability of data, responsiveness of ruminal pH to animal behaviors, and the chemical composition and physical form of the diet, mean ruminal pH was chosen as the primary rumen environment indicator. Physically effective fiber (peNDF) is defined as the fraction of neutral detergent fiber (NDF) that stimulates chewing activity and contributes to the floating mat of large particles in the rumen. The peNDF of feedstuffs is typically estimated by multiplying the NDF content by a particle size measure, resulting in an estimated index of effectiveness. We hypothesized that the utility of peNDF could be expanded and improved by dissociating NDF and particle size and considering other dietary factors, all integrated into a physically adjusted fiber system that can be used to estimate minimum particle sizes of TMR and diet compositions needed to maintain ruminal pH targets. Particle size measures of TMR were limited to those found with the Penn State particle separator (PSPS). Starting with specific diet characteristics, the system employed an ensemble of models that were integrated using a variable mixture of experts approach to generate more robust recommendations for the percentage of dietary DM material that should be retained on the 8-mm sieve of a PSPS. Additional continuous variables also integrated in the physically adjusted fiber system include the proportion of material (dry matter basis) retained on the 19- and 8-mm sieves of the PSPS, estimated mean particle size, the dietary concentrations of forage, forage NDF, starch, and NDF, and ruminally degraded starch and NDF. The system was able to predict that the minimum proportion of material (dry matter basis) retained on the 8-mm sieve should increase with decreasing forage NDF or dietary NDF. Additionally, the minimum proportion of dry matter material on the 8-mm sieve should increase with increasing dietary starch. Results of this study agreed with described interrelationships between the chemical and physical form of diets fed to dairy cows and quantified the links between NDF intake, diet particle size, and ruminal pH. Feeding recommendations can be interpolated from tables and figures included in this work.
Collapse
|
28
|
Nonparametric survival analysis using Bayesian Additive Regression Trees (BART). Stat Med 2016; 35:2741-53. [PMID: 26854022 DOI: 10.1002/sim.6893] [Citation(s) in RCA: 56] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2015] [Revised: 01/11/2016] [Accepted: 01/12/2016] [Indexed: 11/06/2022]
Abstract
Bayesian additive regression trees (BART) provide a framework for flexible nonparametric modeling of relationships of covariates to outcomes. Recently, BART models have been shown to provide excellent predictive performance, for both continuous and binary outcomes, and exceeding that of its competitors. Software is also readily available for such outcomes. In this article, we introduce modeling that extends the usefulness of BART in medical applications by addressing needs arising in survival analysis. Simulation studies of one-sample and two-sample scenarios, in comparison with long-standing traditional methods, establish face validity of the new approach. We then demonstrate the model's ability to accommodate data from complex regression models with a simulation study of a nonproportional hazards scenario with crossing survival functions and survival function estimation in a scenario where hazards are multiplicatively modified by a highly nonlinear function of the covariates. Using data from a recently published study of patients undergoing hematopoietic stem cell transplantation, we illustrate the use and some advantages of the proposed method in medical investigations. Copyright © 2016 John Wiley & Sons, Ltd.
Collapse
|
29
|
Modeling causes of death: an integrated approach using CODEm. Popul Health Metr 2012; 10:1. [PMID: 22226226 PMCID: PMC3315398 DOI: 10.1186/1478-7954-10-1] [Citation(s) in RCA: 278] [Impact Index Per Article: 23.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2011] [Accepted: 01/06/2012] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND Data on causes of death by age and sex are a critical input into health decision-making. Priority setting in public health should be informed not only by the current magnitude of health problems but by trends in them. However, cause of death data are often not available or are subject to substantial problems of comparability. We propose five general principles for cause of death model development, validation, and reporting. METHODS We detail a specific implementation of these principles that is embodied in an analytical tool - the Cause of Death Ensemble model (CODEm) - which explores a large variety of possible models to estimate trends in causes of death. Possible models are identified using a covariate selection algorithm that yields many plausible combinations of covariates, which are then run through four model classes. The model classes include mixed effects linear models and spatial-temporal Gaussian Process Regression models for cause fractions and death rates. All models for each cause of death are then assessed using out-of-sample predictive validity and combined into an ensemble with optimal out-of-sample predictive performance. RESULTS Ensemble models for cause of death estimation outperform any single component model in tests of root mean square error, frequency of predicting correct temporal trends, and achieving 95% coverage of the prediction interval. We present detailed results for CODEm applied to maternal mortality and summary results for several other causes of death, including cardiovascular disease and several cancers. CONCLUSIONS CODEm produces better estimates of cause of death trends than previous methods and is less susceptible to bias in model specification. We demonstrate the utility of CODEm for the estimation of several major causes of death.
Collapse
|