Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

Total Articles

47
(from Reference Citation Analysis)

Article PDFs (6)

Cited by > 0 (39)

Searched Name

Boosted regression trees

Ranked By

Results Analysis

Year Published Analysis
Article Type Analysis
Publication Title Analysis
Category Analysis

Results Analysis

Indexed Articles

Year Published

Show more Refine

Article Statistics

Refine

Publication Titles

Show more Refine

Grant Agencies

Show more Refine

Category

Show more Refine

Number	Citation Analysis
1	Spatial distribution and main drivers of soil selenium in Taihu Lake Basin, Southeast China. JOURNAL OF HAZARDOUS MATERIALS 2024;465:133091. [PMID: 38056274 DOI: 10.1016/j.jhazmat.2023.133091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Revised: 11/13/2023] [Accepted: 11/23/2023] [Indexed: 12/08/2023] Abstract Selenium (Se) is an essential micronutrient that is both hazardous and beneficial to living organisms. However, few studies have examined soil Se distribution and its driving mechanisms on a large basin scale. Thus, multivariate statistics, geostatistics, boosted regression trees, and structural equation models were used to investigate the spatial distribution, driving factors, and multivariate interactions of soil Se based on 1753 topsoil samples (0-20 cm) from the Taihu Lake Basin. The results indicated that the soil Se concentration ranged from 0.12 to 57.26 mg kg-1, with a mean value of 0.90 mg kg-1. Overall, the spatial pattern of soil Se gradually decreased from south to north with approximately 1.06% of the soil contaminated with Se. Moisture index (MI), soil moisture (SM), and ≥ 0 ℃ accumulative temperature (AAT0) were the main determinants of soil Se accumulation. Additionally, the substantial effect of SM∩AAT0 on soil Se concentrations demonstrated that climate-soil interactions largely governed the spatial pattern of soil Se. The Se-enriched and Se-contaminated soils occurred mainly in regions with high precipitation, MI, SM, AAT0, and soil organic matter. This study provides a theoretical basis and practical guidance for the remediation of soil Se contamination and the sustainable development of Se-enriched agriculture. Collapse Key Words Boosted regression trees Driving factors Multivariate interactions Soil selenium Spatial distribution Taihu Lake Basin Collapse MESH Headings Collapse Grants Collapse
2	Evaluation of the efficiency and drivers of complemented cropland in Southwest China over the past 30 years from the perspective of cropland abandonment. JOURNAL OF ENVIRONMENTAL MANAGEMENT 2024;351:119909. [PMID: 38154224 DOI: 10.1016/j.jenvman.2023.119909] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Revised: 11/27/2023] [Accepted: 12/11/2023] [Indexed: 12/30/2023] Abstract Complemented croplands are a crucial component of cropland resources and play a significant role in ensuring national food security. In recent decades, to counter the loss of prime farmland caused by urban construction, the Chinese government introduced a requisition-compensation balance policy, leading to the substantial expansion of new croplands. Therefore, there is an urgent need to determine whether these complemented croplands can be effectively used. Taking Southwest China as a case study, we used high-precision long-term land-use data from 1990 to 2020 to reveal the dynamics of complemented cropland utilization, evaluate the efficiency of complemented cropland utilization from the perspective of abandoned farmland, and identify the factors driving complemented cropland use efficiency based on more than 13 million land parcels. The results showed that: (1) From 1990 to 2020, complemented cropland amounted to approximately 1170.07 × 104 hm2, accounting for 32.67% of the total arable land area in 1990. The potential grain production capacity of these complemented croplands was significantly lower than that of base croplands. (2) The abandonment of complemented croplands was more serious than that of base croplands, and 47.03% of the complemented croplands experienced abandonment at least once during the study period, and the average efficiency of the complemented croplands was 75.61%. (3) The labor population ratio, elevation, and land parcel size played pivotal roles in influencing the complemented cropland utilization efficiency; however, there was substantial variation among the different provinces. Labor replacement, overcoming farming difficulties brought by mountainous terrain, and improving farmers' income are the keys to alleviating cropland abandonment in mountainous areas and improving cropland utilization efficiency. This study provides novel insights into the efficiency assessment and exploration of the mechanisms driving complemented croplands and can provide references for cropland management. Collapse Key Words Abandonment Boosted regression trees Complemented cropland Driving forces Use efficiency Collapse MESH Headings Conservation of Natural Resources/methods Agriculture/methods Farms Edible Grain China Collapse Grants Collapse
3	Strategic restoration planning for land birds in the Colorado River Delta, Mexico. JOURNAL OF ENVIRONMENTAL MANAGEMENT 2024;351:119755. [PMID: 38086116 DOI: 10.1016/j.jenvman.2023.119755] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Revised: 08/29/2023] [Accepted: 11/30/2023] [Indexed: 01/14/2024] Abstract Ecological restoration is an essential strategy for mitigating the current biodiversity crisis, yet restoration actions are costly. We used systematic conservation planning principles to design an approach that prioritizes restoration sites for birds and tested it in a riparian forest restoration program in the Colorado River Delta. Restoration goals were to maximize the abundance and diversity of 15 priority birds with a variety of habitat preferences. We built abundance models for priority birds based on the current landscape, and predicted bird distributions and relative abundances under a scenario of complete riparian forest restoration throughout our study area. Then, we used Zonation conservation planning software to rank this restored landscape based on core areas for all priority birds. The locations with the highest ranks represented the highest priorities for restoration and were located throughout the river reach. We optimized how much of the available landscape to restore by simulating restoration of the top 10-90% of ranked sites in 10% intervals. We found that total diversity was maximized when 40% of the landscape was restored, and mean relative abundance was maximized when 80% of the landscape was restored. The results suggest that complete restoration is not optimal for this community of priority birds and restoration of approximately 60% of the landscape would provide a balance between maximum relative abundance and diversity. Subsequent planning efforts will combine our results with an assessment of restoration costs to provide further decision support for the restoration-siting process. Our approach can be applied to any landscape-scale restoration program to improve the return on investment of limited economic resources for restoration. Collapse Key Words Boosted regression trees Lower Colorado river Riparian forest Strategic restoration planning Systematic conservation planning Zonation Collapse MESH Headings Animals Biodiversity Birds Conservation of Natural Resources/methods Ecosystem Mexico Rivers Collapse Grants Collapse
4	Physical, biological and anthropogenic drivers of spatial patterns of coral reef fish assemblages at regional and local scales. THE SCIENCE OF THE TOTAL ENVIRONMENT 2023;904:166695. [PMID: 37660823 DOI: 10.1016/j.scitotenv.2023.166695] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/19/2022] [Revised: 08/16/2023] [Accepted: 08/28/2023] [Indexed: 09/05/2023] Abstract Species abundance, diversity and community assemblage structure are determined by multiple physical, habitat and management drivers that operate across multiple spatial scales. Here we used a multi-scale coral reef monitoring dataset to examine regional and local differences in the abundance, species richness and composition of fish assemblages in no-take marine reserve (NTMR) and fished zones at four island groups in the Great Barrier Reef Marine Park, Australia. We applied boosted regression trees to quantify the influence of 20 potential drivers on the coral reef fish assemblages. Reefs in two locations, Magnetic Island and the Keppel Islands, had distinctive fish assemblages and low species richness, while the Palm and Whitsunday Islands had similar species composition and higher species richness. Overall, our analyses identified several important physical (temperature, wave exposure) and biological (coral, turf, macroalgal and unconsolidated substratum cover) drivers of inshore reef fish communities, some of which are being altered by human activities. Of these, sea surface temperature (SST) was more influential at large scales, while wave exposure was important both within and between island groups. Species richness declined with increasing macroalgal cover and exposure to cyclones, and increased with SST. Species composition was most strongly influenced by mean SST and percent cover of macroalgae. There was substantial regional variation in the local drivers of spatial patterns. Although NTMR zoning influenced total fish density in some regions, it had negligible effects on fish species richness, composition and trophic structure because of the relatively small number of species targeted by the fishery. These findings show that inshore reef fishes are directly influenced by disturbances typical of the nearshore Great Barrier Reef, highlighting the need to complement global action on climate change with more targeted localised efforts to maintain or improve the condition of coral reef habitats. Collapse Key Words Boosted regression trees Coral reef fish Environmental drivers Great Barrier Reef Inshore coral reefs No-take marine reserves Collapse MESH Headings Animals Humans Coral Reefs Biodiversity Ecosystem Anthozoa Australia Fishes Collapse Grants Collapse
5	Subjective age of acquisition norms for 1604 English words by Spanish L2 speakers of English and their relationship with lexico-semantic, affective, sociolinguistic and proficiency variables. Behav Res Methods 2023;55:4437-4454. [PMID: 36477592 PMCID: PMC10700429 DOI: 10.3758/s13428-022-02026-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/12/2022] [Indexed: 12/12/2022] Abstract Psycholinguistic studies have shown that there are many variables implicated in language comprehension and production. At the lexical level, subjective age of acquisition (AoA), the estimate of the age at which a word is acquired, is key for stimuli selection in psycholinguistic studies. AoA databases in English are often used when testing a variety of phenomena in second language (L2) speakers of English. However, these have limitations, as the norms are not provided by the target population (L2 speakers of English) but by native English speakers. In this study, we asked native Spanish L2 speakers of English to provide subjective AoA ratings for 1604 English words, and investigated whether factors related to 14 lexico-semantic and affective variables, both in Spanish and English, and to the speakers' profile (i.e., sociolinguistic variables and L2 proficiency), were related to the L2 AoA ratings. We used boosted regression trees, an advanced form of regression analysis based on machine learning and boosting algorithms, to analyse the data. Our results showed that the model accounted for a relevant proportion of deviance (58.56%), with the English AoA provided by native English speakers being the strongest predictor for L2 AoA. Additionally, L2 AoA correlated with L2 reaction times. Our database is a useful tool for the research community running psycholinguistic studies in L2 speakers of English. It adds knowledge about which factors-linked to the characteristics of both the linguistic stimuli and the speakers-affect L2 subjective AoA. The database and the data can be downloaded from: https://osf.io/gr8xd/?view_only=73b01dccbedb4d7897c8d104d3d68c46 . Collapse Key Words Age of acquisition Boosted regression trees Second language speakers Collapse MESH Headings Humans Semantics Language Psycholinguistics Reaction Time Databases, Factual Multilingualism Collapse Grants Universidad Autónoma de Madrid Collapse
6	Effects of multiple stressors on benthic invertebrates using Water Framework Directive monitoring data. THE SCIENCE OF THE TOTAL ENVIRONMENT 2023;878:162952. [PMID: 36948311 DOI: 10.1016/j.scitotenv.2023.162952] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Revised: 03/14/2023] [Accepted: 03/15/2023] [Indexed: 05/13/2023] Abstract Multiple stressors affect freshwater systems and cause a deficient ecological status according to the European Water Framework Directive (WFD). To select effective mitigation measures and improve the ecological status, knowledge on the stressor hierarchy and individual and joined effects is necessary. However, compared to common stressors like nutrient enrichment and morphological degradation, the relative importance of micropollutants such as pesticides and pharmaceuticals is largely unaddressed. We used WFD monitoring data from Saxony (Germany) to investigate the importance of 85 environmental variables (including 34 micropollutants) for 18 benthic invertebrate metrics at 108 sites. The environmental variables were assigned to five groups (natural factors, nutrient enrichment, metals, micropollutants and morphological degradation) and were ranked according to their relative importance as group and individually within and across groups using Principal Component Analyses (PCAs) and Boosted Regression Trees (BRTs). Overall, natural factors contributed the most to the total explained deviance of the models. This variable group represented not only typological differences between sampling sites but also a gradient of human impact by strongly anthropogenically influenced variables such as electric conductivity and dissolved oxygen. These large-scale effects can mask the individual importance of the other variable groups, which may act more specifically at a subset of sites. Accordingly, micropollutants were not represented by a few dominant variables but rather a diverse palette of different chemicals with similar contribution. As a group, micropollutants contributed similarly as metals, nutrient enrichment and morphological degradation. However, the importance of micropollutants might be underestimated due to limitations of the current chemical monitoring practices. Collapse Key Words Boosted regression trees Ecological status Nutrients Organic micropollutants Physicochemistry River morphology Collapse MESH Headings Collapse Grants Collapse
7	Exploring the worldwide impact of COVID-19 on conflict risk under climate change. Heliyon 2023;9:e17182. [PMID: 37332947 PMCID: PMC10256592 DOI: 10.1016/j.heliyon.2023.e17182] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2023] [Revised: 06/01/2023] [Accepted: 06/09/2023] [Indexed: 06/20/2023] Open Abstract Objectives Understand whether and how the COVID-19 pandemic affects the risk of different types of conflict worldwide in the context of climate change. Methodology Based on the database of armed conflict, COVID-19, detailed climate, and non-climate data covering the period 2020-2021, we applied Structural Equation Modeling specifically to reorganize the links between climate, COVID-19, and conflict risk. Moreover, we used the Boosted Regression Tree method to simulate conflict risk under the influence of multiple factors. Findings The transmission risk of COVID-19 seems to decrease as the temperature rises. Additionally, COVID-19 has a substantial worldwide impact on conflict risk, albeit regional and conflict risk variations exist. Moreover, when testing a one-month lagged effect, we find consistency across regions, indicating a positive influence of COVID-19 on demonstrations (protests and riots) and a negative relationship with non-state and violent conflict risk. Conclusion COVID-19 has a complex effect on conflict risk worldwide under climate change. Implications Laying the theoretical foundation of how COVID-19 affects conflict risk and providing some inspiration for the implementation of relevant policies. Collapse Key Words Boosted regression trees COVID-19 Causal link Conflict risk Structural equation model Collapse MESH Headings Collapse Grants Collapse
8	Natural and anthropogenic factors and their interactions drive stream community integrity in a North American river basin at a large spatial scale. THE SCIENCE OF THE TOTAL ENVIRONMENT 2022;835:155344. [PMID: 35460766 DOI: 10.1016/j.scitotenv.2022.155344] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/18/2022] [Revised: 04/12/2022] [Accepted: 04/13/2022] [Indexed: 06/14/2023] Abstract Urbanization, agriculture, and other human activities can exert considerable influence on the health and integrity of stream ecosystems. These influences vary greatly over space, time, and scale. We investigated trends in stream biotic integrity over 19 years (1997-2016) in relation to natural and anthropogenic factors in their spatial context using data from a stream biomonitoring program in a region dominated by agricultural land use. Macroinvertebrate and fish diversity and abundance data were used to calculate four multimetric indices (MMIs) that described biotic integrity of streams from 1997 to 2016. Boosted regression trees (BRT), a machine learning technique, were used to model how stream integrity responded to catchment-level natural and anthropogenic drivers including land use, human population density, road density, runoff potential, and natural factors such as latitude and elevation. Neither natural nor anthropogenic factors were consistently more influential on the MMIs. Macroinvertebrate indices were most responsive to time, latitude, elevation, and road density. Fish indices were driven mostly by latitude and longitude, with agricultural land cover among the most influential anthropogenic factors. We concluded that 1) stream biotic integrity was mostly stable in the study region from 1997 to 2016, although macroinvertebrate MMIs had decreased approximately 10% since 2010; 2) stream biotic integrity was driven by a mix of factors including geography, human activity, and variability over yearly time intervals; 3) MMI responses to environmental drivers were nonlinear and often nonmonotonic; 4) MMI composition could influence causal inferences; and 5) although our findings were mostly consistent with the literature on drivers of stream integrity, some commonly seen patterns were not evident. Our findings highlight the utility of large-scale, publicly available spatial data for understanding drivers of stream biodiversity and illustrate some potential pitfalls of large scale, integrative analyses. Collapse Key Words Agriculture Biotic index Boosted regression trees Land use Macroinvertebrates Stream biomonitoring Collapse MESH Headings Animals Anthropogenic Effects Ecosystem Environmental Monitoring/methods Fishes Invertebrates North America Rivers Collapse Grants Collapse
9	Human-induced arsenic pollution modeling in surface waters - An integrated approach using machine learning algorithms and environmental factors. JOURNAL OF ENVIRONMENTAL MANAGEMENT 2022;305:114347. [PMID: 34954681 DOI: 10.1016/j.jenvman.2021.114347] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/27/2021] [Revised: 11/20/2021] [Accepted: 12/18/2021] [Indexed: 06/14/2023] Abstract In recent years, assessment of sediment contamination by heavy metals, i.e., arsenic, has attracted the interest of scientists worldwide. The present study provides a new methodology to better understand the factors influencing surface water vulnerability to arsenic pollution by two advanced machine learning algorithms including boosted regression trees (BRT) and random forest (RF). Based on the sediment quality guidelines (Effects range low) polluted and non-polluted arsenic sediment samples were defined with concentrations >8 ppm and <8 ppm, respectively. Different conditioning factors such as topographical, lithology, erosion, hydrological, and anthropogenic factors were acquired to model surface waters' vulnerability to arsenic. We trained and validated the models using 70 and 30% of both polluted and non-polluted samples, respectively, and generated surface vulnerability maps. To verify the maps to arsenic pollution, the receiver operating characteristics (ROC) curve was implemented. The results approved the acceptable performance of the RF and BRT algorithms with an area under ROC values of 85% and 75.6%, respectively. Further, the findings showed higher importance of precipitation, slope aspect, distance from residential areas, and slope length in arsenic pollution in the modeling process. Erosion, lithology, and land use maps were introduced as the least important factors. The introduced methodology can be used to define the most vulnerable areas to arsenic pollution in advance and implement proper remediation actions to reduce the damages. Collapse Key Words Arsenic Boosted regression trees Modeling Pollution Random forest Collapse MESH Headings Algorithms Anthropogenic Effects Arsenic/analysis Environmental Monitoring Humans Machine Learning Collapse Grants Collapse
10	Community structure and environmental factors affecting diatom abundance and diversity in a Mediterranean climate river system. THE SCIENCE OF THE TOTAL ENVIRONMENT 2022;810:152366. [PMID: 34915010 DOI: 10.1016/j.scitotenv.2021.152366] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/26/2021] [Revised: 12/08/2021] [Accepted: 12/09/2021] [Indexed: 06/14/2023] Abstract Mediterranean climate river systems are among the most threatened ecosystems worldwide, due to a long history of anthropogenic impacts and alien invasive species introductions. Many of such rivers naturally exhibit a non-perennial flow regime, with distinct seasonal, inter-annual and spatial heterogeneity. The present study seeks to detect diatom community patterns and to understand the processes that cause these structures in an Austral Mediterranean river system among different months and river sections. In general, most environmental variables showed an increasing trend downstream for both months, with the exception of pH, dissolved oxygen, PO₄^3- and substrate embeddedness, which decreased downstream. A total of 110 diatom species between the two study months (October - 106 taxa; January - 78 taxa) were identified, dominated by 30 species with at least >2% abundance. Diatom community structure differed significantly across river zones, while no significant differences were observed between the study months. A boosted regression trees model showed that B (43.3%), Cu (20.8%), Fe (3.4%) and water depth (3.2%) were the most significant variables structuring diatoms. Diatom species communities reflected environmental variables (i.e., sediment and water chemistry) in this Mediterranean climate river system, as sediment metals such as B, Cu and Fe were found to be important in structuring diatom communities. Biotic influences from fish communities had little effect on diversity, but shifted diatom community structure. Therefore, the current study highlights how river systems have complex interactions that play an important role in determining diatom species composition. Collapse Key Words Biological invasion Boosted regression trees Diatoms Freshwater ecosystem Krom River Mediterranean climate Collapse MESH Headings Animals Anthropogenic Effects Diatoms Ecosystem Environmental Monitoring Rivers Collapse Grants Collapse
11	Age, time orientation and risk perception are major determinants of discretionary salt usage. Appetite 2022;171:105924. [PMID: 35031381 DOI: 10.1016/j.appet.2022.105924] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2021] [Revised: 12/29/2021] [Accepted: 01/08/2022] [Indexed: 11/02/2022] Abstract The present work explored the relationship between discretionary salt usage and personal characteristics, using boosted regression trees (BRT). Specifically, the focus was on how socio-demographic characteristics and personality traits linked to risk perception and time orientation impact on discretionary salt consumption patterns. For this purpose, an online cross-sectional survey with a convenience sample of 498 Uruguayan participants was carried out. Participants completed the consideration of future consequences (CFC) scale adapted for eating behaviour, a short survey about discretionary salt consumption patterns and indicated their degree of agreement with statements measuring perceived risk of sodium consumption. Finally, socio-demographic data were collected. BRT were applied to build predictive models that related discretionary salt usage to socio-demographic characteristics, the two factors of the CFC-Food scale (consideration of the future and consideration of the immediate consequences of eating behaviour), and the two factors of the perceived risk of sodium consumption scale (severity of perceived risks and risk compensation). Age, time orientation and perceived risk were the most relevant explanatory variables for discretionary salt usage. Older people had a lower likelihood of adding salt to food, either at home or when eating out. In addition, individuals who tend to be present rather than future oriented, as well as those with low perception of risk severity and susceptibility were more likely to add salt to foods. Results from the present work suggest that communication campaigns to reduce discretionary salt intake should mainly focus on stressing the short-term health benefits of reducing sodium intake and raising perceived susceptibility. Collapse Key Words Boosted regression trees Discretionary salt usage Individual differences Personal characteristics Sodium Collapse MESH Headings Collapse Grants Collapse
12	Spatiotemporal patterns and driving forces of remotely sensed urban agglomeration heat islands in South China. THE SCIENCE OF THE TOTAL ENVIRONMENT 2021;800:149499. [PMID: 34426306 DOI: 10.1016/j.scitotenv.2021.149499] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/24/2021] [Revised: 08/02/2021] [Accepted: 08/02/2021] [Indexed: 06/13/2023] Abstract Rapid urbanization and increasing population have widely caused the urban heat island effect. Due to the decreasing distance between cities, there is an urgent need to reevaluate regional heat island intensity (RHII) in an urban agglomeration scale by considering all cities together instead of from conventional single city perspective. Using cropland land surface temperature as the reference temperature, we assessed the diurnal and seasonal RHII variations in the Guangdong-Hong Kong-Macao Greater Bay Area (GBA) urban agglomeration in South China. The boosted regression trees (BRT) method was then used to analyze the relative influence and marginal effect of possible drivers to disentangle their underlying driving mechanisms. Results showed that the daytime RHII spatial patterns averaged over the period 2003-2017 illustrated higher intensity and greater heterogeneity than their nighttime counterparts, especially for the stronger RHII in the central GBA around the estuary area. Seasonal dynamics of daytime RHII displayed a generally descending trend from summer to winter, but the opposite for night. BRT analyses indicated that at both annual and seasonal scales, vegetation fraction and background temperature had a dominant influence on RHII in daytime and nighttime, respectively. RHII variations were also considerably attributed to other drivers for different seasons. For daytime RHII, the other influential drivers included anthropogenic heat emissions and precipitation in summer, anthropogenic heat emissions and terrain in the transition season, and temperature and albedo in winter. For nighttime RHII, anthropogenic heat emissions for all seasons, vegetation activities for summer and the transition season, and precipitation for winter also had important contributions. The marginal effects detected the different nonlinear responses of diurnal and seasonal RHII to potential drivers, suggesting contrasting driving mechanisms. Results of this study highlight more targeted and informed strategies for RHII mitigation in the GBA and provide helpful insights into RHI evaluation in other urban agglomerations. Collapse Key Words Boosted regression trees Driving forces Land surface temperature Regional heat island Spatiotemporal patterns Urban agglomeration Collapse MESH Headings China Cities Climate Change Environmental Monitoring Hot Temperature Urbanization Collapse Grants Collapse
13	Atmospheric water vapor and soil moisture jointly determine the spatiotemporal variations of CO₂ fluxes and evapotranspiration across the Qinghai-Tibetan Plateau grasslands. THE SCIENCE OF THE TOTAL ENVIRONMENT 2021;791:148379. [PMID: 34412395 DOI: 10.1016/j.scitotenv.2021.148379] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/08/2021] [Revised: 06/07/2021] [Accepted: 06/07/2021] [Indexed: 06/13/2023] Abstract Alpine grasslands play important functions in mitigating climate change and regulating water resources. However, the spatiotemporal variability of their carbon and water budgets remains unquantified. Here, 47 site-year observations of CO₂ and water vapor fluxes (ET) are analyzed at sites situated along a hydrothermal gradient across the Qinghai-Tibetan Plateau, including an alpine wetland (wettest), an alpine shrub (coldest), an alpine meadow, an alpine meadow-steppe, and an alpine steppe (driest and warmest). The results show that the benchmarks for annual net ecosystem exchange (NEE) are -79.3, -77.8, -66.7, 20.2, and 100.9 g C m^-2 year^-1 at the meadow, shrub, meadow-steppe, steppe, and wetland, respectively. The peak daily NEE normalized by peak leaf area index converges to 0.93 g C m^-2 d^-1 at the 5 sites. Except in the wetland (722.8 mm), the benchmarks of annual ET fluctuate from 511.0 mm in the steppe to 589.2 mm in the meadow. Boosted regression trees-based analysis suggests that the enhanced vegetation index (EVI) and net radiation (R_n) determine the variations of growing season monthly CO₂ fluxes and ET, respectively, although the effect is to some extent site-specific. Inter-annual variability in NEE, ecosystem respiration (RES), and ET are tightly (R² > 0.60) related to the inter-growing season NEE, RES, and ET, respectively. Both annual RES and annual NEE are significantly constrained by annual gross primary productivity (GPP), with 85% of the per-unit GPP contributing to RES (R² = 0.84) and 15% to NEE (R² = 0.12). Annual GPP significantly correlates with annual ET alone at the drier sites of the meadow-steppe and the steppe, suggesting the coupling of carbon and water is moisture-dependent in alpine grasslands. Over half of the inter-annual spatial variability in GPP, RES, NEE, and ET is explained by EVI, atmospheric water vapor, topsoil water content, and bulk surface resistance (r_s), respectively. Because the spatial variations of EVI and r_s are strongly regulated by atmospheric water vapor (R² = 0.48) and topsoil water content (R² = 0.54), respectively, we conclude that atmospheric water vapor and topsoil water content, rather than the expected air/soil temperatures, drive the spatiotemporal variations in CO₂ fluxes and ET across temperature-limited grasslands. These findings are critical for improving predictions of the carbon sequestration and water holding capacity of alpine grasslands. Collapse Key Words Alpine grasslands Boosted regression trees Bulk surface resistance CO(2) and water vapor exchanges Enhanced vegetation index Net radiation Collapse MESH Headings Carbon Dioxide Ecosystem Grassland Soil Steam Tibet Collapse Grants Collapse
14	Investigating the drivers of the spatio-temporal heterogeneity in COVID-19 hospital incidence-Belgium as a study case. Int J Health Geogr 2021;20:29. [PMID: 34127000 PMCID: PMC8200785 DOI: 10.1186/s12942-021-00281-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2021] [Accepted: 06/01/2021] [Indexed: 12/15/2022] Open Abstract BACKGROUND The COVID-19 pandemic is affecting nations globally, but with an impact exhibiting significant spatial and temporal variation at the sub-national level. Identifying and disentangling the drivers of resulting hospitalisation incidence at the local scale is key to predict, mitigate and manage epidemic surges, but also to develop targeted measures. However, this type of analysis is often not possible because of the lack of spatially-explicit health data and spatial uncertainties associated with infection. METHODS To overcome these limitations, we propose an analytical framework to investigate potential drivers of the spatio-temporal heterogeneity in COVID-19 hospitalisation incidence when data are only available at the hospital level. Specifically, the approach is based on the delimitation of hospital catchment areas, which allows analysing associations between hospitalisation incidence and spatial or temporal covariates. We illustrate and apply our analytical framework to Belgium, a country heavily impacted by two COVID-19 epidemic waves in 2020, both in terms of mortality and hospitalisation incidence. RESULTS Our spatial analyses reveal an association between the hospitalisation incidence and the local density of nursing home residents, which confirms the important impact of COVID-19 in elderly communities of Belgium. Our temporal analyses further indicate a pronounced seasonality in hospitalisation incidence associated with the seasonality of weather variables. Taking advantage of these associations, we discuss the feasibility of predictive models based on machine learning to predict future hospitalisation incidence. CONCLUSION Our reproducible analytical workflow allows performing spatially-explicit analyses of data aggregated at the hospital level and can be used to explore potential drivers and dynamic of COVID-19 hospitalisation incidence at regional or national scales. Collapse Key Words Belgium Boosted regression trees COVID-19 Hospitalisation incidence Spatial covariates Temporal covariates Collapse MESH Headings Aged Belgium/epidemiology COVID-19 Hospitals Humans Incidence Pandemics SARS-CoV-2 Spatio-Temporal Analysis Collapse Grants 874850 European Union's Horizon 2020 project MOOD 874850 European Union's Horizon 2020 project MOOD Fonds National de la Recherche Scientifique (FNRS, Belgium) Collapse
15	Taxonomic resolution affects host-parasite association model performance. Parasitology 2021;148:584-590. [PMID: 33342442 PMCID: PMC10950372 DOI: 10.1017/s0031182020002371] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2020] [Revised: 12/07/2020] [Accepted: 12/09/2020] [Indexed: 11/07/2022] Abstract Identifying the factors that structure host–parasite interactions is fundamental to understand the drivers of species distributions and to predict novel cross-species transmission events. More phylogenetically related host species tend to have more similar parasite associations, but parasite specificity may vary as a function of transmission mode, parasite taxonomy or life history. Accordingly, analyses that attempt to infer host−parasite associations using combined data on different parasite groups may perform quite differently relative to analyses on each parasite subset. In essence, are more data always better when predicting host−parasite associations, or does parasite taxonomic resolution matter? Here, we explore how taxonomic resolution affects predictive models of host−parasite associations using the London Natural History Museum's database of host–helminth interactions. Using boosted regression trees, we demonstrate that taxon-specific models (i.e. of Acanthocephalans, Nematodes and Platyhelminthes) consistently outperform full models in predicting mammal-helminth associations. At finer spatial resolutions, full and taxon-specific model performance does not vary, suggesting tradeoffs between phylogenetic and spatial scales of analysis. Although all models identify similar host and parasite covariates as important to such patterns, our results emphasize the importance of phylogenetic scale in the study of host–parasite interactions and suggest that using taxonomic subsets of data may improve predictions of parasite distributions and cross-species transmission. Predictive models of host–pathogen interactions should thus attempt to encompass the spatial resolution and phylogenetic scale desired for inference and prediction and potentially use model averaging or ensemble models to combine predictions from separately trained models. Collapse Key Words Boosted regression trees parasite macroecology phylogenetic scale Collapse MESH Headings Acanthocephala/physiology Animals Host-Parasite Interactions Mammals/parasitology Models, Biological Nematoda/physiology Phylogeny Platyhelminths/physiology Spatial Analysis Collapse Grants Collapse
16	Predicting deseasonalised serum 25 hydroxy vitamin D concentrations in the D-Health Trial: An analysis using boosted regression trees. Contemp Clin Trials 2021;104:106347. [PMID: 33684596 DOI: 10.1016/j.cct.2021.106347] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2020] [Revised: 02/16/2021] [Accepted: 03/01/2021] [Indexed: 10/22/2022] Abstract BACKGROUND The D-Health Trial aims to determine whether monthly high-dose vitamin D supplementation can reduce the mortality rate and prevent cancer. We did not have adequate statistical power for subgroup analyses, so could not justify the high cost of collecting blood samples at baseline. To enable future exploratory analyses stratified by baseline vitamin D status, we developed models to predict baseline serum 25 hydroxy vitamin D [25(OH)D] concentration. METHODS We used data and serum 25(OH)D concentrations from participants who gave a blood sample during the trial for compliance monitoring and were randomised to placebo. Data were partitioned into training (80%) and validation (20%) datasets. Deseasonalised serum 25(OH)D concentrations were dichotomised using cut-points of 50, 60 and 75 nmol/L. We fitted boosted regression tree models, based on 13 predictors, and evaluated model performance using the validation data. RESULTS The training and validation datasets had 1788 (10.5% <50 nmol/L, 23.1% <60 nmol, 48.8 <75 nmol/L) and 447 (11.9% <50 nmol/L, 25.7% <60 nmol/L, and 49.2% <75 nmol/L) samples, respectively. Ambient UV radiation and total intake of vitamin D were the strongest predictors of 'low' serum 25(OH)D concentration. The area under the receiver operating characteristic curves were 0.71, 0.70, and 0.66 for cut-points of <50, <60 and <75 nmol/L respectively. CONCLUSIONS We exploited compliance monitoring data to develop models to predict serum 25(OH)D concentration for D-Health participants at baseline. This approach may prove useful in other trial settings where there is an obstacle to exhaustive data collection. Collapse Key Words Boosted regression trees Prediction model Randomised clinical trial Vitamin D Collapse MESH Headings Collapse Grants Collapse
17	Investigating the spatio-temporal variability of soil organic carbon stocks in different ecosystems of China. THE SCIENCE OF THE TOTAL ENVIRONMENT 2021;758:143644. [PMID: 33248754 DOI: 10.1016/j.scitotenv.2020.143644] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/13/2020] [Revised: 11/06/2020] [Accepted: 11/07/2020] [Indexed: 06/12/2023] Abstract Soil organic carbon (SOC) significantly influences soil fertility, soil water holding capacity, and plant productivity. In this study, we applied two boosted regression tree (BRT) models to map SOC stocks across China in the 1980s and the 2010s. The models incorporated nine environmental variables (climate, topography, and biology) and 8897 (in the 1980s) and 4534 (in the 2010s) topsoil (0-20 cm) samples. During the two study periods, 20% of the soil samples were randomly selected for model testing, and the remaining samples were used as a training set to construct the models. The verification results showed that incorporating climate environment variables significantly improved the model prediction in both study periods. Mean annual temperature, mean annual precipitation, elevation, and the normalized difference vegetation index were the dominant environmental factors affecting the spatial distribution of SOC stocks. The full-variable model predicted similar spatial distributions of SOC stocks for the 1980s and the 2010s. SOC stocks in China showed an increasing trend over the past 30 years, from 3.9 kg m^-2 in the 1980s to 4.6 kg m^-2 in the 2010s. In both periods, topsoil SOC stocks were mainly stored in agroecosystems, forests, and grasslands in the 1980s, with values of 9.5, 12.0, and 11.4 Pg C, respectively. Our study provides reliable information on Chain's carbon distribution, which can be used by land managers and the national government to formulate relevant land use and carbon sequestration policies. Collapse Key Words Boosted regression trees Climate variables Soil organic carbon stocks Spatial variation Collapse MESH Headings Collapse Grants Collapse
18	Larval connectivity and water quality explain spatial distribution of crown-of-thorns starfish outbreaks across the Great Barrier Reef. ADVANCES IN MARINE BIOLOGY 2020;87:223-258. [PMID: 33293012 DOI: 10.1016/bs.amb.2020.08.007] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023] Abstract Outbreaks of the coral eating crown-of-thorns starfish (COTS; Acanthasts cf. solaris) occur in cyclical waves along the Great Barrier Reef (GBR), contributing significantly to the decline in hard coral cover over the past 30 years. One main difficulty faced by scientists and managers alike, is understanding the relative importance of contributing factors to COTS outbreaks such as increased nutrients and water quality, larval connectivity, fishing pressure, and abiotic conditions. We analysed COTS abundances from the most recent outbreak (2010-2018) using both boosted regression trees and generalised additive models to identify key predictors of COTS outbreaks. We used this approach to predict the suitability of each reef on the GBR for COTS outbreaks at three different levels: (1) reefs with COTS present intermittently (Presence); (2) reefs with COTS widespread and present in most samples and (Prevalence) (3) reefs experiencing outbreak levels of COTS (Outbreak). We also compared the utility of two auto-covariates accounting for spatial autocorrelation among observations, built using weighted inverse distance and weighted larval connectivity to reefs supporting COTS populations, respectively. Boosted regression trees (BRT) and generalised additive mixed models (GAMM) were combined in an ensemble model to reduce the effect of model uncertainty on predictions of COTS presence, prevalence and outbreaks. Our results from best performing models indicate that temperature (Degree Heating Week exposure: relative importance=13.1%) and flood plume exposure (13.0%) are the best predictors of COTS presence, variability in chlorophyll concentration (12.6%) and flood plume exposure (8.2%) best predicted COTS prevalence and larval connectivity potential (22.7%) and minimum sea surface temperature (8.0%) are the best predictors of COTS outbreaks. Whether the reef was open or closed to fishing, however, had no significant effect on either COTS presence, prevalence or outbreaks in BRT results (<0.5%). We identified major hotspots of COTS activity primarily on the mid shelf central GBR and on the southern Swains reefs. This study provides the first empirical comparison of the major hypotheses of COTS outbreaks and the first validated predictions of COTS outbreak potential at the GBR scale incorporating connectivity, nutrients, biophysical and spatial variables, providing a useful aid to management of this pest species on the GBR. Collapse Key Words Boosted regression trees Coral reefs Crown-of-thorns starfish Disturbance Outbreaks Pest management Population dynamics Species distribution modelling Collapse MESH Headings Animals Anthozoa Coral Reefs Disease Outbreaks Larva Starfish Water Quality Collapse Grants Collapse
19	Understanding the importance of spatial scale in the patterns of grassland invasions. THE SCIENCE OF THE TOTAL ENVIRONMENT 2020;727:138669. [PMID: 32325319 DOI: 10.1016/j.scitotenv.2020.138669] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/21/2020] [Revised: 04/04/2020] [Accepted: 04/10/2020] [Indexed: 06/11/2023] Abstract The invasion of alien plant species is a serious problem for conservation and the maintenance of biodiversity in grasslands. Therefore, it is important to find environmental factors correlated with the distribution of invasive species in such areas. In this study, we examined the impacts of environmental factors operating at different spatial scales on the distribution of invasive species. The study area were located in the Sudetes Mountains, Poland (3800 km²). We sampled field data from 163 random plots located in grassland, among which there were 94 plots with invasive species and 69 plots without invasive species. For each plot, we collected data on resident vegetation (species richness, community structure), geodiversity (topography, soil type), environmental heterogeneity (landscape structure) and climate (temperature and precipitation). Since the factors examined are likely to operate at different spatial scales, we calculated values of environmental variables with different spatial scopes (10m² plot and buffers with 50, 250 and 1250 m radii). The probability of invasive plant presence was modeled using boosted regression trees (BRT). The results of our study showed that the distribution of invasive species is explained by factors operated at different spatial scale: in the finer scale the presence of invasive species was driven predominantly by the average Ellenberg's Indicator Values for soil moisture, in medium-scale by the average topographic wetness index and sum of edges, while at coarse-scale by temperature. It was also presented that the effect of drivers operating at fine-spatial scale is overwhelming by effect of drivers operating at coarse scale. From a practical point of view, the results demonstrate that effective grassland management should be planned on a larger spatial context, because focussing on the management of a single site cannot be successful. Collapse Key Words Boosted regression trees Environmental factors Grasslands Plants invasion Spatial scale Collapse MESH Headings Biodiversity Ecosystem Grassland Introduced Species Poland Soil Collapse Grants Collapse
20	Modeling the impact of 2D/3D urban indicators on the urban heat island over different seasons: A boosted regression tree approach. JOURNAL OF ENVIRONMENTAL MANAGEMENT 2020;266:110424. [PMID: 32392133 DOI: 10.1016/j.jenvman.2020.110424] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/06/2019] [Revised: 03/01/2020] [Accepted: 03/08/2020] [Indexed: 06/11/2023] Abstract Understanding how complex urban factors affect the Urban Heat Island (UHI) is crucial for assessing the impacts of urban planning and environmental management on the thermal environment. This paper investigates the relationships between two-dimensional (2D) and three-dimensional (3D) factors and land surface temperatures (LST) within the Olympic Area of Beijing in different seasons, using the boosted regression tree (BRT) model. The BRT model captures the specific contributions of each urban factor to LST in each season and across a continuum of magnitudes for this factor. The results show that these relationships are complex and highly nonlinear. The four most common dominant factors are the Normalized Difference Built-up Index (NDBI), the Normalized Difference Vegetation Index (NDVI), a gravity index for parks (GPI), and average building height (BH). The most important factor in spring is NDBI, with a 45.5% contribution rate. In the other seasons, NDVI is the dominant factor, with contributions of 40% in summer, 21% in autumn, and 19% in winter. NDVI has an overall negative impact on LST in spring and summer, with a quadratic nonlinear decreasing curve, but a positive one in autumn and winter. The 2D land-use variables are most strongly related to LST in summer and spring, but 3D building-related variables have stronger impacts in colder weather. The Sky View Factor (SVF), a 3D measure of urban morphology, has also strong impacts in summer and winter. Both a building-based and a DSM-based SVFs are computed. The latter accounts for buildings, bridges, and trees. In contrast to a building-based SVF, the DSM-based SVF reduces LST when it varies between 0 and 0.75, reflecting the effects of high-density tree canopies that increase shades and evapotranspiration while blocking sky view. The marginal effect curves produced by the BRT are often characterized by thresholds. For instance, the maximal NDVI effect in summer takes place when NDVI = 0.7, suggesting that a very intense green coverage is not necessary to achieve maximal thermal results. Implications for urban planning and environmental management are outlined, including the increased use of evergreen trees that provide thermal benefits in both summer and winter. Collapse Key Words Boosted regression trees Different seasons Land surface temperatures Multi-dimensional urban factors Urban heat island Collapse MESH Headings Beijing Cities Environmental Monitoring Hot Temperature Islands Seasons Collapse Grants Collapse
21	Application of learning vector quantization and different machine learning techniques to assessing forest fire influence factors and spatial modelling. ENVIRONMENTAL RESEARCH 2020;184:109321. [PMID: 32199317 DOI: 10.1016/j.envres.2020.109321] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/27/2019] [Revised: 02/21/2020] [Accepted: 02/28/2020] [Indexed: 06/10/2023] Abstract This study assesses forest-fire susceptibility (FFS) in Fars Province, Iran using three geographic information system (GIS)-based machine-learning algorithms: boosted regression tree (BRT), general linear model (GLM), and mixture discriminant analysis (MDA). Recently, BRT, GLM, and MDA have become important machine-learning algorithms and their use has been enriched by application to various fields of research. A database of historical FFs identified using Landsat-8 OLI and MODIS satellite images (at 358 locations) and ten influencing factors (elevation, slope, topographical wetness index, aspect, distance from urban areas, annual mean temperature, land use, distance from road, annual mean rainfall, and distance from river) were input into a GIS. The 358 sites were divided into two sets for training (70%) and validation (30%). BRT, GLM, and MDA models were used to analyze the spatial relationships between the factors influencing FFs and the locations of fires to generate an FFS map. The prediction success of each modelled FFS map was determined with the help of the ROC curve, accuracy, overall accuracy, True-skill statistic (TSS), F-measures, corrected classify instances (CCI), and K-fold cross-validation (4-fold). The accuracy results of training and validation dataset in the BRT (AUC = 88.90% and 88.2%) and MDA (AUC = 86.4% and 85.6%) models are more effective than the GLM (AUC = 86.6% and 82.5%) model. Also, the outcome of the 4-fold measure confirmed the results from the other accuracy measures. Therefore, the accuracies of the BRT and MDA models are satisfactory and are suitable for FFS mapping in Fars Province. Finally, the well-accepted neural network application of learning-vector quantization (LVQ) reveals that land use, annual mean rainfall, and slope angle were the most useful determinants of FFS. The resulting FFS maps can enhance the effectiveness of planning and management of forest resources and ecological balances in this province. Collapse Key Words Boosted regression trees Forest-fire susceptibility map Generalized linear model Learning-vector quantization Mixture discriminant analysis Collapse MESH Headings Geographic Information Systems Iran Machine Learning Rivers Wildfires Collapse Grants Collapse
22	Evaluating the impact of land uses on stream integrity using machine learning algorithms. THE SCIENCE OF THE TOTAL ENVIRONMENT 2019;696:133858. [PMID: 31465920 DOI: 10.1016/j.scitotenv.2019.133858] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/12/2019] [Revised: 08/05/2019] [Accepted: 08/08/2019] [Indexed: 06/10/2023] Abstract A general pattern of declining aquatic ecological integrity with increasing urban land use has been well established for a number of watersheds worldwide. A more nuanced characterization of the influence of different urban land uses and the determination of cumulative thresholds will further inform watershed planning and management. To this end, we investigated the utility of two machine learning algorithms (Random Forests (RF) and Boosted Regression Trees (BRT)) to model stream impairment through multimetric macroinvertebrate index known as High Gradient Macroinvertebrate Index (HGMI) in an urbanizing watershed located in north-central New Jersey, United States. These machine learning algorithms were able to explain at least 50% of the variability of stream integrity based on watershed land use/land cover. While comparable in results, RF was found to be easier to train and was somewhat more robust to model overfitting compared to BRT. Our results document the influence of increasing high-medium density (> 30% Impervious Surface cover (ISC)), low density (15-30% ISC) urban and transitional/barren land had in negatively affecting stream biological integrity. The thresholds generated by partial plots suggest that the stream integrity decreased abruptly when the percentage of high-medium and low density urban, and transitional/barren land went above 10%, 8%, and 2% of the watershed, respectively. Additionally, when rural residential surpassed 30% threshold, it behaved similar to low density urban towards stream integrity. Identification of such cumulative thresholds can help watershed managers and policymakers to craft land use zoning regulations and design restoration programs that are grounded by objective scientific criteria. Collapse Key Words Boosted regression trees Eco-hydrology Machine learning algorithms Macroinvertebrate index Partial dependence plot Random forests Relative influence Spearman correlation matrix Stream integrity Collapse MESH Headings Collapse Grants Collapse
23	[Epidemiological characteristics and environmental risk factors of hemorrhagic fever with renal syndrome in Wei River basin, China, 2005-2015]. ZHONGHUA LIU XING BING XUE ZA ZHI = ZHONGHUA LIUXINGBINGXUE ZAZHI 2019;39:1159-1164. [PMID: 30293303 DOI: 10.3760/cma.j.issn.0254-6450.2018.09.004] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Abstract Objective: To understand the epidemiological characteristics of hemorrhagic fever with renal syndrome (HFRS) in Wei River Basin from 2005 to 2015, and analyze the environmental factors that cause the differences of spatial distribution. Methods: HFRS reported cases in Wei River Basin from 2005 to 2015 were collected form "National Disease Reporting Information System" , and the epidemiological features of HFRS were analyzed. Boosted regression trees (BRT) model was applied to evaluate the environment factors on the geographical distribution of HFRS in Wei River basin at 5 km×5 km gird scale. Results: The number of HFRS cases was 18 629, and the average annual incidence from 2005-2015 in Wei River basin was 7.24/100 000. The highest morbidity was 15.18/100 000 in 2012. The middle and lower reaches of Wei River basin had high incidence of HFRS, such as Xi'an, Weinan city. Patients' age was mainly between 16 to 60, and the largest morbidity occured in people over 60 years old. Boosted regression trees modle identified building land, farmland coverage percentage and altitude had higher contribution to the distribution of HFRS. Conclusions: The epidemiological characteristics of HFRS changed significantly. Patients older than 60 years old were having the highest incidence rates. Environmental factors such as buildup land, farmland and altitude played important roles in the geographical distribution of HFRS in the Wei River basin. Collapse Key Words Boosted regression trees Environmental risk factors Hemorrhagic fever with renal syndrome Collapse MESH Headings Collapse Grants Collapse
24	Modelling built infrastructure heights to evaluate common assumptions in aquatic conservation. JOURNAL OF ENVIRONMENTAL MANAGEMENT 2019;232:131-137. [PMID: 30471546 DOI: 10.1016/j.jenvman.2018.11.040] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/08/2018] [Revised: 11/07/2018] [Accepted: 11/13/2018] [Indexed: 06/09/2023] Abstract Built infrastructure, such as dams and weirs, are some of the most impactful stressors affecting aquatic ecosystems. However, data on the distribution and characteristics of small built infrastructure that often restrict fish movement, impede flows, and retain sediments and materials, remain limited. Collection of this necessary information is challenged by the large number of built infrastructure with unknown dimensions (e.g., height), which means scientists and practitioners need to make assumptions about these characteristics in research and decision-making. Evaluating these common assumptions is essential for advancing conservation that is more effective. We use a statistical modelling approach to double the number of small (≤5 m high) built infrastructure with height values in France. Using two scenarios depicting common assumptions (all infrastructure without height data are impassable, or all are passable for all species) and one based on our modelled heights, we demonstrate how assumptions can influence our understanding of river fragmentation. Assuming all built infrastructure without height data are passable results in a 5-fold reduction in estimated river fragmentation for fish species that cannot pass built infrastructure ≥1.0 m. The opposite is true for fish species that cannot pass ≥2.0 m, where assuming all built infrastructure without height data are impassable results in a 7-fold increase in fragmentation compared to the scenario with modelled heights to attribute built infrastructure passability. Our findings suggest that modelled height data leads to better understanding of river fragmentation, and that knowledge of different fish species' abilities to pass a variety of built infrastructure is essential to guide more effective management strategies. Our modelling approach, and results, are of particular relevance to regions where efforts to both remediate and remove built infrastructure is occurring, but where gaps in data on characteristics of built infrastructure remain, and limit effective decision making. Collapse Key Words Boosted regression trees Dams Fish Fragmentation Passability Rivers Collapse MESH Headings Animals Conservation of Natural Resources Ecosystem Fishes France Rivers Collapse Grants Collapse
25	Effects of urbanization on direct runoff characteristics in urban functional zones. THE SCIENCE OF THE TOTAL ENVIRONMENT 2018;643:301-311. [PMID: 29940442 DOI: 10.1016/j.scitotenv.2018.06.211] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/05/2017] [Revised: 05/21/2018] [Accepted: 06/17/2018] [Indexed: 06/08/2023] Abstract As urbanization processes, the increasing direct runoff caused by land use change has become a major challenge for urban hydrological system. In this study, the impact of urbanization on direct runoff in the Shenyang urban area was investigated using a modified Soil Conservation Service Curve Number model combined with remote sensing. Urban functional zone (UFZ) was used as the basic unit for hydrological analysis. The hydrological changes in runoff were analyzed by calculating the runoff difference between the current condition and the pre-urbanization condition. Moran's I was used to estimate the spatial autocorrelation of the entire area. Then we assessed the relative influence and marginal effects of factors affecting direct runoff using boosted regression trees (BRT). Our results showed that direct runoff was significantly related to urbanization. Under current conditions, direct runoff increment depth affected by urbanization in the study area was 68.02 mm. For different UFZs, high-density residential, business and industrial zones tended to have large runoff volumes and high runoff coefficients. Through flooding hazard analysis, we found about 6.53% of the study area fell into a significant hazard category. The industrial zone had largest area of significant hazard land (40.97 km²) and the business zone had the largest significant hazard percentage (21.19%). Moran's I results illustrated that the high-high clusters in Shenyang were mainly concentrated in the urban center. BRT analysis indicated that runoff had the strongest correlation with rainfall (52.07%), followed by impervious ratio (27.28%), normalized difference vegetation index (14.31%), antecedent 5-day rainfall (3.02%), and UFZs (1.70%). The industrial zone, business zone and high-density residential zone tend to have greater influence on runoff. Our study could present method for recognizing hotspots of direct runoff in large city, and may provide potential implications for green infrastructure selection and urban planning. Collapse Key Words Boosted regression trees Direct runoff Hydrology Urban functional zones Urbanization Collapse MESH Headings China Cities Environmental Monitoring Hydrology Models, Theoretical Rain Urbanization Collapse Grants Collapse
26	Are all data useful? Inferring causality to predict flows across sewer and drainage systems using directed information and boosted regression trees. WATER RESEARCH 2018;145:697-706. [PMID: 30216864 DOI: 10.1016/j.watres.2018.09.009] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/11/2018] [Revised: 08/03/2018] [Accepted: 09/04/2018] [Indexed: 06/08/2023] Abstract As more sensor data become available across urban water systems, it is often unclear which of these new measurements are actually useful and how they can be efficiently ingested to improve predictions. We present a data-driven approach for modeling and predicting flows across combined sewer and drainage systems, which fuses sensor measurements with output of a large numerical simulation model. Rather than adjusting the structure and parameters of the numerical model, as is commonly done when new data become available, our approach instead learns causal relationships between the numerically-modeled outputs, distributed rainfall measurements, and measured flows. By treating an existing numerical model - even one that may be outdated - as just another data stream, we illustrate how to automatically select and combine features that best explain flows for any given location. This allows for new sensor measurements to be rapidly fused with existing knowledge of the system without requiring recalibration of the underlying physics. Our approach, based on Directed Information (DI) and Boosted Regression Trees (BRT), is evaluated by fusing measurements across nearly 30 rain gages, 15 flow locations, and the outputs of a numerical sewer model in the city of Detroit, Michigan: one of the largest combined sewer systems in the world. The results illustrate that the Boosted Regression Trees provide skillful predictions of flow, especially when compared to an existing numerical model. The innovation of this paper is the use of the Directed Information step, which selects only those inputs that are causal with measurements at locations of interest. Better predictions are achieved when the Directed Information step is used because it reduces overfitting during the training phase of the predictive algorithm. In the age of "big water data", this finding highlights the importance of screening all available data sources before using them as inputs to data-driven models, since more may not always be better. We discuss the generalizability of the case study and the requirements of transferring the approach to other systems. Collapse Key Words Boosted regression trees Causality Data-driven model Directed information Flow prediction Collapse MESH Headings Cities Michigan Models, Theoretical Rain Collapse Grants Collapse
27	Distribution, habitat associations, and conservation status updates for the pilose crayfish Pacifastacus gambelii (Girard, 1852) and Snake River pilose crayfish Pacifastacus connectens (Faxon, 1914) of the western United States. PeerJ 2018;6:e5668. [PMID: 30280038 PMCID: PMC6166635 DOI: 10.7717/peerj.5668] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2018] [Accepted: 08/29/2018] [Indexed: 11/23/2022] Open Abstract Our study evaluates the distribution, habitat associations, and current conservation status of the Snake River pilose crayfish Pacifastacus connectens (Faxon, 1914) and pilose crayfish Pacifastacus gambelii (Girard, 1852), two little-studied and data-deficient species endemic to the western United States. We first developed a species distribution model (SDM) for the pilose crayfishes based on their historical occurrence records using boosted regression trees and freshwater GIS data layers. We then sampled 163 sites in the summers of 2016 and 2017 within the distribution of these crayfishes, including 50 where these species were observed historically. We next compared our field results to modeled predictions of suitable habitat from the SDM. Our SDM predicted 73 sites (45%) we sampled as suitable for the pilose crayfishes, with a moderate AUC value of 0.824. The pilose crayfishes were generally predicted to occur in larger streams and rivers with less extreme upstream temperature and precipitation seasonality. We found the pilose crayfishes at only 20 (12%) of the 163 total sites we sampled, 14 (20%) of the 73 sites predicted as suitable for them by our SDM, and 12 (24%) of 50 historical sites that we sampled. We found the invasive virile crayfish Faxonius virilis (Hagen, 1870) at 22 sites total and 12 (24%) historical sites for the pilose crayfishes, and we found the “native invader” signal crayfish Pacifastacus leniusculus (Dana, 1852) at 29 sites total and 6 (12%) historical sites for the pilose crayfishes. We subsequently used a single classification tree to identify factors associated with our high rate of false positives for contemporary pilose crayfish distributions relative to our SDM. This classification tree identified the presence of invasive crayfishes, impairment of the benthic community, and sampling method as some of the factors differentiating false positives relative to true positives for the pilose crayfishes. Our study identified the historical distribution and habitat associations for P. connectens and P. gambelii using an SDM and contrasted this prediction to results of contemporary field sampling. We found that the pilose crayfishes have seemingly experienced substantial range declines, attributable to apparent displacement by invasive crayfishes and impairment or change to stream communities and habitat. We recommend increased conservation and management attention to P. connectens and P. gambelii in response to these findings. Collapse Key Words Boosted regression trees Ecological niche model Exotic species Faxonius virilis Invasive species Pacifastacus leniusculus Signal crayfish Species distribution modeling Virile crayfish Collapse MESH Headings Collapse Grants Collapse
28	Environmental conditions and herbivore biomass determine coral reef benthic community composition: implications for quantitative baselines. CORAL REEFS (ONLINE) 2018;37:1157-1168. [PMID: 30930680 PMCID: PMC6404665 DOI: 10.1007/s00338-018-01737-w] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/28/2018] [Accepted: 09/20/2018] [Indexed: 05/30/2023] Abstract Our ability to understand natural constraints on coral reef benthic communities requires quantitative assessment of the relative strengths of abiotic and biotic processes across large spatial scales. Here, we combine underwater images, visual censuses and remote sensing data for 1566 sites across 34 islands spanning the central-western Pacific Ocean, to empirically assess the relative roles of abiotic and grazing processes in determining the prevalence of calcifying organisms and fleshy algae on coral reefs. We used regression trees to identify the major predictors of benthic composition and to test whether anthropogenic stress at inhabited islands decouples natural relationships. We show that sea surface temperature, wave energy, oceanic productivity and aragonite saturation strongly influence benthic community composition; overlooking these factors may bias expectations of calcified reef states. Maintenance of grazing biomass above a relatively low threshold (~ 10-20 kg ha^-1) may also prevent transitions to algal-dominated states, providing a tangible management target for rebuilding overexploited herbivore populations. Biophysical relationships did not decouple at inhabited islands, indicating that abiotic influences remain important macroscale processes, even at chronically disturbed reefs. However, spatial autocorrelation among inhabited reefs was substantial and exceeded abiotic and grazing influences, suggesting that natural constraints on reef benthos were superseded by unmeasured anthropogenic impacts. Evidence of strong abiotic influences on reef benthic communities underscores their importance in specifying quantitative targets for coral reef management and restoration that are realistic within the context of local conditions. Collapse Key Words Abiotic forcing Biophysical Boosted regression trees Decoupling Grazing Macroecology Spatial scale Top-down control Collapse MESH Headings Collapse Grants Collapse
29	Understory upheaval: factors influencing Japanese stiltgrass invasion in forestlands of Tennessee, United States. BOTANICAL STUDIES 2018;59:20. [PMID: 30083978 PMCID: PMC6079111 DOI: 10.1186/s40529-018-0236-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/30/2017] [Accepted: 08/02/2018] [Indexed: 06/08/2023] Abstract BACKGROUND Invasions by non-native plants contribute to loss of ecosystem biodiversity and productivity, modification of biogeochemical cycles, and inhibition of natural regeneration of native species. Japanese stiltgrass (Microstegium vimineum (Trin.) A. Campus) is one of the most prevalent invasive grasses in the forestlands of Tennessee, United States. We measured the extent of invasion, identified potential factors affecting invasion, and quantified the relative importance of each factor. We analyzed field data collected by the Forest Inventory and Analysis Program of the U.S. Forest Service to measure the extent of invasion from 2005 to 2011 and identified potential factors affecting invasion during this period using boosted regression trees. RESULTS Our results indicated that presence of Japanese stiltgrass on sampled plots increased 50% (from 269 to 404 plots) during the time period. The probability of invasion was correlated with one landscape condition (elevation) (20.5%) and five forest features (including tree species diversity, basal area, stand age, site productivity, and natural regeneration) (79.5%). Boosted regression trees identified the most influential (highly correlated) variables as tree species diversity (30.7%), basal area (22.9%), elevation (20.5%), and stand age (16.7%). Our results suggest that Japanese stiltgrass is likely to continue its invasion in Tennessee forests. CONCLUSIONS The present model, in addition to correlating the probability of Japanese stiltgrass invasions with current climatic conditions and landscape attributes, could aid in the on-going development of control strategies for confronting Japanese stiltgrass invasions by identifying vulnerable areas that might emerge as a result of likely changes in climatic conditions and land use patterns. Collapse Key Words Biological invasion Boosted regression trees Forest management Japanese stiltgrass, Microstegium vimineum Likelihood of invasion Collapse MESH Headings Collapse Grants USDA National Institute of Food and Agriculture Collapse
30	The spread of mosquito-borne viruses in modern times: A spatio-temporal analysis of dengue and chikungunya. Spat Spatiotemporal Epidemiol 2018;26:113-125. [PMID: 30390927 DOI: 10.1016/j.sste.2018.06.002] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/29/2017] [Revised: 01/12/2018] [Accepted: 06/08/2018] [Indexed: 01/06/2023] Abstract Since the 1970s, mosquito-borne pathogens have spread to previously disease-free areas, as well as causing increased illness in endemic areas. In particular, dengue and chikungunya viruses, transmitted primarily by Aedes aegypti and secondarily by Aedes albopictus mosquitoes, represent a threat for up to a third of the world population, and are a growing public health concern. In this study, we assess the spatial and temporal factors related to the occurrences of historic dengue and chikungunya outbreaks in 76 nations focused geographically on the Indian Ocean, with outbreak data from 1959 to 2009. First, we describe the historical spatial and temporal patterns of outbreaks of dengue and chikungunya in the focal nations. Second, we use a boosted regression tree approach to assess the statistical relationships of nations' concurrent outbreak occurrences and annual occurrences with their spatial proximity to prior infections and climatic and socio-economic characteristics. We demonstrate that higher population density and shorter distances among nations with outbreaks are the dominant factors that characterize both dengue and chikungunya outbreaks. In conclusion, our analysis provides crucial insights, which can be applied to improve nations' surveillance and preparedness for future vector-borne disease epidemics. Collapse Key Words Boosted regression trees Chikungunya virus Dengue virus Indian Ocean Vector-borne diseases Collapse MESH Headings Collapse Grants Collapse
31	Factors controlling the three-decade long rise in cyanobacteria biomass in a eutrophic shallow lake. THE SCIENCE OF THE TOTAL ENVIRONMENT 2018;621:352-359. [PMID: 29190558 DOI: 10.1016/j.scitotenv.2017.11.250] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/24/2017] [Revised: 10/25/2017] [Accepted: 11/21/2017] [Indexed: 06/07/2023] Abstract We aimed at quantifying the importance of limnological variables in the decadal rise of cyanobacteria biomass in shallow hemiboreal lakes. We constructed estimates of cyanobacteria (blue-green algae) biomass in a large, eutrophic lake (Estonia, Northeastern Europe) from a database comprising 28 limnological variables and spanning more than 50years of monitoring. Using a dual-model approach consisting in a boosted regression trees (BRT) followed by a generalized least squares (GLS) model, our results revealed that six variables were most influential for assessing the variance of cyanobacteria biomass. Cyanobacteria response to nitrate concentration and rotifer abundance was negative, whereas it was positive to pH, temperature, cladoceran and copepod biomass. Response to total phosphorus (TP) and total phosphorus to total nitrogen ratio was very weak, which suggests that actual in-lake TP concentration is still above limiting values. The most efficient GLS model, which explained nearly two thirds (r²=0.65) of the variance of cyanobacteria biomass included nitrate concentration, water temperature and pH. The very high number of observations (maximum n=525) supports the robustness of the models. Our results suggest that the decadal rise of blue-green algae in shallow lakes lies in the interaction between cultural eutrophication and global warming which bring in-lake physical and chemical conditions closer to cyanobacteria optima. Collapse Key Words Boosted regression trees Estonia Generalized least squares model Nitrates Phytoplankton Zooplankton Collapse MESH Headings Animals Biomass Cladocera Copepoda Cyanobacteria/growth & development Estonia Eutrophication Lakes/microbiology Phosphorus/analysis Rotifera Collapse Grants Collapse
32	Urban wild boars prefer fragmented areas with food resources near natural corridors. THE SCIENCE OF THE TOTAL ENVIRONMENT 2018;615:282-288. [PMID: 28982077 DOI: 10.1016/j.scitotenv.2017.09.277] [Citation(s) in RCA: 74] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/08/2017] [Revised: 09/25/2017] [Accepted: 09/25/2017] [Indexed: 06/07/2023] Abstract Wild boar populations are expanding throughout the world and intruding into periurban and urban areas. In the last years, wild boar has colonized several European cities, including our study area, the city of Barcelona. It is required to identify the main factors driving wild boar into urban areas prior to establish management measures. We built Boosted Regression Trees (BRT) using 3148 wild boar presences registered in the urban area of Barcelona from 2010 to 2014 to identify the variables correlated with these presences. The variables analysed included proxies for distance to source population, urban food resources, climate and urban habitat structure. Wild boars enter the urban area from close natural habitat using corridors such as streams, preferably in fragmented urban environment, looking for food such as urban green areas or dry pet food from cat colonies. Wild boar presence is higher in spring possibly due to the births of piglets and the dispersion of yearlings during that season, and also when natural resources in the Mediterranean habitat fail to satisfy the nutritional requirements of the wild boar population during the summer season. Management measures derived from this study are currently being applied in the city of Barcelona, including vegetation clearings in the wild boar entrance areas and an awareness campaign aimed at reducing the anthropogenic food availability for wild boars. The methodology used can be applied to other cities with wild boar or even other wildlife species issues. The comparison between the factors attracting wild boars into different urban areas would be helpful to understand the global phenomenon. Collapse Key Words Boosted regression trees Mediterranean climate Native invader Sus scrofa Urban area Wildlife management Collapse MESH Headings Animals Animals, Wild Cities Diet/veterinary Ecosystem Food Seasons Spain Sus scrofa Collapse Grants Collapse
33	Taxonomic affinity, habitat and seed mass strongly predict seed desiccation response: a boosted regression trees analysis based on 17 539 species. ANNALS OF BOTANY 2018;121:71-83. [PMID: 29267906 PMCID: PMC5786232 DOI: 10.1093/aob/mcx128] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/01/2017] [Accepted: 09/27/2017] [Indexed: 05/26/2023] Abstract BACKGROUND AND AIMS Seed desiccation response plays an important role in plant regeneration ecology, and has significant implications for species conservation. The majority of seed plants produce desiccation-tolerant (orthodox) seeds, whilst comparatively few produce desiccation-sensitive (recalcitrant) seeds that are unable to survive dehydration, and which cannot be conserved in traditional seed banks. This study develops a set of models to predict seed desiccation response in unstudied species. METHODS Taxonomy, trait, location and climate data were compiled to form a global data set of 17 539 species. Three boosted regression trees models were then developed to predict species' seed desiccation responses based on habitat and trait information for the species, and the seed desiccation responses of close relatives (either members of the same genus, family or order, depending on the model). Ten-fold cross-validation was used to test model predictive success. The utility of the models was then demonstrated by predicting seed desiccation response for two floras: Ecuador, and Britain and Ireland. KEY RESULTS The three models had varying success rates for identifying the desiccation-sensitive species: 89 % for the genus-level model, 79 % for the family-level model and 60 % for the order-level model. The most important predictor variables were the seed desiccation responses of a species' relatives, seed mass and annual precipitation. It is predicted that 10 % of seed plants from Ecuador and 1.2 % of those from Britain and Ireland produce desiccation-sensitive seeds. Due to data availability, prediction accuracy is likely to be higher for the British and Irish flora, where it is estimated that a desiccation-sensitive species had a 96.7 % chance of being correctly identified, compared with 80.8 % in the Ecuador flora. CONCLUSIONS These models can utilize existing data to predict species' likely seed desiccation responses, providing a gap-filling tool for global studies of plant traits, as well as critical decision-making support for plant conservation activities. Collapse Key Words Boosted regression trees plant development and life-history traits recalcitrant seeds seed desiccation sensitivity seed functional traits seed storage behaviour Collapse MESH Headings Desiccation Ecosystem Models, Theoretical Seeds/anatomy & histology Seeds/physiology Collapse Grants Collapse
34	A hybrid machine learning model to predict and visualize nitrate concentration throughout the Central Valley aquifer, California, USA. THE SCIENCE OF THE TOTAL ENVIRONMENT 2017;601-602:1160-1172. [PMID: 28599372 DOI: 10.1016/j.scitotenv.2017.05.192] [Citation(s) in RCA: 53] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/06/2017] [Revised: 05/19/2017] [Accepted: 05/20/2017] [Indexed: 05/22/2023] Abstract Intense demand for water in the Central Valley of California and related increases in groundwater nitrate concentration threaten the sustainability of the groundwater resource. To assess contamination risk in the region, we developed a hybrid, non-linear, machine learning model within a statistical learning framework to predict nitrate contamination of groundwater to depths of approximately 500m below ground surface. A database of 145 predictor variables representing well characteristics, historical and current field and landscape-scale nitrogen mass balances, historical and current land use, oxidation/reduction conditions, groundwater flow, climate, soil characteristics, depth to groundwater, and groundwater age were assigned to over 6000 private supply and public supply wells measured previously for nitrate and located throughout the study area. The boosted regression tree (BRT) method was used to screen and rank variables to predict nitrate concentration at the depths of domestic and public well supplies. The novel approach included as predictor variables outputs from existing physically based models of the Central Valley. The top five most important predictor variables included two oxidation/reduction variables (probability of manganese concentration to exceed 50ppb and probability of dissolved oxygen concentration to be below 0.5ppm), field-scale adjusted unsaturated zone nitrogen input for the 1975 time period, average difference between precipitation and evapotranspiration during the years 1971-2000, and 1992 total landscape nitrogen input. Twenty-five variables were selected for the final model for log-transformed nitrate. In general, increasing probability of anoxic conditions and increasing precipitation relative to potential evapotranspiration had a corresponding decrease in nitrate concentration predictions. Conversely, increasing 1975 unsaturated zone nitrogen leaching flux and 1992 total landscape nitrogen input had an increasing relative impact on nitrate predictions. Three-dimensional visualization indicates that nitrate predictions depend on the probability of anoxic conditions and other factors, and that nitrate predictions generally decreased with increasing groundwater age. Collapse Key Words Boosted regression trees Groundwater Machine learning Modeling Nitrate Collapse MESH Headings Collapse Grants Collapse
35	Mapping regional risks from climate change for rainfed rice cultivation in India. AGRICULTURAL SYSTEMS 2017;156:76-84. [PMID: 28867871 PMCID: PMC5555444 DOI: 10.1016/j.agsy.2017.05.009] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/11/2016] [Revised: 10/26/2016] [Accepted: 05/20/2017] [Indexed: 06/07/2023] Abstract Global warming is predicted to increase in the future, with detrimental consequences for rainfed crops that are dependent on natural rainfall (i.e. non-irrigated). Given that many crops grown under rainfed conditions support the livelihoods of low-income farmers, it is important to highlight the vulnerability of rainfed areas to climate change in order to anticipate potential risks to food security. In this paper, we focus on India, where ~ 50% of rice is grown under rainfed conditions, and we employ statistical models (climate envelope models (CEMs) and boosted regression trees (BRTs)) to map changes in climate suitability for rainfed rice cultivation at a regional level (~ 18 × 18 km cell resolution) under projected future (2050) climate change (IPCC RCPs 2.6 and 8.5, using three GCMs: BCC-CSM1.1, MIROC-ESM-CHEM, and HadGEM2-ES). We quantify the occurrence of rice (whether or not rainfed rice is commonly grown, using CEMs) and rice extent (area under cultivation, using BRTs) during the summer monsoon in relation to four climate variables that affect rice growth and yield namely ratio of precipitation to evapotranspiration (PER), maximum and minimum temperatures (Tmax and Tmin ), and total rainfall during harvesting. Our models described the occurrence and extent of rice very well (CEMs for occurrence, ensemble AUC = 0.92; BRTs for extent, Pearson's r = 0.87). PER was the most important predictor of rainfed rice occurrence, and it was positively related to rainfed rice area, but all four climate variables were important for determining the extent of rice cultivation. Our models project that 15%-40% of current rainfed rice growing areas will be at risk (i.e. decline in climate suitability or become completely unsuitable). However, our models project considerable variation across India in the impact of future climate change: eastern and northern India are the locations most at risk, but parts of central and western India may benefit from increased precipitation. Hence our CEM and BRT models agree on the locations most at risk, but there is less consensus about the degree of risk at these locations. Our results help to identify locations where livelihoods of low-income farmers and regional food security may be threatened in the next few decades by climate changes. The use of more drought-resilient rice varieties and better irrigation infrastructure in these regions may help to reduce these impacts and reduce the vulnerability of farmers dependent on rainfed cropping. Collapse Key Words Boosted regression trees Climate envelope model India Rainfed rice biomod2 Collapse MESH Headings Collapse Grants Collapse
36	Spatial prediction and validation of zoonotic hazard through micro-habitat properties: where does Puumala hantavirus hole - up? BMC Infect Dis 2017;17:523. [PMID: 28747170 PMCID: PMC5530527 DOI: 10.1186/s12879-017-2618-z] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2016] [Accepted: 07/18/2017] [Indexed: 01/12/2023] Open Abstract Background To predict the risk of infectious diseases originating in wildlife, it is important to identify habitats that allow the co-occurrence of pathogens and their hosts. Puumala hantavirus (PUUV) is a directly-transmitted RNA virus that causes hemorrhagic fever in humans, and is carried and transmitted by the bank vole (Myodes glareolus). In northern Sweden, bank voles undergo 3–4 year population cycles, during which their spatial distribution varies greatly. Methods We used boosted regression trees; a technique inspired by machine learning, on a 10 – year time-series (fall 2003–2013) to develop a spatial predictive model assessing seasonal PUUV hazard using micro-habitat variables in a landscape heavily modified by forestry. We validated the models in an independent study area approx. 200 km away by predicting seasonal presence of infected bank voles in a five-year-period (2007–2010 and 2015). Results The distribution of PUUV-infected voles varied seasonally and inter-annually. In spring, micro-habitat variables related to cover and food availability in forests predicted both bank vole and infected bank vole presence. In fall, the presence of PUUV-infected voles was generally restricted to spruce forests where cover was abundant, despite the broad landscape distribution of bank voles in general. We hypothesize that the discrepancy in distribution between infected and uninfected hosts in fall, was related to higher survival of PUUV and/or PUUV-infected voles in the environment, especially where cover is plentiful. Conclusions Moist and mesic old spruce forests, with abundant cover such as large holes and bilberry shrubs, also providing food, were most likely to harbor infected bank voles. The models developed using long-term and spatially extensive data can be extrapolated to other areas in northern Fennoscandia. To predict the hazard of directly transmitted zoonoses in areas with unknown risk status, models based on micro-habitat variables and developed through machine learning techniques in well-studied systems, could be used. Electronic supplementary material The online version of this article (doi:10.1186/s12879-017-2618-z) contains supplementary material, which is available to authorized users. Collapse Key Words Bank vole Boosted regression trees Hantavirus Machine learning Micro-habitat Prediction, Puumala virus Validation Zoonotic hazard Collapse MESH Headings Collapse Grants Collapse
37	The impact of urbanization and population density on childhood Plasmodium falciparum parasite prevalence rates in Africa. Malar J 2017;16:49. [PMID: 28125996 PMCID: PMC5270336 DOI: 10.1186/s12936-017-1694-2] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2016] [Accepted: 01/13/2017] [Indexed: 11/18/2022] Open Abstract Background Although malaria has been traditionally regarded as less of a problem in urban areas compared to neighbouring rural areas, the risk of malaria infection continues to exist in densely populated, urban areas of Africa. Despite the recognition that urbanization influences the epidemiology of malaria, there is little consensus on urbanization relevant for malaria parasite mapping. Previous studies examining the relationship between urbanization and malaria transmission have used products defining urbanization at global/continental scales developed in the early 2000s, that overestimate actual urban extents while the population estimates are over 15 years old and estimated at administrative unit level. Methods and results This study sought to discriminate an urbanization definition that is most relevant for malaria parasite mapping using individual level malaria infection data obtained from nationally representative household-based surveys. Boosted regression tree (BRT) modelling was used to determine the effect of urbanization on malaria transmission and if this effect varied with urbanization definition. In addition, the most recent high resolution population distribution data was used to determine whether population density had significant effect on malaria parasite prevalence and if so, could population density replace urban classifications in modelling malaria transmission patterns. The risk of malaria infection was shown to decline from rural areas through peri-urban settlements to urban central areas. Population density was found to be an important predictor of malaria risk. The final boosted regression trees (BRT) model with urbanization and population density gave the best model fit (Tukey test p value <0.05) compared to the models with urbanization only. Conclusion Given the challenges in uniformly classifying urban areas across different countries, population density provides a reliable metric to adjust for the patterns of malaria risk in densely populated urban areas. Future malaria risk models can, therefore, be improved by including both population density and urbanization which have both been shown to have significant impact on malaria risk in this study. Electronic supplementary material The online version of this article (doi:10.1186/s12936-017-1694-2) contains supplementary material, which is available to authorized users. Collapse Key Words Boosted regression trees Malaria Population density Urbanization Collapse MESH Headings Collapse Grants Collapse
38	Tracking cyanobacteria blooms: Do different monitoring approaches tell the same story? THE SCIENCE OF THE TOTAL ENVIRONMENT 2017;575:294-308. [PMID: 27744157 DOI: 10.1016/j.scitotenv.2016.10.023] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/02/2016] [Revised: 09/21/2016] [Accepted: 10/03/2016] [Indexed: 06/06/2023] Abstract Cyanobacteria blooms are a major environmental issue worldwide. Our understanding of the biophysical processes driving cyanobacterial proliferation and the ability to develop predictive models that inform resource managers and policy makers rely upon the accurate characterization of bloom dynamics. Models quantifying relationships between bloom severity and environmental drivers are often calibrated to an individual set of bloom observations, and few studies have assessed whether differences among observing platforms could lead to contrasting results in terms of relevant bloom predictors and their estimated influence on bloom severity. The aim of this study was to assess the degree of coherence of different monitoring methods in (1) capturing short- and long-term cyanobacteria bloom dynamics and (2) identifying environmental drivers associated with bloom variability. Using western Lake Erie as a case study, we applied boosted regression tree (BRT) models to long-term time series of cyanobacteria bloom estimates from multiple in-situ and remote sensing approaches to quantify the relative influence of physico-chemical and meteorological drivers on bloom variability. Results of BRT models showed remarkable consistency with known ecological requirements of cyanobacteria (e.g., nutrient loading, water temperature, and tributary discharge). However, discrepancies in inter-annual and intra-seasonal bloom dynamics across monitoring approaches led to some inconsistencies in the relative importance, shape, and sign of the modeled relationships between select environmental drivers and bloom severity. This was especially true for variables characterized by high short-term variability, such as wind forcing. These discrepancies might have implications for our understanding of the role of different environmental drivers in regulating bloom dynamics, and subsequently for the development of models capable of informing management and decision making. Our results highlight the need to develop methods to integrate multiple data sources to better characterize bloom spatio-temporal variability and improve our ability to understand and predict cyanobacteria blooms. Collapse Key Words Biophysical drivers Boosted regression trees Cyanobacterial harmful algal blooms Long-term time series Water quality Collapse MESH Headings Cyanobacteria/growth & development Environmental Monitoring/methods Eutrophication Lakes Temperature Wind Collapse Grants Collapse
39	Analysing the impact of multiple stressors in aquatic biomonitoring data: A 'cookbook' with applications in R. THE SCIENCE OF THE TOTAL ENVIRONMENT 2016;573:1320-1339. [PMID: 27499499 DOI: 10.1016/j.scitotenv.2016.06.243] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/14/2016] [Revised: 06/29/2016] [Accepted: 06/29/2016] [Indexed: 06/06/2023] Abstract Multiple stressors threaten biodiversity and ecosystem integrity, imposing new challenges to ecosystem management and restoration. Ecosystem managers are required to address and mitigate the impact of multiple stressors, yet the knowledge required to disentangle multiple-stressor effects is still incomplete. Experimental studies have advanced the understanding of single and combined stressor effects, but there is a lack of a robust analytical framework, to address the impact of multiple stressors based on monitoring data. Since 2000, the monitoring of Europe's waters has resulted in a vast amount of biological and environmental (stressor) data of about 120,000 water bodies. For many reasons, this data is rarely exploited in the multiple-stressor context, probably because of its rather heterogeneous nature: stressors vary and are mixed with broad-scale proxies of environmental stress (e.g. land cover), missing values and zero-inflated data limit the application of statistical methods and biological indicators are often aggregated (e.g. taxon richness) and do not respond stressor-specific. Here, we present a 'cookbook' to analyse the biological response to multiple stressors using data from biomonitoring schemes. Our 'cookbook' includes guidance for the analytical process and the interpretation of results. The 'cookbook' is accompanied by scripts, which allow the user to run a stepwise analysis based on his/her own data in R, an open-source language and environment for statistical computing and graphics. Using simulated and real data, we show that the recommended procedure is capable of identifying stressor hierarchy (importance) and interaction in large datasets. We recommend a minimum number of 150 independent observations and a minimum stressor gradient length of 75% (of the most relevant stressor's gradient in nature), to be able to reliably rank the stressor's importance, detect relevant interactions and estimate their standardised effect size. We conclude with a brief discussion of the advantages and limitations of this protocol. Collapse Key Words Analytical framework Boosted regression trees Freshwater ecosystems Generalised linear modelling Random Forest Water framework directive Collapse MESH Headings Biodiversity Environmental Monitoring/methods Environmental Pollutants/adverse effects Europe Fresh Water/analysis Software Collapse Grants Collapse
40	Life-history strategies constrain invertebrate community tolerance to multiple stressors: A case study in the Ebro basin. THE SCIENCE OF THE TOTAL ENVIRONMENT 2016;572:196-206. [PMID: 27498381 DOI: 10.1016/j.scitotenv.2016.07.227] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/14/2016] [Revised: 07/31/2016] [Accepted: 07/31/2016] [Indexed: 06/06/2023] Abstract CONTEXT Multiple stressors constitute a serious threat to aquatic ecosystems, particularly in the Mediterranean region where water scarcity is likely to interact with other anthropogenic stressors. Biological traits potentially allow the unravelling of the effects of multiple stressors. However, thus far, trait-based approaches have failed to fully deliver on their promise and still lack strong predictive power when multiple stressors are present. GOAL We aimed to quantify specific community tolerances against six anthropogenic stressors and investigate the responses of the underlying macroinvertebrate biological traits and their combinations. METHODS We built and calibrated boosted regression tree models to predict community tolerances using multiple biological traits with a priori hypotheses regarding their individual responses to specific stressors. We analysed the combinations of traits underlying community tolerance and the effect of trait association on this tolerance. RESULTS Our results validated the following three hypotheses: (i) the community tolerance models efficiently and robustly related trait combinations to stressor intensities and, to a lesser extent, to stressors related to the presence of dams and insecticides; (ii) the effects of traits on community tolerance not only depended on trait identity but also on the trait associations emerging at the community level from the co-occurrence of different traits in species; and (iii) the community tolerances and the underlying trait combinations were specific to the different stressors. CONCLUSION This study takes a further step towards predictive tools in community ecology that consider combinations and associations of traits as the basis of stressor tolerance. Additionally, the community tolerance concept has potential application to help stream managers in the decision process regarding management options. Collapse Key Words Aquatic invertebrates Boosted regression trees Habitat template Mediterranean streams Trait combinations Water scarcity Collapse MESH Headings Animals Environmental Monitoring/methods Invertebrates/physiology Life History Traits Rivers/chemistry Spain Water Pollutants, Chemical/adverse effects Collapse Grants Collapse
41	Spatial models reveal the microclimatic buffering capacity of old-growth forests. SCIENCE ADVANCES 2016;2:e1501392. [PMID: 27152339 PMCID: PMC4846426 DOI: 10.1126/sciadv.1501392] [Citation(s) in RCA: 96] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/06/2015] [Accepted: 03/25/2016] [Indexed: 05/21/2023] Abstract Climate change is predicted to cause widespread declines in biodiversity, but these predictions are derived from coarse-resolution climate models applied at global scales. Such models lack the capacity to incorporate microclimate variability, which is critical to biodiversity microrefugia. In forested montane regions, microclimate is thought to be influenced by combined effects of elevation, microtopography, and vegetation, but their relative effects at fine spatial scales are poorly known. We used boosted regression trees to model the spatial distribution of fine-scale, under-canopy air temperatures in mountainous terrain. Spatial models predicted observed independent test data well (r = 0.87). As expected, elevation strongly predicted temperatures, but vegetation and microtopography also exerted critical effects. Old-growth vegetation characteristics, measured using LiDAR (light detection and ranging), appeared to have an insulating effect; maximum spring monthly temperatures decreased by 2.5°C across the observed gradient in old-growth structure. These cooling effects across a gradient in forest structure are of similar magnitude to 50-year forecasts of the Intergovernmental Panel on Climate Change and therefore have the potential to mitigate climate warming at local scales. Management strategies to conserve old-growth characteristics and to curb current rates of primary forest loss could maintain microrefugia, enhancing biodiversity persistence in mountainous systems under climate warming. Collapse Key Words Boosted regression trees forest structure microclimate microrefugia mountains Collapse MESH Headings Biodiversity Climate Change Forests Microclimate Seasons Temperature Collapse Grants National Science Foundation U.S. Geological Survey Collapse
42	How the choice of safety performance function affects the identification of important crash prediction variables. ACCIDENT; ANALYSIS AND PREVENTION 2016;88:1-8. [PMID: 26710265 DOI: 10.1016/j.aap.2015.12.005] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/14/2015] [Revised: 11/11/2015] [Accepted: 12/04/2015] [Indexed: 06/05/2023] Abstract Across the nation, researchers and transportation engineers are developing safety performance functions (SPFs) to predict crash rates and develop crash modification factors to improve traffic safety at roadway segments and intersections. Generalized linear models (GLMs), such as Poisson or negative binomial regression, are most commonly used to develop SPFs with annual average daily traffic as the primary roadway characteristic to predict crashes. However, while more complex to interpret, data mining models such as boosted regression trees have improved upon GLMs crash prediction performance due to their ability to handle more data characteristics, accommodate non-linearities, and include interaction effects between the characteristics. An intersection data inventory of 36 safety relevant parameters for three- and four-legged non-signalized intersections along state routes in Alabama was used to study the importance of intersection characteristics on crash rate and the interaction effects between key characteristics. Four different SPFs were investigated and compared: Poisson regression, negative binomial regression, regularized generalized linear model, and boosted regression trees. The models did not agree on which intersection characteristics were most related to the crash rate. The boosted regression tree model significantly outperformed the other models and identified several intersection characteristics as having strong interaction effects. Collapse Key Words Boosted regression trees Crash frequency Intersection characteristic importance Non-signalized intersections Safety performance functions Collapse MESH Headings Accidents, Traffic/statistics & numerical data Alabama Environment Design/statistics & numerical data Humans Linear Models Logistic Models Models, Statistical Models, Theoretical Poisson Distribution Safety Transportation Collapse Grants Collapse
43	Mapping the zoonotic niche of Lassa fever in Africa. Trans R Soc Trop Med Hyg 2015;109:483-92. [PMID: 26085474 PMCID: PMC4501400 DOI: 10.1093/trstmh/trv047] [Citation(s) in RCA: 85] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2015] [Accepted: 05/29/2015] [Indexed: 02/05/2023] Open Abstract Background Lassa fever is a viral haemorrhagic illness responsible for disease outbreaks across West Africa. It is a zoonosis, with the primary reservoir species identified as the Natal multimammate mouse, Mastomys natalensis. The host is distributed across sub-Saharan Africa while the virus' range appears to be restricted to West Africa. The majority of infections result from interactions between the animal reservoir and human populations, although secondary transmission between humans can occur, particularly in hospital settings. Methods Using a species distribution model, the locations of confirmed human and animal infections with Lassa virus (LASV) were used to generate a probabilistic surface of zoonotic transmission potential across sub-Saharan Africa. Results Our results predict that 37.7 million people in 14 countries, across much of West Africa, live in areas where conditions are suitable for zoonotic transmission of LASV. Four of these countries, where at-risk populations are predicted, have yet to report any cases of Lassa fever. Conclusions These maps act as a spatial guide for future surveillance activities to better characterise the geographical distribution of the disease and understand the anthropological, virological and zoological interactions necessary for viral transmission. Combining this zoonotic niche map with detailed patient travel histories can aid differential diagnoses of febrile illnesses, enabling a more rapid response in providing care and reducing the risk of onward transmission. Collapse Key Words Boosted regression trees LASV Lassa fever Mastomys natalensis Species distribution models Viral haemorrhagic fever Collapse MESH Headings Collapse Grants Collapse
44	Prioritizing Highway Safety Manual's crash prediction variables using boosted regression trees. ACCIDENT; ANALYSIS AND PREVENTION 2015;79:133-144. [PMID: 25823903 DOI: 10.1016/j.aap.2015.03.011] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/15/2014] [Revised: 02/14/2015] [Accepted: 03/10/2015] [Indexed: 06/04/2023] Abstract The Highway Safety Manual (HSM) recommends using the empirical Bayes (EB) method with locally derived calibration factors to predict an agency's safety performance. However, the data needs for deriving these local calibration factors are significant, requiring very detailed roadway characteristics information. Many of the data variables identified in the HSM are currently unavailable in the states' databases. Moreover, the process of collecting and maintaining all the HSM data variables is cost-prohibitive. Prioritization of the variables based on their impact on crash predictions would, therefore, help to identify influential variables for which data could be collected and maintained for continued updates. This study aims to determine the impact of each independent variable identified in the HSM on crash predictions. A relatively recent data mining approach called boosted regression trees (BRT) is used to investigate the association between the variables and crash predictions. The BRT method can effectively handle different types of predictor variables, identify very complex and non-linear association among variables, and compute variable importance. Five years of crash data from 2008 to 2012 on two urban and suburban facility types, two-lane undivided arterials and four-lane divided arterials, were analyzed for estimating the influence of variables on crash predictions. Variables were found to exhibit non-linear and sometimes complex relationship to predicted crash counts. In addition, only a few variables were found to explain most of the variation in the crash data. Collapse Key Words Boosted regression trees Calibration factor Crash predictions Data mining Highway Safety Manual Variable importance Collapse MESH Headings Accidents, Traffic/statistics & numerical data Accidents, Traffic/trends Bayes Theorem Forecasting Humans Models, Statistical Regression Analysis Safety/standards Collapse Grants Collapse
45	Species distribution modelling for conservation of an endangered endemic orchid. AOB PLANTS 2015;7:plv039. [PMID: 25900746 PMCID: PMC4463238 DOI: 10.1093/aobpla/plv039] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/29/2014] [Accepted: 03/31/2015] [Indexed: 05/14/2023] Abstract Concerns regarding the long-term viability of threatened and endangered plant species are increasingly warranted given the potential impacts of climate change and habitat fragmentation on unstable and isolated populations. Orchidaceae is the largest and most diverse family of flowering plants, but it is currently facing unprecedented risks of extinction. Despite substantial conservation emphasis on rare orchids, populations continue to decline. Spiranthes parksii (Navasota ladies' tresses) is a federally and state-listed endangered terrestrial orchid endemic to central Texas. Hence, we aimed to identify potential factors influencing the distribution of the species, quantify the relative importance of each factor and determine suitable habitat for future surveys and targeted conservation efforts. We analysed several geo-referenced variables describing climatic conditions and landscape features to identify potential factors influencing the likelihood of occurrence of S. parksii using boosted regression trees. Our model classified 97 % of the cells correctly with regard to species presence and absence, and indicated that probability of existence was correlated with climatic conditions and landscape features. The most influential variables were mean annual precipitation, mean elevation, mean annual minimum temperature and mean annual maximum temperature. The most likely suitable range for S. parksii was the eastern portions of Leon and Madison Counties, the southern portion of Brazos County, a portion of northern Grimes County and along the borders between Burleson and Washington Counties. Our model can assist in the development of an integrated conservation strategy through: (i) focussing future survey and research efforts on areas with a high likelihood of occurrence, (ii) aiding in selection of areas for conservation and restoration and (iii) framing future research questions including those necessary for predicting responses to climate change. Our model could also incorporate new information on S. parksii as it becomes available to improve prediction accuracy, and our methodology could be adapted to develop distribution maps for other rare species of conservation concern. Collapse Key Words Boosted regression trees Navasota ladies’ tresses conservation endangered species reintroduction species distribution models Collapse MESH Headings Collapse Grants Collapse
46	Mapping the zoonotic niche of Marburg virus disease in Africa. Trans R Soc Trop Med Hyg 2015;109:366-78. [PMID: 25820266 PMCID: PMC4447827 DOI: 10.1093/trstmh/trv024] [Citation(s) in RCA: 78] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2014] [Accepted: 02/23/2015] [Indexed: 11/12/2022] Open Abstract Background Marburg virus disease (MVD) describes a viral haemorrhagic fever responsible for a number of outbreaks across eastern and southern Africa. It is a zoonotic disease, with the Egyptian rousette (Rousettus aegyptiacus) identified as a reservoir host. Infection is suspected to result from contact between this reservoir and human populations, with occasional secondary human-to-human transmission. Methods Index cases of previous human outbreaks were identified and reports of infection in animals recorded. These data were modelled within a species distribution modelling framework in order to generate a probabilistic surface of zoonotic transmission potential of MVD across sub-Saharan Africa. Results Areas suitable for zoonotic transmission of MVD are predicted in 27 countries inhabited by 105 million people. Regions are suggested for exploratory surveys to better characterise the geographical distribution of the disease, as well as for directing efforts to communicate the risk of practices enhancing zoonotic contact. Conclusions These maps can inform future contingency and preparedness strategies for MVD control, especially where secondary transmission is a risk. Coupling this risk map with patient travel histories could be used to guide the differential diagnosis of highly transmissible pathogens, enabling more rapid response to outbreaks of haemorrhagic fever. Collapse Key Words Boosted regression trees Filovirus Marburg virus disease Rousettus aegyptiacus Species distribution models Viral haemorrhagic fever Collapse MESH Headings Collapse Grants Collapse
47	Factor complexity of crash occurrence: An empirical demonstration using boosted regression trees. ACCIDENT; ANALYSIS AND PREVENTION 2013;61:107-118. [PMID: 22975365 DOI: 10.1016/j.aap.2012.08.015] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/28/2011] [Revised: 04/24/2012] [Accepted: 08/16/2012] [Indexed: 06/01/2023] Abstract Factor complexity is a characteristic of traffic crashes. This paper proposes a novel method, namely boosted regression trees (BRT), to investigate the complex and nonlinear relationships in high-variance traffic crash data. The Taiwanese 2004-2005 single-vehicle motorcycle crash data are used to demonstrate the utility of BRT. Traditional logistic regression and classification and regression tree (CART) models are also used to compare their estimation results and external validities. Both the in-sample cross-validation and out-of-sample validation results show that an increase in tree complexity provides improved, although declining, classification performance, indicating a limited factor complexity of single-vehicle motorcycle crashes. The effects of crucial variables including geographical, time, and sociodemographic factors explain some fatal crashes. Relatively unique fatal crashes are better approximated by interactive terms, especially combinations of behavioral factors. BRT models generally provide improved transferability than conventional logistic regression and CART models. This study also discusses the implications of the results for devising safety policies. Collapse Key Words Boosted regression trees Crash classification Machine learning Motorcycle crashes Collapse MESH Headings Accidents, Traffic/statistics & numerical data Female Humans Logistic Models Male Motorcycles/statistics & numerical data Multivariate Analysis Regression Analysis Risk Factors Collapse Grants Collapse