1
|
Gillespie LE, Ruffley M, Exposito-Alonso M. Deep learning models map rapid plant species changes from citizen science and remote sensing data. Proc Natl Acad Sci U S A 2024; 121:e2318296121. [PMID: 39236239 PMCID: PMC11406280 DOI: 10.1073/pnas.2318296121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2023] [Accepted: 07/17/2024] [Indexed: 09/07/2024] Open
Abstract
Anthropogenic habitat destruction and climate change are reshaping the geographic distribution of plants worldwide. However, we are still unable to map species shifts at high spatial, temporal, and taxonomic resolution. Here, we develop a deep learning model trained using remote sensing images from California paired with half a million citizen science observations that can map the distribution of over 2,000 plant species. Our model-Deepbiosphere-not only outperforms many common species distribution modeling approaches (AUC 0.95 vs. 0.88) but can map species at up to a few meters resolution and finely delineate plant communities with high accuracy, including the pristine and clear-cut forests of Redwood National Park. These fine-scale predictions can further be used to map the intensity of habitat fragmentation and sharp ecosystem transitions across human-altered landscapes. In addition, from frequent collections of remote sensing data, Deepbiosphere can detect the rapid effects of severe wildfire on plant community composition across a 2-y time period. These findings demonstrate that integrating public earth observations and citizen science with deep learning can pave the way toward automated systems for monitoring biodiversity change in real-time worldwide.
Collapse
Affiliation(s)
- Lauren E Gillespie
- Department of Plant Biology, Carnegie Science, Stanford, CA 94305
- Department of Computer Science, Stanford University, Stanford, CA 94305
- Department of Integrative Biology, University of California, Berkeley, CA 94720
| | - Megan Ruffley
- Department of Plant Biology, Carnegie Science, Stanford, CA 94305
| | - Moises Exposito-Alonso
- Department of Plant Biology, Carnegie Science, Stanford, CA 94305
- Department of Integrative Biology, University of California, Berkeley, CA 94720
- Department of Biology, Stanford University, Stanford, CA 94305
- Department of Global Ecology, Carnegie Science, Stanford, CA 94305
- HHMI, University of California, Berkeley, CA 94720
| |
Collapse
|
2
|
de Klerk J, Tildesley M, Labuschagne K, Gorsich E. Modelling bluetongue and African horse sickness vector (Culicoides spp.) distribution in the Western Cape in South Africa using random forest machine learning. Parasit Vectors 2024; 17:354. [PMID: 39169433 PMCID: PMC11340078 DOI: 10.1186/s13071-024-06446-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2024] [Accepted: 08/12/2024] [Indexed: 08/23/2024] Open
Abstract
BACKGROUND Culicoides biting midges exhibit a global spatial distribution and are the main vectors of several viruses of veterinary importance, including bluetongue (BT) and African horse sickness (AHS). Many environmental and anthropological factors contribute to their ability to live in a variety of habitats, which have the potential to change over the years as the climate changes. Therefore, as new habitats emerge, the risk for new introductions of these diseases of interest to occur increases. The aim of this study was to model distributions for two primary vectors for BT and AHS (Culicoides imicola and Culicoides bolitinos) using random forest (RF) machine learning and explore the relative importance of environmental and anthropological factors in a region of South Africa with frequent AHS and BT outbreaks. METHODS Culicoides capture data were collected between 1996 and 2022 across 171 different capture locations in the Western Cape. Predictor variables included climate-related variables (temperature, precipitation, humidity), environment-related variables (normalised difference vegetation index-NDVI, soil moisture) and farm-related variables (livestock densities). Random forest (RF) models were developed to explore the spatial distributions of C. imicola, C. bolitinos and a merged species map, where both competent vectors were combined. The maps were then compared to interpolation maps using the same capture data as well as historical locations of BT and AHS outbreaks. RESULTS Overall, the RF models performed well with 75.02%, 61.6% and 74.01% variance explained for C. imicola, C. bolitinos and merged species models respectively. Cattle density was the most important predictor for C. imicola and water vapour pressure the most important for C. bolitinos. Compared to interpolation maps, the RF models had higher predictive power throughout most of the year when species were modelled individually; however, when merged, the interpolation maps performed better in all seasons except winter. Finally, midge densities did not show any conclusive correlation with BT or AHS outbreaks. CONCLUSION This study yielded novel insight into the spatial abundance and drivers of abundance of competent vectors of BT and AHS. It also provided valuable data to inform mathematical models exploring disease outbreaks so that Culicoides-transmitted diseases in South Africa can be further analysed.
Collapse
Affiliation(s)
- Joanna de Klerk
- The Zeeman Institute for Systems Biology and Infectious Disease Epidemiology Research, School of Life Sciences and Mathematics Institute, University of Warwick, Coventry, CV4 7AL, UK.
| | - Michael Tildesley
- The Zeeman Institute for Systems Biology and Infectious Disease Epidemiology Research, School of Life Sciences and Mathematics Institute, University of Warwick, Coventry, CV4 7AL, UK
| | - Karien Labuschagne
- Epidemiology, Parasites and Vectors, Agricultural Research Council, Onderstepoort Veterinary Research, Onderstepoort, 0110, South Africa
| | - Erin Gorsich
- The Zeeman Institute for Systems Biology and Infectious Disease Epidemiology Research, School of Life Sciences and Mathematics Institute, University of Warwick, Coventry, CV4 7AL, UK
| |
Collapse
|
3
|
Brun P, Karger DN, Zurell D, Descombes P, de Witte LC, de Lutio R, Wegner JD, Zimmermann NE. Multispecies deep learning using citizen science data produces more informative plant community models. Nat Commun 2024; 15:4421. [PMID: 38789424 PMCID: PMC11126635 DOI: 10.1038/s41467-024-48559-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Accepted: 05/03/2024] [Indexed: 05/26/2024] Open
Abstract
In the age of big data, scientific progress is fundamentally limited by our capacity to extract critical information. Here, we map fine-grained spatiotemporal distributions for thousands of species, using deep neural networks (DNNs) and ubiquitous citizen science data. Based on 6.7 M observations, we jointly model the distributions of 2477 plant species and species aggregates across Switzerland with an ensemble of DNNs built with different cost functions. We find that, compared to commonly-used approaches, multispecies DNNs predict species distributions and especially community composition more accurately. Moreover, their design allows investigation of understudied aspects of ecology. Including seasonal variations of observation probability explicitly allows approximating flowering phenology; reweighting predictions to mirror cover-abundance allows mapping potentially canopy-dominant tree species nationwide; and projecting DNNs into the future allows assessing how distributions, phenology, and dominance may change. Given their skill and their versatility, multispecies DNNs can refine our understanding of the distribution of plants and well-sampled taxa in general.
Collapse
Affiliation(s)
- Philipp Brun
- Swiss Federal Research Institute WSL, 8903, Birmensdorf, Switzerland.
| | - Dirk N Karger
- Swiss Federal Research Institute WSL, 8903, Birmensdorf, Switzerland
| | - Damaris Zurell
- Institute of Biochemistry and Biology, University of Potsdam, 14469, Potsdam, Germany
| | - Patrice Descombes
- Muséum cantonal des sciences naturelles, département de botanique, 1007, Lausanne, Switzerland
- Department of Ecology and Evolution, University of Lausanne, 1015, Lausanne, Switzerland
| | | | - Riccardo de Lutio
- EcoVision Lab, Photogrammetry and Remote Sensing, ETH Zurich, 8092, Zürich, Switzerland
| | - Jan Dirk Wegner
- Department of Mathematical Modeling and Machine Learning, University of Zurich, 8057, Zurich, Switzerland
| | | |
Collapse
|
4
|
Oh G, Wi Y, Kang HJ, Cheon SJ, Sung HC, Kim Y, Jin HS. Assessment of American Bullfrog (Lithobates catesbeianus) spreading in the Republic of Korea using rule learning of elementary cellular automata. Sci Rep 2024; 14:11548. [PMID: 38773141 PMCID: PMC11109106 DOI: 10.1038/s41598-024-62139-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Accepted: 05/14/2024] [Indexed: 05/23/2024] Open
Abstract
The spread of American Bullfrog has a significant impact on the surrounding ecosystem. It is important to study the mechanisms of their spreading so that proper mitigation can be applied when needed. This study analyzes data from national surveys on bullfrog distribution. We divided the data into 25 regional clusters. To assess the spread within each cluster, we constructed temporal sequences of spatial distribution using the agglomerative clustering method. We employed Elementary Cellular Automata (ECA) to identify rules governing the changes in spatial patterns. Each cell in the ECA grid represents either the presence or absence of bullfrogs based on observations. For each cluster, we counted the number of presence location in the sequence to quantify spreading intensity. We used a Convolutional Neural Network (CNN) to learn the ECA rules and predict future spreading intensity by estimating the expected number of presence locations over 400 simulated generations. We incorporated environmental factors by obtaining habitat suitability maps using Maxent. We multiplied spreading intensity by habitat suitability to create an overall assessment of bullfrog invasion risk. We estimated the relative spreading assessment and classified it into four categories: rapidly spreading, slowly spreading, stable populations, and declining populations.
Collapse
Affiliation(s)
- Gyujin Oh
- Department of Mathematics and Statistics, Chonnam National University, 77 Yongbongro, Bukgu, Gwangju, 61186, Republic of Korea
| | - Yunju Wi
- Department of Mathematics and Statistics, Chonnam National University, 77 Yongbongro, Bukgu, Gwangju, 61186, Republic of Korea
| | - Hee-Jin Kang
- School of Biological of Sciences and Biotechnology, Chonnam National University, 77 Yongbongro, Bukgu, Gwangju, 61186, Republic of Korea
| | - Seung-Ju Cheon
- School of Biological of Sciences and Biotechnology, Chonnam National University, 77 Yongbongro, Bukgu, Gwangju, 61186, Republic of Korea
| | - Ha-Cheol Sung
- Department of Biological Sciences, College of Natural Sciences, Chonnam National University, 77 Yongbongro, Bukgu, Gwangju, 61186, Republic of Korea
| | - Yena Kim
- Department of Mathematics, Hawaii Pacific University, 1 Aloha Tower Drive, Honolulu, HI, 96813, USA
| | - Hong-Sung Jin
- Department of Mathematics and Statistics, Chonnam National University, 77 Yongbongro, Bukgu, Gwangju, 61186, Republic of Korea.
| |
Collapse
|
5
|
Kass JM, Fukaya K, Thuiller W, Mori AS. Biodiversity modeling advances will improve predictions of nature's contributions to people. Trends Ecol Evol 2024; 39:338-348. [PMID: 37968219 DOI: 10.1016/j.tree.2023.10.011] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Revised: 10/17/2023] [Accepted: 10/17/2023] [Indexed: 11/17/2023]
Abstract
Accurate predictions of ecosystem functions and nature's contributions to people (NCP) are needed to prioritize environmental protection and restoration in the Anthropocene. However, our ability to predict NCP is undermined by approaches that rely on biophysical variables and ignore those describing biodiversity, which have strong links to NCP. To foster predictive mapping of NCP, we should harness the latest methods in biodiversity modeling. This field advances rapidly, and new techniques with promising applications for predicting NCP are still underutilized. Here, we argue that employing recent advances in biodiversity modeling can enhance the accuracy and scope of NCP maps and predictions. This enhancement will contribute significantly to the achievement of global objectives to preserve NCP, for both the present and an unpredictable future.
Collapse
Affiliation(s)
- Jamie M Kass
- Macroecology Laboratory, Graduate School of Life Sciences, Tohoku University, Sendai, Miyagi, Japan; Biodiversity and Biocomplexity Unit, Okinawa Institute of Science and Technology Graduate University, Onna, Okinawa, Japan.
| | - Keiichi Fukaya
- Biodiversity Division, National Institute for Environmental Studies, Tsukuba, Ibaraki, Japan
| | - Wilfried Thuiller
- Université Grenoble Alpes, Université Savoie Mont Blanc, CNRS, LECA, F-38000 Grenoble, France
| | - Akira S Mori
- Research Center for Advanced Science and Technology, The University of Tokyo, Tokyo, Japan
| |
Collapse
|
6
|
Thuiller W. Ecological niche modelling. Curr Biol 2024; 34:R225-R229. [PMID: 38531309 DOI: 10.1016/j.cub.2024.02.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/28/2024]
Abstract
One of the central research questions in ecology and biogeography revolves around understanding the spatial distribution patterns of organisms, the factors influencing species abundance, and why in certain areas there are more species or individuals than in others. Addressing these questions not only forms the bedrock of scientific research in ecology and evolution but also has critical implications for biodiversity conservation and management. To safeguard species, restore habitats, prevent invasions and anticipate future impacts, it is imperative to identify optimal areas for species or biodiversity under current and future conditions, such as changes in climate or land use. Ecologists have long tried to discern which conditions enable species to maintain viable populations in a given area (Figure 1). Broadly speaking, three main conditions must be met for a species to inhabit a site: successful dispersal throughout its biogeographic history; environmental conditions suitable for sustaining a population; and biotic conditions conducive to species persistence, including resource availability and absence of strong competitors. Ecological niche modelling, also known as species distribution modelling or habitat suitability modelling, primarily focuses on environmental factors, though models are increasingly integrating dispersal and biotic interactions. In the following sections, we will delve into the basic structure and hypotheses of ecological niche modelling, their applications and potential future improvements.
Collapse
Affiliation(s)
- Wilfried Thuiller
- University Grenoble Alpes, University Savoie Mont Blanc, CNRS, LECA, Laboratoire d'Ecologie Alpine, F-38000 Grenoble, France.
| |
Collapse
|
7
|
Garcia‐Quintas A, Roy A, Barbraud C, Demarcq H, Denis D, Lanco Bertrand S. Machine and deep learning approaches to understand and predict habitat suitability for seabird breeding. Ecol Evol 2023; 13:e10549. [PMID: 37727776 PMCID: PMC10505760 DOI: 10.1002/ece3.10549] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2023] [Revised: 08/31/2023] [Accepted: 09/05/2023] [Indexed: 09/21/2023] Open
Abstract
The way animals select their breeding habitat may have great impacts on individual fitness. This complex process depends on the integration of information on various environmental factors, over a wide range of spatiotemporal scales. For seabirds, breeding habitat selection integrates both land and sea features over several spatial scales. Seabirds explore these features prior to breeding, assessing habitats' quality. However, the information-gathering and decision-making process by seabirds when choosing a breeding habitat remains poorly understood. We compiled 49 historical records of larids colonies in Cuba from 1980 to 2020. Then, we predicted potentially suitable breeding sites for larids and assessed their breeding macrohabitat selection, using deep and machine learning algorithms respectively. Using a convolutional neural network and Landsat satellite images we predicted the suitability for nesting of non-monitored sites of this archipelago. Furthermore, we assessed the relative contribution of 18 land- and marine-based environmental covariates describing macrohabitats at three spatial scales (i.e. 10, 50 and 100 km) using random forests. Convolutional neural network exhibited good performance at training, validation and test (F1-scores >85%). Sites with higher habitat suitability (p > .75) covered 20.3% of the predicting area. Larids breeding macrohabitats were sites relatively close to main islands, featuring sparse vegetation cover and high chlorophyll-a concentration at sea in 50 and 100 km around colonies. Lower sea surface temperature at larger spatial scales was determinant to distinguish the breeding from non-breeding sites. A more comprehensive understanding of the seabird breeding macrohabitats selection can be reached from the complementary use of convolutional neural networks and random forest models. Our analysis provides crucial knowledge in tropical regions that lack complete and regular monitoring of seabirds' breeding sites.
Collapse
Affiliation(s)
- Antonio Garcia‐Quintas
- Institut de Recherche pour le Développement (IRD)MARBEC (Université de Montpellier, Ifremer, CNRS, IRD)SèteFrance
- Centro de Investigaciones de Ecosistemas Costeros (CIEC)Cayo CocoCuba
| | - Amédée Roy
- Institut de Recherche pour le Développement (IRD)MARBEC (Université de Montpellier, Ifremer, CNRS, IRD)SèteFrance
| | - Christophe Barbraud
- Centres d'Etudes Biologiques de Chizé UMR7372Centre National de la Recherche ScientifiqueVilliers en BoisFrance
| | - Hervé Demarcq
- Institut de Recherche pour le Développement (IRD)MARBEC (Université de Montpellier, Ifremer, CNRS, IRD)SèteFrance
| | | | - Sophie Lanco Bertrand
- Institut de Recherche pour le Développement (IRD)MARBEC (Université de Montpellier, Ifremer, CNRS, IRD)SèteFrance
| |
Collapse
|
8
|
Pichler M, Hartig F. Machine learning and deep learning—A review for ecologists. Methods Ecol Evol 2023. [DOI: 10.1111/2041-210x.14061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/16/2023]
Affiliation(s)
| | - Florian Hartig
- Theoretical Ecology University of Regensburg Regensburg Germany
| |
Collapse
|
9
|
Lippert F, Kranstauber B, Forré PD, van Loon EE. Learning to predict spatiotemporal movement dynamics from weather radar networks. Methods Ecol Evol 2022. [DOI: 10.1111/2041-210x.14007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
Affiliation(s)
- Fiona Lippert
- AI4Science Lab University of Amsterdam Amsterdam The Netherlands
- Amsterdam Machine Learning Lab University of Amsterdam Amsterdam The Netherlands
- Institute for Biodiversity and Ecosystem Dynamics University of Amsterdam Amsterdam The Netherlands
| | - Bart Kranstauber
- Institute for Biodiversity and Ecosystem Dynamics University of Amsterdam Amsterdam The Netherlands
| | - Patrick D. Forré
- AI4Science Lab University of Amsterdam Amsterdam The Netherlands
- Amsterdam Machine Learning Lab University of Amsterdam Amsterdam The Netherlands
| | - E. Emiel van Loon
- Institute for Biodiversity and Ecosystem Dynamics University of Amsterdam Amsterdam The Netherlands
| |
Collapse
|
10
|
Bonannella C, Hengl T, Heisig J, Parente L, Wright MN, Herold M, de Bruin S. Forest tree species distribution for Europe 2000-2020: mapping potential and realized distributions using spatiotemporal machine learning. PeerJ 2022; 10:e13728. [PMID: 35910765 PMCID: PMC9332400 DOI: 10.7717/peerj.13728] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2022] [Accepted: 06/22/2022] [Indexed: 01/17/2023] Open
Abstract
This article describes a data-driven framework based on spatiotemporal machine learning to produce distribution maps for 16 tree species (Abies alba Mill., Castanea sativa Mill., Corylus avellana L., Fagus sylvatica L., Olea europaea L., Picea abies L. H. Karst., Pinus halepensis Mill., Pinus nigra J. F. Arnold, Pinus pinea L., Pinus sylvestris L., Prunus avium L., Quercus cerris L., Quercus ilex L., Quercus robur L., Quercus suber L. and Salix caprea L.) at high spatial resolution (30 m). Tree occurrence data for a total of three million of points was used to train different algorithms: random forest, gradient-boosted trees, generalized linear models, k-nearest neighbors, CART and an artificial neural network. A stack of 305 coarse and high resolution covariates representing spectral reflectance, different biophysical conditions and biotic competition was used as predictors for realized distributions, while potential distribution was modelled with environmental predictors only. Logloss and computing time were used to select the three best algorithms to tune and train an ensemble model based on stacking with a logistic regressor as a meta-learner. An ensemble model was trained for each species: probability and model uncertainty maps of realized distribution were produced for each species using a time window of 4 years for a total of six distribution maps per species, while for potential distributions only one map per species was produced. Results of spatial cross validation show that the ensemble model consistently outperformed or performed as good as the best individual model in both potential and realized distribution tasks, with potential distribution models achieving higher predictive performances (TSS = 0.898, R2 logloss = 0.857) than realized distribution ones on average (TSS = 0.874, R2 logloss = 0.839). Ensemble models for Q. suber achieved the best performances in both potential (TSS = 0.968, R2 logloss = 0.952) and realized (TSS = 0.959, R2 logloss = 0.949) distribution, while P. sylvestris (TSS = 0.731, 0.785, R2 logloss = 0.585, 0.670, respectively, for potential and realized distribution) and P. nigra (TSS = 0.658, 0.686, R2 logloss = 0.623, 0.664) achieved the worst. Importance of predictor variables differed across species and models, with the green band for summer and the Normalized Difference Vegetation Index (NDVI) for fall for realized distribution and the diffuse irradiation and precipitation of the driest quarter (BIO17) being the most frequent and important for potential distribution. On average, fine-resolution models outperformed coarse resolution models (250 m) for realized distribution (TSS = +6.5%, R2 logloss = +7.5%). The framework shows how combining continuous and consistent Earth Observation time series data with state of the art machine learning can be used to derive dynamic distribution maps. The produced predictions can be used to quantify temporal trends of potential forest degradation and species composition change.
Collapse
Affiliation(s)
- Carmelo Bonannella
- Laboratory of Geo-Information Science and Remote Sensing, Wageningen University and Research, Wageningen, The Netherlands
- OpenGeoHub, Wageningen, The Netherlands
| | | | - Johannes Heisig
- Institute for Geoinformatics, University of Münster, Münster, Germany
| | | | - Marvin N. Wright
- Leibniz Institute for Prevention Research and Epidemiology – BIPS, Bremen, Germany
- University of Bremen, Bremen, Germany
| | - Martin Herold
- Laboratory of Geo-Information Science and Remote Sensing, Wageningen University and Research, Wageningen, The Netherlands
- Section 1.4 Remote Sensing and Geoinformatics, GFZ German Research Centre for Geosciences, Potsdam, Germany
| | - Sytze de Bruin
- Laboratory of Geo-Information Science and Remote Sensing, Wageningen University and Research, Wageningen, The Netherlands
| |
Collapse
|
11
|
Flück B, Mathon L, Manel S, Valentini A, Dejean T, Albouy C, Mouillot D, Thuiller W, Murienne J, Brosse S, Pellissier L. Applying convolutional neural networks to speed up environmental DNA annotation in a highly diverse ecosystem. Sci Rep 2022; 12:10247. [PMID: 35715444 PMCID: PMC9205931 DOI: 10.1038/s41598-022-13412-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2021] [Accepted: 05/24/2022] [Indexed: 01/04/2023] Open
Abstract
High-throughput DNA sequencing is becoming an increasingly important tool to monitor and better understand biodiversity responses to environmental changes in a standardized and reproducible way. Environmental DNA (eDNA) from organisms can be captured in ecosystem samples and sequenced using metabarcoding, but processing large volumes of eDNA data and annotating sequences to recognized taxa remains computationally expensive. Speed and accuracy are two major bottlenecks in this critical step. Here, we evaluated the ability of convolutional neural networks (CNNs) to process short eDNA sequences and associate them with taxonomic labels. Using a unique eDNA data set collected in highly diverse Tropical South America, we compared the speed and accuracy of CNNs with that of a well-known bioinformatic pipeline (OBITools) in processing a small region (60 bp) of the 12S ribosomal DNA targeting freshwater fishes. We found that the taxonomic labels from the CNNs were comparable to those from OBITools, with high correlation levels for the composition of the regional fish fauna. The CNNs enabled the processing of raw fastq files at a rate of approximately 1 million sequences per minute, which was about 150 times faster than with OBITools. Given the good performance of CNNs in the highly diverse ecosystem considered here, the development of more elaborate CNNs promises fast deployment for future biodiversity inventories using eDNA.
Collapse
Affiliation(s)
- Benjamin Flück
- Department of Environmental System Science, ETH Zürich, 8092, Zurich, Switzerland.
- Swiss Federal Research Institute WSL, 8903, Birmensdorf, Switzerland.
| | - Laëtitia Mathon
- CEFE, Univ. Montpellier, CNRS, EPHE-PSL University, IRD, Montpellier, France
| | - Stéphanie Manel
- CEFE, Univ. Montpellier, CNRS, EPHE-PSL University, IRD, Montpellier, France
| | | | | | - Camille Albouy
- DECOD (Ecosystem Dynamics and Sustainability), IFREMER, INRAE, Institut Agro - Agrocampus Ouest, Rue de l'Ile d'Yeu, BP21105, 44311, Nantes Cedex 3, France
| | - David Mouillot
- MARBEC, Univ. Montpellier,CNRS, IRD, Ifremer, Montpellier, France
- Institut Universitaire de France, IUF, 75231, Paris, France
| | - Wilfried Thuiller
- CNRS, LECA, Laboratoire d'Écologie Alpine, Univ. Grenoble Alpes, Univ. Savoie Mont Blanc, 38000, Grenoble, France
| | - Jérôme Murienne
- Laboratoire Evolution et Diversité Biologique (UMR5174), CNRS, IRD, Université Paul Sabatier, Toulouse, France
| | - Sébastien Brosse
- Laboratoire Evolution et Diversité Biologique (UMR5174), CNRS, IRD, Université Paul Sabatier, Toulouse, France
| | - Loïc Pellissier
- Department of Environmental System Science, ETH Zürich, 8092, Zurich, Switzerland.
- Swiss Federal Research Institute WSL, 8903, Birmensdorf, Switzerland.
| |
Collapse
|
12
|
Coro G, Bove P, Ellenbroek A. Habitat distribution change of commercial species in the Adriatic Sea during the COVID-19 pandemic. ECOL INFORM 2022; 69:101675. [PMID: 35615467 PMCID: PMC9123804 DOI: 10.1016/j.ecoinf.2022.101675] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2022] [Revised: 05/10/2022] [Accepted: 05/11/2022] [Indexed: 12/31/2022]
Abstract
The COVID-19 pandemic has led to reduced anthropogenic pressure on ecosystems in several world areas, but resulting ecosystem responses in these areas have not been investigated. This paper presents an approach to make quick assessments of potential habitat changes in 2020 of eight marine species of commercial importance in the Adriatic Sea. Measurements from floating probes are interpolated through an advection-equation based model. The resulting distributions are then combined with species observations through an ecological niche model to estimate habitat distributions in the past years (2015–2018) at 0.1° spatial resolution. Habitat patterns over 2019 and 2020 are then extracted and explained in terms of specific environmental parameter changes. These changes are finally assessed for their potential dependency on climate change patterns and anthropogenic pressure change due to the pandemic. Our results demonstrate that the combined effect of climate change and the pandemic could have heterogeneous effects on habitat distributions: three species (Squilla mantis, Engraulis encrasicolus, and Solea solea) did not show significant niche distribution change; habitat suitability positively changed for Sepia officinalis, but negatively for Parapenaeus longirostris, due to increased temperature and decreasing dissolved oxygen (in the Adriatic) generally correlated with climate change; the combination of these trends with an average decrease in chlorophyll, probably due to the pandemic, extended the habitat distributions of Merluccius merluccius and Mullus barbatus but reduced Sardina pilchardus distribution. Although our results are based on approximated data and reliable at a macroscopic level, we present a very early insight of modifications that will possibly be observed years after the end of the pandemic when complete data will be available. Our approach is entirely based on Findable, Accessible, Interoperable, and Reusable (FAIR) data and is general enough to be used for other species and areas.
Collapse
Affiliation(s)
- Gianpaolo Coro
- Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - CNR, Pisa, Italy
| | - Pasquale Bove
- Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - CNR, Pisa, Italy
| | - Anton Ellenbroek
- Food and Agriculture Organization of the United Nations, Viale delle Terme di Caracalla, 00153 Rome, Italy
| |
Collapse
|
13
|
Estopinan J, Servajean M, Bonnet P, Munoz F, Joly A. Deep Species Distribution Modeling From Sentinel-2 Image Time-Series: A Global Scale Analysis on the Orchid Family. FRONTIERS IN PLANT SCIENCE 2022; 13:839327. [PMID: 35528931 PMCID: PMC9072833 DOI: 10.3389/fpls.2022.839327] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/19/2021] [Accepted: 02/28/2022] [Indexed: 06/14/2023]
Abstract
Species distribution models (SDMs) are widely used numerical tools that rely on correlations between geolocated presences (and possibly absences) and environmental predictors to model the ecological preferences of species. Recently, SDMs exploiting deep learning and remote sensing images have emerged and have demonstrated high predictive performance. In particular, it has been shown that one of the key advantages of these models (called deep-SDMs) is their ability to capture the spatial structure of the landscape, unlike prior models. In this paper, we examine whether the temporal dimension of remote sensing images can also be exploited by deep-SDMs. Indeed, satellites such as Sentinel-2 are now providing data with a high temporal revisit, and it is likely that the resulting time-series of images contain relevant information about the seasonal variations of the environment and vegetation. To confirm this hypothesis, we built a substantial and original dataset (called DeepOrchidSeries) aimed at modeling the distribution of orchids on a global scale based on Sentinel-2 image time series. It includes around 1 million occurrences of orchids worldwide, each being paired with a 12-month-long time series of high-resolution images (640 x 640 m RGB+IR patches centered on the geolocated observations). This ambitious dataset enabled us to train several deep-SDMs based on convolutional neural networks (CNNs) whose input was extended to include the temporal dimension. To quantify the contribution of the temporal dimension, we designed a novel interpretability methodology based on temporal permutation tests, temporal sampling, and temporal averaging. We show that the predictive performance of the model is greatly increased by the seasonality information contained in the temporal series. In particular, occurrence-poor species and diversity-rich regions are the ones that benefit the most from this improvement, revealing the importance of habitat's temporal dynamics to characterize species distribution.
Collapse
Affiliation(s)
- Joaquim Estopinan
- INRIA, Montpellier, France
- LIRMM, Univ Montpellier, CNRS, Montpellier, France
| | - Maximilien Servajean
- LIRMM, Univ Montpellier, CNRS, Montpellier, France
- AMIS, Université Paul Valéry Montpellier, Univ Montpellier, CNRS, Montpellier, France
| | - Pierre Bonnet
- AMAP, Univ Montpellier, CIRAD, CNRS, INRAE, IRD, Montpellier, France
- CIRAD, UMR AMAP, Montpellier, France
| | | | - Alexis Joly
- INRIA, Montpellier, France
- LIRMM, Univ Montpellier, CNRS, Montpellier, France
| |
Collapse
|
14
|
Kruger SE, Lorah PA, Okamoto KW. Mapping climate change's impact on cholera infection risk in Bangladesh. PLOS GLOBAL PUBLIC HEALTH 2022; 2:e0000711. [PMID: 36962590 PMCID: PMC10021506 DOI: 10.1371/journal.pgph.0000711] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Accepted: 09/10/2022] [Indexed: 03/26/2023]
Abstract
Several studies have investigated how Vibrio cholerae infection risk changes with increased rainfall, temperature, and water pH levels for coastal Bangladesh, which experiences seasonal surges in cholera infections associated with heavy rainfall events. While coastal environmental conditions are understood to influence V. cholerae propagation within brackish waters and transmission to and within human populations, it remains unknown how changing climate regimes impact the risk for cholera infection throughout Bangladesh. To address this, we developed a random forest species distribution model to predict the occurrence probability of cholera incidence within Bangladesh for 2015 and 2050. We developed a random forest model trained on cholera incidence data and spatial environmental raster data to be predicted to environmental data for the year of training (2015) and 2050. From our model's predictions, we generated risk maps for cholera occurrence for 2015 and 2050. Our best-fitting model predicted cholera occurrence given elevation and distance to water. Generally, we find that regions within every district in Bangladesh experience an increase in infection risk from 2015 to 2050. We also find that although cells of high risk cluster along the coastline predominantly in 2015, by 2050 high-risk areas expand from the coast inland, conglomerating around surface waters across Bangladesh, reaching all but the northwestern-most district. Mapping the geographic distribution of cholera infections given projected environmental conditions provides a valuable tool for guiding proactive public health policy tailored to areas most at risk of future disease outbreaks.
Collapse
Affiliation(s)
- Sophia E Kruger
- Department of Biology, University of St. Thomas, St. Paul, Minnesota, United States of America
- School of Public Health, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Paul A Lorah
- Department of Earth, Environment and Society, University of St. Thomas, St. Paul, Minnesota, United States of America
| | - Kenichi W Okamoto
- Department of Biology, University of St. Thomas, St. Paul, Minnesota, United States of America
| |
Collapse
|
15
|
Deneu B, Joly A, Bonnet P, Servajean M, Munoz F. Very High Resolution Species Distribution Modeling Based on Remote Sensing Imagery: How to Capture Fine-Grained and Large-Scale Vegetation Ecology With Convolutional Neural Networks? FRONTIERS IN PLANT SCIENCE 2022; 13:839279. [PMID: 35599901 PMCID: PMC9122285 DOI: 10.3389/fpls.2022.839279] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/19/2021] [Accepted: 03/22/2022] [Indexed: 05/22/2023]
Abstract
Species Distribution Models (SDMs) are fundamental tools in ecology for predicting the geographic distribution of species based on environmental data. They are also very useful from an application point of view, whether for the implementation of conservation plans for threatened species or for monitoring invasive species. The generalizability and spatial accuracy of an SDM depend very strongly on the type of model used and the environmental data used as explanatory variables. In this article, we study a country-wide species distribution model based on very high resolution (VHR) (1 m) remote sensing images processed by a convolutional neural network. We demonstrate that this model can capture landscape and habitat information at very fine spatial scales while providing overall better predictive performance than conventional models. Moreover, to demonstrate the ecological significance of the model, we propose an original analysis based on the t-distributed Stochastic Neighbor Embedding (t-SNE) dimension reduction technique. It allows visualizing the relation between input data and species traits or environment learned by the model as well as conducting some statistical tests verifying them. We also analyze the spatial mapping of the t-SNE dimensions at both national and local levels, showing the model benefit of automatically learning environmental variation at multiple scales.
Collapse
Affiliation(s)
- Benjamin Deneu
- Inria, Montpellier, France
- UMR LIRMM, Université de Montpellier, Montpellier, France
- UMR AMAP, Université de Montpellier, Cirad, CNRS, INRAE, IRD, Montpellier, France
- *Correspondence: Benjamin Deneu
| | - Alexis Joly
- Inria, Montpellier, France
- UMR LIRMM, Université de Montpellier, Montpellier, France
| | - Pierre Bonnet
- UMR AMAP, Université de Montpellier, Cirad, CNRS, INRAE, IRD, Montpellier, France
- Cirad, Montpellier, France
| | - Maximilien Servajean
- UMR LIRMM, Université de Montpellier, Montpellier, France
- Université Paul Valéry, Montpellier, France
| | | |
Collapse
|