1
|
Bemmelen JV, Smyth DS, Baaijens JA. Amplidiff: an optimized amplicon sequencing approach to estimating lineage abundances in viral metagenomes. BMC Bioinformatics 2024; 25:126. [PMID: 38521945 PMCID: PMC10960382 DOI: 10.1186/s12859-024-05735-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Accepted: 03/08/2024] [Indexed: 03/25/2024] Open
Abstract
BACKGROUND Metagenomic profiling algorithms commonly rely on genomic differences between lineages, strains, or species to infer the relative abundances of sequences present in a sample. This observation plays an important role in the analysis of diverse microbial communities, where targeted sequencing of 16S and 18S rRNA, both well-known hypervariable genomic regions, have led to insights into microbial diversity and the discovery of novel organisms. However, the variable nature of discriminatory regions can also act as a double-edged sword, as the sought-after variability can make it difficult to design primers for their amplification through PCR. Moreover, the most variable regions are not necessarily the most informative regions for the purpose of differentiation; one should focus on regions that maximize the number of lineages that can be distinguished. RESULTS Here we present AmpliDiff, a computational tool that simultaneously finds highly discriminatory genomic regions in viral genomes of a single species, as well as primers allowing for the amplification of these regions. We show that regions and primers found by AmpliDiff can be used to accurately estimate relative abundances of SARS-CoV-2 lineages, for example in wastewater sequencing data. We obtain errors that are comparable with using whole genome information to estimate relative abundances. Furthermore, our results show that AmpliDiff is robust against incomplete input data and that primers designed by AmpliDiff also bind to genomes sampled months after the primers were selected. CONCLUSIONS With AmpliDiff we provide an effective, cost-efficient alternative to whole genome sequencing for estimating lineage abundances in viral metagenomes.
Collapse
Affiliation(s)
- Jasper van Bemmelen
- Intelligent Systems Department, Delft University of Technology, Delft, Netherlands
| | - Davida S Smyth
- Department of Natural Sciences, Texas A &M University-San Antonio, San Antonio, TX, USA
| | - Jasmijn A Baaijens
- Intelligent Systems Department, Delft University of Technology, Delft, Netherlands.
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
2
|
Yan A, Baricordi C, Nguyen Q, Barbarossa L, Loperfido M, Biasco L. IS-Seq: a bioinformatics pipeline for integration sites analysis with comprehensive abundance quantification methods. BMC Bioinformatics 2023; 24:286. [PMID: 37464281 DOI: 10.1186/s12859-023-05390-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Accepted: 06/16/2023] [Indexed: 07/20/2023] Open
Abstract
BACKGROUND Integration site (IS) analysis is a fundamental analytical platform for evaluating the safety and efficacy of viral vector based preclinical and clinical Gene Therapy (GT). A handful of groups have developed standardized bioinformatics pipelines to process IS sequencing data, to generate reports, and/or to perform comparative studies across different GT trials. Keeping up with the technological advances in the field of IS analysis, different computational pipelines have been published over the past decade. These pipelines focus on identifying IS from single-read sequencing or paired-end sequencing data either using read-based or using sonication fragment-based methods, but there is a lack of a bioinformatics tool that automatically includes unique molecular identifiers (UMI) for IS abundance estimations and allows comparing multiple quantification methods in one integrated pipeline. RESULTS Here we present IS-Seq a bioinformatics pipeline that can process data from paired-end sequencing of both old restriction sites-based IS collection methods and new sonication-based IS retrieval systems while allowing the selection of different abundance estimation methods, including read-based, Fragment-based and UMI-based systems. CONCLUSIONS We validated the performance of IS-Seq by testing it against the most popular analytical workflow available in the literature (INSPIIRED) and using different scenarios. Lastly, by performing extensive simulation studies and a comprehensive wet-lab assessment of our IS-Seq pipeline we could show that in clinically relevant scenarios, UMI quantification provides better accuracy than the currently most widely used sonication fragment counts as a method for IS abundance estimation.
Collapse
Affiliation(s)
| | | | | | | | | | - Luca Biasco
- AVROBIO, Inc., Cambridge, MA, USA.
- Infection, Immunity and Inflammation Department, Great Ormond Street Institute of Child Health, University College London, London, UK.
| |
Collapse
|
3
|
Jo TS. Pooling of intra-site measurements inflates variability of the correlation between environmental DNA concentration and organism abundance. Environ Monit Assess 2023; 195:936. [PMID: 37436641 DOI: 10.1007/s10661-023-11539-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Accepted: 06/19/2023] [Indexed: 07/13/2023]
Abstract
Environmental DNA (eDNA) analysis can promote efficient ecosystem monitoring and resource management. However, limited knowledge of the factors affecting the relationship between eDNA concentration and organism abundance causes uncertainty in relative abundance estimates based on eDNA concentration. Pooling of data points obtained from multiple locations within a site has been used to mitigate intra-site variation in eDNA and abundance estimates, but decreases the sample size used for estimating the relationship. I here assessed how the pooling of intra-site measurements of eDNA concentration and organism abundance impacted the reliability of the correlative relationship between eDNA concentration and organism abundance. Mathematical models were developed to simulate measurements of eDNA concentrations and organism abundances from multiple locations in a given survey site, and the CVs (coefficient of variability) of the correlations were compared depending on whether data points from different locations were individually treated or pooled. Although the mean and median values of the correlation coefficients were similar between the scenarios, the CVs of the simulated correlations were substantially higher under the pooled scenario than the individual scenario. Additionally, I re-analyzed two empirical studies conducted in lakes, both showing higher CVs of the correlations by pooling intra-site measurements. This study suggests that it would make eDNA-based abundance estimation more reliable and reproducible to individually analyze target eDNA concentrations and organism abundance estimates.
Collapse
Affiliation(s)
- Toshiaki S Jo
- Research Fellow of Japan Society for the Promotion of Science, 5-3-1 Kojimachi, Chiyoda-ku, Tokyo, 102-0083, Japan.
- Ryukoku Center for Biodiversity Science, 1-5, Yokotani, Oe-cho, Seta, Otsu City, Shiga, 520-2194, Japan.
- Faculty of Advanced Science and Technology, Ryukoku University, 1-5, Yokotani, Oe-cho, Seta, Otsu City, Shiga, 520-2194, Japan.
| |
Collapse
|
4
|
Ferguson JM, Jiménez L, Keyes AA, Hilding A, McCartney MA, St. Clair K, Johnson DH, Fieberg JR. A comparison of survey method efficiencies for estimating densities of zebra mussels ( Dreissena polymorpha). PeerJ 2023; 11:e15528. [PMID: 37456873 PMCID: PMC10340101 DOI: 10.7717/peerj.15528] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Accepted: 05/19/2023] [Indexed: 07/18/2023] Open
Abstract
Abundance surveys are commonly used to estimate plant or animal densities and frequently require estimating detection probabilities to account for imperfect detection. The estimation of detection probabilities requires additional measurements that take time, potentially reducing the efficiency of the survey when applied to high-density populations. We conducted quadrat, removal, and distance surveys of zebra mussels (Dreissena polymorpha) in three central Minnesota lakes and determined how much survey effort would be required to achieve a pre-specified level of precision for each abundance estimator, allowing us to directly compare survey design efficiencies across a range of conditions. We found that the required sampling effort needed to achieve our precision goal depended on both the survey design and population density. At low densities, survey designs that could cover large areas but with lower detection probabilities, such as distance surveys, were more efficient (i.e., required less sampling effort to achieve the same level of precision). However, at high densities, quadrat surveys, which tend to cover less area but with high detection rates, were more efficient. These results demonstrate that the best survey design is likely to be context-specific, requiring some prior knowledge of the underlying population density and the cost/time needed to collect additional information for estimating detection probabilities.
Collapse
Affiliation(s)
- Jake M. Ferguson
- School of Life Sciences, University of Hawaii at Manoa, Honolulu, Hawaii, United States
- Department of Fisheries, Wildlife, and Conservation Biology, University of Minnesota—Twin Cities Campus, St Paul, Minnesota, United States
- Minnesota Aquatic Invasive Species Research Center, University of Minnesota—Twin Cities Campus, St Paul, Minnesota, United States
| | - Laura Jiménez
- School of Life Sciences, University of Hawaii at Manoa, Honolulu, Hawaii, United States
| | - Aislyn A. Keyes
- Department of Ecology and Evoluationary Biology, University of Colorado at Boulder, Boulder, Colorado, United States
| | - Austen Hilding
- Minnesota Aquatic Invasive Species Research Center, University of Minnesota—Twin Cities Campus, St Paul, Minnesota, United States
| | - Michael A. McCartney
- Minnesota Aquatic Invasive Species Research Center, University of Minnesota—Twin Cities Campus, St Paul, Minnesota, United States
| | - Katherine St. Clair
- Department of Mathematics, Carleton College, Northfield, Minnesota, United States
| | - Douglas H. Johnson
- Department of Fisheries, Wildlife, and Conservation Biology, University of Minnesota—Twin Cities Campus, St Paul, Minnesota, United States
| | - John R. Fieberg
- Department of Fisheries, Wildlife, and Conservation Biology, University of Minnesota—Twin Cities Campus, St Paul, Minnesota, United States
- Minnesota Aquatic Invasive Species Research Center, University of Minnesota—Twin Cities Campus, St Paul, Minnesota, United States
| |
Collapse
|
5
|
Iwaszkiewicz-Eggebrecht E, Granqvist E, Buczek M, Prus M, Kudlicka J, Roslin T, Tack AJ, Andersson AF, Miraldo A, Ronquist F, Łukasik P. Optimizing insect metabarcoding using replicated mock communities. Methods Ecol Evol 2023; 14:1130-1146. [PMID: 37876735 PMCID: PMC10593422 DOI: 10.1111/2041-210x.14073] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2022] [Accepted: 01/20/2023] [Indexed: 10/26/2023]
Abstract
1: Metabarcoding (high-throughput sequencing of marker gene amplicons) has emerged as a promising and cost-effective method for characterizing insect community samples. Yet, the methodology varies greatly among studies and its performance has not been systematically evaluated to date. In particular, it is unclear how accurately metabarcoding can resolve species communities in terms of presence-absence, abundances, and biomass. 2: Here we use mock community experiments and a simple probabilistic model to evaluate the effect of different DNA extraction protocols on metabarcoding performance. Specifically, we ask four questions: (Q1) How consistent are the recovered community profiles across replicate mock communities?; (Q2) How does the choice of lysis buffer affect the recovery of the original community?; (Q3) How are community estimates affected by differing lysis times and homogenization?; and (Q4) Is it possible to obtain adequate species abundance estimates through the use of biological spike-ins? 3: We show that estimates are quite variable across community replicates. In general, a mild lysis protocol is better at reconstructing species lists and approximate counts, while homogenization is better at retrieving biomass composition. Small insects are more likely to be detected in lysates, while some tough species require homogenization to be detected. Results are less consistent across biological replicates for lysates than for homogenates. Some species are associated with strong PCR amplification bias, which complicates the reconstruction of species counts. Yet, with adequate spike-in data, species abundance can be determined with roughly 40% standard error for homogenates, and with roughly 50% standard error for lysates, under ideal conditions. In the latter case, however, this often requires species-specific reference data, while spike-in data generalizes better across species for homogenates. 4: We conclude that a non-destructive, mild lysis approach shows the highest promise for presence/absence description of the community, while also allowing future morphological or molecular work on the material. However, homogenization protocols perform better for characterizing community composition, in particular in terms of biomass.
Collapse
Affiliation(s)
| | - Emma Granqvist
- Department of Bioinformatics and Genetics, Swedish Museum of Natural History, Box 50007, SE-104 05 Stockholm, Sweden
| | - Mateusz Buczek
- Institute of Environmental Sciences, Faculty of Biology, Jagiellonian University, ul. Gronostajowa 7, 30-387 Kraków, Poland
| | - Monika Prus
- Institute of Environmental Sciences, Faculty of Biology, Jagiellonian University, ul. Gronostajowa 7, 30-387 Kraków, Poland
| | - Jan Kudlicka
- Department of Data Science and Analytics, BI Norwegian Business School, NO-0442 Oslo, Norway
| | - Tomas Roslin
- Department of Ecology; Box 7044, Swedish University of Agricultural Sciences, SE-750 07 Uppsala, Sweden
| | - Ayco J.M. Tack
- Department of Ecology, Environment and Plant Sciences, Stockholm University, SE-114 18 Stockholm, Sweden
| | - Anders F. Andersson
- KTH Royal Institute of Technology, Science for Life Laboratory, Department of Gene Technology, School of Engineering Sciences in Chemistry, Biotechnology and Health, Stockholm, Sweden
| | - Andreia Miraldo
- Department of Bioinformatics and Genetics, Swedish Museum of Natural History, Box 50007, SE-104 05 Stockholm, Sweden
| | - Fredrik Ronquist
- Department of Bioinformatics and Genetics, Swedish Museum of Natural History, Box 50007, SE-104 05 Stockholm, Sweden
| | - Piotr Łukasik
- Department of Bioinformatics and Genetics, Swedish Museum of Natural History, Box 50007, SE-104 05 Stockholm, Sweden
- Institute of Environmental Sciences, Faculty of Biology, Jagiellonian University, ul. Gronostajowa 7, 30-387 Kraków, Poland
| |
Collapse
|
6
|
Jo TS, Tsuri K, Yamanaka H. Can nuclear aquatic environmental DNA be a genetic marker for the accurate estimation of species abundance? Naturwissenschaften 2022; 109:38. [PMID: 35861927 DOI: 10.1007/s00114-022-01808-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2021] [Revised: 06/22/2022] [Accepted: 07/11/2022] [Indexed: 12/19/2022]
Abstract
Environmental DNA (eDNA) analysis is a promising tool for the sensitive and effective monitoring of species distribution and abundance. Traditional eDNA analysis has targeted mitochondrial DNA (mtDNA) fragments due to their abundance in cells; however, the quantification may vary depending on cell type and physiology. Conversely, some recent eDNA studies have targeted multi-copy nuclear DNA (nuDNA) fragments, such as ribosomal RNA genes, in water, and reported a higher detectability and more rapid degradation than mitochondrial eDNA (mt-eDNA). These properties suggest that nuclear eDNA (nu-eDNA) may be useful for the accurate estimation of species abundance relative to mt-eDNA, but which remains unclear. In this study, we compiled previous studies and re-analyzed the relationships between mt- and nu-eDNA concentration and species abundance by comparing the R2 values of the linear regression. We then performed an aquarium experiment using zebrafish (Danio rerio) to compare the relationships across genetic regions, including single-copy nuDNA. We found more accurate relationships between multi-copy nu-eDNA and species abundance than mt-eDNA in these datasets, although the difference was not significant upon weighted-averaging the R2 values. Moreover, we compared the decay rate constants of zebrafish eDNA across genetic regions and found that multi-copy nu-eDNA degraded faster than mt-eDNA under pH 7, implying a quick turnover of multi-copy nu-eDNA in the field. Although further empirical studies of nu-eDNA applications are necessary to support our findings, this study provides the groundwork for improving the estimation accuracy of species abundance via eDNA analysis.
Collapse
|
7
|
Houa NA, Cappelle N, Bitty EA, Normand E, Kablan YA, Boesch C. Animal reactivity to camera traps and its effects on abundance estimate using distance sampling in the Taï National Park, Côte d'Ivoire. PeerJ 2022; 10:e13510. [PMID: 35651744 PMCID: PMC9150689 DOI: 10.7717/peerj.13510] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2022] [Accepted: 05/06/2022] [Indexed: 01/17/2023] Open
Abstract
The use of camera traps (CTs) has become an increasingly popular method of studying wildlife, as CTs are able to detect rare, nocturnal, and elusive species in remote and difficult-to-access areas. It thus makes them suited to estimate animal density and abundance, identify activity patterns and new behaviours of animals. However, animals can react when they see the CTs and this can lead to bias in the animal population estimates. While CTs may provide many advantages, an improved understanding of their impacts on individual's behaviour is necessary to avoid erroneous density estimates. Yet, the impact of CTs on detected individuals, such as human odour near the device and the environment, or the infrared illumination, has received relatively little attention. To date, there is no clear procedure to remove this potential bias. Here, we use camera trap distance sampling (CTDS) to (1) quantify the bias resulting from the different animal responses to the CTs when determining animal density and abundance, and (2) test if olfactory, visual and auditory signals have an influence on the animals' reaction to CTs. Between March 2019 and March 2020, we deployed CTs at 267 locations distributed systematically over the entire Taï National Park. We obtained 58,947 videos from which we analysed four medium- to-large-bodied species (Maxwell's duiker (Philantomba maxwellii), Jentink's duiker (Cephalophus jentinki), pygmy hippopotamus (Choeropsis liberiensis) and Western chimpanzee (Pan troglodytes verus)) displaying different behaviours towards the CTs. We then established species-specific ethograms describing the behavioural responses to the CTs. Using these species-specific responses, we observed that the Maxwell's duiker reacted weakly to CTs (about 0.11% of the distance data), contrary to Jentink's duiker, pygmy hippopotamus and Western chimpanzee which reacted with relatively high frequencies, representing 32.82%, 52.96% and 16.14% of the distance data, respectively. Not taking into account the species-specific responses to the CTs can lead to an artificial doubling or tripling of the populations' sizes. All species reacted more to the CTs at close distances. Besides, the Jentink's duiker and the pygmy hippopotamus reacted significantly more to the CTs at night than during the day. Finally, as for olfactory signals, the probability of reaction to the CTs during the first days after CTs installation was weak in Maxwell's duiker, but concerned 18% of the video captures in Western chimpanzees which decreasing with time, but they remained high in pygmy hippopotamus and Jentink's duiker (65% and 70% of the video captures respectively). Careful consideration should be given to animal's response to CTs during the analysis and in the field, by reducing human's impact around the CTs installation.
Collapse
Affiliation(s)
- Noël Adiko Houa
- Unité de Formation et de Recherches Biosciences, Université Felix Houphouët-Boigny, Abidjan, Côte d’Ivoire,Wild Chimpanzee Foundation, Abidjan, Côte d’Ivoire
| | | | - Eloi Anderson Bitty
- Unité de Formation et de Recherches Biosciences, Université Felix Houphouët-Boigny, Abidjan, Côte d’Ivoire,Centre Suisse de Recherches Scientifiques, Abidjan, Côte d’Ivoire
| | | | | | - Christophe Boesch
- Wild Chimpanzee Foundation, Abidjan, Côte d’Ivoire,Department of Primatology, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
| |
Collapse
|
8
|
Commander CJC, Barnett LAK, Ward EJ, Anderson SC, Essington TE. The shadow model: how and why small choices in spatially explicit species distribution models affect predictions. PeerJ 2022; 10:e12783. [PMID: 35186453 PMCID: PMC8852273 DOI: 10.7717/peerj.12783] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2021] [Accepted: 12/21/2021] [Indexed: 01/10/2023] Open
Abstract
The use of species distribution models (SDMs) has rapidly increased over the last decade, driven largely by increasing observational evidence of distributional shifts of terrestrial and aquatic populations. These models permit, for example, the quantification of range shifts, the estimation of species co-occurrence, and the association of habitat to species distribution and abundance. The increasing complexity of contemporary SDMs presents new challenges-as the choices among modeling options increase, it is essential to understand how these choices affect model outcomes. Using a combination of original analysis and literature review, we synthesize the effects of three common model choices in semi-parametric predictive process species distribution modeling: model structure, spatial extent of the data, and spatial scale of predictions. To illustrate the effects of these choices, we develop a case study centered around sablefish (Anoplopoma fimbria) distribution on the west coast of the USA. The three modeling choices represent decisions necessary in virtually all ecological applications of these methods, and are important because the consequences of these choices impact derived quantities of interest (e.g., estimates of population size and their management implications). Truncating the spatial extent of data near the observed range edge, or using a model that is misspecified in terms of covariates and spatial and spatiotemporal fields, led to bias in population biomass trends and mean distribution compared to estimates from models using the full dataset and appropriate model structure. In some cases, these suboptimal modeling decisions may be unavoidable, but understanding the tradeoffs of these choices and impacts on predictions is critical. We illustrate how seemingly small model choices, often made out of necessity or simplicity, can affect scientific advice informing management decisions-potentially leading to erroneous conclusions about changes in abundance or distribution and the precision of such estimates. For example, we show how incorrect decisions could cause overestimation of abundance, which could result in management advice resulting in overfishing. Based on these findings and literature gaps, we outline important frontiers in SDM development.
Collapse
Affiliation(s)
- Christian J. C. Commander
- Department of Biological Science, Florida State University, Tallahassee, Florida, United States of America,School of Aquatic and Fishery Sciences, University of Washington, Seattle, Washington, United States
| | - Lewis A. K. Barnett
- Resource Assessment and Conservation Engineering Division, Alaska Fisheries Science Center, National Marine Fisheries Service, NOAA, Seattle, Washington, United States
| | - Eric J. Ward
- Conservation Biology Division, Northwest Fisheries Science Center, National Marine Fisheries Service, NOAA, Seattle, Washington, United States
| | - Sean C. Anderson
- Pacific Biological Station, Fisheries and Oceans Canada, Nanaimo, British Columbia, Canada
| | - Timothy E. Essington
- School of Aquatic and Fishery Sciences, University of Washington, Seattle, Washington, United States
| |
Collapse
|
9
|
Miller DL, Fifield D, Wakefield E, Sigourney DB. Extending density surface models to include multiple and double-observer survey data. PeerJ 2021; 9:e12113. [PMID: 34557355 PMCID: PMC8418794 DOI: 10.7717/peerj.12113] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2021] [Accepted: 08/14/2021] [Indexed: 11/30/2022] Open
Abstract
Spatial models of density and abundance are widely used in both ecological research (e.g., to study habitat use) and wildlife management (e.g., for population monitoring and environmental impact assessment). Increasingly, modellers are tasked with integrating data from multiple sources, collected via different observation processes. Distance sampling is an efficient and widely used survey and analysis technique. Within this framework, observation processes are modelled via detection functions. We seek to take multiple data sources and fit them in a single spatial model. Density surface models (DSMs) are a two-stage approach: first accounting for detectability via distance sampling methods, then modelling distribution via a generalized additive model. However, current software and theory does not address the issue of multiple data sources. We extend the DSM approach to accommodate data from multiple surveys, collected via conventional distance sampling, double-observer distance sampling (used to account for incomplete detection at zero distance) and strip transects. Variance propagation ensures that uncertainty is correctly accounted for in final estimates of abundance. Methods described here are implemented in the dsm R package. We briefly analyse two datasets to illustrate these new developments. Our new methodology enables data from multiple distance sampling surveys of different types to be treated in a single spatial model, enabling more robust abundance estimation, potentially over wider geographical or temporal domains.
Collapse
Affiliation(s)
- David L Miller
- Centre for Research into Ecological and Environmental Modelling and School of Mathematics and Statistics, University of St Andrews, St Andrews, Scotland
| | - David Fifield
- Wildlife Research Division, Science and Technology Branch, Environment and Climate Change Canada, Mount Pearl, NL, Canada
| | - Ewan Wakefield
- Institute of Biodiversity Animal Health and Comparative Medicine, University of Glasgow, Glasgow, Scotland
| | | |
Collapse
|
10
|
Li M, Bai M, Wu Y, Shao W, Zheng L, Sun L, Wang S, Yu C, Huang Y. AGTAR: A novel approach for transcriptome assembly and abundance estimation using an adapted genetic algorithm from RNA-seq data. Comput Biol Med 2021; 135:104646. [PMID: 34274894 DOI: 10.1016/j.compbiomed.2021.104646] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2021] [Revised: 06/20/2021] [Accepted: 07/07/2021] [Indexed: 11/25/2022]
Abstract
BACKGROUND Recently, the rapid development of RNA-seq technologies has accelerated transcriptomics research. The accurate identification and quantification of transcripts based on RNA-seq data will facilitate the exploration of various potential biological mechanisms. However, due to the limitations of the current data analysis tools and RNA-seq technologies, full and accurate reconstruction of the transcriptome still faces many challenges. RESULTS We developed the adapted genetic algorithm (AGTAR) program, which can reliably assemble transcriptomes and estimate abundance based on RNA-seq data with or without genome annotation files. We defined a new concept, isoform junction abundance, to help enhance the accuracy of isoform identification and quantification. Isoform abundance and isoform junction abundance are estimated by an adapted genetic algorithm. The crossover and mutation probabilities of the algorithm can be adaptively adjusted to effectively prevent premature convergence. Both simulated and real data indicated that AGTAR's comprehensive ability to assemble transcripts is significantly superior to that achievable by the currently widely used tools with similar functions. CONCLUSIONS AGTAR is a tool for identifying and quantifying transcripts from RNA-seq data. It has the advantages of higher accuracy and ease of use. The AGTAR package is freely available at https://github.com/v4yuezi/AGTAR.git.
Collapse
Affiliation(s)
- Mingyue Li
- National Engineering Laboratory for Druggable Gene and Protein Screening, Northeast Normal University, Changchun, 130024, China
| | - Miao Bai
- National Engineering Laboratory for Druggable Gene and Protein Screening, Northeast Normal University, Changchun, 130024, China
| | - Yulun Wu
- National Engineering Laboratory for Druggable Gene and Protein Screening, Northeast Normal University, Changchun, 130024, China
| | - Wenjun Shao
- National Engineering Laboratory for Druggable Gene and Protein Screening, Northeast Normal University, Changchun, 130024, China
| | - Lihua Zheng
- Research Center of Agriculture and Medicine Gene Engineering of Ministry of Education, Northeast Normal University, Changchun, 130024, China
| | - Luguo Sun
- National Engineering Laboratory for Druggable Gene and Protein Screening, Northeast Normal University, Changchun, 130024, China
| | - Shuyue Wang
- National Engineering Laboratory for Druggable Gene and Protein Screening, Northeast Normal University, Changchun, 130024, China
| | - Chunlei Yu
- Research Center of Agriculture and Medicine Gene Engineering of Ministry of Education, Northeast Normal University, Changchun, 130024, China
| | - Yanxin Huang
- National Engineering Laboratory for Druggable Gene and Protein Screening, Northeast Normal University, Changchun, 130024, China.
| |
Collapse
|
11
|
LaPierre N, Alser M, Eskin E, Koslicki D, Mangul S. Metalign: efficient alignment-based metagenomic profiling via containment min hash. Genome Biol 2020; 21:242. [PMID: 32912225 PMCID: PMC7488264 DOI: 10.1186/s13059-020-02159-0] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2019] [Accepted: 08/26/2020] [Indexed: 12/31/2022] Open
Abstract
Metagenomic profiling, predicting the presence and relative abundances of microbes in a sample, is a critical first step in microbiome analysis. Alignment-based approaches are often considered accurate yet computationally infeasible. Here, we present a novel method, Metalign, that performs efficient and accurate alignment-based metagenomic profiling. We use a novel containment min hash approach to pre-filter the reference database prior to alignment and then process both uniquely aligned and multi-aligned reads to produce accurate abundance estimates. In performance evaluations on both real and simulated datasets, Metalign is the only method evaluated that maintained high performance and competitive running time across all datasets.
Collapse
Affiliation(s)
- Nathan LaPierre
- Department of Computer Science, University of California, Los Angeles, CA, 90095, USA.
| | - Mohammed Alser
- Department of Computer Science, ETH Zurich, Rämistrasse 101, CH-8092, Zurich, Switzerland
| | - Eleazar Eskin
- Department of Computer Science, University of California, Los Angeles, CA, 90095, USA
- Department of Computational Medicine, University of California, Los Angeles, CA, 90095, USA
- Department of Human Genetics, University of California, Los Angeles, CA, 90095, USA
| | - David Koslicki
- Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA, USA.
- Department of Biology, The Pennsylvania State University, University Park, PA, USA.
- Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park,, PA, USA.
| | - Serghei Mangul
- Department of Clinical Pharmacy, University of Southern California, Los Angeles, CA, 90089, USA.
| |
Collapse
|
12
|
Fukaya K, Murakami H, Yoon S, Minami K, Osada Y, Yamamoto S, Masuda R, Kasai A, Miyashita K, Minamoto T, Kondoh M. Estimating fish population abundance by integrating quantitative data on environmental DNA and hydrodynamic modelling. Mol Ecol 2020; 30:3057-3067. [PMID: 32608023 DOI: 10.1111/mec.15530] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2020] [Revised: 06/03/2020] [Accepted: 06/23/2020] [Indexed: 01/03/2023]
Abstract
Molecular analysis of DNA left in the environment, known as environmental DNA (eDNA), has proven to be a powerful and cost-effective approach to infer occurrence of species. Nonetheless, relating measurements of eDNA concentration to population abundance remains difficult because detailed knowledge on the processes that govern spatial and temporal distribution of eDNA should be integrated to reconstruct the underlying distribution and abundance of a target species. In this study, we propose a general framework of abundance estimation for aquatic systems on the basis of spatially replicated measurements of eDNA. The proposed method explicitly accounts for production, transport and degradation of eDNA by utilizing numerical hydrodynamic models that can simulate the distribution of eDNA concentrations within an aquatic area. It turns out that, under certain assumptions, population abundance can be estimated via a Bayesian inference of a generalized linear model. Application to a Japanese jack mackerel (Trachurus japonicus) population in Maizuru Bay revealed that the proposed method gives an estimate of population abundance comparable to that of a quantitative echo sounder method. Furthermore, the method successfully identified a source of exogenous input of eDNA (a fish market), which may render a quantitative application of eDNA difficult to interpret unless its effect is taken into account. These findings indicate the ability of eDNA to reliably reflect population abundance of aquatic macroorganisms; when the "ecology of eDNA" is adequately accounted for, population abundance can be quantified on the basis of measurements of eDNA concentration.
Collapse
Affiliation(s)
- Keiichi Fukaya
- National Institute for Environmental Studies, Tsukuba, Japan.,The Institute of Statistical Mathematics, Tachikawa, Japan
| | - Hiroaki Murakami
- Maizuru Fisheries Research Station, Field Science Education and Research Center, Kyoto University, Maizuru, Japan
| | - Seokjin Yoon
- Faculty of Fisheries Sciences, Hokkaido University, Hakodate, Japan
| | - Kenji Minami
- Estuary Research Center, Shimane University, Matsue, Japan
| | - Yutaka Osada
- Graduate School of Life Sciences, Tohoku University, Sendai, Japan.,National Research Institute of Fisheries Science, Japan Fisheries Research and Education Agency, Kanazawa, Yokohama, Japan
| | - Satoshi Yamamoto
- Laboratory of Animal Ecology, Department of Zoology, Graduate School of Science, Kyoto University, Kyoto, Japan
| | - Reiji Masuda
- Maizuru Fisheries Research Station, Field Science Education and Research Center, Kyoto University, Maizuru, Japan
| | - Akihide Kasai
- Faculty of Fisheries Sciences, Hokkaido University, Hakodate, Japan
| | - Kazushi Miyashita
- Field Science Center for Northern Biosphere, Hokkaido University, Hakodate, Japan
| | - Toshifumi Minamoto
- Graduate School of Human Development and Environment, Kobe University, Kobe, Japan
| | - Michio Kondoh
- Graduate School of Life Sciences, Tohoku University, Sendai, Japan
| |
Collapse
|
13
|
LaPierre N, Mangul S, Alser M, Mandric I, Wu NC, Koslicki D, Eskin E. MiCoP: microbial community profiling method for detecting viral and fungal organisms in metagenomic samples. BMC Genomics 2019; 20:423. [PMID: 31167634 PMCID: PMC6551237 DOI: 10.1186/s12864-019-5699-9] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
Background High throughput sequencing has spurred the development of metagenomics, which involves the direct analysis of microbial communities in various environments such as soil, ocean water, and the human body. Many existing methods based on marker genes or k-mers have limited sensitivity or are too computationally demanding for many users. Additionally, most work in metagenomics has focused on bacteria and archaea, neglecting to study other key microbes such as viruses and eukaryotes. Results Here we present a method, MiCoP (Microbiome Community Profiling), that uses fast-mapping of reads to build a comprehensive reference database of full genomes from viruses and eukaryotes to achieve maximum read usage and enable the analysis of the virome and eukaryome in each sample. We demonstrate that mapping of metagenomic reads is feasible for the smaller viral and eukaryotic reference databases. We show that our method is accurate on simulated and mock community data and identifies many more viral and fungal species than previously-reported results on real data from the Human Microbiome Project. Conclusions MiCoP is a mapping-based method that proves more effective than existing methods at abundance profiling of viruses and eukaryotes in metagenomic samples. MiCoP can be used to detect the full diversity of these communities. The code, data, and documentation are publicly available on GitHub at: https://github.com/smangul1/MiCoP.
Collapse
Affiliation(s)
- Nathan LaPierre
- Department of Computer Science, University of California, Los Angeles, 90095, CA, USA
| | - Serghei Mangul
- Department of Computer Science, University of California, Los Angeles, 90095, CA, USA.
| | - Mohammed Alser
- Department of Computer Science, ETH Zürich, Zürich, 8092, Switzerland
| | - Igor Mandric
- Department of Computer Science, University of California, Los Angeles, 90095, CA, USA
| | - Nicholas C Wu
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA92037, USA
| | - David Koslicki
- Department of Mathematics, Oregon State University, Corvallis, 97331, OR, USA
| | - Eleazar Eskin
- Department of Computer Science, University of California, Los Angeles, 90095, CA, USA.,Department of Human Genetics, University of California, Los Angeles, 90095, CA, USA
| |
Collapse
|
14
|
Arona L, Dale J, Heaslip SG, Hammill MO, Johnston DW. Assessing the disturbance potential of small unoccupied aircraft systems (UAS) on gray seals ( Halichoerus grypus) at breeding colonies in Nova Scotia, Canada. PeerJ 2018; 6:e4467. [PMID: 29576950 PMCID: PMC5863716 DOI: 10.7717/peerj.4467] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2016] [Accepted: 02/16/2018] [Indexed: 11/20/2022] Open
Abstract
The use of small unoccupied aircraft systems (UAS) for ecological studies and wildlife population assessments is increasing. These methods can provide significant benefits in terms of costs and reductions in human risk, but little is known if UAS-based approaches cause disturbance of animals during operations. To address this knowledge gap, we conducted a series of UAS flights at gray seal breeding colonies on Hay and Saddle Islands in Nova Scotia, Canada. Using a small fixed-wing UAS, we assessed both immediate and short-term effects of surveys using sequential image analysis and between-flight seal counts in ten, 50 m2 random quadrats at each colony. Counts of adult gray seals and young-of-the-year animals between first and second flights revealed no changes in abundance in quadrats (matched pair t-test p > 0.69) and slopes approaching 1 for linear regression comparisons (r2 > 0.80). Sequential image analysis revealed no changes in orientation or posture of imaged animals. We also assessed the acoustic properties of the small UAS in relation to low ambient noise conditions using sound equivalent level (Leq) measurements with a calibrated U-MIK 1 and a 1/3 octave band soundscape approach. The results of Leq measurements indicate that small fixed-wing UAS are quiet, with most energy above 160 Hz, and that levels across 1/3 octave bands do not greatly exceed ambient acoustic measurements in a quiet field during operations at standard survey altitudes. As such, this platform is unlikely to acoustically disturb gray seals at breeding colonies during population surveys. The results of the present study indicate that the effects of small fixed-wing UAS on gray seals at breeding colonies are negligible, and that fixed-wing UAS-based approaches should be considered amongst best practices for assessing gray seal colonies.
Collapse
Affiliation(s)
- Lauren Arona
- Division of Marine Science and Conservation, Nicholas School of the Environment, Duke University Marine Laboratory, Beaufort, NC, United States of America
| | - Julian Dale
- Division of Marine Science and Conservation, Nicholas School of the Environment, Duke University Marine Laboratory, Beaufort, NC, United States of America
| | - Susan G Heaslip
- Division of Marine Science and Conservation, Nicholas School of the Environment, Duke University Marine Laboratory, Beaufort, NC, United States of America
| | - Michael O Hammill
- Institut Maurice-Lamontagne/Maurice Lamontagne Institute, Pêches et Océans Canada/Fisheries and Oceans Canada, Mont-Joli, QC, Canada
| | - David W Johnston
- Division of Marine Science and Conservation, Nicholas School of the Environment, Duke University Marine Laboratory, Beaufort, NC, United States of America
| |
Collapse
|
15
|
Borchers DL, Langrock R. Double-observer line transect surveys with Markov-modulated Poisson process models for animal availability. Biometrics 2015; 71:1060-9. [PMID: 26134283 DOI: 10.1111/biom.12341] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2014] [Revised: 04/01/2015] [Accepted: 04/01/2015] [Indexed: 11/30/2022]
Abstract
We develop maximum likelihood methods for line transect surveys in which animals go undetected at distance zero, either because they are stochastically unavailable while within view or because they are missed when they are available. These incorporate a Markov-modulated Poisson process model for animal availability, allowing more clustered availability events than is possible with Poisson availability models. They include a mark-recapture component arising from the independent-observer survey, leading to more accurate estimation of detection probability given availability. We develop models for situations in which (a) multiple detections of the same individual are possible and (b) some or all of the availability process parameters are estimated from the line transect survey itself, rather than from independent data. We investigate estimator performance by simulation, and compare the multiple-detection estimators with estimators that use only initial detections of individuals, and with a single-observer estimator. Simultaneous estimation of detection function parameters and availability model parameters is shown to be feasible from the line transect survey alone with multiple detections and double-observer data but not with single-observer data. Recording multiple detections of individuals improves estimator precision substantially when estimating the availability model parameters from survey data, and we recommend that these data be gathered. We apply the methods to estimate detection probability from a double-observer survey of North Atlantic minke whales, and find that double-observer data greatly improve estimator precision here too.
Collapse
Affiliation(s)
- D L Borchers
- Centre for Research into Ecological and Environmental Modelling, The Observatory, Buchanan Gardens, University of St Andrews, Fife, KY16 9LZ, Scotland
| | - R Langrock
- Centre for Research into Ecological and Environmental Modelling, The Observatory, Buchanan Gardens, University of St Andrews, Fife, KY16 9LZ, Scotland
| |
Collapse
|
16
|
Borchers DL, Stevenson BC, Kidney D, Thomas L, Marques TA. A Unifying Model for Capture-Recapture and Distance Sampling Surveys of Wildlife Populations. J Am Stat Assoc 2015; 110:195-204. [PMID: 26063947 PMCID: PMC4440664 DOI: 10.1080/01621459.2014.893884] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2013] [Revised: 02/01/2014] [Indexed: 12/05/2022]
Abstract
A fundamental problem in wildlife ecology and management is estimation of population size or density. The two dominant methods in this area are capture–recapture (CR) and distance sampling (DS), each with its own largely separate literature. We develop a class of models that synthesizes them. It accommodates a spectrum of models ranging from nonspatial CR models (with no information on animal locations) through to DS and mark-recapture distance sampling (MRDS) models, in which animal locations are observed without error. Between these lie spatially explicit capture–recapture (SECR) models that include only capture locations, and a variety of models with less location data than are typical of DS surveys but more than are normally used on SECR surveys. In addition to unifying CR and DS models, the class provides a means of improving inference from SECR models by adding supplementary location data, and a means of incorporating measurement error into DS and MRDS models. We illustrate their utility by comparing inference on acoustic surveys of gibbons and frogs using only capture locations, using estimated angles (gibbons) and combinations of received signal strength and time-of-arrival data (frogs), and on a visual MRDS survey of whales, comparing estimates with exact and estimated distances. Supplementary materials for this article are available online.
Collapse
|
17
|
Abstract
The N-mixture model is widely used to estimate the abundance of a population in the presence of unknown detection probability from only a set of counts subject to spatial and temporal replication (Royle, 2004, Biometrics 60, 105-115). We explain and exploit the equivalence of N-mixture and multivariate Poisson and negative-binomial models, which provides powerful new approaches for fitting these models. We show that particularly when detection probability and the number of sampling occasions are small, infinite estimates of abundance can arise. We propose a sample covariance as a diagnostic for this event, and demonstrate its good performance in the Poisson case. Infinite estimates may be missed in practice, due to numerical optimization procedures terminating at arbitrarily large values. It is shown that the use of a bound, K, for an infinite summation in the N-mixture likelihood can result in underestimation of abundance, so that default values of K in computer packages should be avoided. Instead we propose a simple automatic way to choose K. The methods are illustrated by analysis of data on Hermann's tortoise Testudo hermanni.
Collapse
Affiliation(s)
- Emily B Dennis
- School of Mathematics, Statistics and Actuarial Science, University of Kent, Canterbury, Kent CT2 7NF, UK
| | - Byron J T Morgan
- School of Mathematics, Statistics and Actuarial Science, University of Kent, Canterbury, Kent CT2 7NF, UK
| | - Martin S Ridout
- School of Mathematics, Statistics and Actuarial Science, University of Kent, Canterbury, Kent CT2 7NF, UK
| |
Collapse
|