1
|
Li G, Yang L, Chen J, Zhang X. Robust Differential Abundance Analysis of Microbiome Sequencing Data. Genes (Basel) 2023; 14:2000. [PMID: 38002943 PMCID: PMC10671797 DOI: 10.3390/genes14112000] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2023] [Revised: 10/20/2023] [Accepted: 10/24/2023] [Indexed: 11/26/2023] Open
Abstract
It is well known that the microbiome data are ridden with outliers and have heavy distribution tails, but the impact of outliers and heavy-tailedness has yet to be examined systematically. This paper investigates the impact of outliers and heavy-tailedness on differential abundance analysis (DAA) using the linear models for the differential abundance analysis (LinDA) method and proposes effective strategies to mitigate their influence. The presence of outliers and heavy-tailedness can significantly decrease the power of LinDA. We investigate various techniques to address outliers and heavy-tailedness, including generalizing LinDA into a more flexible framework that allows for the use of robust regression and winsorizing the data before applying LinDA. Our extensive numerical experiments and real-data analyses demonstrate that robust Huber regression has overall the best performance in addressing outliers and heavy-tailedness.
Collapse
Affiliation(s)
- Guanxun Li
- Department of Statistics, Texas A&M University, College Station, TX 77843, USA;
| | - Lu Yang
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN 55905, USA;
| | - Jun Chen
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN 55905, USA;
| | - Xianyang Zhang
- Department of Statistics, Texas A&M University, College Station, TX 77843, USA;
| |
Collapse
|
2
|
Misher C, Vats G, Vanak AT. Differential Responses of Small Mammals to Woody Encroachment in a Semi-Arid Grassland. Front Ecol Evol 2022. [DOI: 10.3389/fevo.2022.755903] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Encroachment by woody invasive plants is a major threat to grasslands and savannah ecosystems worldwide. Rodents, being primary consumers, are likely to be the first to respond to changes in the structure and composition of native vegetation. We examined the effect of an invasive shrub Prosopis juliflora (hereafter Prosopis) on the native rodent community of an arid grassland system of Western India. Our sampling plots were divided into five categories representing different stages of Prosopis invasion and other land cover types. These consisted of restored native grassland, agriculture fallow, open brushland, sparse-Prosopis plots, and Prosopis-dominated plots. We also examined the impact of woody invasion on the response of native rodents toward moonlight and temperature. As hypothesized, we found a significantly higher abundance of rodent species in the native grassland habitat compared to sparse-Prosopis habitats. However, there was no significant difference in rodent abundance and diversity between the grassland and Prosopis-dominated habitats. Thus, species richness and abundance of rodents were the highest in the restored grasslands and dense Prosopis thickets, and the lowest in the sparse Prosopis, potentially showing a “U” shaped response to Prosopis invasion. We observed a species-specific effect of Prosopis on the activity of Tatera indica, Bandicota bengalensis, and Millardia meltada. Habitat type mediated the effect of different environmental factors (moonlight and temperature) on the activity of the most commonly ocurring species T. indica while activity of M. meltada showed a weak association with environmental factors. B. bengalensis was the most generalist species showing similar activity across all habitat types. Thus, the impact of Prosopis invasion on the rodent community was uneven, and depended on species as well as on local environmental characteristics.
Collapse
|
3
|
Jones MT, Willey LL, Mays JD, Dodd CK. Wildfire, Depredation, and Synergistic Management Challenges Contribute to the Decline of a Significant Population of Florida Box Turtles (Terrapene bauri). CHELONIAN CONSERVATION AND BIOLOGY 2021. [DOI: 10.2744/ccb-1480.1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Affiliation(s)
- Michael T. Jones
- Department of Environmental Conservation, University of Massachusetts, Amherst, Massachusetts 01003 USA []
| | - Lisabeth L. Willey
- American Turtle Observatory, 90 Whitaker Road, New Salem, Massachusetts 01355 USA
| | - Jonathan D. Mays
- Fish and Wildlife Research Institute, Florida Fish and Wildlife Conservation Commission, Gainesville, Florida 32601 USA []
| | - C. Kenneth Dodd
- Division of Herpetology, Florida Museum of Natural History, University of Florida, Gainesville, Florida 32611 USA []
| |
Collapse
|
4
|
Matthias J, du Bernard S, Schillinger JA, Hong J, Pearson V, Peterman TA. Estimating Neonatal Herpes Simplex Virus Incidence and Mortality Using Capture-recapture, Florida. Clin Infect Dis 2021; 73:506-512. [PMID: 32507882 DOI: 10.1093/cid/ciaa727] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2019] [Accepted: 06/03/2020] [Indexed: 12/28/2022] Open
Abstract
BACKGROUND Neonatal herpes simplex virus infection (nHSV) leads to severe morbidity and mortality, but national incidence is uncertain. Florida regulations require that healthcare providers report cases, and clinical laboratories report test results when herpes simplex virus (HSV) is detected. We estimated nHSV incidence using laboratory-confirmed provider-reported cases and electronic laboratory reports (ELR) stored separately from provider-reported cases. Mortality was estimated using provider-reported cases, ELR, and vital statistics death records. METHODS For 2011-2017, we reviewed: provider-reported cases (infants ≤ 60 days of age with HSV infection confirmed by culture or polymerase chain reaction [PCR]), ELR of HSV-positive culture or PCR results in the same age group, and death certificates containing International Classification of Disease, Tenth Revision, codes for herpes infection: P35.2, B00.0-B00.9, and A60.0-A60.9. Provider-reported cases were matched against ELR reports. Death certificates were matched with provider and ELR reports. Chapman's capture-recapture method was used to estimate nHSV incidence and mortality. Mortality from all 3 sources was estimated using log-linear modeling. RESULTS Providers reported 114 nHSV cases, and ELR identified 197 nHSV cases. Forty-six cases were common to both datasets, leaving 265 unique nHSV reports. Chapman's estimate suggests 483 (95% confidence interval [CI], 383-634) nHSV cases occurred (31.5 infections per 100 000 live births). The nHSV deaths were reported by providers (n = 9), ELR (n = 18), and vital statistics (n = 31), totaling 34 unique reports. Log-linear modeling estimates 35.8 fatal cases occurred (95% CI, 34-40). CONCLUSIONS Chapman's estimates using data collected over 7 years in Florida conclude nHSV infections occurred at a rate of 1 per 3000 live births.
Collapse
Affiliation(s)
- James Matthias
- Centers for Disease Control and Prevention, Atlanta, Georgia, USA.,Florida Department of Health, Tallahassee, Florida, USA
| | | | - Julia A Schillinger
- Centers for Disease Control and Prevention, Atlanta, Georgia, USA.,New York City Department of Health and Mental Hygiene, New York City, New York, USA
| | - Jaeyoung Hong
- Centers for Disease Control and Prevention, Atlanta, Georgia, USA
| | | | | |
Collapse
|
5
|
On the estimation of population sizes in capture–recapture experiments. J MULTIVARIATE ANAL 2019. [DOI: 10.1016/j.jmva.2019.04.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
6
|
Yauck M, Rivest LP, Rothman G. Capture-Recapture Methods for Data on the Activation of Applications on Mobile Phones. J Am Stat Assoc 2019. [DOI: 10.1080/01621459.2018.1469991] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
Affiliation(s)
- Mamadou Yauck
- Department of Mathematics and Statistics, Université Laval, Quebec City, QC, Canada
| | - Louis-Paul Rivest
- Department of Mathematics and Statistics, Université Laval, Quebec City, QC, Canada
| | | |
Collapse
|
7
|
Affiliation(s)
- Kyle Vincent
- Currency Department, Bank of Canada, Ottawa, Canada
| | - Steve Thompson
- Department of Statistics and Actuarial Science, Simon Fraser University, Burnaby, British Columbia, Canada
| |
Collapse
|
8
|
Braeye T, Verheagen J, Mignon A, Flipse W, Pierard D, Huygen K, Schirvel C, Hens N. Capture-Recapture Estimators in Epidemiology with Applications to Pertussis and Pneumococcal Invasive Disease Surveillance. PLoS One 2016; 11:e0159832. [PMID: 27529167 PMCID: PMC4987016 DOI: 10.1371/journal.pone.0159832] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2015] [Accepted: 07/08/2016] [Indexed: 11/17/2022] Open
Abstract
Introduction Surveillance networks are often not exhaustive nor completely complementary. In such situations, capture-recapture methods can be used for incidence estimation. The choice of estimator and their robustness with respect to the homogeneity and independence assumptions are however not well documented. Methods We investigated the performance of five different capture-recapture estimators in a simulation study. Eight different scenarios were used to detect and combine case-information. The scenarios increasingly violated assumptions of independence of samples and homogeneity of detection probabilities. Belgian datasets on invasive pneumococcal disease (IPD) and pertussis provided motivating examples. Results No estimator was unbiased in all scenarios. Performance of the parametric estimators depended on how much of the dependency and heterogeneity were correctly modelled. Model building was limited by parameter estimability, availability of additional information (e.g. covariates) and the possibilities inherent to the method. In the most complex scenario, methods that allowed for detection probabilities conditional on previous detections estimated the total population size within a 20–30% error-range. Parametric estimators remained stable if individual data sources lost up to 50% of their data. The investigated non-parametric methods were more susceptible to data loss and their performance was linked to the dependence between samples; overestimating in scenarios with little dependence, underestimating in others. Issues with parameter estimability made it impossible to model all suggested relations between samples for the IPD and pertussis datasets. For IPD, the estimates for the Belgian incidence for cases aged 50 years and older ranged from 44 to58/100,000 in 2010. The estimates for pertussis (all ages, Belgium, 2014) ranged from 24.2 to30.8/100,000. Conclusion We encourage the use of capture-recapture methods, but epidemiologists should preferably include datasets for which the underlying dependency structure is not too complex, a priori investigate this structure, compensate for it within the model and interpret the results with the remaining unmodelled heterogeneity in mind.
Collapse
Affiliation(s)
- Toon Braeye
- Department Epidemiology of infectious diseases, Epidemiology, Scientific Institute of Public Health, Brussels, Belgium
| | - Jan Verheagen
- Department of Clinical Microbiology, University Clinic Leuven, Leuven, Belgium
| | | | - Wim Flipse
- Infectious Disease Control, Flemish Agency for Care and Health, Brussels, Belgium
| | - Denis Pierard
- Institute of Medical Microbiology, University Hospital of Brussels, Brussels, Belgium
| | - Kris Huygen
- Department immunology, Communicable and Infectious Diseases, Scientific Institute of Public Health, Brussels, Belgium
| | - Carole Schirvel
- Cellule de surveillance des maladies infectieuses, Direction générale de la santé, Brussels, Belgium
| | - Niel Hens
- Interuniversity Institute for Biostatistics and statistical Bioinformatics, Hasselt University, Hasselt, Belgium.,Centre for Health Economic Research and Modelling Infectious Diseases (CHERMID), Vaccine & Infectious Disease Institute (WHO Collaborating Centre), University of Antwerp, Wilrijk, Belgium.,Epidemiology and social medicine (ESOC), University of Antwerp, Wilrijk, Belgium
| |
Collapse
|
9
|
Barker RJ, Forsyth DM, Wood M. Modeling sighting heterogeneity and abundance in spatially replicated multiple-observer surveys. J Wildl Manage 2014. [DOI: 10.1002/jwmg.694] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Affiliation(s)
- Richard J. Barker
- Department of Mathematics and Statistics; University of Otago; P.O. Box 56 Dunedin New Zealand
| | - David M. Forsyth
- Department of Sustainability and Environment; Arthur Rylah Institute for Environmental Research; 123 Brown Street Heidelberg VIC 3084 Australia
| | - Matthew Wood
- Australian Ecological Research Services Ltd.; 341 Princes Highway Portland VIC 3305 Australia
| |
Collapse
|
10
|
Huggins R, Hwang WH. A Review of the Use of Conditional Likelihood in Capture-Recapture Experiments. Int Stat Rev 2011. [DOI: 10.1111/j.1751-5823.2011.00157.x] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
11
|
Why a time effect often has a limited impact on capture-recapture estimates in closed populations. CAN J STAT 2008. [DOI: 10.1002/cjs.5550360108] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
12
|
Rivest LP, Baillargeon S. Applications and Extensions of Chao's Moment Estimator for the Size of a Closed Population. Biometrics 2007; 63:999-1006. [PMID: 17425635 DOI: 10.1111/j.1541-0420.2007.00779.x] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
Abstract
This article revisits Chao's (1989, Biometrics45, 427-438) lower bound estimator for the size of a closed population in a mark-recapture experiment where the capture probabilities vary between animals (model M(h)). First, an extension of the lower bound to models featuring a time effect and heterogeneity in capture probabilities (M(th)) is proposed. The biases of these lower bounds are shown to be a function of the heterogeneity parameter for several loglinear models for M(th). Small-sample bias reduction techniques for Chao's lower bound estimator are also derived. The application of the loglinear model underlying Chao's estimator when heterogeneity has been detected in the primary periods of a robust design is then investigated. A test for the null hypothesis that Chao's loglinear model provides unbiased abundance estimators is provided. The strategy of systematically using Chao's loglinear model in the primary periods of a robust design where heterogeneity has been detected is investigated in a Monte Carlo experiment. Its impact on the estimation of the population sizes and of the survival rates is evaluated in a Monte Carlo experiment.
Collapse
Affiliation(s)
- Louis-Paul Rivest
- Département de mathématiques et de statistique, Université Laval, Ste-Foy, Québec, Canada G1K 7P4.
| | | |
Collapse
|
13
|
|
14
|
|
15
|
Abstract
The robust design is a method for implementing a mark-recapture experiment featuring a nested sampling structure. The first level consists of primary sampling sessions; the population experiences mortality and immigration between primary sessions so that open population models apply at this level. The second level of sampling has a short mark-recapture study within each primary session. Closed population models are used at this stage to estimate the animal abundance at each primary session. This article suggests a loglinear technique to fit the robust design. Loglinear models for the analysis of mark-recapture data from closed and open populations are first reviewed. These two types of models are then combined to analyze the data from a robust design. The proposed loglinear approach to the robust design allows incorporating parameters for a heterogeneity in the capture probabilities of the units within each primary session. Temporary emigration out of the study area can also be accounted for in the loglinear framework. The analysis is relatively simple; it relies on a large Poisson regression with the vector of frequencies of the capture histories as dependent variable. An example concerned with the estimation of abundance and survival of the red-back vole in an area of southeastern Québec is presented.
Collapse
Affiliation(s)
- Louis-Paul Rivest
- Département de Mathématiques et de Statistique, Université Laval, Ste-Foy, Québec G1K 7P4, Canada.
| | | |
Collapse
|