1
|
Bifidobacteria define gut microbiome profiles of golden lion tamarin (Leontopithecus rosalia) and marmoset (Callithrix sp.) metagenomic shotgun pools. Sci Rep 2023; 13:15679. [PMID: 37735195 PMCID: PMC10514281 DOI: 10.1038/s41598-023-42059-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Accepted: 09/05/2023] [Indexed: 09/23/2023] Open
Abstract
Gut microbiome disruptions may lead to adverse effects on wildlife fitness and viability, thus maintaining host microbiota biodiversity needs to become an integral part of wildlife conservation. The highly-endangered callitrichid golden lion tamarin (GLT-Leontopithecus rosalia) is a rare conservation success, but allochthonous callitrichid marmosets (Callithrix) serve as principle ecological GLT threats. However, incorporation of microbiome approaches to GLT conservation is impeded by limited gut microbiome studies of Brazilian primates. Here, we carried out analysis of gut metagenomic pools from 114 individuals of wild and captive GLTs and marmosets. More specifically, we analyzed the bacterial component of ultra filtered samples originally collected as part of a virome profiling study. The major findings of this study are consistent with previous studies in showing that Bifidobacterium, a bacterial species important for the metabolism of tree gums consumed by callitrichids, is an important component of the callitrichid gut microbiome - although GTLs and marmosets were enriched for different species of Bifidobacterium. Additionally, the composition of GLT and marmoset gut microbiota is sensitive to host environmental factors. Overall, our data expand baseline gut microbiome data for callitrichids to allow for the development of new tools to improve their management and conservation.
Collapse
|
2
|
Expanding the stdpopsim species catalog, and lessons learned for realistic genome simulations. eLife 2023; 12:RP84874. [PMID: 37342968 DOI: 10.7554/elife.84874] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/23/2023] Open
Abstract
Simulation is a key tool in population genetics for both methods development and empirical research, but producing simulations that recapitulate the main features of genomic datasets remains a major obstacle. Today, more realistic simulations are possible thanks to large increases in the quantity and quality of available genetic data, and the sophistication of inference and simulation software. However, implementing these simulations still requires substantial time and specialized knowledge. These challenges are especially pronounced for simulating genomes for species that are not well-studied, since it is not always clear what information is required to produce simulations with a level of realism sufficient to confidently answer a given question. The community-developed framework stdpopsim seeks to lower this barrier by facilitating the simulation of complex population genetic models using up-to-date information. The initial version of stdpopsim focused on establishing this framework using six well-characterized model species (Adrion et al., 2020). Here, we report on major improvements made in the new release of stdpopsim (version 0.2), which includes a significant expansion of the species catalog and substantial additions to simulation capabilities. Features added to improve the realism of the simulated genomes include non-crossover recombination and provision of species-specific genomic annotations. Through community-driven efforts, we expanded the number of species in the catalog more than threefold and broadened coverage across the tree of life. During the process of expanding the catalog, we have identified common sticking points and developed the best practices for setting up genome-scale simulations. We describe the input data required for generating a realistic simulation, suggest good practices for obtaining the relevant information from the literature, and discuss common pitfalls and major considerations. These improvements to stdpopsim aim to further promote the use of realistic whole-genome population genetic simulations, especially in non-model organisms, making them available, transparent, and accessible to everyone.
Collapse
|
3
|
Federated learning and Indigenous genomic data sovereignty. NAT MACH INTELL 2022; 4:909-911. [PMID: 36504698 PMCID: PMC9731328 DOI: 10.1038/s42256-022-00551-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Indigenous peoples are under-represented in genomic datasets, which can lead to limited accuracy and utility of machine learning models in precision health. While open data sharing undermines rights of Indigenous communities to govern data decisions, federated learning may facilitate secure and community-consented data sharing.
Collapse
|
4
|
The first steps toward a global pandemic: Reconstructing the demographic history of parasite host switches in its native range. Mol Ecol 2022; 31:1358-1374. [PMID: 34882860 PMCID: PMC11105409 DOI: 10.1111/mec.16322] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2021] [Revised: 11/23/2021] [Accepted: 11/29/2021] [Indexed: 12/14/2022]
Abstract
Host switching allows parasites to expand their niches. However, successful switching may require suites of adaptations and also may decrease performance on the old host. As a result, reductions in gene flow accompany many host switches, driving speciation. Because host switches tend to be rapid, it is difficult to study them in real-time, and their demographic parameters remain poorly understood. As a result, fundamental factors that control subsequent parasite evolution, such as the size of the switching population or the extent of immigration from the original host, remain largely unknown. To shed light on the host switching process, we explored how host switches occur in independent host shifts by two ectoparasitic honey bee mites (Varroa destructor and V. jacobsoni). Both switched to the western honey bee (Apis mellifera) after being brought into contact with their ancestral host (Apis cerana), ~70 and ~12 years ago, respectively. Varroa destructor subsequently caused worldwide collapses of honey bee populations. Using whole-genome sequencing on 63 mites collected in their native ranges from both the ancestral and novel hosts, we were able to reconstruct the known temporal dynamics of the switch. We further found multiple previously undiscovered mitochondrial lineages on the novel host, along with the genetic equivalent of tens of individuals that were involved in the initial host switch. Despite being greatly reduced, some gene flow remains between mites adapted to different hosts. Our findings suggest that while reproductive isolation may facilitate the fixation of traits beneficial for exploiting the new host, ongoing genetic exchange may allow genetic amelioration of inbreeding effects.
Collapse
|
5
|
Abstract
Neutral evolution is a fundamental concept in evolutionary biology but teaching this and other non-adaptive concepts is especially challenging. Here we present Genie, a browser-based educational tool that demonstrates population-genetic concepts such as genetic drift, population isolation, gene flow, and genetic mutation. Because it does not need to be downloaded and installed, Genie can scale to large groups of students and is useful for both in-person and online instruction. Genie was used to teach genetic drift to Evolution students at Arizona State University during Spring 2016 and Spring 2017. The effectiveness of Genie to teach key genetic drift concepts and misconceptions was assessed with the Genetic Drift Inventory developed by Price et al. (CBE Life Sci Educ 13(1):65-75, 2014). Overall, Genie performed comparably to that of traditional static methods across all evaluated classes. We have empirically demonstrated that Genie can be successfully integrated with traditional instruction to reduce misconceptions about genetic drift.
Collapse
|
6
|
Abstract
Insertions and deletions (indels) are common molecular evolutionary events. However, probabilistic models for indel evolution are under-developed due to their computational complexity. Here, we introduce several improvements to indel modeling: 1) While previous models for indel evolution assumed that the rates and length distributions of insertions and deletions are equal, here we propose a richer model that explicitly distinguishes between the two; 2) we introduce numerous summary statistics that allow approximate Bayesian computation-based parameter estimation; 3) we develop a method to correct for biases introduced by alignment programs, when inferring indel parameters from empirical data sets; and 4) using a model-selection scheme, we test whether the richer model better fits biological data compared with the simpler model. Our analyses suggest that both our inference scheme and the model-selection procedure achieve high accuracy on simulated data. We further demonstrate that our proposed richer model better fits a large number of empirical data sets and that, for the majority of these data sets, the deletion rate is higher than the insertion rate.
Collapse
|
7
|
Genomic skimming and nanopore sequencing uncover cryptic hybridization in one of world's most threatened primates. Sci Rep 2021; 11:17279. [PMID: 34446741 PMCID: PMC8390465 DOI: 10.1038/s41598-021-96404-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2021] [Accepted: 08/10/2021] [Indexed: 12/28/2022] Open
Abstract
The Brazilian buffy-tufted-ear marmoset (Callithrix aurita), one of the world's most endangered primates, is threatened by anthropogenic hybridization with exotic, invasive marmoset species. As there are few genetic data available for C. aurita, we developed a PCR-free protocol with minimal technical requirements to rapidly generate genomic data with genomic skimming and portable nanopore sequencing. With this direct DNA sequencing approach, we successfully determined the complete mitogenome of a marmoset that we initially identified as C. aurita. The obtained nanopore-assembled sequence was highly concordant with a Sanger sequenced version of the same mitogenome. Phylogenetic analyses unexpectedly revealed that our specimen was a cryptic hybrid, with a C. aurita phenotype and C. penicillata mitogenome lineage. We also used publicly available mitogenome data to determine diversity estimates for C. aurita and three other marmoset species. Mitogenomics holds great potential to address deficiencies in genomic data for endangered, non-model species such as C. aurita. However, we discuss why mitogenomic approaches should be used in conjunction with other data for marmoset species identification. Finally, we discuss the utility and implications of our results and genomic skimming/nanopore approach for conservation and evolutionary studies of C. aurita and other marmosets.
Collapse
|
8
|
Abstract
The explosion in population genomic data demands ever more complex modes of analysis, and increasingly, these analyses depend on sophisticated simulations. Recent advances in population genetic simulation have made it possible to simulate large and complex models, but specifying such models for a particular simulation engine remains a difficult and error-prone task. Computational genetics researchers currently re-implement simulation models independently, leading to inconsistency and duplication of effort. This situation presents a major barrier to empirical researchers seeking to use simulations for power analyses of upcoming studies or sanity checks on existing genomic data. Population genetics, as a field, also lacks standard benchmarks by which new tools for inference might be measured. Here, we describe a new resource, stdpopsim, that attempts to rectify this situation. Stdpopsim is a community-driven open source project, which provides easy access to a growing catalog of published simulation models from a range of organisms and supports multiple simulation engine backends. This resource is available as a well-documented python library with a simple command-line interface. We share some examples demonstrating how stdpopsim can be used to systematically compare demographic inference methods, and we encourage a broader community of developers to contribute to this growing resource.
Collapse
|
9
|
Abstract
Somatic mutations can have important effects on the life history, ecology, and evolution of plants, but the rate at which they accumulate is poorly understood and difficult to measure directly. Here, we develop a method to measure somatic mutations in individual plants and use it to estimate the somatic mutation rate in a large, long-lived, phenotypically mosaic Eucalyptus melliodora tree. Despite being 100 times larger than Arabidopsis, this tree has a per-generation mutation rate only ten times greater, which suggests that this species may have evolved mechanisms to reduce the mutation rate per unit of growth. This adds to a growing body of evidence that illuminates the correlated evolutionary shifts in mutation rate and life history in plants.
Collapse
|
10
|
accuMUlate: a mutation caller designed for mutation accumulation experiments. Bioinformatics 2019; 34:2659-2660. [PMID: 29566129 DOI: 10.1093/bioinformatics/bty165] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2017] [Accepted: 03/15/2018] [Indexed: 11/13/2022] Open
Abstract
Summary Mutation accumulation (MA) is the most widely used method for directly studying the effects of mutation. By sequencing whole genomes from MA lines, researchers can directly study the rate and molecular spectra of spontaneous mutations and use these results to understand how mutation contributes to biological processes. At present there is no software designed specifically for identifying mutations from MA lines. Here we describe accuMUlate, a probabilistic mutation caller that reflects the design of a typical MA experiment while being flexible enough to accommodate properties unique to any particular experiment. Availability and implementation accuMUlate is available from https://github.com/dwinter/accuMUlate. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
|
11
|
SpartaABC: a web server to simulate sequences with indel parameters inferred using an approximate Bayesian computation algorithm. Nucleic Acids Res 2019; 45:W453-W457. [PMID: 28460062 PMCID: PMC5570005 DOI: 10.1093/nar/gkx322] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2017] [Accepted: 04/15/2017] [Indexed: 11/22/2022] Open
Abstract
Many analyses for the detection of biological phenomena rely on a multiple sequence alignment as input. The results of such analyses are often further studied through parametric bootstrap procedures, using sequence simulators. One of the problems with conducting such simulation studies is that users currently have no means to decide which insertion and deletion (indel) parameters to choose, so that the resulting sequences mimic biological data. Here, we present SpartaABC, a web server that aims to solve this issue. SpartaABC implements an approximate-Bayesian-computation rejection algorithm to infer indel parameters from sequence data. It does so by extracting summary statistics from the input. It then performs numerous sequence simulations under randomly sampled indel parameters. By computing a distance between the summary statistics extracted from the input and each simulation, SpartaABC retains only parameters behind simulations close to the real data. As output, SpartaABC provides point estimates and approximate posterior distributions of the indel parameters. In addition, SpartaABC allows simulating sequences with the inferred indel parameters. To this end, the sequence simulators, Dawg 2.0 and INDELible were integrated. Using SpartaABC we demonstrate the differences in indel dynamics among three protein-coding genes across mammalian orthologs. SpartaABC is freely available for use at http://spartaabc.tau.ac.il/webserver.
Collapse
|
12
|
The role of gene flow in rapid and repeated evolution of cave-related traits in Mexican tetra, Astyanax mexicanus. Mol Ecol 2018; 27:4397-4416. [PMID: 30252986 DOI: 10.1111/mec.14877] [Citation(s) in RCA: 101] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2018] [Revised: 08/08/2018] [Accepted: 08/19/2018] [Indexed: 12/13/2022]
Abstract
Understanding the molecular basis of repeatedly evolved phenotypes can yield key insights into the evolutionary process. Quantifying gene flow between populations is especially important in interpreting mechanisms of repeated phenotypic evolution, and genomic analyses have revealed that admixture occurs more frequently between diverging lineages than previously thought. In this study, we resequenced 47 whole genomes of the Mexican tetra from three cave populations, two surface populations and outgroup samples. We confirmed that cave populations are polyphyletic and two Astyanax mexicanus lineages are present in our data set. The two lineages likely diverged much more recently than previous mitochondrial estimates of 5-7 mya. Divergence of cave populations from their phylogenetically closest surface population likely occurred between ~161 and 191 k generations ago. The favoured demographic model for most population pairs accounts for divergence with secondary contact and heterogeneous gene flow across the genome, and we rigorously identified gene flow among all lineages sampled. Therefore, the evolution of cave-related traits occurred more rapidly than previously thought, and trogolomorphic traits are maintained despite gene flow with surface populations. The recency of these estimated divergence events suggests that selection may drive the evolution of cave-derived traits, as opposed to disuse and drift. Finally, we show that a key trogolomorphic phenotype QTL is enriched for genomic regions with low divergence between caves, suggesting that regions important for cave phenotypes may be transferred between caves via gene flow. Our study shows that gene flow must be considered in studies of independent, repeated trait evolution.
Collapse
|
13
|
Inferring Rates and Length-Distributions of Indels Using Approximate Bayesian Computation. Genome Biol Evol 2018; 9:1280-1294. [PMID: 28453624 PMCID: PMC5438127 DOI: 10.1093/gbe/evx084] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/25/2017] [Indexed: 02/07/2023] Open
Abstract
The most common evolutionary events at the molecular level are single-base substitutions, as well as insertions and deletions (indels) of short DNA segments. A large body of research has been devoted to develop probabilistic substitution models and to infer their parameters using likelihood and Bayesian approaches. In contrast, relatively little has been done to model indel dynamics, probably due to the difficulty in writing explicit likelihood functions. Here, we contribute to the effort of modeling indel dynamics by presenting SpartaABC, an approximate Bayesian computation (ABC) approach to infer indel parameters from sequence data (either aligned or unaligned). SpartaABC circumvents the need to use an explicit likelihood function by extracting summary statistics from simulated sequences. First, summary statistics are extracted from the input sequence data. Second, SpartaABC samples indel parameters from a prior distribution and uses them to simulate sequences. Third, it computes summary statistics from the simulated sets of sequences. By computing a distance between the summary statistics extracted from the input and each simulation, SpartaABC can provide an approximation to the posterior distribution of indel parameters as well as point estimates. We study the performance of our methodology and show that it provides accurate estimates of indel parameters in simulations. We next demonstrate the utility of SpartaABC by studying the impact of alignment errors on the inference of positive selection. A C ++ program implementing SpartaABC is freely available in http://spartaabc.tau.ac.il.
Collapse
|
14
|
The impact of self-incompatibility systems on the prevention of biparental inbreeding. PeerJ 2017; 5:e4085. [PMID: 29188143 PMCID: PMC5703146 DOI: 10.7717/peerj.4085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2017] [Accepted: 11/02/2017] [Indexed: 12/05/2022] Open
Abstract
Inbreeding in hermaphroditic plants can occur through two different mechanisms: biparental inbreeding, when a plant mates with a related individual, or self-fertilization, when a plant mates with itself. To avoid inbreeding, many hermaphroditic plants have evolved self-incompatibility (SI) systems which prevent or limit self-fertilization. One particular SI system-homomorphic SI-can also reduce biparental inbreeding. Homomorphic SI is found in many angiosperm species, and it is often assumed that the additional benefit of reduced biparental inbreeding may be a factor in the success of this SI system. To test this assumption, we developed a spatially-explicit, individual-based simulation of plant populations that displayed three different types of homomorphic SI. We measured the total level of inbreeding avoidance by comparing each population to a self-compatible population (NSI), and we measured biparental inbreeding avoidance by comparing to a population of self-incompatible plants that were free to mate with any other individual (PSI). Because biparental inbreeding is more common when offspring dispersal is limited, we examined the levels of biparental inbreeding over a range of dispersal distances. We also tested whether the introduction of inbreeding depression affected the level of biparental inbreeding avoidance. We found that there was a statistically significant decrease in autozygosity in each of the homomorphic SI populations compared to the PSI population and, as expected, this was more pronounced when seed and pollen dispersal was limited. However, levels of homozygosity and inbreeding depression were not reduced. At low dispersal, homomorphic SI populations also suffered reduced female fecundity and had smaller census population sizes. Overall, our simulations showed that the homomorphic SI systems had little impact on the amount of biparental inbreeding in the population especially when compared to the overall reduction in inbreeding compared to the NSI population. With further study, this observation may have important consequences for research into the origin and evolution of homomorphic self-incompatibility systems.
Collapse
|
15
|
Estimating error models for whole genome sequencing using mixtures of Dirichlet-multinomial distributions. Bioinformatics 2017; 33:2322-2329. [PMID: 28334373 PMCID: PMC5860108 DOI: 10.1093/bioinformatics/btx133] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2016] [Revised: 01/22/2017] [Accepted: 03/07/2017] [Indexed: 12/30/2022] Open
Abstract
MOTIVATION Accurate identification of genotypes is an essential part of the analysis of genomic data, including in identification of sequence polymorphisms, linking mutations with disease and determining mutation rates. Biological and technical processes that adversely affect genotyping include copy-number-variation, paralogous sequences, library preparation, sequencing error and reference-mapping biases, among others. RESULTS We modeled the read depth for all data as a mixture of Dirichlet-multinomial distributions, resulting in significant improvements over previously used models. In most cases the best model was comprised of two distributions. The major-component distribution is similar to a binomial distribution with low error and low reference bias. The minor-component distribution is overdispersed with higher error and reference bias. We also found that sites fitting the minor component are enriched for copy number variants and low complexity regions, which can produce erroneous genotype calls. By removing sites that do not fit the major component, we can improve the accuracy of genotype calls. AVAILABILITY AND IMPLEMENTATION Methods and data files are available at https://github.com/CartwrightLab/WuEtAl2017/ (doi:10.5281/zenodo.256858). CONTACT cartwright@asu.edu. SUPPLEMENTARY INFORMATION Supplementary data is available at Bioinformatics online.
Collapse
|
16
|
The importance of selection in the evolution of blindness in cavefish. BMC Evol Biol 2017; 17:45. [PMID: 28173751 PMCID: PMC5297207 DOI: 10.1186/s12862-017-0876-4] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2016] [Accepted: 01/09/2017] [Indexed: 12/04/2022] Open
Abstract
Background Blindness has evolved repeatedly in cave-dwelling organisms, and many hypotheses have been proposed to explain this observation, including both accumulation of neutral loss-of-function mutations and adaptation to darkness. Investigating the loss of sight in cave dwellers presents an opportunity to understand the operation of fundamental evolutionary processes, including drift, selection, mutation, and migration. Results Here we model the evolution of blindness in caves. This model captures the interaction of three forces: (1) selection favoring alleles causing blindness, (2) immigration of sightedness alleles from a surface population, and (3) mutations creating blindness alleles. We investigated the dynamics of this model and determined selection-strength thresholds that result in blindness evolving in caves despite immigration of sightedness alleles from the surface. We estimate that the selection coefficient for blindness would need to be at least 0.005 (and maybe as high as 0.5) for blindness to evolve in the model cave-organism, Astyanax mexicanus. Conclusions Our results indicate that strong selection is required for the evolution of blindness in cave-dwelling organisms, which is consistent with recent work suggesting a high metabolic cost of eye development. Electronic supplementary material The online version of this article (doi:10.1186/s12862-017-0876-4) contains supplementary material, which is available to authorized users.
Collapse
|
17
|
Low Base-Substitution Mutation Rate in the Germline Genome of the Ciliate Tetrahymena thermophil. Genome Biol Evol 2016; 8:3629-3639. [PMID: 27635054 PMCID: PMC5585995 DOI: 10.1093/gbe/evw223] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/12/2016] [Indexed: 12/28/2022] Open
Abstract
Mutation is the ultimate source of all genetic variation and is, therefore, central to evolutionary change. Previous work on Paramecium tetraurelia found an unusually low germline base-substitution mutation rate in this ciliate. Here, we tested the generality of this result among ciliates using Tetrahymena thermophila. We sequenced the genomes of 10 lines of T. thermophila that had each undergone approximately 1,000 generations of mutation accumulation (MA). We applied an existing mutation-calling pipeline and developed a new probabilistic mutation detection approach that directly models the design of an MA experiment and accommodates the noise introduced by mismapped reads. Our probabilistic mutation-calling method provides a straightforward way of estimating the number of sites at which a mutation could have been called if one was present, providing the denominator for our mutation rate calculations. From these methods, we find that T. thermophila has a germline base-substitution mutation rate of 7.61 × 10 - 12 per-site, per cell division, which is consistent with the low base-substitution mutation rate in P. tetraurelia. Over the course of the evolution experiment, genomic exclusion lines derived from the MA lines experienced a fitness decline that cannot be accounted for by germline base-substitution mutations alone, suggesting that other genetic or epigenetic factors must be involved. Because selection can only operate to reduce mutation rates based upon the "visible" mutational load, asexual reproduction with a transcriptionally silent germline may allow ciliates to evolve extremely low germline mutation rates.
Collapse
|
18
|
The effect of the dispersal kernel on isolation-by-distance in a continuous population. PeerJ 2016; 4:e1848. [PMID: 27069794 PMCID: PMC4824897 DOI: 10.7717/peerj.1848] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2015] [Accepted: 03/04/2016] [Indexed: 11/29/2022] Open
Abstract
Under models of isolation-by-distance, population structure is determined by the probability of identity-by-descent between pairs of genes according to the geographic distance between them. Well established analytical results indicate that the relationship between geographical and genetic distance depends mostly on the neighborhood size of the population which represents a standardized measure of gene flow. To test this prediction, we model local dispersal of haploid individuals on a two-dimensional landscape using seven dispersal kernels: Rayleigh, exponential, half-normal, triangular, gamma, Lomax and Pareto. When neighborhood size is held constant, the distributions produce similar patterns of isolation-by-distance, confirming predictions. Considering this, we propose that the triangular distribution is the appropriate null distribution for isolation-by-distance studies. Under the triangular distribution, dispersal is uniform over the neighborhood area which suggests that the common description of neighborhood size as a measure of an effective, local panmictic population is valid for popular families of dispersal distributions. We further show how to draw random variables from the triangular distribution efficiently and argue that it should be utilized in other studies in which computational efficiency is important.
Collapse
|
19
|
Equations of the End: Teaching Mathematical Modeling Using the Zombie Apocalypse. JOURNAL OF MICROBIOLOGY & BIOLOGY EDUCATION 2016; 17:137-42. [PMID: 27047611 PMCID: PMC4798798 DOI: 10.1128/jmbe.v17i1.1066] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2023]
Abstract
Mathematical models of infectious diseases are a valuable tool in understanding the mechanisms and patterns of disease transmission. It is, however, a difficult subject to teach, requiring both mathematical expertise and extensive subject-matter knowledge of a variety of disease systems. In this article, we explore several uses of zombie epidemics to make mathematical modeling and infectious disease epidemiology more accessible to public health professionals, students, and the general public. We further introduce a web-based simulation, White Zed (http://cartwrig.ht/apps/whitezed/), that can be deployed in classrooms to allow students to explore models before implementing them. In our experience, zombie epidemics are familiar, approachable, flexible, and an ideal way to introduce basic concepts of infectious disease epidemiology.
Collapse
|
20
|
Whole Genome Sequencing of Field Isolates Reveals Extensive Genetic Diversity in Plasmodium vivax from Colombia. PLoS Negl Trop Dis 2015; 9:e0004252. [PMID: 26709695 PMCID: PMC4692395 DOI: 10.1371/journal.pntd.0004252] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2015] [Accepted: 10/30/2015] [Indexed: 11/24/2022] Open
Abstract
Plasmodium vivax is the most prevalent malarial species in South America and exerts a substantial burden on the populations it affects. The control and eventual elimination of P. vivax are global health priorities. Genomic research contributes to this objective by improving our understanding of the biology of P. vivax and through the development of new genetic markers that can be used to monitor efforts to reduce malaria transmission. Here we analyze whole-genome data from eight field samples from a region in Cordóba, Colombia where malaria is endemic. We find considerable genetic diversity within this population, a result that contrasts with earlier studies suggesting that P. vivax had limited diversity in the Americas. We also identify a selective sweep around a substitution known to confer resistance to sulphadoxine-pyrimethamine (SP). This is the first observation of a selective sweep for SP resistance in this species. These results indicate that P. vivax has been exposed to SP pressure even when the drug is not in use as a first line treatment for patients afflicted by this parasite. We identify multiple non-synonymous substitutions in three other genes known to be involved with drug resistance in Plasmodium species. Finally, we found extensive microsatellite polymorphisms. Using this information we developed 18 polymorphic and easy to score microsatellite loci that can be used in epidemiological investigations in South America. Although P. vivax is not as deadly as the more widely studied P. falciparum, it remains a pressing global health problem. Here we report the results of a whole-genome study of P. vivax from Cordóba, Colombia, in South America. This parasite is the most prevalent in this region. We show that the parasite population is genetically diverse, which is contrary to expectations from earlier studies from the Americas. We also find molecular evidence that resistance to an anti-malarial drug has arisen recently in this region. This selective sweep indicates that the parasite has been exposed to a drug that is not used as first-line treatment for this malaria parasite. In addition to extensive single nucleotide and microsatellite polymorphism, we report 18 new genetic loci that might be helpful for fine-scale studies of this species in the Americas.
Collapse
|
21
|
Abstract
In this study, we present a novel methodology to infer indel parameters from multiple sequence alignments (MSAs) based on simulations. Our algorithm searches for the set of evolutionary parameters describing indel dynamics which best fits a given input MSA. In each step of the search, we use parametric bootstraps and the Mahalanobis distance to estimate how well a proposed set of parameters fits input data. Using simulations, we demonstrate that our methodology can accurately infer the indel parameters for a large variety of plausible settings. Moreover, using our methodology, we show that indel parameters substantially vary between three genomic data sets: Mammals, bacteria, and retroviruses. Finally, we demonstrate how our methodology can be used to simulate MSAs based on indel parameters inferred from real data sets.
Collapse
|
22
|
A composite genome approach to identify phylogenetically informative data from next-generation sequencing. BMC Bioinformatics 2015; 16:193. [PMID: 26062548 PMCID: PMC4464851 DOI: 10.1186/s12859-015-0632-y] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2014] [Accepted: 05/29/2015] [Indexed: 11/16/2022] Open
Abstract
Background Improvements in sequencing technology now allow easy acquisition of large datasets; however, analyzing these data for phylogenetics can be challenging. We have developed a novel method to rapidly obtain homologous genomic data for phylogenetics directly from next-generation sequencing reads without the use of a reference genome. This software, called SISRS, avoids the time consuming steps of de novo whole genome assembly, multiple genome alignment, and annotation. Results For simulations SISRS is able to identify large numbers of loci containing variable sites with phylogenetic signal. For genomic data from apes, SISRS identified thousands of variable sites, from which we produced an accurate phylogeny. Finally, we used SISRS to identify phylogenetic markers that we used to estimate the phylogeny of placental mammals. We recovered eight phylogenies that resolved the basal relationships among mammals using datasets with different levels of missing data. The three alternate resolutions of the basal relationships are consistent with the major hypotheses for the relationships among mammals, all of which have been supported previously by different molecular datasets. Conclusions SISRS has the potential to transform phylogenetic research. This method eliminates the need for expensive marker development in many studies by using whole genome shotgun sequence data directly. SISRS is open source and freely available at https://github.com/rachelss/SISRS/releases. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0632-y) contains supplementary material, which is available to authorized users.
Collapse
|
23
|
Abstract
Domestication and plant breeding are ongoing 10,000-year-old evolutionary experiments that have radically altered wild species to meet human needs. Maize has undergone a particularly striking transformation. Researchers have sought for decades to identify the genes underlying maize evolution, but these efforts have been limited in scope. Here, we report a comprehensive assessment of the evolution of modern maize based on the genome-wide resequencing of 75 wild, landrace and improved maize lines. We find evidence of recovery of diversity after domestication, likely introgression from wild relatives, and evidence for stronger selection during domestication than improvement. We identify a number of genes with stronger signals of selection than those previously shown to underlie major morphological changes. Finally, through transcriptome-wide analysis of gene expression, we find evidence both consistent with removal of cis-acting variation during maize domestication and improvement and suggestive of modern breeding having increased dominance in expression while targeting highly expressed genes.
Collapse
|
24
|
Abstract
Although most of the important evolutionary events in the history of biology can only be studied via interspecific comparisons, it is challenging to apply the rich body of population genetic theory to the study of interspecific genetic variation. Probabilistic modeling of the substitution process would ideally be derived from first principles of population genetics, allowing a quantitative connection to be made between the parameters describing mutation, selection, drift, and the patterns of interspecific variation. There has been progress in reconciling population genetics and interspecific evolution for the case where mutation rates are sufficiently low, but when mutation rates are higher, reconciliation has been hampered due to complications from how the loss or fixation of new mutations can be influenced by linked nonneutral polymorphisms (i.e., the Hill-Robertson effect). To investigate the generation of interspecific genetic variation when concurrent fitness-affecting polymorphisms are common and the Hill-Robertson effect is thereby potentially strong, we used the Wright-Fisher model of population genetics to simulate very many generations of mutation, natural selection, and genetic drift. This was done so that the chronological history of advantageous, deleterious, and neutral substitutions could be traced over time along the ancestral lineage. Our simulations show that the process by which a nonrecombining sequence changes over time can markedly deviate from the Markov assumption that is ubiquitous in molecular phylogenetics. In particular, we find tendencies for advantageous substitutions to be followed by deleterious ones and for deleterious substitutions to be followed by advantageous ones. Such non-Markovian patterns reflect the fact that the fate of the ancestral lineage depends not only on its current allelic state but also on gene copies not belonging to the ancestral lineage. Although our simulations describe nonrecombining sequences, we conclude by discussing how non-Markovian behavior of the ancestral lineage is plausible even when recombination rates are not low. As a result, we believe that increased attention needs to be devoted to the robustness of evolutionary inference procedures that rely upon the Markov assumption.
Collapse
|
25
|
The multiple personalities of Watson and Crick strands. Biol Direct 2011; 6:7. [PMID: 21303550 PMCID: PMC3055211 DOI: 10.1186/1745-6150-6-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2010] [Accepted: 02/08/2011] [Indexed: 11/18/2022] Open
Abstract
Background In genetics it is customary to refer to double-stranded DNA as containing a "Watson strand" and a "Crick strand." However, there seems to be no consensus in the literature on the exact meaning of these two terms, and the many usages contradict one another as well as the original definition. Here, we review the history of the terminology and suggest retaining a single sense that is currently the most useful and consistent. Proposal The Saccharomyces Genome Database defines the Watson strand as the strand which has its 5'-end at the short-arm telomere and the Crick strand as its complement. The Watson strand is always used as the reference strand in their database. Using this as the basis of our standard, we recommend that Watson and Crick strand terminology only be used in the context of genomics. When possible, the centromere or other genomic feature should be used as a reference point, dividing the chromosome into two arms of unequal lengths. Under our proposal, the Watson strand is standardized as the strand whose 5'-end is on the short arm of the chromosome, and the Crick strand as the one whose 5'-end is on the long arm. Furthermore, the Watson strand should be retained as the reference (plus) strand in a genomic database. This usage not only makes the determination of Watson and Crick unambiguous, but also allows unambiguous selection of reference stands for genomics. Reviewers This article was reviewed by John M. Logsdon, Igor B. Rogozin (nominated by Andrey Rzhetsky), and William Martin.
Collapse
|
26
|
Abstract
Mutational robustness describes the extent to which a phenotype remains unchanged in the face of mutations. Theory predicts that the strength of direct selection for mutational robustness is at most the magnitude of the rate of deleterious mutation. As far as nucleic acid sequences are concerned, only long sequences in organisms with high deleterious mutation rates and large population sizes are expected to evolve mutational robustness. Surprisingly, recent studies have concluded that molecules that meet none of these conditions--the microRNA precursors (pre-miRNAs) of multicellular eukaryotes--show signs of selection for mutational and/or environmental robustness. To resolve the apparent disagreement between theory and these studies, we have reconstructed the evolutionary history of Drosophila pre-miRNAs and compared the robustness of each sequence to that of its reconstructed ancestor. In addition, we "replayed the tape" of pre-miRNA evolution via simulation under different evolutionary assumptions and compared these alternative histories with the actual one. We found that Drosophila pre-miRNAs have evolved under strong purifying selection against changes in secondary structure. Contrary to earlier claims, there is no evidence that these RNAs have been shaped by either direct or congruent selection for any kind of robustness. Instead, the high robustness of Drosophila pre-miRNAs appears to be mostly intrinsic and likely a consequence of selection for functional structures.
Collapse
|
27
|
PICS-Ord: unlimited coding of ambiguous regions by pairwise identity and cost scores ordination. BMC Bioinformatics 2011; 12:10. [PMID: 21214904 PMCID: PMC3024941 DOI: 10.1186/1471-2105-12-10] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2010] [Accepted: 01/07/2011] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND We present a novel method to encode ambiguously aligned regions in fixed multiple sequence alignments by 'Pairwise Identity and Cost Scores Ordination' (PICS-Ord). The method works via ordination of sequence identity or cost scores matrices by means of Principal Coordinates Analysis (PCoA). After identification of ambiguous regions, the method computes pairwise distances as sequence identities or cost scores, ordinates the resulting distance matrix by means of PCoA, and encodes the principal coordinates as ordered integers. Three biological and 100 simulated datasets were used to assess the performance of the new method. RESULTS Including ambiguous regions coded by means of PICS-Ord increased topological accuracy, resolution, and bootstrap support in real biological and simulated datasets compared to the alternative of excluding such regions from the analysis a priori. In terms of accuracy, PICS-Ord performs equal to or better than previously available methods of ambiguous region coding (e.g., INAASE), with the advantage of a practically unlimited alignment size and increased analytical speed and the possibility of PICS-Ord scores to be analyzed together with DNA data in a partitioned maximum likelihood model. CONCLUSIONS Advantages of PICS-Ord over step matrix-based ambiguous region coding with INAASE include a practically unlimited number of OTUs and seamless integration of PICS-Ord codes into phylogenetic datasets, as well as the increased speed of phylogenetic analysis. Contrary to word- and frequency-based methods, PICS-Ord maintains the advantage of pairwise sequence alignment to derive distances, and the method is flexible with respect to the calculation of distance scores. In addition to distance and maximum parsimony, PICS-Ord codes can be analyzed in a Bayesian or maximum likelihood framework. RAxML (version 7.2.6 or higher that was developed for this study) allows up to 32-state ordered or unordered characters. A GTR, MK, or ORDERED model can be applied to analyse the PICS-Ord codes partition, with GTR performing slightly better than MK and ORDERED. AVAILABILITY An implementation of the PICS-Ord algorithm is available from http://scit.us/projects/ngila/wiki/PICS-Ord. It requires both the statistical software, R http://www.r-project.org and the alignment software Ngila http://scit.us/projects/ngila.
Collapse
|
28
|
Antagonism between local dispersal and self-incompatibility systems in a continuous plant population. Mol Ecol 2009; 18:2327-36. [PMID: 19389171 DOI: 10.1111/j.1365-294x.2009.04180.x] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Many self-incompatible plant species exist in continuous populations in which individuals disperse locally. Local dispersal of pollen and seeds facilitates inbreeding because pollen pools are likely to contain relatives. Self-incompatibility promotes outbreeding because relatives are likely to carry incompatible alleles. Therefore, populations can experience an antagonism between these forces. In this study, a novel computational model is used to explore the effects of this antagonism on gene flow, allelic diversity, neighbourhood sizes, and identity by descent. I confirm that this antagonism is sensitive to dispersal levels and linkage. However, the results suggest that there is little to no difference between the effects of gametophytic and sporophytic self-incompatibility systems (GSI and SSI) on unlinked loci. More importantly, both GSI and SSI affect unlinked loci in a manner similar to obligate outcrossing without mating types. This suggests that the primary evolutionary impact of self-incompatibility systems may be to prevent selfing, and prevention of biparental inbreeding might be a beneficial side-effect.
Collapse
|
29
|
Abstract
Insertions and deletions (indels) are fundamental but understudied components of molecular evolution. Here we present an expectation-maximization algorithm built on a pair hidden Markov model that is able to properly handle indels in neutrally evolving DNA sequences. From a data set of orthologous introns, we estimate relative rates and length distributions of indels among primates and rodents. This technique has the advantage of potentially handling large genomic data sets. We find that a zeta power-law model of indel lengths provides a much better fit than the traditional geometric model and that indel processes are conserved between our taxa. The estimated relative rates are about 12-16 indels per 100 substitutions, and the estimated power-law magnitudes are about 1.6-1.7. More significantly, we find that using the traditional geometric/affine model of indel lengths introduces artifacts into evolutionary analysis, casting doubt on studies of the evolution and diversity of indel formation using traditional models and invalidating measures of species divergence that include indel lengths.
Collapse
|
30
|
Abstract
MOTIVATION Relationships amongst taxa are inferred from biological data using phylogenetic methods and procedures. Very few known phylogenies exist against which to test the accuracy of our inferences. Therefore, in the absence of biological data, simulated data must be used to test the accuracy of methods which produce these inferences. Researchers have limited or non-existent options for simulations useful for studying the impact of insertions, deletions, and alignments on phylogenetic accuracy. RESULTS To satisfy this gap I have developed a new algorithm of indel formation and incorporated it into a new, flexible, and portable application for sequence simulation. The application, called Dawg, simulates phylogenetic evolution of DNA sequences in continuous time using the robust general time reversible model with gamma and invariant rate heterogeneity and a novel length-dependent model of indel formation. On completion, Dawg produces the true alignment of the simulated sequences. Unlike other applications, Dawg allows indel lengths to be explicitly distributed via a biologically realistic power law. Many options are available to allow users to customize their simulations and results. Because simulating with indels would be problematic if biologically realistic parameters could not be estimated, a script is provided with Dawg that can estimate the parameters of indel formation from sequence data. Dawg was applied to the sequences of four chloroplast trnK introns. It was used to parametrically bootstrap an estimation of the rate of indel formation for the phylogeny. Because Dawg can assist in parametric bootstrapping of sequence data it is useful beyond phylogenetics, such as studying alignment algorithms or parameters of molecular evolution. AVAILABILITY Dawg 1.0.0 can be obtained at the following websites: http://www.genetics.uga.edu/sw/ or http://scit.us/dawg/. The package includes source code, example files, a brief manual and helper scripts. Binary distributions are available for Windows and Macintosh OS X. A development page for Dawg exists at http://scit.us/dawg/, with links to a Subversion repository, mailing lists and updated versions.
Collapse
|
31
|
Abstract
UNLABELLED Ngila is an application that will find the best alignment of a pair of sequences using log-affine gap costs, which are the most biologically realistic gap costs. AVAILABILITY Portable source code for Ngila can be downloaded from its development website, http://scit.us/projects/ngila/. It compiles on most operating systems.
Collapse
|
32
|
Abstract
Background Studies on the distribution of indel sizes have consistently found that they obey a power law. This finding has lead several scientists to propose that logarithmic gap costs, G (k) = a + c ln k, are more biologically realistic than affine gap costs, G (k) = a + bk, for sequence alignment. Since quick and efficient affine costs are currently the most popular way to globally align sequences, the goal of this paper is to determine whether logarithmic gap costs improve alignment accuracy significantly enough the merit their use over the faster affine gap costs. Results A group of simulated sequences pairs were globally aligned using affine, logarithmic, and log-affine gap costs. Alignment accuracy was calculated by comparing resulting alignments to actual alignments of the sequence pairs. Gap costs were then compared based on average alignment accuracy. Log-affine gap costs had the best accuracy, followed closely by affine gap costs, while logarithmic gap costs performed poorly. Subsequently a model was developed to explain the results. Conclusion In contrast to initial expectations, logarithmic gap costs produce poor alignments and are actually not implied by the power-law behavior of gap sizes, given typical match and mismatch costs. Furthermore, affine gap costs not only produce accurate alignments but are also good approximations to biologically realistic gap costs. This work provides added confidence for the biological relevance of existing alignment algorithms.
Collapse
|
33
|
A toxic mutator and selection alternative to the non-Mendelian RNA cache hypothesis for hothead reversion. THE PLANT CELL 2005; 17:2856-8. [PMID: 16267378 PMCID: PMC1276014 DOI: 10.1105/tpc.105.036293] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
|
34
|
Abstract
Selection in which fitnesses vary with the changing genetic composition of the population may facilitate the maintenance of genetic diversity in a wide range of organisms. Here, a detailed theoretical investigation is made of a frequency-dependent selection model, in which fitnesses are based on pairwise interactions between the two phenotypes at a diploid, diallelic, autosomal locus with complete dominance. The allele frequency dynamics are fully delimited analytically, along with all possible shapes of the mean fitness function in terms of where it increases or decreases as a function of the current allele frequency in the population. These results in turn allow possibly the first complete characterization of the dynamical behavior by the mean fitness through time under frequency-dependent selection. Here the mean fitness (i) monotonically increases, (ii) monotonically decreases, (iii) initially increases and then decreases, or (iv) initially decreases and then increases as equilibrium is approached. We analytically derive the exact initial and fitness conditions that produce each dynamic and how often each arises. Computer simulations with random initial conditions and fitnesses reveal that the potential decline in mean fitness is not negligible; on average a net decrease occurs 20% of the time and reduces the mean fitness by >17%.
Collapse
|
35
|
Abstract
The descriptive and aetiological epidemiology of Hodgkin's Disease (HD) are reviewed. Key issues which are highlighted include the evidence suggesting that HD is a complex of related conditions that are part mediated by infectious diseases, immune deficits and genetic susceptibilities. There is little convincing evidence to suggest any other environmental factors are involved in the aetiology. The apparent changing pattern of disease by time and from country to country, needs careful future study.
Collapse
|
36
|
Abstract
OBJECTIVE To evaluate the age-standardized incidence rate of bladder cancer in patients with spinal cord injury (SCI) and the overall risk for this population. PATIENTS AND METHODS We reviewed 1334 patients with SCI whose dates of SCI, or first attendance at our centre, were between 1940 and 1998. The length of follow-up was calculated for each patient and age-specific incidence rates of bladder cancer calculated using 5-year age bands. This was used to calculate the overall incidence rate, using direct standardization with the European standard population. The cancers were analysed histochemically to characterize the phenotype. RESULTS The 1324 patients contributed a total of 12 444 person-years of follow-up. There were four cases of bladder cancer, giving an age-standardized incidence rate of 30.7 per 100 000 person-years. Histochemistry showed areas were positive for cytokeratin 14, which was also positive in the undifferentiated areas. Immunohistochemical staining was positive for cytokeratin 14 and consistently negative for cytokeratin 20, suggesting a pure squamous phenotype. CONCLUSIONS The age-standardized incidence of invasive bladder cancer in patients in our SCI unit is not statistically different from that of the general population. However, the incidence of invasive bladder cancer in the present study appears to be lower than that reported in other series. Histochemical analysis confirmed a squamous cell phenotype in these tumours.
Collapse
|
37
|
Occupational exposure to electromagnetic fields and acute leukaemia: analysis of a case-control study. Occup Environ Med 2003; 60:577-83. [PMID: 12883018 PMCID: PMC1740585 DOI: 10.1136/oem.60.8.577] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
AIMS To investigate whether the risk of acute leukaemia among adults is associated with occupational exposure to electromagnetic fields. METHODS Probable occupational exposure to electromagnetic fields at higher than typical residential levels was investigated among 764 patients diagnosed with acute leukaemia during 1991-96 and 1510 sex and age matched controls. A job exposure matrix was applied to the self reported employment histories to determine whether or not a subject was exposed to electromagnetic fields. Risks were assessed using conditional logistic regression for a matched analysis. RESULTS Study subjects considered probably ever exposed to electromagnetic fields at work were not at increased risk of acute leukaemia compared to those considered never exposed. Generally, no associations were observed on stratification by sex, leukaemia subtype, number of years since exposure stopped, or occupation; there was no evidence of a dose-response effect using increasing number of years exposed. However, relative to women considered never exposed, a significant excess of acute lymphoblastic leukaemia was observed among women probably exposed to electromagnetic fields at work that remained increased irrespective of time prior to diagnosis or job ever held. CONCLUSION This large population based case-control study found little evidence to support an association between occupational exposure to electromagnetic fields and acute leukaemia. While an excess of acute lymphoblastic leukaemia among women was observed, it is unlikely that occupational exposure to electromagnetic fields was responsible, given that increased risks remained during periods when exposure above background levels was improbable.
Collapse
|
38
|
Childhood cancers and radon. Lancet 2003; 361:1658. [PMID: 12747919 DOI: 10.1016/s0140-6736(03)13289-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
39
|
Leukaemia incidence near coastal features. J Public Health (Oxf) 2002; 24:255-60. [PMID: 12546201 DOI: 10.1093/pubmed/24.4.255] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
BACKGROUND The aim of the study was to independently test the hypothesis that leukaemia incidence is higher in proximity to estuaries. METHODS Electoral wards were classified as to whether they included estuarine, coastal or only inland features. Rates of different adult and childhood leukaemias were computed for each ward category; that is, acute lymphoblastic leukaemia (ALL), acute myeloid leukaemia (AML), chronic myeloid leukaemia (CML) aged 0-79 and for all childhood leukaemias combined (aged 0-14). RESULTS Poisson regression analysis controlling for the effects of sex, age, and socioeconomic and urban-rural status, showed no statistically significant differences in incidence between wards with different levels of estuarine classification. CONCLUSION The hypothesis created from an earlier dataset that a link exists between leukaemia and residence near estuaries is not upheld.
Collapse
|
40
|
Patients entered into MRC AML trials are biologically representative of the totality of the disease in the UK. CLINICAL AND LABORATORY HAEMATOLOGY 2002; 24:263-5. [PMID: 12181033 DOI: 10.1046/j.1365-2257.2002.00445.x] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
41
|
Age-specific incidence rates for cytogenetically-defined subtypes of acute myeloid leukaemia. Br J Cancer 2002; 86:1061-3. [PMID: 11953849 PMCID: PMC2364184 DOI: 10.1038/sj.bjc.6600195] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2001] [Revised: 01/15/2002] [Accepted: 01/21/2002] [Indexed: 11/30/2022] Open
Abstract
It is generally considered that most cancers arise following the accumulation of several genetic events and that as a consequence its incidence increases with age. We report a cytogenetic subgroup of acute myeloid leukaemia whose incidence is independent of age. This observation indicates that acute myeloid leukaemia can develop via multiple pathways, and underlines the importance of cytogenetics in understanding this disease.
Collapse
|
42
|
Smoking and the risk of acute myeloid leukaemia in cytogenetic subgroups. Br J Cancer 2002; 86:60-2. [PMID: 11857012 PMCID: PMC2746540 DOI: 10.1038/sj.bjc.6600010] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2001] [Revised: 10/26/2001] [Accepted: 10/26/2001] [Indexed: 11/09/2022] Open
Abstract
Cytogenetically-defined subgroups of acute myeloid leukaemia have distinct biologies, clinical features and outcomes. Evidence from therapy-related leukaemia suggests that chromosomal abnormalities are also markers of exposure. Our results suggest that the smoking-associated risk for acute myeloid leukaemia is restricted to the t(8;21)(q22;q22) subgroup. This supports the hypothesis that distinct cytogenetic subgroups of acute myeloid leukaemia have separate aetiologies.
Collapse
|
43
|
Abstract
BACKGROUND The objective of this study was to formally investigate the onset of the Seascale cluster of childhood and young person's cancer. This has not previously been attempted. METHODS A mortality study within the Whitehaven registration district was set up and death records were abstracted for 1906-1970. They were categorized as death from leukaemias, lymphomas, other cancers and all other causes in persons aged 0-14, 0-24 and 25-84. The number of deaths, death rates and standardized mortality ratios were calculated. RESULTS The mortality of persons aged 25-84 in Seascale civil parish, Gosforth civil parish and the rest of the Whitehaven district was unremarkable compared with national data 1906-1970. There were no cancer deaths aged 0-24 in Gosforth civil parish during 1906-1970. In Seascale civil parish a hitherto unrecorded childhood cancer case was revealed, dying in 1954. No cancer deaths aged 0-24 were found before that date. In the period 1946-1955 three cancer deaths gave a statistically significant excess owing to non-leukaemia cases, whereas in the period 1956-1965 a statistical excess of all types of leukaemia occurred as a result of two deaths. There was no case excess (based on one leukaemia death) in the period 1966-1970. CONCLUSION We found no clear temporal associations of the case excesses either with the periods of significant nuclear activity on the Sellafield site or with the main periods of population growth in the area.
Collapse
|
44
|
Polymorphism in glutathione S-transferase P1 is associated with susceptibility to chemotherapy-induced leukemia. Proc Natl Acad Sci U S A 2001; 98:11592-7. [PMID: 11553769 PMCID: PMC58774 DOI: 10.1073/pnas.191211198] [Citation(s) in RCA: 172] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2001] [Indexed: 01/02/2023] Open
Abstract
Glutathione S-transferases (GSTs) detoxify potentially mutagenic and toxic DNA-reactive electrophiles, including metabolites of several chemotherapeutic agents, some of which are suspected human carcinogens. Functional polymorphisms exist in at least three genes that encode GSTs, including GSTM1, GSTT1, and GSTP1. We hypothesize, therefore, that polymorphisms in genes that encode GSTs alter susceptibility to chemotherapy-induced carcinogenesis, specifically to therapy-related acute myeloid leukemia (t-AML), a devastating complication of long-term cancer survival. Elucidation of genetic determinants may help to identify individuals at increased risk of developing t-AML. To this end, we have examined 89 cases of t-AML, 420 cases of de novo AML, and 1,022 controls for polymorphisms in GSTM1, GSTT1, and GSTP1. Gene deletion of GSTM1 or GSTT1 was not specifically associated with susceptibility to t-AML. Individuals with at least one GSTP1 codon 105 Val allele were significantly over-represented in t-AML cases compared with de novo AML cases [odds ratio (OR), 1.81; 95% confidence interval (CI), 1.11-2.94]. Moreover, relative to de novo AML, the GSTP1 codon 105 Val allele occurred more often among t-AML patients with prior exposure to chemotherapy (OR, 2.66; 95% CI, 1.39-5.09), particularly among those with prior exposure to known GSTP1 substrates (OR, 4.34; 95% CI, 1.43-13.20), and not among those t-AML patients with prior exposure to radiotherapy alone (OR,1.01; 95% CI, 0.50-2.07). These data suggest that inheritance of at least one Val allele at GSTP1 codon 105 confers a significantly increased risk of developing t-AML after cytotoxic chemotherapy, but not after radiotherapy.
Collapse
|
45
|
Epstein-Barr Virus and HLA-DPB1-*0301 in young adult Hodgkin's disease: evidence for inherited susceptibility to Epstein-Barr Virus in cases that are EBV(+ve). Cancer Epidemiol Biomarkers Prev 2001; 10:705-9. [PMID: 11401923] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/20/2023] Open
Abstract
Cases of Hodgkin's disease (HD) may be distinguished by whether they do [EBV-positive ((+ve)) cases] or do not [EBV-negative ((-ve)) cases] have evidence of EBV DNA in the Reed-Sternberg cells. Only one study has attempted to distinguish epidemiological risk factors for EBV(+ve) and EBV(-ve) HD, and none have compared inherited susceptibility. The present study involves a population-based case series of HD, diagnosed in patients between 16-24 years of age in the United Kingdom (n = 118), of whom 87% were classified by EBV status (EBV(+ve), 19, EBV(-ve), 84). History of infectious illness, EBV antibody titers, and HLA-DPB1 type have been compared in EBV(+ve) and EBV(-ve) cases. Reported infectious mononucleosis was more frequent in EBV(+ve) cases (odds ratio (OR), 5.10; 95% confidence interval (CI), 1.12-24.4). EBV antibody titers to viral capsid antigen were significantly higher in EBV(+ve) cases (P for trend = 0.02). Higher proportions of EBV(+ve) (43%) than EBV(-ve) (31%) cases typed positive for HLA-DPB1*0301, but this was not statistically significant; the association of infectious mononucleosis with EBV(+ve) cases was stronger in this HLA subgroup (OR, 17.1; 95%CI, 1.06-1177) than in other cases (OR, 1.24; 95% CI, 0.02-15.4). Although these results are based on small numbers of HD cases, they provide suggestive evidence that the etiology of EBV(+ve) HD may involve inherited susceptibility to EBV.
Collapse
|
46
|
Abstract
A novel hierarchical cytogenetic classification for acute myeloid leukemia (AML) has been developed. Patients with successful cytogenetics and a diagnosis of AML were categorized into four mutually exclusive karyotype groups: normal, translocation, deletion and trisomy. Patients with more than one chromosomal abnormality were classified using the hierarchy: established translocation>established deletion>established trisomy>non-established translocation>non-established deletion>non-established trisomy. A total of 593 AML patients from a large population-based case-control study of acute leukemia were classified according to their diagnostic karyotype. The four karyotype groups showed different age distributions. Overall the frequency of patients increased with age as did the frequency of patients with a deletion, trisomy or normal karyotype. Although the increase of patients with age was much sharper for patients with a deletion. In contrast, the distribution of patients with a translocation was roughly constant with age. We concluded that there was a link between karyotype and the age of the patient at diagnosis. Furthermore, two karyotype groups, translocations and deletions, may define disease entities with different etiologies. This novel cytogenetic classification will allow other studies to examine whether AML cases with very different types of chromosomal abnormality have the same etiology.
Collapse
|
47
|
Childhood cancer and parental use of tobacco: findings from the inter-regional epidemiological study of childhood cancer (IRESCC). Br J Cancer 2001; 84:141-6. [PMID: 11139329 PMCID: PMC2363626 DOI: 10.1054/bjoc.2000.1556] [Citation(s) in RCA: 64] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022] Open
Abstract
Parental smoking data have been re-abstracted from the interview records of the Inter-Regional Epidemiological Study of Childhood Cancer (IRESCC) to test further the hypothesis that paternal cigarette smoking is a risk factor for the generality of childhood cancer. Reported cigarette smoking habits for the parents of 555 children diagnosed with cancer in the period 1980-1983 were compared, in two separate matched pairs analyses, with similar information for the parents of 555 children selected from GP lists (GP controls) and for the parents of 555 hospitalized children (hospital controls). When cases were compared with GP controls there was a statistically significant positive trend (P = 0.02) between the risk of childhood cancer and paternal daily consumption of cigarettes before the pregnancy; there was no significant trend for maternal smoking habit. When cases were compared with hospital controls there was a statistically significant negative trend (P< 0.001) between the risk of childhood cancer and maternal daily consumption of cigarettes before the pregnancy; there was no significant trend for paternal smoking habit. Neither of the significant trends could be explained by adjustment for socioeconomic grouping, ethnic origin or parental age at the birth of the child, or by simultaneous analysis of parental smoking habits. Relations between maternal consumption of cigarettes and birth weights suggested that (maternal) smoking data were equally reliable for case and control subjects, although comparisons with national data suggested that the hospital control parents were unusually heavy smokers. These findings give some support for the hypothesis that paternal cigarette smoking is a potential risk factor for the generality of childhood cancers.
Collapse
|
48
|
Abstract
Approaches to the management of adolescents and young adults with acute leukaemia were investigated by sending a questionnaire to hospitals identified as having diagnosed or treated patients aged 15-29 years. The responses demonstrated the types of hospital treating these patients, the haematologists' perceived practice for entry of patients to Medical Research Council (MRC) leukaemia trials and reasons for non-entry. Data were linked to MRC trials data to determine the proportion of patients aged 15-29 years at diagnosis in responding hospitals actually treated in MRC leukaemia trials in the 5 years preceding the questionnaire. Eighty-two per cent of haematologists stated that they entered patients 'always' or 'whenever possible' for acute myeloid leukaemia (AML) and 76% for acute lymphoblastic leukaemia (ALL), but actual entry rates from the study hospitals were 46% of 239 AML patients and 36% of 182 ALL patients. The reasons most commonly reported for not entering eligible patients to national leukaemia trials were clinician preference for one arm of an MRC trial, a regional study or non-trial protocol, and concern about workload and ethical approval.
Collapse
|
49
|
Risk factors for Hodgkin's disease by Epstein-Barr virus (EBV) status: prior infection by EBV and other agents. Br J Cancer 2000; 82:1117-21. [PMID: 10737396 PMCID: PMC2374437 DOI: 10.1054/bjoc.1999.1049] [Citation(s) in RCA: 84] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
A UK population-based case-control study of Hodgkin's disease (HD) in young adults (16-24 years) included 118 cases and 237 controls matched on year of birth, gender and county of residence. The majority (103) of the cases were classified by Epstein-Barr virus (EBV) status (EBV present in Reed-Stenberg cells), with 19 being EBV-positive. Analyses using conditional logistic regression are presented of subject reports of prior infectious disease (infectious mononucleosis (IM), chicken pox, measles, mumps, pertussis and rubella). In these analyses HD cases are compared with matched controls, EBV-positive cases and EBV-negative cases are compared separately with their controls and formal tests of differences of association by EBV status are applied. A prior history of IM was positively associated with HD (odds ratio (OR) = 2.43, 95% confidence interval (CI) = 1.10-5.33) and with EBV-positive HD (OR = 9.16, 95% CI = 1.07-78.31) and the difference between EBV-positive and EBV-negative HD was statistically significant (P = 0.013). The remaining infectious illnesses (combined) were negatively associated with HD, EBV-positive HD and EBV-negative HD (in the total series, for > or =2 episodes compared with < or =1, OR = 0.45, 95% CI = 0.25-0.83). These results support previous evidence that early exposure to infection protects against HD and that IM increases subsequent risk; the comparisons of EBV-positive and EBV-negative HD are new and generate hypotheses for further study.
Collapse
|
50
|
Abstract
The object of this study was to examine cases of Hodgkin's Disease (HD) for evidence of space-time clustering of onsets by age group, sex and disease subtype. Data comprised 2024 cases of HD aged 0-79 years arising throughout the period 1984 to 1993 in the areas covered by a specialist population based register of leukaemias and lymphomas. Knox space-time analysis was used separately for 3 different age groups: childhood (0-14 years), young adult (15-34 years) and older adults (35-79 years); for adult cases separate analysis was carried out by sex and for the nodular and non-nodular sclerosing subtypes. Results showed that space-time clustering of onsets was limited to the nodular sclerosing cases. It was more prominent in young adult nodular sclerosing cases aged 15-34 years (particularly females) diagnosed in the period 1984-88, than in those diagnosed in 1989-93. We conclude that clustering may provide further evidence that an infectious process is involved in the aetiology of young adult nodular sclerosing cases of HD.
Collapse
|