1
|
Forsdyke DR. When acting as a reproductive barrier for sympatric speciation, hybrid sterility can only be primary. Biol J Linn Soc Lond 2019. [DOI: 10.1093/biolinnean/blz135] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
AbstractAnimal gametes unite to form a zygote that develops into an adult with gonads that, in turn, produce gametes. Interruption of this germinal cycle by prezygotic or postzygotic reproductive barriers can result in two cycles, each with the potential to evolve into a new species. When the speciation process is complete, members of each species are fully reproductively isolated from those of the other. During speciation a primary barrier may be supported and eventually superceded by a later-appearing secondary barrier. For those holding certain cases of prezygotic isolation to be primary (e.g. elephant cannot copulate with mouse), the onus is to show that they had not been preceded over evolutionary time by periods of postzygotic hybrid inviability (genically determined) or sterility (genically or chromosomally determined). Likewise, the onus is upon those holding cases of hybrid inviability to be primary (e.g. Dobzhansky–Muller epistatic incompatibilities) to show that they had not been preceded by periods, however brief, of hybrid sterility. The latter, when acting as a sympatric barrier causing reproductive isolation, can only be primary. In many cases, hybrid sterility may result from incompatibilities between parental chromosomes that attempt to pair during meiosis in the gonad of their offspring (Winge-Crowther-Bateson incompatibilities). While such incompatibilities have long been observed on a microscopic scale, there is growing evidence for a role of dispersed finer DNA sequence differences (i.e. in base k-mers).
Collapse
Affiliation(s)
- Donald R Forsdyke
- Department of Biomedical and Molecular Sciences, Queen’s University, Kingston, ON K7L3N6, Canada
| |
Collapse
|
2
|
Forsdyke DR. Success of alignment-free oligonucleotide (k-mer) analysis confirms relative importance of genomes not genes in speciation and phylogeny. Biol J Linn Soc Lond 2019. [DOI: 10.1093/biolinnean/blz096] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
AbstractThe utility of DNA sequence substrings (k-mers) in alignment-free phylogenetic classification, including that of bacteria and viruses, is increasingly recognized. However, its biological basis eludes many 21st century practitioners. A path from the 19th century recognition of the informational basis of heredity to the modern era can be discerned. Crick’s DNA ‘unpairing postulate’ predicted that recombinational pairing of homologous DNAs during meiosis would be mediated by short k-mers in the loops of stem-loop structures extruded from classical duplex helices. The complementary ‘kissing’ duplex loops – like tRNA anticodon–codon k-mer duplexes – would seed a more extensive pairing that would then extend until limited by lack of homology or other factors. Indeed, this became the principle behind alignment-based methods that assessed similarity by degree of DNA–DNA reassociation in vitro. These are now seen as less sensitive than alignment-free methods that are closely consistent, both theoretically and mechanistically, with chromosomal anti-recombination models for the initiation of divergence into new species. The analytical power of k-mer differences supports the theses that evolutionary advance sometimes serves the needs of nucleic acids (genomes) rather than proteins (genes), and that such differences can play a role in early speciation events.
Collapse
Affiliation(s)
- Donald R Forsdyke
- Department of Biomedical and Molecular Sciences, Queen’s University, Kingston, Ontario, Canada
| |
Collapse
|
3
|
Forsdyke DR. Base Composition, Speciation, and Why the Mitochondrial Barcode Precisely Classifies. ACTA ACUST UNITED AC 2017. [DOI: 10.1007/s13752-017-0267-5] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
|
4
|
Forsdyke DR. Implications of HIV RNA structure for recombination, speciation, and the neutralism-selectionism controversy. Microbes Infect 2013; 16:96-103. [PMID: 24211872 DOI: 10.1016/j.micinf.2013.10.017] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2013] [Revised: 10/24/2013] [Accepted: 10/24/2013] [Indexed: 11/29/2022]
Abstract
The conflict between the needs to encode both a protein (impaired by non-synonymous mutation), and nucleic acid structure (impaired by synonymous or non-synonymous mutation), can sometimes be resolved in favour of the nucleic acid because its structure is critical for a selectively advantageous genome-wide activity--recombination. However, above a sequence difference threshold, recombination is impaired. It may then be advantageous for new species to arise. Building on the work of Grantham and others critical of the neutralist viewpoint, heuristic support for this hypothesis emerged from studies of the base composition and structure of retroviral genomes. The extreme enrichment in the purine A of the RNA of human immunodeficiency virus (HIV-1), parallels the mild purine-loading of the RNAs of most organisms, for which there is an adaptive explanation--immune evasion. However, human T cell leukaemia virus (HTLV-1), with the potential to invade the same host cell, shows extreme enrichment in the pyrimidine C. Assuming the low GC% HIV and the high GC% HTLV-1 to share a common ancestor, it was postulated that differences in GC% had arisen to prevent homologous recombination between these emerging lentiviral species. Sympatrically isolated by this intracellular reproductive barrier, prototypic HIV-1 seized the AU-rich (low GC%) high ground (thus committing to purine A rather than purine G). Prototypic HTLV-1 forwent this advantage and evolved an independent evolutionary strategy--similar to that of the GC%-rich Epstein-Barr virus--profound latency maintained by transcription of one purine-rich mRNA. The evidence supporting these interpretations is reviewed.
Collapse
Affiliation(s)
- Donald R Forsdyke
- Department of Biomedical and Molecular Sciences, Queen's University, Kingston, ON K7L3N6, Canada.
| |
Collapse
|
5
|
Abstract
To detect positive Darwinian selection it is thought essential to compare two sequences. Despite its defects, "the comparative method rules." However, genes evolving rapidly under positive selection conflict more with internal forces (the genome phenotype) than genes evolving slowly under negative selection. In particular, there is conflict with stem-loop potential. The conflict between protein-encoding potential (primary information) and stem-loop potential (secondary information) permits detection of positive selection in a single sequence. The degree to which secondary information is compromised provides a measure of the speed of transmission of primary information. Thus, the sovereignty of the comparative method is challenged not only by its own defects, but also by the availability of a single-sequence method. However, while of limited utility for positive selection, the comparative method casts new light on Darwin's great question — the origin of species. Comparison of rates of synonymous and non-synonymous mutation suggests that branching into new species begins with synonymous mutations.
Collapse
Affiliation(s)
- DONALD R. FORSDYKE
- Department of Biochemistry, Queen's University, Kingston, Ontario, Canada K7L3N6, Canada
| |
Collapse
|
6
|
Kelley DR, Salzberg SL. Clustering metagenomic sequences with interpolated Markov models. BMC Bioinformatics 2010; 11:544. [PMID: 21044341 PMCID: PMC3098094 DOI: 10.1186/1471-2105-11-544] [Citation(s) in RCA: 79] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2010] [Accepted: 11/02/2010] [Indexed: 01/16/2023] Open
Abstract
BACKGROUND Sequencing of environmental DNA (often called metagenomics) has shown tremendous potential to uncover the vast number of unknown microbes that cannot be cultured and sequenced by traditional methods. Because the output from metagenomic sequencing is a large set of reads of unknown origin, clustering reads together that were sequenced from the same species is a crucial analysis step. Many effective approaches to this task rely on sequenced genomes in public databases, but these genomes are a highly biased sample that is not necessarily representative of environments interesting to many metagenomics projects. RESULTS We present SCIMM (Sequence Clustering with Interpolated Markov Models), an unsupervised sequence clustering method. SCIMM achieves greater clustering accuracy than previous unsupervised approaches. We examine the limitations of unsupervised learning on complex datasets, and suggest a hybrid of SCIMM and supervised learning method Phymm called PHYSCIMM that performs better when evolutionarily close training genomes are available. CONCLUSIONS SCIMM and PHYSCIMM are highly accurate methods to cluster metagenomic sequences. SCIMM operates entirely unsupervised, making it ideal for environments containing mostly novel microbes. PHYSCIMM uses supervised learning to improve clustering in environments containing microbial strains from well-characterized genera. SCIMM and PHYSCIMM are available open source from http://www.cbcb.umd.edu/software/scimm.
Collapse
Affiliation(s)
- David R Kelley
- Center for Bioinformatics and Computational Biology, Institute for Advanced Computer Studies, College Park, MD 20742, USA
- Department of Computer Science, University of Maryland, A.V. Williams Building College Park, MD 20742, USA
| | - Steven L Salzberg
- Center for Bioinformatics and Computational Biology, Institute for Advanced Computer Studies, College Park, MD 20742, USA
- Department of Computer Science, University of Maryland, A.V. Williams Building College Park, MD 20742, USA
| |
Collapse
|
7
|
Ancient, recurrent phage attacks and recombination shaped dynamic sequence-variable mosaics at the root of phytoplasma genome evolution. Proc Natl Acad Sci U S A 2008; 105:11827-32. [PMID: 18701718 DOI: 10.1073/pnas.0805237105] [Citation(s) in RCA: 70] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Mobile genetic elements have impacted biological evolution across all studied organisms, but evidence for a role in evolutionary emergence of an entire phylogenetic clade has not been forthcoming. We suggest that mobile element predation played a formative role in emergence of the phytoplasma clade. Phytoplasmas are cell wall-less bacteria that cause numerous diseases in plants. Phylogenetic analyses indicate that these transkingdom parasites descended from Gram-positive walled bacteria, but events giving rise to the first phytoplasma have remained unknown. Previously we discovered a unique feature of phytoplasmal genome architecture, genes clustered in sequence-variable mosaics (SVMs), and suggested that such structures formed through recurrent, targeted attacks by mobile elements. In the present study, we discovered that cryptic prophage remnants, originating from phages in the order Caudovirales, formed SVMs and comprised exceptionally large percentages of the chromosomes of 'Candidatus Phytoplasma asteris'-related strains OYM and AYWB, occupying nearly all major nonsyntenic sections, and accounting for most of the size difference between the two genomes. The clustered phage remnants formed genomic islands exhibiting distinct DNA physical signatures, such as dinucleotide relative abundance and codon position GC values. Phytoplasma strain-specific genes identified as phage morons were located in hypervariable regions within individual SVMs, indicating that prophage remnants played important roles in generating phytoplasma genetic diversity. Because no SVM-like structures could be identified in genomes of ancestral relatives including Acholeplasma spp., we hypothesize that ancient phage attacks leading to SVM formation occurred after divergence of phytoplasmas from acholeplasmas, triggering evolution of the phytoplasma clade.
Collapse
|
8
|
Navarre WW, Porwollik S, Wang Y, McClelland M, Rosen H, Libby SJ, Fang FC. Selective silencing of foreign DNA with low GC content by the H-NS protein in Salmonella. Science 2006; 313:236-8. [PMID: 16763111 DOI: 10.1126/science.1128794] [Citation(s) in RCA: 545] [Impact Index Per Article: 30.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
Horizontal gene transfer plays a major role in microbial evolution. However, newly acquired sequences can decrease fitness unless integrated into preexisting regulatory networks. We found that the histone-like nucleoid structuring protein (H-NS) selectively silences horizontally acquired genes by targeting sequences with GC content lower than the resident genome. Mutations in hns are lethal in Salmonella unless accompanied by compensatory mutations in other regulatory loci. Thus, H-NS provides a previously unrecognized mechanism of bacterial defense against foreign DNA, enabling the acquisition of DNA from exogenous sources while avoiding detrimental consequences from unregulated expression of newly acquired genes. Characteristic GC/AT ratios of bacterial genomes may facilitate discrimination between a cell's own DNA and foreign DNA.
Collapse
|
9
|
Dalevi D, Dubhashi D, Hermansson M. Bayesian classifiers for detecting HGT using fixed and variable order markov models of genomic signatures. Bioinformatics 2006; 22:517-22. [PMID: 16403797 DOI: 10.1093/bioinformatics/btk029] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Analyses of genomic signatures are gaining attention as they allow studies of species-specific relationships without involving alignments of homologous sequences. A naïve Bayesian classifier was built to discriminate between different bacterial compositions of short oligomers, also known as DNA words. The classifier has proven successful in identifying foreign genes in Neisseria meningitis. In this study we extend the classifier approach using either a fixed higher order Markov model (Mk) or a variable length Markov model (VLMk). RESULTS We propose a simple algorithm to lock a variable length Markov model to a certain number of parameters and show that the use of Markov models greatly increases the flexibility and accuracy in prediction to that of a naïve model. We also test the integrity of classifiers in terms of false-negatives and give estimates of the minimal sizes of training data. We end the report by proposing a method to reject a false hypothesis of horizontal gene transfer. AVAILABILITY Software and Supplementary information available at www.cs.chalmers.se/~dalevi/genetic_sign_classifiers/.
Collapse
Affiliation(s)
- Daniel Dalevi
- Department of Computing Science, Chalmers University, SE 412 96 Göteborg, Sweden.
| | | | | |
Collapse
|
10
|
Forsdyke DR. Conflict Resolution. Evol Bioinform Online 2006. [DOI: 10.1007/978-0-387-33419-6_9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022] Open
|
11
|
Forsdyke DR. Heredity as transmission of information: Butlerian 'Intelligent Design.'. CENTAURUS; INTERNATIONAL MAGAZINE OF THE HISTORY OF SCIENCE AND MEDICINE 2006; 48:133-148. [PMID: 18543449 DOI: 10.1111/j.1600-0498.2006.00045.x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
In the 1870s, Ewald Hering and Samuel Butler provided what was, for that time, a scientifically coherent foundation for the Lamarckist view that positive adaptations to the environment acquired during an individual's lifetime can be transmitted to the offspring. Observing that heredity was a form of memory (involving stored information), they distinguished what are now known as genotype and phenotype and proposed that cognitive abilities present in the the most elementary organisms might mediate a transmission of acquired adaptations. While compatible with the then-available facts of evolution, this Butlerian version of 'intelligent design' was rendered less credible by subsequent appreciations of the discrete (discontinuous) inheritance of many phenotypic characters (Mendelism) and of the separation of germ line from soma (Weismanism). However, it can now be seen that 21st-century bioinformatics has 19th-century roots.
Collapse
Affiliation(s)
- Donald R Forsdyke
- Department of Biochemistry, Queen's University, Kingston, Ontario, Canada
| |
Collapse
|
12
|
Rayment JH, Forsdyke DR. Amino acids as placeholders: base-composition pressures on protein length in malaria parasites and prokaryotes. ACTA ACUST UNITED AC 2005; 4:117-30. [PMID: 16128613 DOI: 10.2165/00822942-200504020-00005] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
BACKGROUND The composition and sequence of amino acids in a protein may serve the underlying needs of the nucleic acids that encode the protein (the genome phenotype). In extreme form, amino acids become mere placeholders inserted between functional segments or domains, and--apart from increasing protein length--playing no role in the specific function or structure of a protein (the conventional phenotype). METHODS We studied the genomes of two malarial parasites and 521 prokaryotes (144 complete) that differ widely in GC% and optimum growth temperature, comparing the base compositions of the protein coding regions and corresponding lengths (kilobases). RESULTS Malarial parasites show distinctive responses to base-compositional pressures that increase as protein lengths increase. A low-GC% species (Plasmodium falciparum) is likely to have more placeholder amino acids than an intermediate-GC% species (P. vivax), so that homologous proteins are longer. In prokaryotes, GC% is generally greater and AG% is generally less in open reading frames (ORFs) encoding long proteins. The increased GC% in long ORFs increases as species' GC% increases, and decreases as species' AG% increases. In low- and intermediate-GC% prokaryotic species, increases in ORF GC% as encoded proteins increase in length are largely accounted for by the base compositions of first and second (amino acid-determining) codon positions. In high-GC% prokaryotic species, first and third (non-amino acid-determining) codon positions play this role. CONCLUSION In low- and intermediate-GC% prokaryotes, placeholder amino acids are likely to be well defined, corresponding to codons enriched in G and/or C at first and second positions. In high-GC% prokaryotes, placeholder amino acids are likely to be less well defined. Increases in ORF GC% as encoded proteins increase in length are greater in mesophiles than in thermophiles, which are constrained from increasing protein lengths in response to base-composition pressures.
Collapse
Affiliation(s)
- Jonathan H Rayment
- Department of Biochemistry, Queen's University, Kingston, Ontario, Canada
| | | |
Collapse
|