1
|
Forsdyke DR. Genomic compliance with Chargaff's second parity rule may have originated non-adaptively, but stem-loops now function adaptively. J Theor Biol 2024; 595:111943. [PMID: 39277166 DOI: 10.1016/j.jtbi.2024.111943] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2024] [Revised: 07/06/2024] [Accepted: 09/07/2024] [Indexed: 09/17/2024]
Abstract
Of Chargaff's four rules on DNA base quantity, his second parity rule (PR-2) is the most contentious. Various biometricians (e.g., Sueoka, Lobry) regarded PR-2 compliance as a non-adaptive feature of modern genomes that could be modeled through interrelations among mutation rates. However, PR-2 compliance with stem-loop potential was considered adaptively relevant by biochemists familiar with analyses of nucleic acid structure (e.g., of Crick) and of meiotic recombination (e.g., of Kleckner). Meanwhile, other biometricians had shown that PR-2 complementarity extended beyond individual bases (1-mers) to oligonucleotides (k-mers), possibly reflecting "advantageous DNA structure" (Nussinov). An "introns early" hypothesis (Reanney, Forsdyke) had suggested a primordial nucleic acid world with recombination-mediated error-correction requiring genome-wide stem-loop potential to have evolved prior to localized intrusions of protein-encoding potential (exons). Thus, a primordial genome was equivalent to one long intron. Indeed, when assessed as the base order-dependent component (correcting for local influences of GC%), modern genes, especially when evolving rapidly under positive Darwinian selection, display high intronic stem-loop potential. This suggests forced migration from neighboring exons by competing protein-encoding potential. PR-2 compliance may have first arisen non-adaptively. Primary prototypic structures were later strengthened by their adaptive contribution to recombination. Thus, contentious views may actually be in harmony.
Collapse
Affiliation(s)
- Donald R Forsdyke
- Department of Biomedical and Molecular Sciences, Queen's University, Kingston, Ontario K7L3N6, Canada.
| |
Collapse
|
2
|
Forsdyke DR. Speciation, natural selection, and networks: three historians versus theoretical population geneticists. Theory Biosci 2024; 143:1-26. [PMID: 38282046 DOI: 10.1007/s12064-024-00412-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Accepted: 01/06/2024] [Indexed: 01/30/2024]
Abstract
In 1913, the geneticist William Bateson called for a halt in studies of genetic phenomena until evolutionary fundamentals had been sufficiently addressed at the molecular level. Nevertheless, in the 1960s, the theoretical population geneticists celebrated a "modern synthesis" of the teachings of Mendel and Darwin, with an exclusive role for natural selection in speciation. This was supported, albeit with minor reservations, by historians Mark Adams and William Provine, who taught it to generations of students. In subsequent decades, doubts were raised by molecular biologists and, despite the deep influence of various mentors, Adams and Provine noted serious anomalies and began to question traditional "just-so-stories." They were joined in challenging the genetic orthodoxy by a scientist-historian, Donald Forsdyke, who suggested that a "collective variation" postulated by Darwin's young research associate, George Romanes, and a mysterious "residue" postulated by Bateson, might relate to differences in short runs of DNA bases (oligonucleotides). The dispute between a small network of historians and a large network of geneticists can be understood in the context of national politics. Contrasts are drawn between democracies, where capturing the narrative makes reversal difficult, and dictatorships, where overthrow of a supportive dictator can result in rapid reversal.
Collapse
Affiliation(s)
- Donald R Forsdyke
- Department of Biomedical and Molecular Sciences, Queen's University, Kingston, ON, K7L3N6, Canada.
| |
Collapse
|
3
|
Arias PM, Butler J, Randhawa GS, Soltysiak MPM, Hill KA, Kari L. Environment and taxonomy shape the genomic signature of prokaryotic extremophiles. Sci Rep 2023; 13:16105. [PMID: 37752120 PMCID: PMC10522608 DOI: 10.1038/s41598-023-42518-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Accepted: 09/11/2023] [Indexed: 09/28/2023] Open
Abstract
This study provides comprehensive quantitative evidence suggesting that adaptations to extreme temperatures and pH imprint a discernible environmental component in the genomic signature of microbial extremophiles. Both supervised and unsupervised machine learning algorithms were used to analyze genomic signatures, each computed as the k-mer frequency vector of a 500 kbp DNA fragment arbitrarily selected to represent a genome. Computational experiments classified/clustered genomic signatures extracted from a curated dataset of [Formula: see text] extremophile (temperature, pH) bacteria and archaea genomes, at multiple scales of analysis, [Formula: see text]. The supervised learning resulted in high accuracies for taxonomic classifications at [Formula: see text], and medium to medium-high accuracies for environment category classifications of the same datasets at [Formula: see text]. For [Formula: see text], our findings were largely consistent with amino acid compositional biases and codon usage patterns in coding regions, previously attributed to extreme environment adaptations. The unsupervised learning of unlabelled sequences identified several exemplars of hyperthermophilic organisms with large similarities in their genomic signatures, in spite of belonging to different domains in the Tree of Life.
Collapse
Affiliation(s)
- Pablo Millán Arias
- School of Computer Science, University of Waterloo, Waterloo, ON, Canada.
| | - Joseph Butler
- Department of Biology, University of Western Ontario, London, ON, Canada
| | - Gurjit S Randhawa
- School of Mathematical and Computational Sciences, University of Prince Edward Island, Charlottetown, PE, Canada
| | | | - Kathleen A Hill
- Department of Biology, University of Western Ontario, London, ON, Canada
| | - Lila Kari
- School of Computer Science, University of Waterloo, Waterloo, ON, Canada
| |
Collapse
|
4
|
Khrustalev VV, Khrustaleva TA, Popinako AV. Germline mutations directions are different between introns of the same gene: case study of the gene coding for amyloid-beta precursor protein. Genetica 2023; 151:61-73. [PMID: 36129589 DOI: 10.1007/s10709-022-00166-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2022] [Accepted: 09/08/2022] [Indexed: 02/01/2023]
Abstract
Amyloid-beta precursor protein (APP) is highly conserved in mammals. This feature allowed us to compare nucleotide usage biases in fourfold degenerated sites along the length of its coding region for 146 species of mammals and birds in search of fragments with significant deviations. Even though cytosine usage has the highest value in fourfold degenerated sites in APP coding region from all tested placental mammals, in contrast to marsupial mammals with the bias toward thymine usage, the most frequent germline and somatic mutations in human APP coding region are C to T and G to A transitions. The same mutational AT-pressure is characteristic for germline mutations in introns of human APP gene. However, surprisingly, there are several exceptional introns with deviations in germline mutations rates. The most of those introns surround exons with exceptional biases in nucleotide usage in fourfold degenerated sites. Existence of such fragments in exons 4 and 5, as well as in exon 14, can be connected with the presence of lncRNA genes in complementary strand of DNA. Exceptional nucleotide usage bias in exons 16 and 17 that contain a sequence encoding amyloid-beta peptides can be explained either by the presence of yet unmapped lncRNA(s), or by the autonomous expression of a short mRNA that encodes just C-terminal part of the APP providing an alternative source of amyloid-beta peptides. This hypothesis is supported by the increased rate of T to C transitions in introns 16-17 and 17-18 of Human APP gene relatively to other introns.
Collapse
Affiliation(s)
| | | | - Anna Vladimirovna Popinako
- Bach Institute of Biochemistry, Research Center of Biotechnology of the Russian Academy of Sciences, Moscow, Russian Federation
| |
Collapse
|
5
|
Balaban M, Bristy NA, Faisal A, Bayzid MS, Mirarab S. Genome-wide alignment-free phylogenetic distance estimation under a no strand-bias model. BIOINFORMATICS ADVANCES 2022; 2:vbac055. [PMID: 35992043 PMCID: PMC9383262 DOI: 10.1093/bioadv/vbac055] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/21/2022] [Accepted: 08/09/2022] [Indexed: 01/27/2023]
Abstract
While alignment has been the dominant approach for determining homology prior to phylogenetic inference, alignment-free methods can simplify the analysis, especially when analyzing genome-wide data. Furthermore, alignment-free methods present the only option for emerging forms of data, such as genome skims, which do not permit assembly. Despite the appeal, alignment-free methods have not been competitive with alignment-based methods in terms of accuracy. One limitation of alignment-free methods is their reliance on simplified models of sequence evolution such as Jukes-Cantor. If we can estimate frequencies of base substitutions in an alignment-free setting, we can compute pairwise distances under more complex models. However, since the strand of DNA sequences is unknown for many forms of genome-wide data, which arguably present the best use case for alignment-free methods, the most complex models that one can use are the so-called no strand-bias models. We show how to calculate distances under a four-parameter no strand-bias model called TK4 without relying on alignments or assemblies. The main idea is to replace letters in the input sequences and recompute Jaccard indices between k-mer sets. However, on larger genomes, we also need to compute the number of k-mer mismatches after replacement due to random chance as opposed to homology. We show in simulation that alignment-free distances can be highly accurate when genomes evolve under the assumed models and study the accuracy on assembled and unassembled biological data. Availability and implementation Our software is available open source at https://github.com/nishatbristy007/NSB. Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
| | | | - Ahnaf Faisal
- Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka 1205, Bangladesh
| | - Md Shamsuzzoha Bayzid
- Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka 1205, Bangladesh
| | | |
Collapse
|
6
|
Smith G, Manzano-Marín A, Reyes-Prieto M, Antunes CSR, Ashworth V, Goselle ON, Jan AAA, Moya A, Latorre A, Perotti MA, Braig HR. Human follicular mites: Ectoparasites becoming symbionts. Mol Biol Evol 2022; 39:msac125. [PMID: 35724423 PMCID: PMC9218549 DOI: 10.1093/molbev/msac125] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2021] [Revised: 05/23/2022] [Accepted: 05/31/2022] [Indexed: 12/13/2022] Open
Abstract
Most humans carry mites in the hair follicles of their skin for their entire lives. Follicular mites are the only metazoans tha continuously live on humans. We propose that Demodex folliculorum (Acari) represents a transitional stage from a host-injuring obligate parasite to an obligate symbiont. Here, we describe the profound impact of this transition on the genome and physiology of the mite. Genome sequencing revealed that the permanent host association of D. folliculorum led to an extensive genome reduction through relaxed selection and genetic drift, resulting in the smallest number of protein-coding genes yet identified among panarthropods. Confocal microscopy revealed that this gene loss coincided with an extreme reduction in the number of cells. Single uninucleate muscle cells are sufficient to operate each of the three segments that form each walking leg. While it has been assumed that the reduction of the cell number in parasites starts early in development, we identified a greater total number of cells in the last developmental stage (nymph) than in the terminal adult stage, suggesting that reduction starts at the adult or ultimate stage of development. This is the first evolutionary step in an arthropod species adopting a reductive, parasitic or endosymbiotic lifestyle. Somatic nuclei show underreplication at the diploid stage. Novel eye structures or photoreceptors as well as a unique human host melatonin-guided day/night rhythm are proposed for the first time. The loss of DNA repair genes coupled with extreme endogamy might have set this mite species on an evolutionary dead-end trajectory.
Collapse
Affiliation(s)
- Gilbert Smith
- School of Natural Sciences, Bangor University, Bangor, Wales, United Kingdom
| | - Alejandro Manzano-Marín
- Centre for Microbiology and Environmental Systems Science (CMESS), University of Vienna, Vienna, Austria
| | - Mariana Reyes-Prieto
- Institute of Integrative Systems Biology (I2Sysbio), Universitat de València and Spanish Research Council (CSIC), València, Spain
- Foundation for the Promotion of Health and Biomedical Research of the Valencian Community (FISABIO), València, Spain
| | | | - Victoria Ashworth
- School of Natural Sciences, Bangor University, Bangor, Wales, United Kingdom
| | - Obed Nanjul Goselle
- School of Natural Sciences, Bangor University, Bangor, Wales, United Kingdom
| | | | - Andrés Moya
- Institute of Integrative Systems Biology (I2Sysbio), Universitat de València and Spanish Research Council (CSIC), València, Spain
- Foundation for the Promotion of Health and Biomedical Research of the Valencian Community (FISABIO), València, Spain
- Center for Networked Biomedical Research in Epidemiology and Public Health (CIBEResp), Madrid, Spain
| | - Amparo Latorre
- Institute of Integrative Systems Biology (I2Sysbio), Universitat de València and Spanish Research Council (CSIC), València, Spain
- Foundation for the Promotion of Health and Biomedical Research of the Valencian Community (FISABIO), València, Spain
- Center for Networked Biomedical Research in Epidemiology and Public Health (CIBEResp), Madrid, Spain
| | - M Alejandra Perotti
- School of Biological Sciences, University of Reading, Reading, United Kingdom
| | - Henk R Braig
- School of Natural Sciences, Bangor University, Bangor, Wales, United Kingdom
- Institute and Museum of Natural Sciences, National University of San Juan, San Juan, Argentina
| |
Collapse
|
7
|
Hu EZ, Lan XR, Liu ZL, Gao J, Niu DK. A positive correlation between GC content and growth temperature in prokaryotes. BMC Genomics 2022; 23:110. [PMID: 35139824 PMCID: PMC8827189 DOI: 10.1186/s12864-022-08353-7] [Citation(s) in RCA: 40] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2021] [Accepted: 01/31/2022] [Indexed: 01/27/2023] Open
Abstract
BACKGROUND GC pairs are generally more stable than AT pairs; GC-rich genomes were proposed to be more adapted to high temperatures than AT-rich genomes. Previous studies consistently showed positive correlations between growth temperature and the GC contents of structural RNA genes. However, for the whole genome sequences and the silent sites of the codons in protein-coding genes, the relationship between GC content and growth temperature is in a long-lasting debate. RESULTS With a dataset much larger than previous studies (681 bacteria and 155 archaea with completely assembled genomes), our phylogenetic comparative analyses showed positive correlations between optimal growth temperature (Topt) and GC content both in bacterial and archaeal structural RNA genes and in bacterial whole genome sequences, chromosomal sequences, plasmid sequences, core genes, and accessory genes. However, in the 155 archaea, we did not observe a significant positive correlation of Topt with whole-genome GC content (GCw) or GC content at four-fold degenerate sites. We randomly drew 155 samples from the 681 bacteria for 1000 rounds. In most cases (> 95%), the positive correlations between Topt and genomic GC contents became statistically nonsignificant (P > 0.05). This result suggested that the small sample sizes might account for the lack of positive correlations between growth temperature and genomic GC content in the 155 archaea and the bacterial samples of previous studies. Comparing the GC content among four categories (psychrophiles/psychrotrophiles, mesophiles, thermophiles, and hyperthermophiles) also revealed a positive correlation between GCw and growth temperature in bacteria. By including the GCw of incompletely assembled genomes, we expanded the sample size of archaea to 303. Positive correlations between GCw and Topt appear especially after excluding the halophilic archaea whose GC contents might be strongly shaped by intense UV radiation. CONCLUSIONS This study explains the previous contradictory observations and ends a long debate. Prokaryotes growing in high temperatures have higher GC contents. Thermal adaptation is one possible explanation for the positive association. Meanwhile, we propose that the elevated efficiency of DNA repair in response to heat mutagenesis might have the by-product of increasing GC content like that happens in intracellular symbionts and marine bacterioplankton.
Collapse
Affiliation(s)
- En-Ze Hu
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering and Beijing Key Laboratory of Gene Resource and Molecular Development, College of Life Sciences, Beijing Normal University, Beijing, 100875, China
| | - Xin-Ran Lan
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering and Beijing Key Laboratory of Gene Resource and Molecular Development, College of Life Sciences, Beijing Normal University, Beijing, 100875, China
| | - Zhi-Ling Liu
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering and Beijing Key Laboratory of Gene Resource and Molecular Development, College of Life Sciences, Beijing Normal University, Beijing, 100875, China
| | - Jie Gao
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering and Beijing Key Laboratory of Gene Resource and Molecular Development, College of Life Sciences, Beijing Normal University, Beijing, 100875, China
| | - Deng-Ke Niu
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering and Beijing Key Laboratory of Gene Resource and Molecular Development, College of Life Sciences, Beijing Normal University, Beijing, 100875, China.
| |
Collapse
|
8
|
Zhang C, Forsdyke DR. Potential Achilles heels of SARS-CoV-2 are best displayed by the base order-dependent component of RNA folding energy. Comput Biol Chem 2021; 94:107570. [PMID: 34500325 PMCID: PMC8410225 DOI: 10.1016/j.compbiolchem.2021.107570] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2021] [Revised: 08/29/2021] [Accepted: 08/30/2021] [Indexed: 11/29/2022]
Abstract
The base order-dependent component of folding energy has revealed a highly conserved region in HIV-1 genomes that associates with RNA structure. This corresponds to a packaging signal that is recognized by the nucleocapsid domain of the Gag polyprotein. Long viewed as a potential HIV-1 "Achilles heel," the signal can be targeted by a new antiviral compound. Although SARS-CoV-2 differs in many respects from HIV-1, the same technology displays regions with a high base order-dependent folding energy component, which are also highly conserved. This indicates structural invariance (SI) sustained by natural selection. While the regions are often also protein-encoding (e. g. NSP3, ORF3a), we suggest that their nucleic acid level functions can be considered potential "Achilles heels" for SARS-CoV-2, perhaps susceptible to therapies like those envisaged for AIDS. The ribosomal frameshifting element scored well, but higher SI scores were obtained in other regions, including those encoding NSP13 and the nucleocapsid (N) protein.
Collapse
Affiliation(s)
- Chiyu Zhang
- Shanghai Public Health Clinical Center, Fudan University, Shanghai, China
| | - Donald R Forsdyke
- Department of Biomedical and Molecular Sciences, Queen's University, Kingston, Ontario K7L3N6, Canada.
| |
Collapse
|