1
|
Yu H, Yau SST. The optimal metric for viral genome space. Comput Struct Biotechnol J 2024; 23:2083-2096. [PMID: 38803517 PMCID: PMC11128839 DOI: 10.1016/j.csbj.2024.05.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Revised: 04/22/2024] [Accepted: 05/04/2024] [Indexed: 05/29/2024] Open
Abstract
Understanding the structural similarity between genomes is pivotal in classification and phylogenetic analysis. As the number of known genomes rockets, alignment-free methods have gained considerable attention. Among these methods, the natural vector method stands out as it represents sequences as vectors using statistical moments, enabling effective clustering based on families in biological taxonomy. However, determining an optimal metric that combines different elements in natural vectors remains challenging due to the absence of a rigorous theoretical framework for weighting different k-mers and orders. In this study, we address this challenge by transforming the determination of optimal weights into an optimization problem and resolving it through gradient-based techniques. Our experimental results underscore the substantial improvement in classification accuracy achieved by employing these optimal weights, reaching an impressive 92.73% on the testing set, surpassing other alignment-free methods. On one hand, our method offers an outstanding metric for virus classification, and on the other hand, it provides valuable insights into feature integration within alignment-free methods.
Collapse
Affiliation(s)
- Hongyu Yu
- Department of Mathematical Sciences, Tsinghua University, Beijing, 100084, People's Republic of China
| | - Stephen S.-T. Yau
- Department of Mathematical Sciences, Tsinghua University, Beijing, 100084, People's Republic of China
- Beijing Institute of Mathematical Sciences and Applications (Bimsa), Beijing, 101408, People's Republic of China
| |
Collapse
|
2
|
Wang T, Yu ZG, Li J. CGRWDL: alignment-free phylogeny reconstruction method for viruses based on chaos game representation weighted by dynamical language model. Front Microbiol 2024; 15:1339156. [PMID: 38572227 PMCID: PMC10987876 DOI: 10.3389/fmicb.2024.1339156] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Accepted: 02/23/2024] [Indexed: 04/05/2024] Open
Abstract
Traditional alignment-based methods meet serious challenges in genome sequence comparison and phylogeny reconstruction due to their high computational complexity. Here, we propose a new alignment-free method to analyze the phylogenetic relationships (classification) among species. In our method, the dynamical language (DL) model and the chaos game representation (CGR) method are used to characterize the frequency information and the context information of k-mers in a sequence, respectively. Then for each DNA sequence or protein sequence in a dataset, our method converts the sequence into a feature vector that represents the sequence information based on CGR weighted by the DL model to infer phylogenetic relationships. We name our method CGRWDL. Its performance was tested on both DNA and protein sequences of 8 datasets of viruses to construct the phylogenetic trees. We compared the Robinson-Foulds (RF) distance between the phylogenetic tree constructed by CGRWDL and the reference tree by other advanced methods for each dataset. The results show that the phylogenetic trees constructed by CGRWDL can accurately classify the viruses, and the RF scores between the trees and the reference trees are smaller than that with other methods.
Collapse
Affiliation(s)
- Ting Wang
- National Center for Applied Mathematics in Hunan, Xiangtan University, Xiangtan, Hunan, China
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Xiangtan, Hunan, China
| | - Zu-Guo Yu
- National Center for Applied Mathematics in Hunan, Xiangtan University, Xiangtan, Hunan, China
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Xiangtan, Hunan, China
| | - Jinyan Li
- School of Computer Science and Control Engineering, Shenzhen Institute of Advanced Technology, Shenzhen, Guangdong, China
| |
Collapse
|
3
|
Yu H, Yau SST. Automated recognition of chromosome fusion using an alignment-free natural vector method. Front Genet 2024; 15:1364951. [PMID: 38572414 PMCID: PMC10987741 DOI: 10.3389/fgene.2024.1364951] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Accepted: 03/06/2024] [Indexed: 04/05/2024] Open
Abstract
Chromosomal fusion is a significant form of structural variation, but research into algorithms for its identification has been limited. Most existing methods rely on synteny analysis, which necessitates manual annotations and always involves inefficient sequence alignments. In this paper, we present a novel alignment-free algorithm for chromosomal fusion recognition. Our method transforms the problem into a series of assignment problems using natural vectors and efficiently solves them with the Kuhn-Munkres algorithm. When applied to the human/gorilla and swamp buffalo/river buffalo datasets, our algorithm successfully and efficiently identifies chromosomal fusion events. Notably, our approach offers several advantages, including higher processing speeds by eliminating time-consuming alignments and removing the need for manual annotations. By an alignment-free perspective, our algorithm initially considers entire chromosomes instead of fragments to identify chromosomal structural variations, offering substantial potential to advance research in this field.
Collapse
Affiliation(s)
- Hongyu Yu
- Department of Mathematical Sciences, Tsinghua University, Beijing, China
| | - Stephen S.-T. Yau
- Department of Mathematical Sciences, Tsinghua University, Beijing, China
- Yanqi Lake Beijing Institute of Mathematical Science and Applications (BIMSA), Beijing, China
| |
Collapse
|
4
|
Rachtman E, Sarmashghi S, Bafna V, Mirarab S. Quantifying the uncertainty of assembly-free genome-wide distance estimates and phylogenetic relationships using subsampling. Cell Syst 2022; 13:817-829.e3. [PMID: 36265468 PMCID: PMC9589918 DOI: 10.1016/j.cels.2022.06.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2021] [Revised: 03/14/2022] [Accepted: 06/28/2022] [Indexed: 01/26/2023]
Abstract
Computing distance between two genomes without alignments or even access to assemblies has many downstream analyses. However, alignment-free methods, including in the fast-growing field of genome skimming, are hampered by a significant methodological gap. While accurate methods (many k-mer-based) for assembly-free distance calculation exist, measuring the uncertainty of estimated distances has not been sufficiently studied. In this paper, we show that bootstrapping, the standard non-parametric method of measuring estimator uncertainty, is not accurate for k-mer-based methods that rely on k-mer frequency profiles. Instead, we propose using subsampling (with no replacement) in combination with a correction step to reduce the variance of the inferred distribution. We show that the distribution of distances using our procedure matches the true uncertainty of the estimator. The resulting phylogenetic support values effectively differentiate between correct and incorrect branches and identify controversial branches that change across alignment-free and alignment-based phylogenies reported in the literature.
Collapse
Affiliation(s)
- Eleonora Rachtman
- Bioinformatics and Systems Biology Graduate Program, UC San Diego, San Diego, CA 92093, USA
| | - Shahab Sarmashghi
- Department of Electrical and Computer Engineering, UC San Diego, San Diego, CA 92093, USA
| | - Vineet Bafna
- Department of Computer Science and Engineering, UC San Diego, San Diego, CA 92093, USA
| | - Siavash Mirarab
- Department of Electrical and Computer Engineering, UC San Diego, San Diego, CA 92093, USA.
| |
Collapse
|
5
|
Balaban M, Bristy NA, Faisal A, Bayzid MS, Mirarab S. Genome-wide alignment-free phylogenetic distance estimation under a no strand-bias model. BIOINFORMATICS ADVANCES 2022; 2:vbac055. [PMID: 35992043 PMCID: PMC9383262 DOI: 10.1093/bioadv/vbac055] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/21/2022] [Accepted: 08/09/2022] [Indexed: 01/27/2023]
Abstract
While alignment has been the dominant approach for determining homology prior to phylogenetic inference, alignment-free methods can simplify the analysis, especially when analyzing genome-wide data. Furthermore, alignment-free methods present the only option for emerging forms of data, such as genome skims, which do not permit assembly. Despite the appeal, alignment-free methods have not been competitive with alignment-based methods in terms of accuracy. One limitation of alignment-free methods is their reliance on simplified models of sequence evolution such as Jukes-Cantor. If we can estimate frequencies of base substitutions in an alignment-free setting, we can compute pairwise distances under more complex models. However, since the strand of DNA sequences is unknown for many forms of genome-wide data, which arguably present the best use case for alignment-free methods, the most complex models that one can use are the so-called no strand-bias models. We show how to calculate distances under a four-parameter no strand-bias model called TK4 without relying on alignments or assemblies. The main idea is to replace letters in the input sequences and recompute Jaccard indices between k-mer sets. However, on larger genomes, we also need to compute the number of k-mer mismatches after replacement due to random chance as opposed to homology. We show in simulation that alignment-free distances can be highly accurate when genomes evolve under the assumed models and study the accuracy on assembled and unassembled biological data. Availability and implementation Our software is available open source at https://github.com/nishatbristy007/NSB. Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
| | | | - Ahnaf Faisal
- Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka 1205, Bangladesh
| | - Md Shamsuzzoha Bayzid
- Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka 1205, Bangladesh
| | | |
Collapse
|
6
|
Wang Z, Wen Z, Jiang M, Xia F, Wang M, Zhuge X, Dai J. Dissemination of virulence and resistance genes among Klebsiella pneumoniae via outer membrane vesicle: An important plasmid transfer mechanism to promote the emergence of carbapenem-resistant hypervirulent Klebsiella pneumoniae. Transbound Emerg Dis 2022; 69:e2661-e2676. [PMID: 35679514 DOI: 10.1111/tbed.14615] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2022] [Revised: 05/15/2022] [Accepted: 06/07/2022] [Indexed: 12/01/2022]
Abstract
Klebsiella pneumoniae is well-known opportunistic enterobacteria involved in complex clinical infections in humans and animals. The domestic animals might be a source of the multidrug-resistant virulent K. pneumoniae to humans. K. pneumoniae infections in domestic animals are considered as an emergent global concern. The horizontal gene transfer plays essential roles in bacterial genome evolution by spread of virulence and resistance determinants. However, the virulence genes can be transferred horizontally via K. pneumoniae-derived outer membrane vesicles (OMVs) remains to be unreported. In this study, we performed complete genome sequencing of two K. pneumoniae HvK2115 and CRK3022 with hypervirulent or carbapenem-resistant traits. OMVs from K. pneumoniae HvK2115 and CRK3022 were purified and observed. The carriage of virulence or resistance genes in K. pneumoniae OMVs was identified. The influence of OMVs on the horizontal transfer of virulence-related or drug-resistant plasmids among K. pneumoniae strains was evaluated thoroughly. The plasmid transfer to recipient bacteria through OMVs was identified by polymerase chain reaction, pulsed field gel electrophoresis and Southern blot. This study revealed that OMVs could mediate the intraspecific and interspecific horizontal transfer of the virulence plasmid phvK2115. OMVs could simultaneously transfer two resistance plasmids into K. pneumoniae and Escherichia coli recipient strains. OMVs-mediated horizontal transfer of virulence plasmid phvK2115 could significantly enhance the pathogenicity of human carbapenem-resistant K. pneumoniae CRK3022. The CRK3022 acquired the virulence plasmid phvK2115 could become a CR-hvKp strain. It was critically important that OMVs-mediated horizontal transfer of phvK2115 lead to the coexistence of virulence and carbapenem-resistance genes in K. pneumoniae, resulting in the emerging of carbapenem-resistant hypervirulent K. pneumoniae.
Collapse
Affiliation(s)
- Zhongxing Wang
- Department of Nutrition and Food Hygiene, School of Public Health, Nantong University, Nantong, Jiangsu, China.,MOE Joint International Research Laboratory of Animal Health and Food Safety, College of Veterinary Medicine, Nanjing Agricultural University, Nanjing, China
| | - Zhe Wen
- Department of Nutrition and Food Hygiene, School of Public Health, Nantong University, Nantong, Jiangsu, China.,MOE Joint International Research Laboratory of Animal Health and Food Safety, College of Veterinary Medicine, Nanjing Agricultural University, Nanjing, China
| | - Min Jiang
- MOE Joint International Research Laboratory of Animal Health and Food Safety, College of Veterinary Medicine, Nanjing Agricultural University, Nanjing, China
| | - Fufang Xia
- MOE Joint International Research Laboratory of Animal Health and Food Safety, College of Veterinary Medicine, Nanjing Agricultural University, Nanjing, China
| | - Min Wang
- Department of Nutrition and Food Hygiene, School of Public Health, Nantong University, Nantong, Jiangsu, China
| | - Xiangkai Zhuge
- Department of Nutrition and Food Hygiene, School of Public Health, Nantong University, Nantong, Jiangsu, China.,MOE Joint International Research Laboratory of Animal Health and Food Safety, College of Veterinary Medicine, Nanjing Agricultural University, Nanjing, China
| | - Jianjun Dai
- MOE Joint International Research Laboratory of Animal Health and Food Safety, College of Veterinary Medicine, Nanjing Agricultural University, Nanjing, China.,College of Pharmacy, China Pharmaceutical University, Nanjing, China
| |
Collapse
|
7
|
Liyanapathiranage P, Wagner N, Avram O, Pupko T, Potnis N. Phylogenetic Distribution and Evolution of Type VI Secretion System in the Genus Xanthomonas. Front Microbiol 2022; 13:840308. [PMID: 35495725 PMCID: PMC9048695 DOI: 10.3389/fmicb.2022.840308] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2021] [Accepted: 02/10/2022] [Indexed: 11/13/2022] Open
Abstract
The type VI secretion system (T6SS) present in many Gram-negative bacteria is a contact-dependent apparatus that can directly deliver secreted effectors or toxins into diverse neighboring cellular targets including both prokaryotic and eukaryotic organisms. Recent reverse genetics studies with T6 core gene loci have indicated the importance of functional T6SS toward overall competitive fitness in various pathogenic Xanthomonas spp. To understand the contribution of T6SS toward ecology and evolution of Xanthomonas spp., we explored the distribution of the three distinguishable T6SS clusters, i3*, i3***, and i4, in approximately 1,740 Xanthomonas genomes, along with their conservation, genetic organization, and their evolutionary patterns in this genus. Screening genomes for core genes of each T6 cluster indicated that 40% of the sequenced strains possess two T6 clusters, with combinations of i3*** and i3* or i3*** and i4. A few strains of Xanthomonas citri, Xanthomonas phaseoli, and Xanthomonas cissicola were the exception, possessing a unique combination of i3* and i4. The findings also indicated clade-specific distribution of T6SS clusters. Phylogenetic analysis demonstrated that T6SS clusters i3* and i3*** were probably acquired by the ancestor of the genus Xanthomonas, followed by gain or loss of individual clusters upon diversification into subsequent clades. T6 i4 cluster has been acquired in recent independent events by group 2 xanthomonads followed by its spread via horizontal dissemination across distinct clades across groups 1 and 2 xanthomonads. We also noted reshuffling of the entire core T6 loci, as well as T6SS spike complex components, hcp and vgrG, among different species. Our findings indicate that gain or loss events of specific T6SS clusters across Xanthomonas phylogeny have not been random.
Collapse
Affiliation(s)
| | - Naama Wagner
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Oren Avram
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Tal Pupko
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Neha Potnis
- Department of Entomology and Plant Pathology, Auburn University, Auburn, AL, United States
- *Correspondence: Neha Potnis,
| |
Collapse
|
8
|
Microbial storage and its implications for soil ecology. THE ISME JOURNAL 2022; 16:617-629. [PMID: 34593996 PMCID: PMC8857262 DOI: 10.1038/s41396-021-01110-w] [Citation(s) in RCA: 27] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/09/2020] [Revised: 08/31/2021] [Accepted: 09/07/2021] [Indexed: 02/08/2023]
Abstract
Organisms throughout the tree of life accumulate chemical resources, in particular forms or compartments, to secure their availability for future use. Here we review microbial storage and its ecological significance by assembling several rich but disconnected lines of research in microbiology, biogeochemistry, and the ecology of macroscopic organisms. Evidence is drawn from various systems, but we pay particular attention to soils, where microorganisms play crucial roles in global element cycles. An assembly of genus-level data demonstrates the likely prevalence of storage traits in soil. We provide a theoretical basis for microbial storage ecology by distinguishing a spectrum of storage strategies ranging from surplus storage (storage of abundant resources that are not immediately required) to reserve storage (storage of limited resources at the cost of other metabolic functions). This distinction highlights that microorganisms can invest in storage at times of surplus and under conditions of scarcity. We then align storage with trait-based microbial life-history strategies, leading to the hypothesis that ruderal species, which are adapted to disturbance, rely less on storage than microorganisms adapted to stress or high competition. We explore the implications of storage for soil biogeochemistry, microbial biomass, and element transformations and present a process-based model of intracellular carbon storage. Our model indicates that storage can mitigate against stoichiometric imbalances, thereby enhancing biomass growth and resource-use efficiency in the face of unbalanced resources. Given the central roles of microbes in biogeochemical cycles, we propose that microbial storage may be influential on macroscopic scales, from carbon cycling to ecosystem stability.
Collapse
|
9
|
Dong R, Pei S, Guan M, Yau SC, Yin C, He RL, Yau SST. Full Chromosomal Relationships Between Populations and the Origin of Humans. Front Genet 2022; 12:828805. [PMID: 35186019 PMCID: PMC8847220 DOI: 10.3389/fgene.2021.828805] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2021] [Accepted: 12/22/2021] [Indexed: 11/23/2022] Open
Abstract
A comprehensive description of human genomes is essential for understanding human evolution and relationships between modern populations. However, most published literature focuses on local alignment comparison of several genes rather than the complete evolutionary record of individual genomes. Combining with data from the 1,000 Genomes Project, we successfully reconstructed 2,504 individual genomes and propose Divided Natural Vector method to analyze the distribution of nucleotides in the genomes. Comparisons based on autosomes, sex chromosomes and mitochondrial genomes reveal the genetic relationships between populations, and different inheritance pattern leads to different phylogenetic results. Results based on mitochondrial genomes confirm the “out-of-Africa” hypothesis and assert that humans, at least females, most likely originated in eastern Africa. The reconstructed genomes are stored on our server and can be further used for any genome-scale analysis of humans (http://yaulab.math.tsinghua.edu.cn/2022_1000genomesprojectdata/). This project provides the complete genomes of thousands of individuals and lays the groundwork for genome-level analyses of the genetic relationships between populations and the origin of humans.
Collapse
Affiliation(s)
- Rui Dong
- Yau Mathematical Sciences Center, Tsinghua University, Beijing, China.,Yanqi Lake Beijing Institute of Mathematical Sciences and Applications, Beijing, China
| | - Shaojun Pei
- Department of Mathematical Sciences, Tsinghua University, Beijing, China
| | - Mengcen Guan
- Department of Mathematical Sciences, Tsinghua University, Beijing, China
| | - Shek-Chung Yau
- Information Technology Services Center, The Hong Kong University of Science and Technology, Kowloon, Hong Kong, China
| | - Changchuan Yin
- Department of Mathematics, Statistics and Computer Science, University of Illinois at Chicago, Chicago, IL, United States
| | - Rong L He
- Department of Biological Sciences, Chicago State University, Chicago, IL, United States
| | - Stephen S-T Yau
- Department of Mathematical Sciences, Tsinghua University, Beijing, China.,Yanqi Lake Beijing Institute of Mathematical Sciences and Applications, Beijing, China
| |
Collapse
|
10
|
Giannakara M, Koumandou VL. Evolution of two-component quorum sensing systems. Access Microbiol 2022; 4:000303. [PMID: 35252749 PMCID: PMC8895600 DOI: 10.1099/acmi.0.000303] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2021] [Accepted: 11/15/2021] [Indexed: 12/16/2022] Open
Abstract
Quorum sensing (QS) is a cell-to-cell communication system that enables bacteria to coordinate their gene expression depending on their population density, via the detection of small molecules called autoinducers. In this way bacteria can act collectively to initiate processes like bioluminescence, virulence and biofilm formation. Autoinducers are detected by receptors, some of which are part of two-component signal transduction systems (TCS), which comprise of a (usually membrane-bound) sensor histidine kinase (HK) and a cognate response regulator (RR). Different QS systems are used by different bacterial taxa, and their relative evolutionary relationships have not been extensively studied. To address this, we used the Kyoto Encyclopedia of Genes and Genomes (KEGG) database to identify all the QS HKs and RRs that are part of TCSs and examined their conservation across microbial taxa. We compared the combinations of the highly conserved domains in the different families of receptors and response regulators using the Simple Modular Architecture Research Tool (SMART) and KEGG databases, and we also carried out phylogenetic analyses for each family, and all families together. The distribution of the different QS systems across taxa, indicates flexibility in HK–RR pairing and highlights the need for further study of the most abundant systems. For both the QS receptors and the response regulators, our results indicate close evolutionary relationships between certain families, highlighting a common evolutionary history which can inform future applications, such as the design of novel inhibitors for pathogenic QS systems.
Collapse
Affiliation(s)
- Marina Giannakara
- Genetics Laboratory, Department of Biotechnology, Agricultural University of Athens, Athens, Greece
| | - Vassiliki Lila Koumandou
- Genetics Laboratory, Department of Biotechnology, Agricultural University of Athens, Athens, Greece
| |
Collapse
|
11
|
Zhong H, Loukides G, Pissis SP. Clustering sequence graphs. DATA KNOWL ENG 2022. [DOI: 10.1016/j.datak.2022.101981] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
|
12
|
Nakayinga R, Makumi A, Tumuhaise V, Tinzaara W. Xanthomonas bacteriophages: a review of their biology and biocontrol applications in agriculture. BMC Microbiol 2021; 21:291. [PMID: 34696726 PMCID: PMC8543423 DOI: 10.1186/s12866-021-02351-7] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2021] [Accepted: 10/12/2021] [Indexed: 11/10/2022] Open
Abstract
Phytopathogenic bacteria are economically important because they affect crop yields and threaten the livelihoods of farmers worldwide. The genus Xanthomonas is particularly significant because it is associated with some plant diseases that cause tremendous loss in yields of globally essential crops. Current management practices are ineffective, unsustainable and harmful to natural ecosystems. Bacteriophage (phage) biocontrol for plant disease management has been of particular interest from the early nineteenth century to date. Xanthomonas phage research for plant disease management continues to demonstrate promising results under laboratory and field conditions. AgriPhage has developed phage products for the control of Xanthomonas campestris pv. vesicatoria and Xanthomonas citri subsp. citri. These are causative agents for tomato, pepper spot and speck disease as well as citrus canker disease. Phage-mediated biocontrol is becoming a viable option because phages occur naturally and are safe for disease control and management. Thorough knowledge of biological characteristics of Xanthomonas phages is vital for developing effective biocontrol products. This review covers Xanthomonas phage research highlighting aspects of their ecology, biology and biocontrol applications.
Collapse
Affiliation(s)
- Ritah Nakayinga
- Department of Biological Sciences, Faculty of Science, Kyambogo University, P.O. Box 1, Kyambogo, Uganda.
| | - Angela Makumi
- Department of Animal and Human Health, General Biosciences, International Livestock Research Institute, P.O. Box 3070, Nairobi, 00100, Kenya
| | - Venansio Tumuhaise
- Department of Agriculture, Faculty of Vocational Studies, Kyambogo University, P.O. Box 1, Kyambogo, Uganda
| | - William Tinzaara
- Department of Agriculture, Faculty of Vocational Studies, Kyambogo University, P.O. Box 1, Kyambogo, Uganda
| |
Collapse
|
13
|
Kořený L, Oborník M, Horáková E, Waller RF, Lukeš J. The convoluted history of haem biosynthesis. Biol Rev Camb Philos Soc 2021; 97:141-162. [PMID: 34472688 DOI: 10.1111/brv.12794] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2020] [Revised: 08/12/2021] [Accepted: 08/19/2021] [Indexed: 01/14/2023]
Abstract
The capacity of haem to transfer electrons, bind diatomic gases, and catalyse various biochemical reactions makes it one of the essential biomolecules on Earth and one that was likely used by the earliest forms of cellular life. Since the description of haem biosynthesis, our understanding of this multi-step pathway has been almost exclusively derived from a handful of model organisms from narrow taxonomic contexts. Recent advances in genome sequencing and functional studies of diverse and previously neglected groups have led to discoveries of alternative routes of haem biosynthesis that deviate from the 'classical' pathway. In this review, we take an evolutionarily broad approach to illuminate the remarkable diversity and adaptability of haem synthesis, from prokaryotes to eukaryotes, showing the range of strategies that organisms employ to obtain and utilise haem. In particular, the complex evolutionary histories of eukaryotes that involve multiple endosymbioses and horizontal gene transfers are reflected in the mosaic origin of numerous metabolic pathways with haem biosynthesis being a striking case. We show how different evolutionary trajectories and distinct life strategies resulted in pronounced tensions and differences in the spatial organisation of the haem biosynthesis pathway, in some cases leading to a complete loss of a haem-synthesis capacity and, rarely, even loss of a requirement for haem altogether.
Collapse
Affiliation(s)
- Luděk Kořený
- Department of Biochemistry, University of Cambridge, Hopkins Building, Tennis Court Road, Cambridge, CB2 1QW, U.K
| | - Miroslav Oborník
- Institute of Parasitology, Biology Centre of the Czech Academy of Sciences, Branišovská 31, České Budějovice (Budweis), 370 05, Czech Republic.,Faculty of Sciences, University of South Bohemia, Branišovská, České Budějovice (Budweis), 31, Czech Republic
| | - Eva Horáková
- Institute of Parasitology, Biology Centre of the Czech Academy of Sciences, Branišovská 31, České Budějovice (Budweis), 370 05, Czech Republic
| | - Ross F Waller
- Department of Biochemistry, University of Cambridge, Hopkins Building, Tennis Court Road, Cambridge, CB2 1QW, U.K
| | - Julius Lukeš
- Institute of Parasitology, Biology Centre of the Czech Academy of Sciences, Branišovská 31, České Budějovice (Budweis), 370 05, Czech Republic.,Faculty of Sciences, University of South Bohemia, Branišovská, České Budějovice (Budweis), 31, Czech Republic
| |
Collapse
|
14
|
Cofactor Specificity of Glucose-6-Phosphate Dehydrogenase Isozymes in Pseudomonas putida Reveals a General Principle Underlying Glycolytic Strategies in Bacteria. mSystems 2021; 6:6/2/e00014-21. [PMID: 33727391 PMCID: PMC8546961 DOI: 10.1128/msystems.00014-21] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Glucose-6-phosphate dehydrogenase (G6PDH) is widely distributed in nature and catalyzes the first committing step in the oxidative branch of the pentose phosphate (PP) pathway, feeding either the reductive PP or the Entner-Doudoroff pathway. Besides its role in central carbon metabolism, this dehydrogenase provides reduced cofactors, thereby affecting redox balance. Although G6PDH is typically considered to display specificity toward NADP+, some variants accept NAD+ similarly or even preferentially. Furthermore, the number of G6PDH isozymes encoded in bacterial genomes varies from none to more than four orthologues. On this background, we systematically analyzed the interplay of the three G6PDH isoforms of the soil bacterium Pseudomonas putida KT2440 from genomic, genetic, and biochemical perspectives. P. putida represents an ideal model to tackle this endeavor, as its genome harbors gene orthologues for most dehydrogenases in central carbon metabolism. We show that the three G6PDHs of strain KT2440 have different cofactor specificities and that the isoforms encoded by zwfA and zwfB carry most of the activity, acting as metabolic “gatekeepers” for carbon sources that enter at different nodes of the biochemical network. Moreover, we demonstrate how multiplication of G6PDH isoforms is a widespread strategy in bacteria, correlating with the presence of an incomplete Embden-Meyerhof-Parnas pathway. The abundance of G6PDH isoforms in these species goes hand in hand with low NADP+ affinity, at least in one isozyme. We propose that gene duplication and relaxation in cofactor specificity is an evolutionary strategy toward balancing the relative production of NADPH and NADH. IMPORTANCE Protein families have likely arisen during evolution by gene duplication and divergence followed by neofunctionalization. While this phenomenon is well documented for catabolic activities (typical of environmental bacteria that colonize highly polluted niches), the coexistence of multiple isozymes in central carbon catabolism remains relatively unexplored. We have adopted the metabolically versatile soil bacterium Pseudomonas putida KT2440 as a model to interrogate the physiological and evolutionary significance of coexisting glucose-6-phosphate dehydrogenase (G6PDH) isozymes. Our results show that each of the three G6PDHs in this bacterium display distinct biochemical properties, especially at the level of cofactor preference, impacting bacterial physiology in a carbon source-dependent fashion. Furthermore, the presence of multiple G6PDHs differing in NAD+ or NADP+ specificity in bacterial species strongly correlates with their predominant metabolic lifestyle. Our findings support the notion that multiplication of genes encoding cofactor-dependent dehydrogenases is a general evolutionary strategy toward achieving redox balance according to the growth conditions.
Collapse
|
15
|
McKinnon LM, Miller JB, Whiting MF, Kauwe JSK, Ridge PG. A comprehensive analysis of the phylogenetic signal in ramp sequences in 211 vertebrates. Sci Rep 2021; 11:622. [PMID: 33436653 PMCID: PMC7803996 DOI: 10.1038/s41598-020-78803-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2020] [Accepted: 11/23/2020] [Indexed: 01/24/2023] Open
Abstract
Ramp sequences increase translational speed and accuracy when rare, slowly-translated codons are found at the beginnings of genes. Here, the results of the first analysis of ramp sequences in a phylogenetic construct are presented. Ramp sequences were compared from 247 vertebrates (114 Mammalian and 133 non-mammalian), where the presence and absence of ramp sequences was analyzed as a binary character in a parsimony and maximum likelihood framework. Additionally, ramp sequences were mapped to the Open Tree of Life synthetic tree to determine the number of parallelisms and reversals that occurred, and those results were compared to random permutations. Parsimony and maximum likelihood analyses of the presence and absence of ramp sequences recovered phylogenies that are highly congruent with established phylogenies. Additionally, 81% of vertebrate mammalian ramps and 81.2% of other vertebrate ramps had less parallelisms and reversals than the mean from 1000 randomly permuted trees. A chi-square analysis of completely orthologous ramp sequences resulted in a p-value < 0.001 as compared to random chance. Ramp sequences recover comparable phylogenies as other phylogenomic methods. Although not all ramp sequences appear to have a phylogenetic signal, more ramp sequences track speciation than expected by random chance. Therefore, ramp sequences may be used in conjunction with other phylogenomic approaches if many orthologs are taken into account. However, phylogenomic methods utilizing few orthologs should be cautious in incorporating ramp sequences because individual ramp sequences may provide conflicting signals.
Collapse
Affiliation(s)
- Lauren M McKinnon
- Department of Biology, Brigham Young University, Provo, UT, 84602, USA
| | - Justin B Miller
- Department of Biology, Brigham Young University, Provo, UT, 84602, USA
| | - Michael F Whiting
- Department of Biology, Brigham Young University, Provo, UT, 84602, USA
- Monte L. Bean Museum, Brigham Young University, Provo, UT, 84602, USA
| | - John S K Kauwe
- Department of Biology, Brigham Young University, Provo, UT, 84602, USA
| | - Perry G Ridge
- Department of Biology, Brigham Young University, Provo, UT, 84602, USA.
| |
Collapse
|
16
|
An SQ, Potnis N, Dow M, Vorhölter FJ, He YQ, Becker A, Teper D, Li Y, Wang N, Bleris L, Tang JL. Mechanistic insights into host adaptation, virulence and epidemiology of the phytopathogen Xanthomonas. FEMS Microbiol Rev 2020; 44:1-32. [PMID: 31578554 PMCID: PMC8042644 DOI: 10.1093/femsre/fuz024] [Citation(s) in RCA: 117] [Impact Index Per Article: 29.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2019] [Accepted: 09/29/2019] [Indexed: 01/15/2023] Open
Abstract
Xanthomonas is a well-studied genus of bacterial plant pathogens whose members cause a variety of diseases in economically important crops worldwide. Genomic and functional studies of these phytopathogens have provided significant understanding of microbial-host interactions, bacterial virulence and host adaptation mechanisms including microbial ecology and epidemiology. In addition, several strains of Xanthomonas are important as producers of the extracellular polysaccharide, xanthan, used in the food and pharmaceutical industries. This polymer has also been implicated in several phases of the bacterial disease cycle. In this review, we summarise the current knowledge on the infection strategies and regulatory networks controlling virulence and adaptation mechanisms from Xanthomonas species and discuss the novel opportunities that this body of work has provided for disease control and plant health.
Collapse
Affiliation(s)
- Shi-Qi An
- National Biofilms Innovation Centre (NBIC), Biological Sciences, University of Southampton, University Road, Southampton SO17 1BJ, UK
| | - Neha Potnis
- Department of Entomology and Plant Pathology, Rouse Life Science Building, Auburn University, Auburn AL36849, USA
| | - Max Dow
- School of Microbiology, Food Science & Technology Building, University College Cork, Cork T12 K8AF, Ireland
| | | | - Yong-Qiang He
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-bioresources, College of Life Science and Technology, Guangxi University, 100 Daxue Road, Nanning 530004, Guangxi, China
| | - Anke Becker
- Loewe Center for Synthetic Microbiology and Department of Biology, Philipps-Universität Marburg, Hans-Meerwein-Straße 6, Marburg 35032, Germany
| | - Doron Teper
- Citrus Research and Education Center, Department of Microbiology and Cell Science, Institute of Food and Agricultural Sciences, University of Florida, 700 Experiment Station Road, Lake Alfred 33850, USA
| | - Yi Li
- Bioengineering Department, University of Texas at Dallas, 2851 Rutford Ave, Richardson, TX 75080, USA.,Center for Systems Biology, University of Texas at Dallas, 800 W Campbell Road, Richardson, TX 75080, USA
| | - Nian Wang
- Citrus Research and Education Center, Department of Microbiology and Cell Science, Institute of Food and Agricultural Sciences, University of Florida, 700 Experiment Station Road, Lake Alfred 33850, USA
| | - Leonidas Bleris
- Bioengineering Department, University of Texas at Dallas, 2851 Rutford Ave, Richardson, TX 75080, USA.,Center for Systems Biology, University of Texas at Dallas, 800 W Campbell Road, Richardson, TX 75080, USA.,Department of Biological Sciences, University of Texas at Dallas, 800 W Campbell Road, Richardson, TX75080, USA
| | - Ji-Liang Tang
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-bioresources, College of Life Science and Technology, Guangxi University, 100 Daxue Road, Nanning 530004, Guangxi, China
| |
Collapse
|
17
|
Delibaş E, Arslan A, Şeker A, Diri B. A novel alignment-free DNA sequence similarity analysis approach based on top-k n-gram match-up. J Mol Graph Model 2020; 100:107693. [PMID: 32805559 DOI: 10.1016/j.jmgm.2020.107693] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2020] [Revised: 06/15/2020] [Accepted: 07/06/2020] [Indexed: 11/17/2022]
Abstract
DNA sequence similarity analysis is an essential task in computational biology and bioinformatics. In nearly all research that explores evolutionary relationships, gene function analysis, protein structure prediction and sequence retrieving, it is necessary to perform similarity calculations. As an alternative to alignment-based sequence comparison methods, which result in high computational cost, alignment-free methods have emerged that calculate similarity by digitizing the sequence in a different space. In this paper, we proposed an alignment-free DNA sequence similarity analysis method based on top-k n-gram matches, with the prediction that common repeating DNA subsections indicate high similarity between DNA sequences. In our method, we determined DNA sequence similarities by measuring similarity among feature vectors created according to top-k n-gram match-up scores without the use of similarity functions. We applied the similarity calculation for three different DNA data sets of different lengths. The phylogenetic relationships revealed by our method show that our trees coincide almost completely with the results of the MEGA software, which is based on sequence alignment. Our findings show that a certain number of frequently recurring common sequence patterns have the power to characterize DNA sequences.
Collapse
Affiliation(s)
- Emre Delibaş
- Department of Computer Engineering, Faculty of Engineering, Sivas Cumhuriyet University, 58140, Sivas, Turkey.
| | - Ahmet Arslan
- Department of Computer Engineering, Faculty of Engineering, Selçuk University, 42250, Konya, Turkey.
| | - Abdulkadir Şeker
- Department of Computer Engineering, Faculty of Engineering, Sivas Cumhuriyet University, 58140, Sivas, Turkey.
| | - Banu Diri
- Department of Computer Engineering, Faculty of Electrical and Electronics, Yıldız Technical University, 34349, Ístanbul, Turkey.
| |
Collapse
|
18
|
Mughal F, Nasir A, Caetano-Anollés G. The origin and evolution of viruses inferred from fold family structure. Arch Virol 2020; 165:2177-2191. [PMID: 32748179 PMCID: PMC7398281 DOI: 10.1007/s00705-020-04724-1] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2020] [Accepted: 05/30/2020] [Indexed: 12/16/2022]
Abstract
The canonical frameworks of viral evolution describe viruses as cellular predecessors, reduced forms of cells, or entities that escaped cellular control. The discovery of giant viruses has changed these standard paradigms. Their genetic, proteomic and structural complexities resemble those of cells, prompting a redefinition and reclassification of viruses. In a previous genome-wide analysis of the evolution of structural domains in proteomes, with domains defined at the fold superfamily level, we found the origins of viruses intertwined with those of ancient cells. Here, we extend these data-driven analyses to the study of fold families confirming the co-evolution of viruses and ancient cells and the genetic ability of viruses to foster molecular innovation. The results support our suggestion that viruses arose by genomic reduction from ancient cells and validate a co-evolutionary ‘symbiogenic’ model of viral origins.
Collapse
Affiliation(s)
- Fizza Mughal
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, IL, USA
- Illinois Informatics Institute, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Arshan Nasir
- Theoretical Biology and Biophysics Group, Los Alamos National Laboratory, Los Alamos, NM, USA
- Department of Biosciences, COMSATS University Islamabad, Islamabad, Pakistan
| | - Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, IL, USA.
- Illinois Informatics Institute, University of Illinois at Urbana-Champaign, Urbana, IL, USA.
| |
Collapse
|
19
|
Dong R, Pei S, Yin C, He RL, Yau SST. Analysis of the Hosts and Transmission Paths of SARS-CoV-2 in the COVID-19 Outbreak. Genes (Basel) 2020; 11:E637. [PMID: 32526937 PMCID: PMC7349679 DOI: 10.3390/genes11060637] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2020] [Revised: 05/30/2020] [Accepted: 06/03/2020] [Indexed: 12/11/2022] Open
Abstract
The severe respiratory disease COVID-19 was initially reported in Wuhan, China, in December 2019, and spread into many provinces from Wuhan. The corresponding pathogen was soon identified as a novel coronavirus named SARS-CoV-2 (formerly, 2019-nCoV). As of 2 May, 2020, over 3 million COVID-19 cases had been confirmed, and 235,290 deaths had been reported globally, and the numbers are still increasing. It is important to understand the phylogenetic relationship between SARS-CoV-2 and known coronaviruses, and to identify its hosts for preventing the next round of emergency outbreak. In this study, we employ an effective alignment-free approach, the Natural Vector method, to analyze the phylogeny and classify the coronaviruses based on genomic and protein data. Our results show that SARS-CoV-2 is closely related to, but distinct from the SARS-CoV branch. By analyzing the genetic distances from the SARS-CoV-2 strain to the coronaviruses residing in animal hosts, we establish that the most possible transmission path originates from bats to pangolins to humans.
Collapse
Affiliation(s)
- Rui Dong
- Department of Mathematical Sciences, Tsinghua University, Beijing 100084, China; (R.D.); (S.P.)
| | - Shaojun Pei
- Department of Mathematical Sciences, Tsinghua University, Beijing 100084, China; (R.D.); (S.P.)
| | - Changchuan Yin
- Department of Mathematics, Statistics and Computer Science, University of Illinois at Chicago, Chicago, IL 60607, USA;
| | - Rong Lucy He
- Department of Biological Sciences, Chicago State University, Chicago, IL 60628, USA;
| | - Stephen S.-T. Yau
- Department of Mathematical Sciences, Tsinghua University, Beijing 100084, China; (R.D.); (S.P.)
| |
Collapse
|
20
|
Miller JB, McKinnon LM, Whiting MF, Kauwe JSK, Ridge PG. Codon Pairs are Phylogenetically Conserved: A comprehensive analysis of codon pairing conservation across the Tree of Life. PLoS One 2020; 15:e0232260. [PMID: 32401752 PMCID: PMC7219770 DOI: 10.1371/journal.pone.0232260] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2019] [Accepted: 04/10/2020] [Indexed: 11/27/2022] Open
Abstract
Identical codon pairing and co-tRNA codon pairing increase translational efficiency within genes when two codons that encode the same amino acid are translated by the same tRNA before it diffuses from the ribosome. We examine the phylogenetic signal in both identical and co-tRNA codon pairing across 23 428 species using alignment-free and parsimony methods. We determined that conserved codon pairing typically has a smaller window size than the length of a ribosome, and codon pairing tracks phylogenies across various taxonomic groups. We report a comprehensive analysis of codon pairing, including the extent to which each codon pairs. Our parsimony method generally recovers phylogenies that are more congruent with the established phylogenies than our alignment-free method. However, four of the ten taxonomic groups did not have sufficient orthologous codon pairings and were therefore analyzed using only the alignment-free methods. Since the recovered phylogenies using only codon pairing largely match phylogenies from the Open Tree of Life and the NCBI taxonomy, and are comparable to trees recovered by other algorithms, we propose that codon pairing biases are phylogenetically conserved and should be considered in conjunction with other phylogenomic techniques.
Collapse
Affiliation(s)
- Justin B. Miller
- Department of Biology, Brigham Young University, Provo, UT, United States of America
| | - Lauren M. McKinnon
- Department of Biology, Brigham Young University, Provo, UT, United States of America
| | - Michael F. Whiting
- Department of Biology, Brigham Young University, Provo, UT, United States of America
- M.L. Bean Museum, Brigham Young University, Provo, UT, United States of America
| | - John S. K. Kauwe
- Department of Biology, Brigham Young University, Provo, UT, United States of America
| | - Perry G. Ridge
- Department of Biology, Brigham Young University, Provo, UT, United States of America
| |
Collapse
|
21
|
Abstract
Tree of life (ToL) is a metaphorical tree that captures a simplified narrative of the evolutionary course and kinship among all living organisms of today. We have reconstructed a whole-proteome ToL for over 4,000 different extant species for which complete or near-complete genome sequences are available in public databases. The ToL suggests that 1) all extant organisms of this study can be grouped into 2 “Supergroups,” 6 “Major Groups,” or 35+ “Groups”; 2) the order of emergence of the “founders” of all the groups may be assigned on an evolutionary progression scale; and 3) all of the founders of the groups have emerged in a “deep burst” near the root of the ToL—an explosive birth of life’s diversity. An organism tree of life (organism ToL) is a conceptual and metaphorical tree to capture a simplified narrative of the evolutionary course and kinship among the extant organisms. Such a tree cannot be experimentally validated but may be reconstructed based on characteristics associated with the organisms. Since the whole-genome sequence of an organism is, at present, the most comprehensive descriptor of the organism, a whole-genome sequence-based ToL can be an empirically derivable surrogate for the organism ToL. However, experimentally determining the whole-genome sequences of many diverse organisms was practically impossible until recently. We have constructed three types of ToLs for diversely sampled organisms using the sequences of whole genome, of whole transcriptome, and of whole proteome. Of the three, whole-proteome sequence-based ToL (whole-proteome ToL), constructed by applying information theory-based feature frequency profile method, an “alignment-free” method, gave the most topologically stable ToL. Here, we describe the main features of a whole-proteome ToL for 4,023 species with known complete or almost complete genome sequences on grouping and kinship among the groups at deep evolutionary levels. The ToL reveals 1) all extant organisms of this study can be grouped into 2 “Supergroups,” 6 “Major Groups,” or 35+ “Groups”; 2) the order of emergence of the “founders” of all of the groups may be assigned on an evolutionary progression scale; 3) all of the founders of the groups have emerged in a “deep burst” at the very beginning period near the root of the ToL—an explosive birth of life’s diversity.
Collapse
|
22
|
De Pierri CR, Voyceik R, Santos de Mattos LGC, Kulik MG, Camargo JO, Repula de Oliveira AM, de Lima Nichio BT, Marchaukoski JN, da Silva Filho AC, Guizelini D, Ortega JM, Pedrosa FO, Raittz RT. SWeeP: representing large biological sequences datasets in compact vectors. Sci Rep 2020; 10:91. [PMID: 31919449 PMCID: PMC6952362 DOI: 10.1038/s41598-019-55627-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2019] [Accepted: 12/02/2019] [Indexed: 12/25/2022] Open
Abstract
Vectoral and alignment-free approaches to biological sequence representation have been explored in bioinformatics to efficiently handle big data. Even so, most current methods involve sequence comparisons via alignment-based heuristics and fail when applied to the analysis of large data sets. Here, we present “Spaced Words Projection (SWeeP)”, a method for representing biological sequences using relatively small vectors while preserving intersequence comparability. SWeeP uses spaced-words by scanning the sequences and generating indices to create a higher-dimensional vector that is later projected onto a smaller randomly oriented orthonormal base. We constructed phylogenetic trees for all organisms with mitochondrial and bacterial protein data in the NCBI database. SWeeP quickly built complete and accurate trees for these organisms with low computational cost. We compared SWeeP to other alignment-free methods and Sweep was 10 to 100 times quicker than the other techniques. A tool to build SWeeP vectors is available at https://sourceforge.net/projects/spacedwordsprojection/.
Collapse
Affiliation(s)
- Camilla Reginatto De Pierri
- Federal University of Paraná - SEPT, Graduate Program in Bioinformatics, Curitiba, Paraná, Brazil.,Federal University of Paraná, Department of Biochemistry and Molecular Biology, Curitiba, Paraná, Brazil
| | - Ricardo Voyceik
- Federal University of Minas Gerais, Institute of Biological Sciences (ICB), Belo Horizonte, Minas Gerais, Brazil
| | | | - Mariane Gonçalves Kulik
- Federal University of Paraná - SEPT, Graduate Program in Bioinformatics, Curitiba, Paraná, Brazil
| | - Josué Oliveira Camargo
- Federal University of Paraná - SEPT, Graduate Program in Bioinformatics, Curitiba, Paraná, Brazil.,Federal University of Paraná, Department of Biochemistry and Molecular Biology, Curitiba, Paraná, Brazil
| | - Aryel Marlus Repula de Oliveira
- Federal University of Paraná - SEPT, Graduate Program in Bioinformatics, Curitiba, Paraná, Brazil.,Federal University of Paraná, Department of Genetics, Curitiba, Paraná, Brazil
| | - Bruno Thiago de Lima Nichio
- Federal University of Paraná - SEPT, Graduate Program in Bioinformatics, Curitiba, Paraná, Brazil.,Federal University of Paraná, Department of Biochemistry and Molecular Biology, Curitiba, Paraná, Brazil
| | | | - Antonio Camilo da Silva Filho
- Federal University of Paraná - SEPT, Graduate Program in Bioinformatics, Curitiba, Paraná, Brazil.,Federal University of Paraná, Department of Pharmaceutical Sciences, Curitiba, Paraná, Brazil
| | - Dieval Guizelini
- Federal University of Paraná - SEPT, Graduate Program in Bioinformatics, Curitiba, Paraná, Brazil
| | - J Miguel Ortega
- Federal University of Minas Gerais, Institute of Biological Sciences (ICB), Belo Horizonte, Minas Gerais, Brazil
| | - Fabio O Pedrosa
- Federal University of Paraná - SEPT, Graduate Program in Bioinformatics, Curitiba, Paraná, Brazil.,Federal University of Paraná, Department of Biochemistry and Molecular Biology, Curitiba, Paraná, Brazil
| | - Roberto Tadeu Raittz
- Federal University of Paraná - SEPT, Graduate Program in Bioinformatics, Curitiba, Paraná, Brazil. .,Federal University of Minas Gerais, Institute of Biological Sciences (ICB), Belo Horizonte, Minas Gerais, Brazil. .,Federal University of Paraná, Department of Genetics, Curitiba, Paraná, Brazil.
| |
Collapse
|
23
|
Bernard G, Chan CX, Chan YB, Chua XY, Cong Y, Hogan JM, Maetschke SR, Ragan MA. Alignment-free inference of hierarchical and reticulate phylogenomic relationships. Brief Bioinform 2019; 20:426-435. [PMID: 28673025 PMCID: PMC6433738 DOI: 10.1093/bib/bbx067] [Citation(s) in RCA: 53] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2017] [Revised: 05/04/2017] [Indexed: 11/22/2022] Open
Abstract
We are amidst an ongoing flood of sequence data arising from the application of high-throughput technologies, and a concomitant fundamental revision in our understanding of how genomes evolve individually and within the biosphere. Workflows for phylogenomic inference must accommodate data that are not only much larger than before, but often more error prone and perhaps misassembled, or not assembled in the first place. Moreover, genomes of microbes, viruses and plasmids evolve not only by tree-like descent with modification but also by incorporating stretches of exogenous DNA. Thus, next-generation phylogenomics must address computational scalability while rethinking the nature of orthogroups, the alignment of multiple sequences and the inference and comparison of trees. New phylogenomic workflows have begun to take shape based on so-called alignment-free (AF) approaches. Here, we review the conceptual foundations of AF phylogenetics for the hierarchical (vertical) and reticulate (lateral) components of genome evolution, focusing on methods based on k-mers. We reflect on what seems to be successful, and on where further development is needed.
Collapse
|
24
|
Zielezinski A, Girgis HZ, Bernard G, Leimeister CA, Tang K, Dencker T, Lau AK, Röhling S, Choi JJ, Waterman MS, Comin M, Kim SH, Vinga S, Almeida JS, Chan CX, James BT, Sun F, Morgenstern B, Karlowski WM. Benchmarking of alignment-free sequence comparison methods. Genome Biol 2019; 20:144. [PMID: 31345254 PMCID: PMC6659240 DOI: 10.1186/s13059-019-1755-7] [Citation(s) in RCA: 101] [Impact Index Per Article: 20.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2019] [Accepted: 07/03/2019] [Indexed: 11/22/2022] Open
Abstract
BACKGROUND Alignment-free (AF) sequence comparison is attracting persistent interest driven by data-intensive applications. Hence, many AF procedures have been proposed in recent years, but a lack of a clearly defined benchmarking consensus hampers their performance assessment. RESULTS Here, we present a community resource (http://afproject.org) to establish standards for comparing alignment-free approaches across different areas of sequence-based research. We characterize 74 AF methods available in 24 software tools for five research applications, namely, protein sequence classification, gene tree inference, regulatory element detection, genome-based phylogenetic inference, and reconstruction of species trees under horizontal gene transfer and recombination events. CONCLUSION The interactive web service allows researchers to explore the performance of alignment-free tools relevant to their data types and analytical goals. It also allows method developers to assess their own algorithms and compare them with current state-of-the-art tools, accelerating the development of new, more accurate AF solutions.
Collapse
Affiliation(s)
- Andrzej Zielezinski
- Department of Computational Biology, Faculty of Biology, Adam Mickiewicz University Poznan, Uniwersytetu Poznańskiego 6, 61-614, Poznan, Poland
| | - Hani Z Girgis
- Tandy School of Computer Science, The University of Tulsa, 800 South Tucker Drive, Tulsa, OK, 74104, USA
| | | | - Chris-Andre Leimeister
- Department of Bioinformatics, Institute of Microbiology and Genetics, University of Göttingen, Goldschmidtstr. 1, 37077, Göttingen, Germany
| | - Kujin Tang
- Department of Biological Sciences, Quantitative and Computational Biology Program, University of Southern California, Los Angeles, CA, 90089, USA
| | - Thomas Dencker
- Department of Bioinformatics, Institute of Microbiology and Genetics, University of Göttingen, Goldschmidtstr. 1, 37077, Göttingen, Germany
| | - Anna Katharina Lau
- Department of Bioinformatics, Institute of Microbiology and Genetics, University of Göttingen, Goldschmidtstr. 1, 37077, Göttingen, Germany
| | - Sophie Röhling
- Department of Bioinformatics, Institute of Microbiology and Genetics, University of Göttingen, Goldschmidtstr. 1, 37077, Göttingen, Germany
| | - Jae Jin Choi
- Department of Chemistry, University of California, Berkeley, CA, 94720, USA
- Molecular Biophysics & Integrated Bioimaging Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Michael S Waterman
- Department of Biological Sciences, Quantitative and Computational Biology Program, University of Southern California, Los Angeles, CA, 90089, USA
- Centre for Computational Systems Biology, School of Mathematical Sciences, Fudan University, Shanghai, 200433, China
| | - Matteo Comin
- Department of Information Engineering, University of Padova, Padova, Italy
| | - Sung-Hou Kim
- Department of Chemistry, University of California, Berkeley, CA, 94720, USA
- Molecular Biophysics & Integrated Bioimaging Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Susana Vinga
- INESC-ID, Instituto Superior Técnico, Universidade de Lisboa, Av. Rovisco Pais 1, 1049-001, Lisbon, Portugal
- IDMEC, Instituto Superior Técnico, Universidade de Lisboa, Av. Rovisco Pais 1, 1049-001, Lisbon, Portugal
| | - Jonas S Almeida
- Division of Cancer Epidemiology and Genetics (DCEG), National Cancer Institute (NIH/NCI), Bethesda, USA
| | - Cheong Xin Chan
- Institute for Molecular Bioscience, and School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, QLD, 4072, Australia
| | - Benjamin T James
- Tandy School of Computer Science, The University of Tulsa, 800 South Tucker Drive, Tulsa, OK, 74104, USA
| | - Fengzhu Sun
- Department of Biological Sciences, Quantitative and Computational Biology Program, University of Southern California, Los Angeles, CA, 90089, USA
- Centre for Computational Systems Biology, School of Mathematical Sciences, Fudan University, Shanghai, 200433, China
| | - Burkhard Morgenstern
- Department of Bioinformatics, Institute of Microbiology and Genetics, University of Göttingen, Goldschmidtstr. 1, 37077, Göttingen, Germany
| | - Wojciech M Karlowski
- Department of Computational Biology, Faculty of Biology, Adam Mickiewicz University Poznan, Uniwersytetu Poznańskiego 6, 61-614, Poznan, Poland.
| |
Collapse
|
25
|
Lu YY, Tang K, Ren J, Fuhrman JA, Waterman MS, Sun F. CAFE: aCcelerated Alignment-FrEe sequence analysis. Nucleic Acids Res 2019; 45:W554-W559. [PMID: 28472388 PMCID: PMC5793812 DOI: 10.1093/nar/gkx351] [Citation(s) in RCA: 40] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2017] [Accepted: 04/20/2017] [Indexed: 12/13/2022] Open
Abstract
Alignment-free genome and metagenome comparisons are increasingly important with the development of next generation sequencing (NGS) technologies. Recently developed state-of-the-art k-mer based alignment-free dissimilarity measures including CVTree, \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{upgreek}
\usepackage{mathrsfs}
\setlength{\oddsidemargin}{-69pt}
\begin{document}
}{}$d_2^*$\end{document} and \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{upgreek}
\usepackage{mathrsfs}
\setlength{\oddsidemargin}{-69pt}
\begin{document}
}{}$d_2^S$\end{document} are more computationally expensive than measures based solely on the k-mer frequencies. Here, we report a standalone software, aCcelerated Alignment-FrEe sequence analysis (CAFE), for efficient calculation of 28 alignment-free dissimilarity measures. CAFE allows for both assembled genome sequences and unassembled NGS shotgun reads as input, and wraps the output in a standard PHYLIP format. In downstream analyses, CAFE can also be used to visualize the pairwise dissimilarity measures, including dendrograms, heatmap, principal coordinate analysis and network display. CAFE serves as a general k-mer based alignment-free analysis platform for studying the relationships among genomes and metagenomes, and is freely available at https://github.com/younglululu/CAFE.
Collapse
Affiliation(s)
- Yang Young Lu
- Molecular and Computational Biology Program, Department of Biological Sciences, University of Southern California, CA 90089, USA
| | - Kujin Tang
- Molecular and Computational Biology Program, Department of Biological Sciences, University of Southern California, CA 90089, USA
| | - Jie Ren
- Molecular and Computational Biology Program, Department of Biological Sciences, University of Southern California, CA 90089, USA
| | - Jed A Fuhrman
- Department of Biological Sciences and Wrigley Institute for Environmental Studies, University of Southern California, Los Angeles, CA 90089, USA
| | - Michael S Waterman
- Molecular and Computational Biology Program, Department of Biological Sciences, University of Southern California, CA 90089, USA.,Centre for Computational Systems Biology, School of Mathematical Sciences, Fudan University, 200433 Shanghai, China
| | - Fengzhu Sun
- Molecular and Computational Biology Program, Department of Biological Sciences, University of Southern California, CA 90089, USA.,Centre for Computational Systems Biology, School of Mathematical Sciences, Fudan University, 200433 Shanghai, China
| |
Collapse
|
26
|
Miller JB, McKinnon LM, Whiting MF, Ridge PG. CAM: an alignment-free method to recover phylogenies using codon aversion motifs. PeerJ 2019; 7:e6984. [PMID: 31198636 PMCID: PMC6555396 DOI: 10.7717/peerj.6984] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2018] [Accepted: 04/17/2019] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND Common phylogenomic approaches for recovering phylogenies are often time-consuming and require annotations for orthologous gene relationships that are not always available. In contrast, alignment-free phylogenomic approaches typically use structure and oligomer frequencies to calculate pairwise distances between species. We have developed an approach to quickly calculate distances between species based on codon aversion. METHODS Utilizing a novel alignment-free character state, we present CAM, an alignment-free approach to recover phylogenies by comparing differences in codon aversion motifs (i.e., the set of unused codons within each gene) across all genes within a species. Synonymous codon usage is non-random and differs between organisms, between genes, and even within a single gene, and many genes do not use all possible codons. We report a comprehensive analysis of codon aversion within 229,742,339 genes from 23,428 species across all kingdoms of life, and we provide an alignment-free framework for its use in a phylogenetic construct. For each species, we first construct a set of codon aversion motifs spanning all genes within that species. We define the pairwise distance between two species, A and B, as one minus the number of shared codon aversion motifs divided by the total codon aversion motifs of the species, A or B, containing the fewest motifs. This approach allows us to calculate pairwise distances even when substantial differences in the number of genes or a high rate of divergence between species exists. Finally, we use neighbor-joining to recover phylogenies. RESULTS Using the Open Tree of Life and NCBI Taxonomy Database as expected phylogenies, our approach compares well, recovering phylogenies that largely match expected trees and are comparable to trees recovered using maximum likelihood and other alignment-free approaches. Our technique is much faster than maximum likelihood and similar in accuracy to other alignment-free approaches. Therefore, we propose that codon aversion be considered a phylogenetically conserved character that may be used in future phylogenomic studies. AVAILABILITY CAM, documentation, and test files are freely available on GitHub at https://github.com/ridgelab/cam.
Collapse
Affiliation(s)
- Justin B. Miller
- Department of Biology, Brigham Young University, Provo, UT, United States of America
| | - Lauren M. McKinnon
- Department of Biology, Brigham Young University, Provo, UT, United States of America
| | - Michael F. Whiting
- Department of Biology, Brigham Young University, Provo, UT, United States of America
- Brigham Young University, M.L. Bean Museum, Provo, UT, United States of America
| | - Perry G. Ridge
- Department of Biology, Brigham Young University, Provo, UT, United States of America
| |
Collapse
|
27
|
Abstract
Sam Granick opened his seminal 1957 paper titled 'Speculations on the origins and evolution of photosynthesis' with the assertion that there is a constant urge in human beings to seek beginnings (I concur). This urge has led to an incessant stream of speculative ideas and debates on the evolution of photosynthesis that started in the first half of the twentieth century and shows no signs of abating. Some of these speculative ideas have become commonplace, are taken as fact, but find little support. Here, I review and scrutinize three widely accepted ideas that underpin the current study of the evolution of photosynthesis: first, that the photochemical reaction centres used in anoxygenic photosynthesis are more primitive than those in oxygenic photosynthesis; second, that the probability of acquiring photosynthesis via horizontal gene transfer is greater than the probability of losing photosynthesis; and third, and most important, that the origin of anoxygenic photosynthesis pre-dates the origin of oxygenic photosynthesis. I shall attempt to demonstrate that these three ideas are often grounded in incorrect assumptions built on more assumptions with no experimental or observational support. I hope that this brief review will not only serve as a cautionary tale but also that it will open new avenues of research aimed at disentangling the complex evolution of photosynthesis and its impact on the early history of life and the planet.
Collapse
Affiliation(s)
- Tanai Cardona
- Department of Life Sciences, Imperial College London, London, UK
| |
Collapse
|
28
|
Leimeister CA, Schellhorn J, Dörrer S, Gerth M, Bleidorn C, Morgenstern B. Prot-SpaM: fast alignment-free phylogeny reconstruction based on whole-proteome sequences. Gigascience 2019; 8:giy148. [PMID: 30535314 PMCID: PMC6436989 DOI: 10.1093/gigascience/giy148] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2018] [Revised: 09/10/2018] [Accepted: 11/20/2018] [Indexed: 11/20/2022] Open
Abstract
Word-based or 'alignment-free' sequence comparison has become an active research area in bioinformatics. While previous word-frequency approaches calculated rough measures of sequence similarity or dissimilarity, some new alignment-free methods are able to accurately estimate phylogenetic distances between genomic sequences. One of these approaches is Filtered Spaced Word Matches. Here, we extend this approach to estimate evolutionary distances between complete or incomplete proteomes; our implementation of this approach is called Prot-SpaM. We compare the performance of Prot-SpaM to other alignment-free methods on simulated sequences and on various groups of eukaryotic and prokaryotic taxa. Prot-SpaM can be used to calculate high-quality phylogenetic trees for dozens of whole-proteome sequences in a matter of seconds or minutes and often outperforms other alignment-free approaches. The source code of our software is available through Github: https://github.com/jschellh/ProtSpaM.
Collapse
Affiliation(s)
- Chris-Andre Leimeister
- University of Göttingen, Department of Bioinformatics, Goldschmidtstr. 1, 37077 Göttingen, Germany
| | - Jendrik Schellhorn
- University of Göttingen, Department of Bioinformatics, Goldschmidtstr. 1, 37077 Göttingen, Germany
| | - Svenja Dörrer
- University of Göttingen, Department of Bioinformatics, Goldschmidtstr. 1, 37077 Göttingen, Germany
| | - Michael Gerth
- Institute for Integrative Biology, University of Liverpool, Biosciences Building, Crown Street, L69 7ZB Liverpool, UK
| | - Christoph Bleidorn
- University of Göttingen, Department of Animal Evolution and Biodiversity, Untere Karspüle 2, 37073 Göttingen, Germany
- Museo Nacional de Ciencias Naturales, Spanish National Research Council (CSIC), 28006 Madrid, Spain
| | - Burkhard Morgenstern
- University of Göttingen, Department of Bioinformatics, Goldschmidtstr. 1, 37077 Göttingen, Germany
- Göttingen Center of Molecular Biosciences (GZMB), Justus-von-Liebig-Weg 11, 37077 Göttingen
| |
Collapse
|
29
|
Pornsukarom S, van Vliet AHM, Thakur S. Whole genome sequencing analysis of multiple Salmonella serovars provides insights into phylogenetic relatedness, antimicrobial resistance, and virulence markers across humans, food animals and agriculture environmental sources. BMC Genomics 2018; 19:801. [PMID: 30400810 PMCID: PMC6218967 DOI: 10.1186/s12864-018-5137-4] [Citation(s) in RCA: 82] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2018] [Accepted: 10/02/2018] [Indexed: 11/13/2022] Open
Abstract
Background Salmonella enterica is a significant foodborne pathogen, which can be transmitted via several distinct routes, and reports on acquisition of antimicrobial resistance (AMR) are increasing. To better understand the association between human Salmonella clinical isolates and the potential environmental/animal reservoirs, whole genome sequencing (WGS) was used to investigate the epidemiology and AMR patterns within Salmonella isolates from two adjacent US states. Results WGS data of 200 S. enterica isolates recovered from human (n = 44), swine (n = 32), poultry (n = 22), and farm environment (n = 102) were used for in silico prediction of serovar, distribution of virulence genes, and phylogenetically clustered using core genome single nucleotide polymorphism (SNP) and feature frequency profiling (FFP). Furthermore, AMR was studied both by genotypic prediction using five curated AMR databases, and compared to phenotypic AMR using broth microdilution. Core genome SNP-based and FFP-based phylogenetic trees showed consistent clustering of isolates into the respective serovars, and suggested clustering of isolates based on the source of isolation. The overall correlation of phenotypic and genotypic AMR was 87.61% and 97.13% for sensitivity and specificity, respectively. AMR and virulence genes clustered with the Salmonella serovars, while there were also associations between the presence of virulence genes in both animal/environmental isolates and human clinical samples. Conclusions WGS is a helpful tool for Salmonella phylogenetic analysis, AMR and virulence gene predictions. The clinical isolates clustered closely with animal and environmental isolates, suggesting that animals and environment are potential sources for dissemination of AMR and virulence genes between Salmonella serovars. Electronic supplementary material The online version of this article (10.1186/s12864-018-5137-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Suchawan Pornsukarom
- Faculty of Veterinary Medicine, Rajamangala University of Technology Tawan-ok, Chonburi, Thailand
| | - Arnoud H M van Vliet
- School of Veterinary Medicine, Faculty of Health and Medical Sciences, University of Surrey, Surrey, UK
| | - Siddhartha Thakur
- Department of Population Health and Pathobiology, College of Veterinary Medicine, North Carolina State University, Raleigh, NC, USA. .,Comparative Medicine Institute, North Carolina State University, Raleigh, NC, USA.
| |
Collapse
|
30
|
Caetano-Anollés G, Nasir A, Kim KM, Caetano-Anollés D. Rooting Phylogenies and the Tree of Life While Minimizing Ad Hoc and Auxiliary Assumptions. Evol Bioinform Online 2018; 14:1176934318805101. [PMID: 30364468 PMCID: PMC6196624 DOI: 10.1177/1176934318805101] [Citation(s) in RCA: 38] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2018] [Accepted: 09/05/2018] [Indexed: 12/25/2022] Open
Abstract
Phylogenetic methods unearth evolutionary history when supported by three starting points of reason: (1) the continuity axiom begs the existence of a "model" of evolutionary change, (2) the singularity axiom defines the historical ground plan (phylogeny) in which biological entities (taxa) evolve, and (3) the memory axiom demands identification of biological attributes (characters) with historical information. Axiom consequences are interlinked, making the retrodiction enterprise an endeavor of reciprocal fulfillment. In particular, establishing direction of evolutionary change (character polarization) roots phylogenies and enables testing the existence of historical memory (homology). Unfortunately, rooting phylogenies, especially the "tree of life," generally follow narratives instead of integrating empirical and theoretical knowledge of retrodictive exploration. This stems mostly from a focus on molecular sequence analysis and uncertainties about rooting methods. Here, we review available rooting criteria, highlighting the need to minimize both ad hoc and auxiliary assumptions, especially argumentative ad hocness. We show that while the outgroup comparison method has been widely adopted, the generality criterion of nesting and additive phylogenetic change embodied in Weston rule offers the most powerful rooting approach. We also propose a change of focus, from phylogenies that describe the evolution of biological systems to those that describe the evolution of parts of those systems. This weakens violation of character independence, helps formalize the generality criterion of rooting, and provides new ways to study the problem of evolution.
Collapse
Affiliation(s)
- Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Arshan Nasir
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, IL, USA
- Department of Biosciences, COMSATS University Islamabad, Islamabad, Pakistan
| | - Kyung Mo Kim
- Division of Polar Life Sciences, Korea Polar Research Institute, Incheon, Republic of Korea
| | - Derek Caetano-Anollés
- Department of Evolutionary Genetics, Max-Planck-Institut für Evolutionsbiologie, Plön, Germany
| |
Collapse
|
31
|
Wiegand S, Jogler M, Jogler C. On the maverick Planctomycetes. FEMS Microbiol Rev 2018; 42:739-760. [DOI: 10.1093/femsre/fuy029] [Citation(s) in RCA: 134] [Impact Index Per Article: 22.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2018] [Accepted: 07/22/2018] [Indexed: 01/01/2023] Open
Affiliation(s)
- Sandra Wiegand
- Department of Microbiology, Radboud University, Heyendaalseweg 135, Nijmegen, The Netherlands
| | - Mareike Jogler
- Leibniz Institute DSMZ, Inhoffenstraße 7b, 38124 Braunschweig, Germany
| | - Christian Jogler
- Department of Microbiology, Radboud University, Heyendaalseweg 135, Nijmegen, The Netherlands
| |
Collapse
|
32
|
Ren J, Bai X, Lu YY, Tang K, Wang Y, Reinert G, Sun F. Alignment-Free Sequence Analysis and Applications. Annu Rev Biomed Data Sci 2018; 1:93-114. [PMID: 31828235 PMCID: PMC6905628 DOI: 10.1146/annurev-biodatasci-080917-013431] [Citation(s) in RCA: 58] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
Genome and metagenome comparisons based on large amounts of next generation sequencing (NGS) data pose significant challenges for alignment-based approaches due to the huge data size and the relatively short length of the reads. Alignment-free approaches based on the counts of word patterns in NGS data do not depend on the complete genome and are generally computationally efficient. Thus, they contribute significantly to genome and metagenome comparison. Recently, novel statistical approaches have been developed for the comparison of both long and shotgun sequences. These approaches have been applied to many problems including the comparison of gene regulatory regions, genome sequences, metagenomes, binning contigs in metagenomic data, identification of virus-host interactions, and detection of horizontal gene transfers. We provide an updated review of these applications and other related developments of word-count based approaches for alignment-free sequence analysis.
Collapse
Affiliation(s)
- Jie Ren
- Molecular and Computational Biology Program, University of Southern California, Los Angeles, California, USA
| | - Xin Bai
- Molecular and Computational Biology Program, University of Southern California, Los Angeles, California, USA
- Centre for Computational Systems Biology, School of Mathematical Sciences, Fudan University, Shanghai, China
| | - Yang Young Lu
- Molecular and Computational Biology Program, University of Southern California, Los Angeles, California, USA
| | - Kujin Tang
- Molecular and Computational Biology Program, University of Southern California, Los Angeles, California, USA
| | - Ying Wang
- Department of Automation, Xiamen University, Xiamen, Fujian, China
| | - Gesine Reinert
- Department of Statistics, University of Oxford, Oxford, United Kingdom
| | - Fengzhu Sun
- Molecular and Computational Biology Program, University of Southern California, Los Angeles, California, USA
- Centre for Computational Systems Biology, School of Mathematical Sciences, Fudan University, Shanghai, China
| |
Collapse
|
33
|
Staley JT, Caetano-Anollés G. Archaea-First and the Co-Evolutionary Diversification of Domains of Life. Bioessays 2018; 40:e1800036. [PMID: 29944192 DOI: 10.1002/bies.201800036] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2018] [Revised: 05/12/2018] [Indexed: 12/13/2022]
Abstract
The origins and evolution of the Archaea, Bacteria, and Eukarya remain controversial. Phylogenomic-wide studies of molecular features that are evolutionarily conserved, such as protein structural domains, suggest Archaea is the first domain of life to diversify from a stem line of descent. This line embodies the last universal common ancestor of cellular life. Here, we propose that ancestors of Euryarchaeota co-evolved with those of Bacteria prior to the diversification of Eukarya. This co-evolutionary scenario is supported by comparative genomic and phylogenomic analyses of the distributions of fold families of domains in the proteomes of free-living organisms, which show horizontal gene recruitments and informational process homologies. It also benefits from the molecular study of cell physiologies responsible for membrane phospholipids, methanogenesis, methane oxidation, cell division, gas vesicles, and the cell wall. Our theory however challenges popular cell fusion and two-domain of life scenarios derived from sequence analysis, demanding phylogenetic reconciliation. Also see the video abstract here: https://youtu.be/9yVWn_Q9faY.
Collapse
Affiliation(s)
- James T Staley
- Department of Microbiology and Astrobiology Program, University of Washington, Seattle, WA, 98195, USA
| | - Gustavo Caetano-Anollés
- Department of Crop Sciences, C. R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
| |
Collapse
|
34
|
Barbieri M. What is code biology? Biosystems 2018; 164:1-10. [DOI: 10.1016/j.biosystems.2017.10.005] [Citation(s) in RCA: 44] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2017] [Revised: 10/04/2017] [Accepted: 10/05/2017] [Indexed: 01/29/2023]
|
35
|
Jun SR, Wassenaar TM, Wanchai V, Patumcharoenpol P, Nookaew I, Ussery DW. Suggested mechanisms for Zika virus causing microcephaly: what do the genomes tell us? BMC Bioinformatics 2017; 18:471. [PMID: 29297281 PMCID: PMC5751795 DOI: 10.1186/s12859-017-1894-3] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Background Zika virus (ZIKV) is an emerging human pathogen. Since its arrival in the Western hemisphere, from Africa via Asia, it has become a serious threat to pregnant women, causing microcephaly and other neuropathies in developing fetuses. The mechanisms behind these teratogenic effects are unknown, although epidemiological evidence suggests that microcephaly is not associated with the original, African lineage of ZIKV. The sequences of 196 published ZIKV genomes were used to assess whether recently proposed mechanistic explanations for microcephaly are supported by molecular level changes that may have increased its virulence since the virus left Africa. For this we performed phylogenetic, recombination, adaptive evolution and tetramer frequency analyses, and compared protein sequences for the presence of protease cleavage sites, Pfam domains, glycosylation sites, signal peptides, trans-membrane protein domains, and phosphorylation sites. Results Recombination events within or between Asian and Brazilian lineages were not observed, and likewise there were no differences in protease cleavage, glycosylation sites, signal peptides or trans-membrane domains between African and Brazilian strains. The frequency of Retinoic Acid Response Element (RARE) sequences was increased in Brazilian strains. Genetic adaptation was also apparent by tetramer signatures that had undergone major changes in the past but has stabilized in the Brazilian lineage despite subsequent geographic spread, suggesting the viral population presently propagates in the same host species in various regions. Evidence for selection pressure was recognized for several amino acid sites in the Brazilian lineage compared to the African lineage, mainly in nonstructural proteins, especially protein NS4B. A number of these positively selected mutations resulted in an increased potential to be phosphorylated in the Brazilian lineage compared to the African linage, which may have increased their potential to interfere with neural fetal development. Conclusions ZIKV seems to have adapted to a limited number of hosts, including humans, during which its virulence increased. Its protein NS4B, together with NS4A, has recently been shown to inhibit Akt-mTOR signaling in human fetal neural stem cells, a key pathway for brain development. We hypothesize that positive selection of novel phosphorylation sites in the protein NS4B of the Brazilian lineage could interfere with phosphorylation of Akt and mTOR, impairing Akt-mTOR signaling and this may result in an increased risk for developmental neuropathies. Electronic supplementary material The online version of this article (10.1186/s12859-017-1894-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Se-Ran Jun
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR, USA.
| | - Trudy M Wassenaar
- Molecular Microbiology and Genomics Consultants, Zotzenheim, Germany
| | - Visanu Wanchai
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR, USA
| | - Preecha Patumcharoenpol
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR, USA
| | - Intawat Nookaew
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR, USA
| | - David W Ussery
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR, USA.
| |
Collapse
|
36
|
Camiolo S, Porru C, Benítez-Cabello A, Rodríguez-Gómez F, Calero-Delgado B, Porceddu A, Budroni M, Mannazzu I, Jiménez-Díaz R, Arroyo-López FN. Genome overview of eight Candida boidinii strains isolated from human activities and wild environments. Stand Genomic Sci 2017; 12:70. [PMID: 29213357 PMCID: PMC5712119 DOI: 10.1186/s40793-017-0281-z] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2017] [Accepted: 11/21/2017] [Indexed: 11/10/2022] Open
Affiliation(s)
- Salvatore Camiolo
- Dipartimento di Agraria, Università degli Studi di Sassari, Viale Italia 39, Sassari, Italy
| | - Cinzia Porru
- Dipartimento di Agraria, Università degli Studi di Sassari, Viale Italia 39, Sassari, Italy
| | - Antonio Benítez-Cabello
- Food Biotechnology Department, Instituto de la Grasa (C.S.I.C.), University Campus Pablo de Olavide, Building 46, Crta. de Utrera km 1, 41013 Seville, Spain
| | - Francisco Rodríguez-Gómez
- Food Biotechnology Department, Instituto de la Grasa (C.S.I.C.), University Campus Pablo de Olavide, Building 46, Crta. de Utrera km 1, 41013 Seville, Spain
| | - Beatríz Calero-Delgado
- Food Biotechnology Department, Instituto de la Grasa (C.S.I.C.), University Campus Pablo de Olavide, Building 46, Crta. de Utrera km 1, 41013 Seville, Spain
| | - Andrea Porceddu
- Dipartimento di Agraria, Università degli Studi di Sassari, Viale Italia 39, Sassari, Italy
| | - Marilena Budroni
- Dipartimento di Agraria, Università degli Studi di Sassari, Viale Italia 39, Sassari, Italy
| | - Ilaria Mannazzu
- Dipartimento di Agraria, Università degli Studi di Sassari, Viale Italia 39, Sassari, Italy
| | - Rufino Jiménez-Díaz
- Food Biotechnology Department, Instituto de la Grasa (C.S.I.C.), University Campus Pablo de Olavide, Building 46, Crta. de Utrera km 1, 41013 Seville, Spain
| | - Francisco Noé Arroyo-López
- Food Biotechnology Department, Instituto de la Grasa (C.S.I.C.), University Campus Pablo de Olavide, Building 46, Crta. de Utrera km 1, 41013 Seville, Spain
| |
Collapse
|
37
|
Zielezinski A, Vinga S, Almeida J, Karlowski WM. Alignment-free sequence comparison: benefits, applications, and tools. Genome Biol 2017; 18:186. [PMID: 28974235 PMCID: PMC5627421 DOI: 10.1186/s13059-017-1319-7] [Citation(s) in RCA: 244] [Impact Index Per Article: 34.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023] Open
Abstract
Alignment-free sequence analyses have been applied to problems ranging from whole-genome phylogeny to the classification of protein families, identification of horizontally transferred genes, and detection of recombined sequences. The strength of these methods makes them particularly useful for next-generation sequencing data processing and analysis. However, many researchers are unclear about how these methods work, how they compare to alignment-based methods, and what their potential is for use for their research. We address these questions and provide a guide to the currently available alignment-free sequence analysis tools.
Collapse
Affiliation(s)
- Andrzej Zielezinski
- Department of Computational Biology, Faculty of Biology, Adam Mickiewicz University in Poznan, Umultowska 89, 61-614, Poznan, Poland
| | - Susana Vinga
- IDMEC, Instituto Superior Técnico, Universidade de Lisboa, Av. Rovisco Pais 1, 1049-001, Lisbon, Portugal
| | - Jonas Almeida
- Stony Brook University (SUNY), 101 Nicolls Road, Stony Brook, NY, 11794, USA
| | - Wojciech M Karlowski
- Department of Computational Biology, Faculty of Biology, Adam Mickiewicz University in Poznan, Umultowska 89, 61-614, Poznan, Poland.
| |
Collapse
|
38
|
Abstract
Fungi belong to one of the largest and most diverse kingdoms of living organisms. The evolutionary kinship within a fungal population has so far been inferred mostly from the gene-information-based trees ("gene trees"), constructed commonly based on the degree of differences of proteins or DNA sequences of a small number of highly conserved genes common among the population by a multiple sequence alignment (MSA) method. Since each gene evolves under different evolutionary pressure and time scale, it has been known that one gene tree for a population may differ from other gene trees for the same population depending on the subjective selection of the genes. Within the last decade, a large number of whole-genome sequences of fungi have become publicly available, which represent, at present, the most fundamental and complete information about each fungal organism. This presents an opportunity to infer kinship among fungi using a whole-genome information-based tree ("genome tree"). The method we used allows comparison of whole-genome information without MSA, and is a variation of a computational algorithm developed to find semantic similarities or plagiarism in two books, where we represent whole-genomic information of an organism as a book of words without spaces. The genome tree reveals several significant and notable differences from the gene trees, and these differences invoke new discussions about alternative narratives for the evolution of some of the currently accepted fungal groups.
Collapse
Affiliation(s)
- JaeJin Choi
- Department of Chemistry, University of California, Berkeley, CA 94720
- Molecular Biophysics & Integrated Bioimaging Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720
- Department of Integrated Omics for Biomedical Sciences, Yonsei University, Seoul 03722, Republic of Korea
- Korea Research Institute of Bioscience and Biotechnology, Daejeon 34141, Republic of Korea
| | - Sung-Hou Kim
- Department of Chemistry, University of California, Berkeley, CA 94720;
- Molecular Biophysics & Integrated Bioimaging Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720
- Department of Integrated Omics for Biomedical Sciences, Yonsei University, Seoul 03722, Republic of Korea
- Center for Computational Biology, University of California, Berkeley, CA 94720
| |
Collapse
|
39
|
He L, Li Y, He RL, Yau SST. A novel alignment-free vector method to cluster protein sequences. J Theor Biol 2017; 427:41-52. [DOI: 10.1016/j.jtbi.2017.06.002] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2017] [Revised: 05/04/2017] [Accepted: 06/02/2017] [Indexed: 11/29/2022]
|
40
|
Seo H, Cho DH. A new alignment free genome comparison algorithm based on statistically estimated feature frequency profile. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2017; 2017:4265-4268. [PMID: 29060839 DOI: 10.1109/embc.2017.8037798] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
The sequence comparison is an important part in bioinformatics to understand the biological property of genome. Although the alignment based sequence comparison is traditional and reliable algorithm, alignment free methods have been actively researched because of their advantage in terms of computational complexity. In this paper, we suggest a new alignment free genome comparison scheme based on statistical approach. From sequence components, word frequency information of the sequence is estimated. By investigating the relationship between estimated frequency information and actual word frequency, the characteristics of the sequence are numerically represented. The phylogenetic tree and the sequence classification of mammalian sequences are provided to reveal the remarkable performance of our statistical algorithm.
Collapse
|
41
|
Marin J, Battistuzzi FU, Brown AC, Hedges SB. The Timetree of Prokaryotes: New Insights into Their Evolution and Speciation. Mol Biol Evol 2017; 34:437-446. [PMID: 27965376 DOI: 10.1093/molbev/msw245] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The increasing size of timetrees in recent years has led to a focus on diversification analyses to better understand patterns of macroevolution. Thus far, nearly all studies have been conducted with eukaryotes primarily because phylogenies have been more difficult to reconstruct and calibrate to geologic time in prokaryotes. Here, we have estimated a timetree of 11,784 'species' of prokaryotes and explored their pattern of diversification. We used data from the small subunit ribosomal RNA along with an evolutionary framework from previous multi-gene studies to produce three alternative timetrees. For each timetree we surprisingly found a constant net diversification rate derived from an exponential increase of lineages and showing no evidence of saturation (rate decline), the same pattern found previously in eukaryotes. The implication is that prokaryote diversification as a whole is the result of the random splitting of lineages and is neither limited by existing diversity (filled niches) nor responsive in any major way to environmental changes.
Collapse
Affiliation(s)
- Julie Marin
- Center for Biodiversity, Temple University, SERC Suite 502, 1925 N 12th Street, Philadelphia, PA.,Institut de Systématique, Evolution, Biodiversité UMR 7205, Département Systématique et Evolution, Muséum National d'Histoire Naturelle, Sorbonne-Universités, Paris, France
| | | | - Anais C Brown
- Department of Biological Sciences, Oakland University, Rochester, MI
| | - S Blair Hedges
- Center for Biodiversity, Temple University, SERC Suite 502, 1925 N 12th Street, Philadelphia, PA
| |
Collapse
|
42
|
Staley JT. Domain Cell Theory supports the independent evolution of the Eukarya, Bacteria and Archaea and the Nuclear Compartment Commonality hypothesis. Open Biol 2017; 7:170041. [PMID: 28659382 PMCID: PMC5493775 DOI: 10.1098/rsob.170041] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2017] [Accepted: 05/26/2017] [Indexed: 01/15/2023] Open
Abstract
In 2015, the Royal Society of London held a meeting to discuss the various hypotheses regarding the origin of the Eukarya. Although not all participants supported a hypothesis, the proposals that did fit into two broad categories: one group favoured 'Prokaryotes First' hypotheses and another addressed 'Eukaryotes First' hypotheses. Those who proposed Prokaryotes First hypotheses advocated either a fusion event between a bacterium and an archaeon that produced the first eukaryote or the direct evolution of the Eukarya from the Archaea. The Eukaryotes First proponents posit that the eukaryotes evolved initially and then, by reductive evolution, produced the Bacteria and Archaea. No mention was made of another previously published hypothesis termed the Nuclear Compartment Commonality (NuCom) hypothesis, which proposed the evolution of the Eukarya and Bacteria from nucleated ancestors (Staley 2013 Astrobiol Outreach1, 105 (doi:10.4172/2332-2519.1000105)). Evidence from two studies indicates that the nucleated Planctomycetes-Verrucomicrobia-Chlamydia superphylum members are the most ancient Bacteria known (Brochier & Philippe 2002 Nature417, 244 (doi:10.1038/417244a); Jun et al. 2010 Proc. Natl Acad. Sci. USA107, 133-138 (doi:10.1073/pnas.0913033107)). This review summarizes the evidence for the NuCom hypothesis and discusses how simple the NuCom hypothesis is in explaining eukaryote evolution relative to the other hypotheses. The philosophical importance of simplicity and its relationship to truth in hypotheses such as NuCom and Domain Cell Theory is presented. Domain Cell Theory is also proposed herein, which contends that each of the three cellular lineages of life, the Archaea, Bacteria and Eukarya domains, evolved independently, in support of the NuCom hypothesis. All other proposed hypotheses violate Domain Cell Theory because they posit the evolution of different cellular descendants from ancestral cellular types.
Collapse
Affiliation(s)
- James T Staley
- Department of Microbiology and Astrobiology Program, University of Washington, Seattle, WA 98195, USA
| |
Collapse
|
43
|
Sagulenko E, Nouwens A, Webb RI, Green K, Yee B, Morgan G, Leis A, Lee KC, Butler MK, Chia N, Pham UTP, Lindgreen S, Catchpole R, Poole AM, Fuerst JA. Nuclear Pore-Like Structures in a Compartmentalized Bacterium. PLoS One 2017; 12:e0169432. [PMID: 28146565 PMCID: PMC5287468 DOI: 10.1371/journal.pone.0169432] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2016] [Accepted: 12/02/2016] [Indexed: 01/02/2023] Open
Abstract
Planctomycetes are distinguished from other Bacteria by compartmentalization of cells via internal membranes, interpretation of which has been subject to recent debate regarding potential relations to Gram-negative cell structure. In our interpretation of the available data, the planctomycete Gemmata obscuriglobus contains a nuclear body compartment, and thus possesses a type of cell organization with parallels to the eukaryote nucleus. Here we show that pore-like structures occur in internal membranes of G.obscuriglobus and that they have elements structurally similar to eukaryote nuclear pores, including a basket, ring-spoke structure, and eight-fold rotational symmetry. Bioinformatic analysis of proteomic data reveals that some of the G. obscuriglobus proteins associated with pore-containing membranes possess structural domains found in eukaryote nuclear pore complexes. Moreover, immunogold labelling demonstrates localization of one such protein, containing a β-propeller domain, specifically to the G. obscuriglobus pore-like structures. Finding bacterial pores within internal cell membranes and with structural similarities to eukaryote nuclear pore complexes raises the dual possibilities of either hitherto undetected homology or stunning evolutionary convergence.
Collapse
Affiliation(s)
- Evgeny Sagulenko
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Queensland, Australia
| | - Amanda Nouwens
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Queensland, Australia
| | - Richard I. Webb
- Centre for Microscopy and Microanalysis, The University of Queensland, Brisbane, Queensland, Australia
| | - Kathryn Green
- Centre for Microscopy and Microanalysis, The University of Queensland, Brisbane, Queensland, Australia
| | - Benjamin Yee
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Queensland, Australia
| | - Garry Morgan
- Centre for Microscopy and Microanalysis, The University of Queensland, Brisbane, Queensland, Australia
| | - Andrew Leis
- CSIRO - Livestock Industries, Australian Animal Health Laboratory, Biosecurity Microscopy Facility (ABMF), Geelong, Victoria, Australia
| | - Kuo-Chang Lee
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Queensland, Australia
| | - Margaret K. Butler
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Queensland, Australia
| | - Nicholas Chia
- Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Uyen Thi Phuong Pham
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Queensland, Australia
| | - Stinus Lindgreen
- School of Biological Sciences, University of Canterbury, Christchurch, New Zealand
| | - Ryan Catchpole
- School of Biological Sciences, University of Canterbury, Christchurch, New Zealand
- Biomolecular Interaction Centre, University of Canterbury, Christchurch, New Zealand
| | - Anthony M. Poole
- School of Biological Sciences, University of Canterbury, Christchurch, New Zealand
- Biomolecular Interaction Centre, University of Canterbury, Christchurch, New Zealand
- Allan Wilson Centre, University of Canterbury, Christchurch, New Zealand
- Bioinformatics Institute, School of Biological Sciences, University of Auckland, Auckland, New Zealand
| | - John A. Fuerst
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Queensland, Australia
- * E-mail:
| |
Collapse
|
44
|
Staley JT, Fuerst JA. Ancient, highly conserved proteins from a LUCA with complex cell biology provide evidence in support of the nuclear compartment commonality (NuCom) hypothesis. Res Microbiol 2017; 168:395-412. [PMID: 28111289 DOI: 10.1016/j.resmic.2017.01.001] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2016] [Revised: 01/08/2017] [Accepted: 01/09/2017] [Indexed: 12/23/2022]
Abstract
The nuclear compartment commonality (NuCom) hypothesis posits a complex last common ancestor (LUCA) with membranous compartments including a nuclear membrane. Such a LUCA then evolved to produce two nucleated lineages of the tree of life: the Planctomycetes-Verrucomicrobia-Chlamydia superphylum (PVC) within the Bacteria, and the Eukarya. We propose that a group of ancient essential protokaryotic signature proteins (PSPs) originating in LUCA were incorporated into ancestors of PVC Bacteria and Eukarya. Tubulins, ubiquitin system enzymes and sterol-synthesizing enzymes are consistent with early origins of these features shared between the PVC superphylum and Eukarya.
Collapse
Affiliation(s)
- James T Staley
- Department of Microbiology and Astrobiology Program, University of Washington, Seattle 98195, USA
| | - John A Fuerst
- School of Chemistry and Molecular Biosciences, University of Queensland, St. Lucia, Queensland 4072, Australia.
| |
Collapse
|
45
|
Viral Phylogenomics Using an Alignment-Free Method: A Three-Step Approach to Determine Optimal Length of k-mer. Sci Rep 2017; 7:40712. [PMID: 28102365 PMCID: PMC5244389 DOI: 10.1038/srep40712] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2016] [Accepted: 12/08/2016] [Indexed: 11/25/2022] Open
Abstract
The development of rapid, economical genome sequencing has shed new light on the classification of viruses. As of October 2016, the National Center for Biotechnology Information (NCBI) database contained >2 million viral genome sequences and a reference set of ~4000 viral genome sequences that cover a wide range of known viral families. Whole-genome sequences can be used to improve viral classification and provide insight into the viral “tree of life”. However, due to the lack of evolutionary conservation amongst diverse viruses, it is not feasible to build a viral tree of life using traditional phylogenetic methods based on conserved proteins. In this study, we used an alignment-free method that uses k-mers as genomic features for a large-scale comparison of complete viral genomes available in RefSeq. To determine the optimal feature length, k (an essential step in constructing a meaningful dendrogram), we designed a comprehensive strategy that combines three approaches: (1) cumulative relative entropy, (2) average number of common features among genomes, and (3) the Shannon diversity index. This strategy was used to determine k for all 3,905 complete viral genomes in RefSeq. The resulting dendrogram shows consistency with the viral taxonomy of the ICTV and the Baltimore classification of viruses.
Collapse
|
46
|
|
47
|
Chen S, Deng LY, Bowman D, Shiau JJH, Wong TY, Madahian B, Lu HHS. Phylogenetic tree construction using trinucleotide usage profile (TUP). BMC Bioinformatics 2016; 17:381. [PMID: 27766939 PMCID: PMC5073869 DOI: 10.1186/s12859-016-1222-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND It has been a challenging task to build a genome-wide phylogenetic tree for a large group of species containing a large number of genes with long nucleotides sequences. The most popular method, called feature frequency profile (FFP-k), finds the frequency distribution for all words of certain length k over the whole genome sequence using (overlapping) windows of the same length. For a satisfactory result, the recommended word length (k) ranges from 6 to 15 and it may not be a multiple of 3 (codon length). The total number of possible words needed for FFP-k can range from 46=4096 to 415. RESULTS We propose a simple improvement over the popular FFP method using only a typical word length of 3. A new method, called Trinucleotide Usage Profile (TUP), is proposed based only on the (relative) frequency distribution using non-overlapping windows of length 3. The total number of possible words needed for TUP is 43=64, which is much less than the total count for the recommended optimal "resolution" for FFP. To build a phylogenetic tree, we propose first representing each of the species by a TUP vector and then using an appropriate distance measure between pairs of the TUP vectors for the tree construction. In particular, we propose summarizing a DNA sequence by a matrix of three rows corresponding to three reading frames, recording the frequency distribution of the non-overlapping words of length 3 in each of the reading frame. We also provide a numerical measure for comparing trees constructed with various methods. CONCLUSIONS Compared to the FFP method, our empirical study showed that the proposed TUP method is more capable of building phylogenetic trees with a stronger biological support. We further provide some justifications on this from the information theory viewpoint. Unlike the FFP method, the TUP method takes the advantage that the starting of the first reading frame is (usually) known. Without this information, the FFP method could only rely on the frequency distribution of overlapping words, which is the average (or mixture) of the frequency distributions of three possible reading frames. Consequently, we show (from the entropy viewpoint) that the FFP procedure could dilute important gene information and therefore provides less accurate classification.
Collapse
Affiliation(s)
- Si Chen
- Key Laboratory of Combinatorial Biosynthesis and Drug Discovery Ministry of Education and School of Pharmaceutical Sciences Wuhan University, Wuhan, China
| | - Lih-Yuan Deng
- Department of Mathematical Sciences, University of Memphis, Memphis, TN, USA
| | - Dale Bowman
- Department of Mathematical Sciences, University of Memphis, Memphis, TN, USA
| | | | - Tit-Yee Wong
- Department of Biological Sciences, University of Memphis, Memphis, TN, USA
| | - Behrouz Madahian
- Department of Mathematical Sciences, University of Memphis, Memphis, TN, USA
| | | |
Collapse
|
48
|
Hahn L, Leimeister CA, Ounit R, Lonardi S, Morgenstern B. rasbhari: Optimizing Spaced Seeds for Database Searching, Read Mapping and Alignment-Free Sequence Comparison. PLoS Comput Biol 2016; 12:e1005107. [PMID: 27760124 PMCID: PMC5070788 DOI: 10.1371/journal.pcbi.1005107] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2016] [Accepted: 08/11/2016] [Indexed: 12/05/2022] Open
Abstract
Many algorithms for sequence analysis rely on word matching or word statistics. Often, these approaches can be improved if binary patterns representing match and don't-care positions are used as a filter, such that only those positions of words are considered that correspond to the match positions of the patterns. The performance of these approaches, however, depends on the underlying patterns. Herein, we show that the overlap complexity of a pattern set that was introduced by Ilie and Ilie is closely related to the variance of the number of matches between two evolutionarily related sequences with respect to this pattern set. We propose a modified hill-climbing algorithm to optimize pattern sets for database searching, read mapping and alignment-free sequence comparison of nucleic-acid sequences; our implementation of this algorithm is called rasbhari. Depending on the application at hand, rasbhari can either minimize the overlap complexity of pattern sets, maximize their sensitivity in database searching or minimize the variance of the number of pattern-based matches in alignment-free sequence comparison. We show that, for database searching, rasbhari generates pattern sets with slightly higher sensitivity than existing approaches. In our Spaced Words approach to alignment-free sequence comparison, pattern sets calculated with rasbhari led to more accurate estimates of phylogenetic distances than the randomly generated pattern sets that we previously used. Finally, we used rasbhari to generate patterns for short read classification with CLARK-S. Here too, the sensitivity of the results could be improved, compared to the default patterns of the program. We integrated rasbhari into Spaced Words; the source code of rasbhari is freely available at http://rasbhari.gobics.de/.
Collapse
Affiliation(s)
- Lars Hahn
- University of Göttingen, Department of Bioinformatics, Göttingen, Germany
| | | | - Rachid Ounit
- University of California, Riverside, Department of Computer Science and Engineering, Riverside, California, United States of America
| | - Stefano Lonardi
- University of California, Riverside, Department of Computer Science and Engineering, Riverside, California, United States of America
| | - Burkhard Morgenstern
- University of Göttingen, Department of Bioinformatics, Göttingen, Germany
- University of Göttingen, Center for Computational Sciences, Göttingen, Germany
| |
Collapse
|
49
|
Pinos S, Pontarotti P, Raoult D, Baudoin JP, Pagnier I. Compartmentalization in PVC super-phylum: evolution and impact. Biol Direct 2016; 11:38. [PMID: 27507008 PMCID: PMC4977879 DOI: 10.1186/s13062-016-0144-3] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2016] [Accepted: 08/02/2016] [Indexed: 11/27/2022] Open
Abstract
BACKGROUND The PVC super-phylum gathers bacteria from seven phyla (Planctomycetes, Verrucomicrobiae, Chlamydiae, Lentisphaera, Poribacteria, OP3, WWE2) presenting different lifestyles, cell plans and environments. Planctomyces and several Verrucomicrobiae exhibit a complex cell plan, with an intracytoplasmic membrane inducing the compartmentalization of the cytoplasm into two regions (pirellulosome and paryphoplasm). The evolution and function of this cell plan is still subject to debate. In this work, we hypothesized that it could play a role in protection of the bacterial DNA, especially against Horizontal Genes Transfers (HGT). Therefore, 64 bacterial genomes belonging to seven different phyla (whose four PVC phyla) were studied. We reconstructed the evolution of the cell plan as precisely as possible, thanks to information obtained by bibliographic study and electronic microscopy. We used a strategy based on comparative phylogenomic in order to determine the part occupied by the horizontal transfers for each studied genomes. RESULTS Our results show that the bacteria Simkania negevensis (Chlamydiae) and Coraliomargarita akajimensis (Verrucomicrobiae), whose cell plan were unknown before, are compartmentalized, as we can see on the micrographies. This is one of the first indication of the presence of an intracytoplasmic membrane in a Chlamydiae. The proportion of HGT does not seems to be related to the cell plan of bacteria, suggesting that compartmentalization does not induce a protection of bacterial DNA against HGT. Conversely, lifestyle of bacteria seems to impact the ability of bacteria to exchange genes. CONCLUSIONS Our study allows a best reconstruction of the evolution of intracytoplasmic membrane, but this structure seems to have no impact on HGT occurrences. REVIEWERS This article was reviewed by Mircea Podar and Olivier Tenaillon.
Collapse
Affiliation(s)
- Sandrine Pinos
- Aix Marseille Université, URMITE, UM63, CNRS 7278, IRD 198, INSERM 1095, 27 Bd Jean Moulin, 13385 Marseille Cedex 5, France
- Aix Marseille Université, CNRS, Centrale Marseille, I2M UMR 7373, Evolution Biologique et Modélisation, 13385 Marseille, Cedex 5, France
| | - Pierre Pontarotti
- Aix Marseille Université, CNRS, Centrale Marseille, I2M UMR 7373, Evolution Biologique et Modélisation, 13385 Marseille, Cedex 5, France
| | - Didier Raoult
- Aix Marseille Université, URMITE, UM63, CNRS 7278, IRD 198, INSERM 1095, 27 Bd Jean Moulin, 13385 Marseille Cedex 5, France
| | - Jean Pierre Baudoin
- Aix Marseille Université, URMITE, UM63, CNRS 7278, IRD 198, INSERM 1095, 27 Bd Jean Moulin, 13385 Marseille Cedex 5, France
| | - Isabelle Pagnier
- Aix Marseille Université, URMITE, UM63, CNRS 7278, IRD 198, INSERM 1095, 27 Bd Jean Moulin, 13385 Marseille Cedex 5, France
| |
Collapse
|
50
|
Bernard G, Chan CX, Ragan MA. Alignment-free microbial phylogenomics under scenarios of sequence divergence, genome rearrangement and lateral genetic transfer. Sci Rep 2016; 6:28970. [PMID: 27363362 PMCID: PMC4929450 DOI: 10.1038/srep28970] [Citation(s) in RCA: 42] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2016] [Accepted: 06/13/2016] [Indexed: 12/22/2022] Open
Abstract
Alignment-free (AF) approaches have recently been highlighted as alternatives to methods based on multiple sequence alignment in phylogenetic inference. However, the sensitivity of AF methods to genome-scale evolutionary scenarios is little known. Here, using simulated microbial genome data we systematically assess the sensitivity of nine AF methods to three important evolutionary scenarios: sequence divergence, lateral genetic transfer (LGT) and genome rearrangement. Among these, AF methods are most sensitive to the extent of sequence divergence, less sensitive to low and moderate frequencies of LGT, and most robust against genome rearrangement. We describe the application of AF methods to three well-studied empirical genome datasets, and introduce a new application of the jackknife to assess node support. Our results demonstrate that AF phylogenomics is computationally scalable to multi-genome data and can generate biologically meaningful phylogenies and insights into microbial evolution.
Collapse
Affiliation(s)
- Guillaume Bernard
- Institute for Molecular Bioscience, and ARC Centre of Excellence in Bioinformatics, The University of Queensland, Brisbane, QLD 4072, Australia
| | - Cheong Xin Chan
- Institute for Molecular Bioscience, and ARC Centre of Excellence in Bioinformatics, The University of Queensland, Brisbane, QLD 4072, Australia
| | - Mark A. Ragan
- Institute for Molecular Bioscience, and ARC Centre of Excellence in Bioinformatics, The University of Queensland, Brisbane, QLD 4072, Australia
| |
Collapse
|