1
|
Teekas L, Sharma S, Vijay N. Terminal regions of a protein are a hotspot for low complexity regions and selection. Open Biol 2024; 14:230439. [PMID: 38862022 DOI: 10.1098/rsob.230439] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Accepted: 05/13/2024] [Indexed: 06/13/2024] Open
Abstract
Volatile low complexity regions (LCRs) are a novel source of adaptive variation, functional diversification and evolutionary novelty. An interplay of selection and mutation governs the composition and length of low complexity regions. High %GC and mutations provide length variability because of mechanisms like replication slippage. Owing to the complex dynamics between selection and mutation, we need a better understanding of their coexistence. Our findings underscore that positively selected sites (PSS) and low complexity regions prefer the terminal regions of genes, co-occurring in most Tetrapoda clades. We observed that positively selected sites within a gene have position-specific roles. Central-positively selected site genes primarily participate in defence responses, whereas terminal-positively selected site genes exhibit non-specific functions. Low complexity region-containing genes in the Tetrapoda clade exhibit a significantly higher %GC and lower ω (dN/dS: non-synonymous substitution rate/synonymous substitution rate) compared with genes without low complexity regions. This lower ω implies that despite providing rapid functional diversity, low complexity region-containing genes are subjected to intense purifying selection. Furthermore, we observe that low complexity regions consistently display ubiquitous prevalence at lower purity levels, but exhibit a preference for specific positions within a gene as the purity of the low complexity region stretch increases, implying a composition-dependent evolutionary role. Our findings collectively contribute to the understanding of how genetic diversity and adaptation are shaped by the interplay of selection and low complexity regions in the Tetrapoda clade.
Collapse
Affiliation(s)
- Lokdeep Teekas
- Computational Evolutionary Genomics Lab, Department of Biological Sciences, IISER Bhopal , Bhauri, Madhya Pradesh, India
| | - Sandhya Sharma
- Computational Evolutionary Genomics Lab, Department of Biological Sciences, IISER Bhopal , Bhauri, Madhya Pradesh, India
| | - Nagarjun Vijay
- Computational Evolutionary Genomics Lab, Department of Biological Sciences, IISER Bhopal , Bhauri, Madhya Pradesh, India
| |
Collapse
|
2
|
Herrick J. DNA Damage, Genome Stability, and Adaptation: A Question of Chance or Necessity? Genes (Basel) 2024; 15:520. [PMID: 38674454 PMCID: PMC11049855 DOI: 10.3390/genes15040520] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2024] [Revised: 04/14/2024] [Accepted: 04/18/2024] [Indexed: 04/28/2024] Open
Abstract
DNA damage causes the mutations that are the principal source of genetic variation. DNA damage detection and repair mechanisms therefore play a determining role in generating the genetic diversity on which natural selection acts. Speciation, it is commonly assumed, occurs at a rate set by the level of standing allelic diversity in a population. The process of speciation is driven by a combination of two evolutionary forces: genetic drift and ecological selection. Genetic drift takes place under the conditions of relaxed selection, and results in a balance between the rates of mutation and the rates of genetic substitution. These two processes, drift and selection, are necessarily mediated by a variety of mechanisms guaranteeing genome stability in any given species. One of the outstanding questions in evolutionary biology concerns the origin of the widely varying phylogenetic distribution of biodiversity across the Tree of Life and how the forces of drift and selection contribute to shaping that distribution. The following examines some of the molecular mechanisms underlying genome stability and the adaptive radiations that are associated with biodiversity and the widely varying species richness and evenness in the different eukaryotic lineages.
Collapse
Affiliation(s)
- John Herrick
- Independent Researcher at 3, Rue des Jeûneurs, 75002 Paris, France
| |
Collapse
|
3
|
White LJ, Russell AJ, Pizzey AR, Dasmahapatra KK, Pownall ME. The Presence of Two MyoD Genes in a Subset of Acanthopterygii Fish Is Associated with a Polyserine Insert in MyoD1. J Dev Biol 2023; 11:jdb11020019. [PMID: 37218813 DOI: 10.3390/jdb11020019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 04/20/2023] [Accepted: 04/26/2023] [Indexed: 05/24/2023] Open
Abstract
The MyoD gene was duplicated during the teleost whole genome duplication and, while a second MyoD gene (MyoD2) was subsequently lost from the genomes of some lineages (including zebrafish), many fish lineages (including Alcolapia species) have retained both MyoD paralogues. Here we reveal the expression patterns of the two MyoD genes in Oreochromis (Alcolapia) alcalica using in situ hybridisation. We report our analysis of MyoD1 and MyoD2 protein sequences from 54 teleost species, and show that O. alcalica, along with some other teleosts, include a polyserine repeat between the amino terminal transactivation domains (TAD) and the cysteine-histidine rich region (H/C) in MyoD1. The evolutionary history of MyoD1 and MyoD2 is compared to the presence of this polyserine region using phylogenetics, and its functional relevance is tested using overexpression in a heterologous system to investigate subcellular localisation, stability, and activity of MyoD proteins that include and do not include the polyserine region.
Collapse
Affiliation(s)
- Lewis J White
- Biology Department, University of York, York YO10 5DD, UK
| | | | | | | | - Mary E Pownall
- Biology Department, University of York, York YO10 5DD, UK
| |
Collapse
|
4
|
Persi E, Wolf YI, Karamycheva S, Makarova KS, Koonin EV. Compensatory relationship between low-complexity regions and gene paralogy in the evolution of prokaryotes. Proc Natl Acad Sci U S A 2023; 120:e2300154120. [PMID: 37036997 PMCID: PMC10120016 DOI: 10.1073/pnas.2300154120] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2023] [Accepted: 03/17/2023] [Indexed: 04/12/2023] Open
Abstract
The evolution of genomes in all life forms involves two distinct, dynamic types of genomic changes: gene duplication (and loss) that shape families of paralogous genes and extension (and contraction) of low-complexity regions (LCR), which occurs through dynamics of short repeats in protein-coding genes. Although the roles of each of these types of events in genome evolution have been studied, their co-evolutionary dynamics is not thoroughly understood. Here, by analyzing a wide range of genomes from diverse bacteria and archaea, we show that LCR and paralogy represent two distinct routes of evolution that are inversely correlated. The emergence of LCR is a prominent evolutionary mechanism in fast evolving, young protein families, whereas paralogy dominates the comparatively slow evolution of old protein families. The analysis of multiple prokaryotic genomes shows that the formation of LCR is likely a widespread, transient evolutionary mechanism that temporally and locally affects also ancestral functions, but apparently, fades away with time, under mutational and selective pressures, yielding to gene paralogy. We propose that compensatory relationships between short-term and longer-term evolutionary mechanisms are universal in the evolution of life.
Collapse
Affiliation(s)
- Erez Persi
- National Center for Biotechnology Information, National Library of Medicine, NIH, Bethesda, MD20894
| | - Yuri I. Wolf
- National Center for Biotechnology Information, National Library of Medicine, NIH, Bethesda, MD20894
| | - Svetlana Karamycheva
- National Center for Biotechnology Information, National Library of Medicine, NIH, Bethesda, MD20894
| | - Kira S. Makarova
- National Center for Biotechnology Information, National Library of Medicine, NIH, Bethesda, MD20894
| | - Eugene V. Koonin
- National Center for Biotechnology Information, National Library of Medicine, NIH, Bethesda, MD20894
| |
Collapse
|
5
|
Li S, Hannenhalli S, Ovcharenko I. De novo human brain enhancers created by single-nucleotide mutations. SCIENCE ADVANCES 2023; 9:eadd2911. [PMID: 36791193 PMCID: PMC9931207 DOI: 10.1126/sciadv.add2911] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/01/2022] [Accepted: 01/12/2023] [Indexed: 05/30/2023]
Abstract
Advanced human cognition is attributed to increased neocortex size and complexity, but the underlying evolutionary and regulatory mechanisms are largely unknown. Using human and macaque embryonic neocortical H3K27ac data coupled with a deep learning model of enhancers, we identified ~4000 enhancer gains in humans, which, per our model, can often be attributed to single-nucleotide essential mutations. Our analyses suggest that functional gains in embryonic brain development are associated with de novo enhancers whose putative target genes exhibit increased expression in progenitor cells and interneurons and partake in critical neural developmental processes. Essential mutations alter enhancer activity through altered binding of key transcription factors (TFs) of embryonic neocortex, including ISL1, POU3F2, PITX1/2, and several SOX TFs, and are associated with central nervous system disorders. Overall, our results suggest that essential mutations lead to gain of embryonic neocortex enhancers, which orchestrate expression of genes involved in critical developmental processes associated with human cognition.
Collapse
Affiliation(s)
- Shan Li
- Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20892, USA
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Sridhar Hannenhalli
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Ivan Ovcharenko
- Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20892, USA
| |
Collapse
|
6
|
Rapid molecular diversification and homogenization of clustered major ampullate silk genes in Argiope garden spiders. PLoS Genet 2022; 18:e1010537. [PMID: 36508456 PMCID: PMC9779670 DOI: 10.1371/journal.pgen.1010537] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Revised: 12/22/2022] [Accepted: 11/18/2022] [Indexed: 12/14/2022] Open
Abstract
The evolutionary diversification of orb-web weaving spiders is closely tied to the mechanical performance of dragline silk. This proteinaceous fiber provides the primary structural framework of orb web architecture, and its extraordinary toughness allows these structures to absorb the high energy of aerial prey impact. The dominant model of dragline silk molecular structure involves the combined function of two highly repetitive, spider-specific, silk genes (spidroins)-MaSp1 and MaSp2. Recent genomic studies, however, have suggested this framework is overly simplistic, and our understanding of how MaSp genes evolve is limited. Here we present a comprehensive analysis of MaSp structural and evolutionary diversity across species of Argiope (garden spiders). This genomic analysis reveals the largest catalog of MaSp genes found in any spider, driven largely by an expansion of MaSp2 genes. The rapid diversification of Argiope MaSp genes, located primarily in a single genomic cluster, is associated with profound changes in silk gene structure. MaSp2 genes, in particular, have evolved complex hierarchically organized repeat units (ensemble repeats) delineated by novel introns that exhibit remarkable evolutionary dynamics. These repetitive introns have arisen independently within the genus, are highly homogenized within a gene, but diverge rapidly between genes. In some cases, these iterated introns are organized in an alternating structure in which every other intron is nearly identical in sequence. We hypothesize that this intron structure has evolved to facilitate homogenization of the coding sequence. We also find evidence of intergenic gene conversion and identify a more diverse array of stereotypical amino acid repeats than previously recognized. Overall, the extreme diversification found among MaSp genes requires changes in the structure-function model of dragline silk performance that focuses on the differential use and interaction among various MaSp paralogs as well as the impact of ensemble repeat structure and different amino acid motifs on mechanical behavior.
Collapse
|
7
|
Teekas L, Sharma S, Vijay N. Lineage-specific protein repeat expansions and contractions reveal malleable regions of immune genes. Genes Immun 2022; 23:218-234. [PMID: 36203090 DOI: 10.1038/s41435-022-00186-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Revised: 09/21/2022] [Accepted: 09/22/2022] [Indexed: 01/07/2023]
Abstract
Functional diversification, a higher evolutionary rate, and intense positive selection help a limited number of immune genes interact with many pathogens. Repeats in protein-coding regions are a well-known source of functional diversification, adaptive variation, and evolutionary novelty in a short time. Repeats play a crucial role in biochemical functions like functional diversification of transcription regulation, protein kinases, cell adhesion, signaling pathways, morphogenesis, DNA repair, recombination, and RNA processing. Repeat length variation can change the associated protein's interaction, efficacy, and overall protein network. Repeats have an intrinsic unstable nature and can potentially evolve rapidly and expedite the acquisition of complex phenotypic traits and functions. Because of their ability to generate rapid, adaptive variations over short evolutionary distances, repeats are considered "tuning knobs." Repeat length variation in specific genes, like RUNX2 and ALX4, is associated with morphological and physiological changes across vertebrates. Here we study repeat length variation as a potent source of species-specific immune diversification across several clades of tetrapods. Moreover, we provide a clade-wise comprehensive list of immune genes with repeat types for future studies of morphological/evolutionary changes within species groups. We observe significant repeat length variation of FASLG and C1QC in Rodentia and Primates' contrasting species groups, respectively.
Collapse
Affiliation(s)
- Lokdeep Teekas
- Department of Biological Sciences, Computational Evolutionary Genomics Lab, IISER Bhopal, Bhauri, Madhya Pradesh, India
| | - Sandhya Sharma
- Department of Biological Sciences, Computational Evolutionary Genomics Lab, IISER Bhopal, Bhauri, Madhya Pradesh, India
| | - Nagarjun Vijay
- Department of Biological Sciences, Computational Evolutionary Genomics Lab, IISER Bhopal, Bhauri, Madhya Pradesh, India.
| |
Collapse
|
8
|
Karamycheva S, Wolf YI, Persi E, Koonin EV, Makarova KS. Analysis of lineage-specific protein family variability in prokaryotes combined with evolutionary reconstructions. Biol Direct 2022; 17:22. [PMID: 36042479 PMCID: PMC9425974 DOI: 10.1186/s13062-022-00337-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Accepted: 08/13/2022] [Indexed: 12/24/2022] Open
Abstract
Background Evolutionary rate is a key characteristic of gene families that is linked to the functional importance of the respective genes as well as specific biological functions of the proteins they encode. Accurate estimation of evolutionary rates is a challenging task that requires precise phylogenetic analysis. Here we present an easy to estimate protein family level measure of sequence variability based on alignment column homogeneity in multiple alignments of protein sequences from Clade-Specific Clusters of Orthologous Genes (csCOGs). Results We report genome-wide estimates of variability for 8 diverse groups of bacteria and archaea and investigate the connection between variability and various genomic and biological features. The variability estimates are based on homogeneity distributions across amino acid sequence alignments and can be obtained for multiple groups of genomes at minimal computational expense. About half of the variance in variability values can be explained by the analyzed features, with the greatest contribution coming from the extent of gene paralogy in the given csCOG. The correlation between variability and paralogy appears to originate, primarily, not from gene duplication, but from acquisition of distant paralogs and xenologs, introducing sequence variants that are more divergent than those that could have evolved in situ during the lifetime of the given group of organisms. Both high-variability and low-variability csCOGs were identified in all functional categories, but as expected, proteins encoded by integrated mobile elements as well as proteins involved in defense functions and cell motility are, on average, more variable than proteins with housekeeping functions. Additionally, using linear discriminant analysis, we found that variability and fraction of genomes carrying a given gene are the two variables that provide the best prediction of gene essentiality as compared to the results of transposon mutagenesis in Sulfolobus islandicus. Conclusions Variability, a measure of sequence diversity within an alignment relative to the overall diversity within a group of organisms, offers a convenient proxy for evolutionary rate estimates and is informative with respect to prediction of functional properties of proteins. In particular, variability is a strong predictor of gene essentiality for the respective organisms and indicative of sub- or neofunctionalization of paralogs. Supplementary Information The online version contains supplementary material available at 10.1186/s13062-022-00337-7.
Collapse
Affiliation(s)
- Svetlana Karamycheva
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD, 20894, USA
| | - Yuri I Wolf
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD, 20894, USA
| | - Erez Persi
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD, 20894, USA
| | - Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD, 20894, USA
| | - Kira S Makarova
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD, 20894, USA.
| |
Collapse
|
9
|
Zou X, Chen L, Li B, Xiao J, Xu P. The neuropeptide Y receptor gene repository, phylogeny and comparative expression in allotetraploid common carp. Sci Rep 2022; 12:9449. [PMID: 35676423 PMCID: PMC9177570 DOI: 10.1038/s41598-022-13587-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2022] [Accepted: 04/29/2022] [Indexed: 11/23/2022] Open
Abstract
NPY-family receptors belong to G protein-coupled receptors (GPCR), which lays a physiological foundation for the transmembrane transport of an endogenous appetite-stimulating factor neuropeptide Y and related peptides. In this study, we investigated the npyr genes in ten representative species, and twelve npyr genes were identified from allotetraploid C. carpio, the npyr gene number of C. carpio was twice the number of its subgenome B progenitor-like diploid Poropuntius huangchuchieni. Phylogenetic analysis showed that all npyr genes were divided into three subgroups, and they underwent strong purifying selection according to selection pressure analysis. Subsequently, synteny analysis showed that most npyr genes were evenly distributed on the homologous chromosomes of two subgenomes in allotetraploid C. carpio, in which npy1r and npy2r were tandem duplicated, respectively. In addition, the global expression of npyr genes during embryonic development in allotetraploid C. carpio suggested the potential function of npyr genes in immunity and reproduction. In adult tissues, npyr genes were mainly distributed in the brain, gonad, and skin, which displayed a similar expression pattern between the C. carpio B subgenome and P. huangchuchieni. In general, our research could provide reference information for future exploration of the NPY receptors and neuroendocrine system of allotetraploid C. carpio and vertebrates.
Collapse
Affiliation(s)
- Xiaoqing Zou
- State Key Laboratory of Marine Environmental Science, College of Ocean and Earth Sciences, Xiamen University, Xiamen, China.,Fujian Key Laboratory of Genetics and Breeding of Marine Organisms, College of Ocean and Earth Sciences, Xiamen University, Xiamen, China
| | - Lin Chen
- State Key Laboratory of Marine Environmental Science, College of Ocean and Earth Sciences, Xiamen University, Xiamen, China.,Fujian Key Laboratory of Genetics and Breeding of Marine Organisms, College of Ocean and Earth Sciences, Xiamen University, Xiamen, China
| | - Bijun Li
- State Key Laboratory of Marine Environmental Science, College of Ocean and Earth Sciences, Xiamen University, Xiamen, China.,Fujian Key Laboratory of Genetics and Breeding of Marine Organisms, College of Ocean and Earth Sciences, Xiamen University, Xiamen, China
| | - Junzhu Xiao
- State Key Laboratory of Marine Environmental Science, College of Ocean and Earth Sciences, Xiamen University, Xiamen, China.,Fujian Key Laboratory of Genetics and Breeding of Marine Organisms, College of Ocean and Earth Sciences, Xiamen University, Xiamen, China
| | - Peng Xu
- State Key Laboratory of Marine Environmental Science, College of Ocean and Earth Sciences, Xiamen University, Xiamen, China. .,Fujian Key Laboratory of Genetics and Breeding of Marine Organisms, College of Ocean and Earth Sciences, Xiamen University, Xiamen, China. .,State Key Laboratory of Large Yellow Croaker Breeding, Ningde Fufa Fisheries Company Limited, Ningde, China.
| |
Collapse
|
10
|
The role of zinc in the adaptive evolution of polar phytoplankton. Nat Ecol Evol 2022; 6:965-978. [PMID: 35654896 DOI: 10.1038/s41559-022-01750-x] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2021] [Accepted: 03/28/2022] [Indexed: 12/20/2022]
Abstract
Zinc is an essential trace metal for oceanic primary producers with the highest concentrations in polar oceans. However, its role in the biological functioning and adaptive evolution of polar phytoplankton remains enigmatic. Here, we have applied a combination of evolutionary genomics, quantitative proteomics, co-expression analyses and cellular physiology to suggest that model polar phytoplankton species have a higher demand for zinc because of elevated cellular levels of zinc-binding proteins. We propose that adaptive expansion of regulatory zinc-finger protein families, co-expanded and co-expressed zinc-binding proteins families involved in photosynthesis and growth in these microalgal species and their natural communities were identified to be responsible for the higher zinc demand. The expression of their encoding genes in eukaryotic phytoplankton metatranscriptomes from pole-to-pole was identified to correlate not only with dissolved zinc concentrations in the upper ocean but also with temperature, suggesting that environmental conditions of polar oceans are responsible for an increased demand of zinc. These results suggest that zinc plays an important role in supporting photosynthetic growth in eukaryotic polar phytoplankton and that this has been critical for algal colonization of low-temperature polar oceans.
Collapse
|
11
|
Accurate contact-based modelling of repeat proteins predicts the structure of new repeats protein families. PLoS Comput Biol 2021; 17:e1008798. [PMID: 33857128 PMCID: PMC8078820 DOI: 10.1371/journal.pcbi.1008798] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2020] [Revised: 04/27/2021] [Accepted: 02/15/2021] [Indexed: 12/18/2022] Open
Abstract
Repeat proteins are abundant in eukaryotic proteomes. They are involved in many eukaryotic specific functions, including signalling. For many of these proteins, the structure is not known, as they are difficult to crystallise. Today, using direct coupling analysis and deep learning it is often possible to predict a protein’s structure. However, the unique sequence features present in repeat proteins have been a challenge to use direct coupling analysis for predicting contacts. Here, we show that deep learning-based methods (trRosetta, DeepMetaPsicov (DMP) and PconsC4) overcomes this problem and can predict intra- and inter-unit contacts in repeat proteins. In a benchmark dataset of 815 repeat proteins, about 90% can be correctly modelled. Further, among 48 PFAM families lacking a protein structure, we produce models of forty-one families with estimated high accuracy. Repeat proteins are widespread among organisms and particularly abundant in eukaryotic proteomes. Their primary sequence presents repetition in the amino acid sequences that origin structures with repeated folds/domains. Although the repeated units often can be recognised from the sequence alone, often structural information is missing. Here, we used contact prediction for predicting the structure of repeats protein directly from their primary sequences. We benchmark the methods on a dataset comprehensive of all the known repeated structures. We evaluate the contact predictions and the obtained models for different classes of repeat proteins. Further, we develop and benchmark a quality assessment (QA) method specific for repeat proteins. Finally, we used the prediction pipeline for all PFAM repeat families without resolved structures and found that forty-one of them could be modelled with high accuracy.
Collapse
|
12
|
Persi E, Wolf YI, Horn D, Ruppin E, Demichelis F, Gatenby RA, Gillies RJ, Koonin EV. Mutation-selection balance and compensatory mechanisms in tumour evolution. Nat Rev Genet 2020; 22:251-262. [PMID: 33257848 DOI: 10.1038/s41576-020-00299-4] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/16/2020] [Indexed: 12/11/2022]
Abstract
Intratumour heterogeneity and phenotypic plasticity, sustained by a range of somatic aberrations, as well as epigenetic and metabolic adaptations, are the principal mechanisms that enable cancers to resist treatment and survive under environmental stress. A comprehensive picture of the interplay between different somatic aberrations, from point mutations to whole-genome duplications, in tumour initiation and progression is lacking. We posit that different genomic aberrations generally exhibit a temporal order, shaped by a balance between the levels of mutations and selective pressures. Repeat instability emerges first, followed by larger aberrations, with compensatory effects leading to robust tumour fitness maintained throughout the tumour progression. A better understanding of the interplay between genetic aberrations, the microenvironment, and epigenetic and metabolic cellular states is essential for early detection and prevention of cancer as well as development of efficient therapeutic strategies.
Collapse
Affiliation(s)
- Erez Persi
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA.
| | - Yuri I Wolf
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - David Horn
- School of Physics and Astronomy, Raymond & Beverly Sackler Faculty of Exact Sciences, Tel-Aviv University, Tel-Aviv, Israel
| | - Eytan Ruppin
- Cancer Data Science Lab, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Francesca Demichelis
- Department for Cellular, Computational and Integrative Biology, University of Trento, Trento, Italy.,Caryl and Israel Englander Institute for Precision Medicine, New York Presbyterian Hospital, Weill Cornell Medicine, New York, NY, USA
| | - Robert A Gatenby
- Integrated Mathematical Oncology, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL, USA
| | - Robert J Gillies
- Department of Cancer Physiology, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL, USA.
| | - Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA.
| |
Collapse
|
13
|
Galpern EA, Freiberger MI, Ferreiro DU. Large Ankyrin repeat proteins are formed with similar and energetically favorable units. PLoS One 2020; 15:e0233865. [PMID: 32579546 PMCID: PMC7314423 DOI: 10.1371/journal.pone.0233865] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2020] [Accepted: 05/13/2020] [Indexed: 11/19/2022] Open
Abstract
Ankyrin containing proteins are one of the most abundant repeat protein families present in all extant organisms. They are made with tandem copies of similar amino acid stretches that fold into elongated architectures. Here, we built and curated a dataset of 200 thousand proteins that contain 1.2 million Ankyrin regions and characterize the abundance, structure and energetics of the repetitive regions in natural proteins. We found that there is a continuous roughly exponential variety of array lengths with an exceptional frequency at 24 repeats. We described that individual repeats are seldom interrupted with long insertions and accept few deletions, in line with the known tertiary structures. We found that longer arrays are made up of repeats that are more similar to each other than shorter arrays, and display more favourable folding energy, hinting at their evolutionary origin. The array distributions show that there is a physical upper limit to the size of an array of repeats of about 120 copies, consistent with the limit found in nature. The identity patterns within the arrays suggest that they may have originated by sequential copies of more than one Ankyrin unit.
Collapse
Affiliation(s)
- Ezequiel A. Galpern
- Protein Physiology Lab, Departamento de Química Biológica, Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales (IQUIBICEN-CONICE), Universidad de Buenos Aires, Buenos Aires, Argentina
| | - María I. Freiberger
- Protein Physiology Lab, Departamento de Química Biológica, Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales (IQUIBICEN-CONICE), Universidad de Buenos Aires, Buenos Aires, Argentina
| | - Diego U. Ferreiro
- Protein Physiology Lab, Departamento de Química Biológica, Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales (IQUIBICEN-CONICE), Universidad de Buenos Aires, Buenos Aires, Argentina
- * E-mail:
| |
Collapse
|
14
|
Guimaraes AMS, Zimpel CK. Mycobacterium bovis: From Genotyping to Genome Sequencing. Microorganisms 2020; 8:E667. [PMID: 32375210 PMCID: PMC7285088 DOI: 10.3390/microorganisms8050667] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2020] [Revised: 04/17/2020] [Accepted: 04/21/2020] [Indexed: 12/15/2022] Open
Abstract
Mycobacterium bovis is the main pathogen of bovine, zoonotic, and wildlife tuberculosis. Despite the existence of programs for bovine tuberculosis (bTB) control in many regions, the disease remains a challenge for the veterinary and public health sectors, especially in developing countries and in high-income nations with wildlife reservoirs. Current bTB control programs are mostly based on test-and-slaughter, movement restrictions, and post-mortem inspection measures. In certain settings, contact tracing and surveillance has benefited from M. bovis genotyping techniques. More recently, whole-genome sequencing (WGS) has become the preferential technique to inform outbreak response through contact tracing and source identification for many infectious diseases. As the cost per genome decreases, the application of WGS to bTB control programs is inevitable moving forward. However, there are technical challenges in data analyses and interpretation that hinder the implementation of M. bovis WGS as a molecular epidemiology tool. Therefore, the aim of this review is to describe M. bovis genotyping techniques and discuss current standards and challenges of the use of M. bovis WGS for transmission investigation, surveillance, and global lineages distribution. We compiled a series of associated research gaps to be explored with the ultimate goal of implementing M. bovis WGS in a standardized manner in bTB control programs.
Collapse
Affiliation(s)
- Ana M. S. Guimaraes
- Laboratory of Applied Research in Mycobacteria, Department of Microbiology, University of São Paulo, São Paulo 01246-904, Brazil;
| | - Cristina K. Zimpel
- Laboratory of Applied Research in Mycobacteria, Department of Microbiology, University of São Paulo, São Paulo 01246-904, Brazil;
- Department of Preventive Veterinary Medicine and Animal Health, University of São Paulo, São Paulo 01246-904, Brazil
| |
Collapse
|
15
|
Khristich AN, Mirkin SM. On the wrong DNA track: Molecular mechanisms of repeat-mediated genome instability. J Biol Chem 2020; 295:4134-4170. [PMID: 32060097 PMCID: PMC7105313 DOI: 10.1074/jbc.rev119.007678] [Citation(s) in RCA: 161] [Impact Index Per Article: 40.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Expansions of simple tandem repeats are responsible for almost 50 human diseases, the majority of which are severe, degenerative, and not currently treatable or preventable. In this review, we first describe the molecular mechanisms of repeat-induced toxicity, which is the connecting link between repeat expansions and pathology. We then survey alternative DNA structures that are formed by expandable repeats and review the evidence that formation of these structures is at the core of repeat instability. Next, we describe the consequences of the presence of long structure-forming repeats at the molecular level: somatic and intergenerational instability, fragility, and repeat-induced mutagenesis. We discuss the reasons for gender bias in intergenerational repeat instability and the tissue specificity of somatic repeat instability. We also review the known pathways in which DNA replication, transcription, DNA repair, and chromatin state interact and thereby promote repeat instability. We then discuss possible reasons for the persistence of disease-causing DNA repeats in the genome. We describe evidence suggesting that these repeats are a payoff for the advantages of having abundant simple-sequence repeats for eukaryotic genome function and evolvability. Finally, we discuss two unresolved fundamental questions: (i) why does repeat behavior differ between model systems and human pedigrees, and (ii) can we use current knowledge on repeat instability mechanisms to cure repeat expansion diseases?
Collapse
Affiliation(s)
| | - Sergei M Mirkin
- Department of Biology, Tufts University, Medford, Massachusetts 02155.
| |
Collapse
|
16
|
Northover DE, Shank SD, Liberles DA. Characterizing lineage-specific evolution and the processes driving genomic diversification in chordates. BMC Evol Biol 2020; 20:24. [PMID: 32046633 PMCID: PMC7011509 DOI: 10.1186/s12862-020-1585-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2019] [Accepted: 01/16/2020] [Indexed: 11/21/2022] Open
Abstract
Background Understanding the origins of genome content has long been a goal of molecular evolution and comparative genomics. By examining genome evolution through the guise of lineage-specific evolution, it is possible to make inferences about the evolutionary events that have given rise to species-specific diversification. Here we characterize the evolutionary trends found in chordate species using The Adaptive Evolution Database (TAED). TAED is a database of phylogenetically indexed gene families designed to detect episodes of directional or diversifying selection across chordates. Gene families within the database have been assessed for lineage-specific estimates of dN/dS and have been reconciled to the chordate species to identify retained duplicates. Gene families have also been mapped to the functional pathways and amino acid changes which occurred on high dN/dS lineages have been mapped to protein structures. Results An analysis of this exhaustive database has enabled a characterization of the processes of lineage-specific diversification in chordates. A pathway level enrichment analysis of TAED determined that pathways most commonly found to have elevated rates of evolution included those involved in metabolism, immunity, and cell signaling. An analysis of protein fold presence on proteins, after normalizing for frequency in the database, found common folds such as Rossmann folds, Jelly Roll folds, and TIM barrels were overrepresented on proteins most likely to undergo directional selection. A set of gene families which experience increased numbers of duplications within short evolutionary times are associated with pathways involved in metabolism, olfactory reception, and signaling. An analysis of protein secondary structure indicated more relaxed constraint in β-sheets and stronger constraint on alpha Helices, amidst a general preference for substitutions at exposed sites. Lastly a detailed analysis of the ornithine decarboxylase gene family, a key enzyme in the pathway for polyamine synthesis, revealed lineage-specific evolution along the lineage leading to Cetacea through rapid sequence evolution in a duplicate gene with amino acid substitutions causing active site rearrangement. Conclusion Episodes of lineage-specific evolution are frequent throughout chordate species. Both duplication and directional selection have played large roles in the evolution of the phylum. TAED is a powerful tool for facilitating this understanding of lineage-specific evolution.
Collapse
Affiliation(s)
- David E Northover
- Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, PA, 19122, USA
| | - Stephen D Shank
- Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, PA, 19122, USA
| | - David A Liberles
- Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, PA, 19122, USA. .,Department of Molecular Biology, University of Wyoming, Laramie, WY, 82071, USA.
| |
Collapse
|
17
|
Barua A, Mikheyev AS. Many Options, Few Solutions: Over 60 My Snakes Converged on a Few Optimal Venom Formulations. Mol Biol Evol 2020; 36:1964-1974. [PMID: 31220860 PMCID: PMC6736290 DOI: 10.1093/molbev/msz125] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open
Abstract
Gene expression changes contribute to complex trait variations in both individuals and populations. However, the evolution of gene expression underlying complex traits over macroevolutionary timescales remains poorly understood. Snake venoms are proteinaceous cocktails where the expression of each toxin can be quantified and mapped to a distinct genomic locus and traced for millions of years. Using a phylogenetic generalized linear mixed model, we analyzed expression data of toxin genes from 52 snake species spanning the 3 venomous snake families and estimated phylogenetic covariance, which acts as a measure of evolutionary constraint. We find that evolution of toxin combinations is not constrained. However, although all combinations are in principle possible, the actual dimensionality of phylomorphic space is low, with envenomation strategies focused around only four major toxin families: metalloproteases, three-finger toxins, serine proteases, and phospholipases A2. Although most extant snakes prioritize either a single or a combination of major toxin families, they are repeatedly recruited and lost. We find that over macroevolutionary timescales, the venom phenotypes were not shaped by phylogenetic constraints, which include important microevolutionary constraints such as epistasis and pleiotropy, but more likely by ecological filtering that permits a small number of optimal solutions. As a result, phenotypic optima were repeatedly attained by distantly related species. These results indicate that venoms evolve by selection on biochemistry of prey envenomation, which permit diversity through parallelism, and impose strong limits, since only a few of the theoretically possible strategies seem to work well and are observed in extant snakes.
Collapse
Affiliation(s)
- Agneesh Barua
- Okinawa Institute of Science and Technology Graduate University, Onna, Japan
| | - Alexander S Mikheyev
- Okinawa Institute of Science and Technology Graduate University, Onna, Japan.,Evolutionary Genomics Research Group, Ecology and Evolution Unit, Australian National University, Canberra, Australia
| |
Collapse
|
18
|
Ntountoumi C, Vlastaridis P, Mossialos D, Stathopoulos C, Iliopoulos I, Promponas V, Oliver SG, Amoutzias GD. Low complexity regions in the proteins of prokaryotes perform important functional roles and are highly conserved. Nucleic Acids Res 2019; 47:9998-10009. [PMID: 31504783 PMCID: PMC6821194 DOI: 10.1093/nar/gkz730] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2019] [Revised: 07/16/2019] [Accepted: 08/15/2019] [Indexed: 01/27/2023] Open
Abstract
We provide the first high-throughput analysis of the properties and functional role of Low Complexity Regions (LCRs) in more than 1500 prokaryotic and phage proteomes. We observe that, contrary to a widespread belief based on older and sparse data, LCRs actually have a significant, persistent and highly conserved presence and role in many and diverse prokaryotes. Their specific amino acid content is linked to proteins with certain molecular functions, such as the binding of RNA, DNA, metal-ions and polysaccharides. In addition, LCRs have been repeatedly identified in very ancient, and usually highly expressed proteins of the translation machinery. At last, based on the amino acid content enriched in certain categories, we have developed a neural network web server to identify LCRs and accurately predict whether they can bind nucleic acids, metal-ions or are involved in chaperone functions. An evaluation of the tool showed that it is highly accurate for eukaryotic proteins as well.
Collapse
Affiliation(s)
- Chrysa Ntountoumi
- Bioinformatics Laboratory, Department of Biochemistry and Biotechnology, University of Thessaly, 41500, Greece
| | - Panayotis Vlastaridis
- Bioinformatics Laboratory, Department of Biochemistry and Biotechnology, University of Thessaly, 41500, Greece
| | - Dimitris Mossialos
- Microbial Biotechnology-Molecular Bacteriology-Virology Laboratory, Department of Biochemistry and Biotechnology, University of Thessaly, 41500, Greece
| | | | | | - Vasilios Promponas
- Bioinformatics Research Laboratory, Department of Biological Sciences, New Campus, University of Cyprus, PO Box 20537, CY-1678 Nicosia, Cyprus
| | - Stephen G Oliver
- Cambridge Systems Biology Centre & Department of Biochemistry, University of Cambridge, CB2 1GA, UK
| | - Grigoris D Amoutzias
- Bioinformatics Laboratory, Department of Biochemistry and Biotechnology, University of Thessaly, 41500, Greece
| |
Collapse
|
19
|
de Jonge PA, von Meijenfeldt FAB, van Rooijen LE, Brouns SJJ, Dutilh BE. Evolution of BACON Domain Tandem Repeats in crAssphage and Novel Gut Bacteriophage Lineages. Viruses 2019; 11:v11121085. [PMID: 31766550 PMCID: PMC6949934 DOI: 10.3390/v11121085] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2019] [Revised: 11/17/2019] [Accepted: 11/19/2019] [Indexed: 12/12/2022] Open
Abstract
The human gut contains an expanse of largely unstudied bacteriophages. Among the most common are crAss-like phages, which were predicted to infect Bacteriodetes hosts. CrAssphage, the first crAss-like phage to be discovered, contains a protein encoding a Bacteroides-associated carbohydrate-binding often N-terminal (BACON) domain tandem repeat. Because protein domain tandem repeats are often hotspots of evolution, BACON domains may provide insight into the evolution of crAss-like phages. Here, we studied the biodiversity and evolution of BACON domains in bacteriophages by analysing over 2 million viral contigs. We found a high biodiversity of BACON in seven gut phage lineages, including five known crAss-like phage lineages and two novel gut phage lineages that are distantly related to crAss-like phages. In three BACON-containing phage lineages, we found that BACON domain tandem repeats were associated with phage tail proteins, suggestive of a possible role of these repeats in host binding. In contrast, individual BACON domains that did not occur in tandem were not found in the proximity of tail proteins. In two lineages, tail-associated BACON domain tandem repeats evolved largely through horizontal transfer of separate domains. In the third lineage that includes the prototypical crAssphage, the tandem repeats arose from several sequential domain duplications, resulting in a characteristic tandem array that is distinct from bacterial BACON domains. We conclude that phage tail-associated BACON domain tandem repeats have evolved in at least two independent cases in gut bacteriophages, including in the widespread gut phage crAssphage.
Collapse
Affiliation(s)
- Patrick A. de Jonge
- Theoretical Biology and Bioinformatics, Science4 Life, Utrecht University, 3584 CH Utrecht, The Netherlands; (P.A.d.J.); (F.A.B.v.M.); (L.E.v.R.)
- Department of Bionanoscience, Kavli Institute of Nanoscience, Delft University of Technology, 2629 HZ Delft, The Netherlands;
| | - F. A. Bastiaan von Meijenfeldt
- Theoretical Biology and Bioinformatics, Science4 Life, Utrecht University, 3584 CH Utrecht, The Netherlands; (P.A.d.J.); (F.A.B.v.M.); (L.E.v.R.)
| | - Laura E. van Rooijen
- Theoretical Biology and Bioinformatics, Science4 Life, Utrecht University, 3584 CH Utrecht, The Netherlands; (P.A.d.J.); (F.A.B.v.M.); (L.E.v.R.)
| | - Stan J. J. Brouns
- Department of Bionanoscience, Kavli Institute of Nanoscience, Delft University of Technology, 2629 HZ Delft, The Netherlands;
| | - Bas E. Dutilh
- Theoretical Biology and Bioinformatics, Science4 Life, Utrecht University, 3584 CH Utrecht, The Netherlands; (P.A.d.J.); (F.A.B.v.M.); (L.E.v.R.)
- Centre for Molecular and Biomolecular Informatics, Radboud Institute for Molecular Life Sciences, Radboud University Medical Centre, 6525 GA Nijmegen, The Netherlands
- Correspondence:
| |
Collapse
|
20
|
Proteomic and genomic signatures of repeat instability in cancer and adjacent normal tissues. Proc Natl Acad Sci U S A 2019; 116:16987-16996. [PMID: 31387980 DOI: 10.1073/pnas.1908790116] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Repetitive sequences are hotspots of evolution at multiple levels. However, due to difficulties involved in their assembly and analysis, the role of repeats in tumor evolution is poorly understood. We developed a rigorous motif-based methodology to quantify variations in the repeat content, beyond microsatellites, in proteomes and genomes directly from proteomic and genomic raw data. This method was applied to a wide range of tumors and normal tissues. We identify high similarity between repeat instability patterns in tumors and their patient-matched adjacent normal tissues. Nonetheless, tumor-specific signatures both in protein expression and in the genome strongly correlate with cancer progression and robustly predict the tumorigenic state. In a patient, the hierarchy of genomic repeat instability signatures accurately reconstructs tumor evolution, with primary tumors differentiated from metastases. We observe an inverse relationship between repeat instability and point mutation load within and across patients independent of other somatic aberrations. Thus, repeat instability is a distinct, transient, and compensatory adaptive mechanism in tumor evolution and a potential signal for early detection.
Collapse
|
21
|
A Graph-Based Approach for Detecting Sequence Homology in Highly Diverged Repeat Protein Families. Methods Mol Biol 2019. [PMID: 30298401 DOI: 10.1007/978-1-4939-8736-8_13] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]
Abstract
Reconstructing evolutionary relationships in repeat proteins is notoriously difficult due to the high degree of sequence divergence that typically occurs between duplicated repeats. This is complicated further by the fact that proteins with a large number of similar repeats are more likely to produce significant local sequence alignments than proteins with fewer copies of the repeat motif. Furthermore, biologically correct sequence alignments are sometimes impossible to achieve in cases where insertion or translocation events disrupt the order of repeats in one of the sequences being aligned. Combined, these attributes make traditional phylogenetic methods for studying protein families unreliable for repeat proteins, due to the dependence of such methods on accurate sequence alignment.We present here a practical solution to this problem, making use of graph clustering combined with the open-source software package HH-suite, which enables highly sensitive detection of sequence relationships. Carrying out multiple rounds of homology searches via alignment of profile hidden Markov models, large sets of related proteins are generated. By representing the relationships between proteins in these sets as graphs, subsequent clustering with the Markov cluster algorithm enables robust detection of repeat protein subfamilies.
Collapse
|
22
|
Rahbar MR, Zarei M, Jahangiri A, Khalili S, Nezafat N, Negahdaripour M, Fattahian Y, Ghasemi Y. Trimeric autotransporter adhesins in Acinetobacter baumannii, coincidental evolution at work. INFECTION GENETICS AND EVOLUTION 2019; 71:116-127. [PMID: 30922803 DOI: 10.1016/j.meegid.2019.03.023] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/25/2019] [Revised: 02/27/2019] [Accepted: 03/23/2019] [Indexed: 12/20/2022]
Abstract
Trimeric autotransporter (TAA), also known as type Vc secretion system, is expressed by many strains of Acinetobacter baumannii, an opportunistic pathogen, which is responsible for nosocomial infections worldwide. TAAs, are modular homotrimeric virulence factors, containing a signal peptide, complex stalk, and conserved membrane anchoring domain. The evolutionary mechanisms underlying the evolvement of these adhesins are not clear. Here, we showed that TAA genes were laterally acquired and underwent gene duplication and recombination. The heterogeneity of TAA nucleotide sequences, GC content, codon usage, and the probability of recombination and duplication events were assessed by MEGA7. Given the heterogeneity of sequences, we used all-against-all BLAST for clustering the TAAs. The pattern of distribution of TAAs are highly scattered; GC content and codon usage for these genes are variable. Multiple events of lateral gene transfer from the early history of Acinetobacter and the occurrence of gene duplication, gene loss, and recombination after acquiring the alien genes may explain the scattered pattern of distribution of TAAs. Additionally, this gene is not present in many clinical isolates of A. baumannii, thus is not a single virulence factor attributing to the infection. The advantage of harboring such genes might be adopting to different environments by developing the biofilm communities. We suggested that TAA genes were laterally acquired in the environmental context and incidentally provided some benefits at the infection site. Thus, coincidental evolution theory may be better suited for describing the evolution of TAA genes in A. baumannii genomes.
Collapse
Affiliation(s)
- Mohammad Reza Rahbar
- Pharmaceutical Sciences Research Center, Shiraz University of Medical Sciences, Shiraz, Iran
| | - Mahboubeh Zarei
- Pharmaceutical Sciences Research Center, Shiraz University of Medical Sciences, Shiraz, Iran
| | - Abolfazl Jahangiri
- Applied Microbiology Research Center, Systems Biology and Poisonings Institute, Baqiyatallah University of Medical Sciences, Tehran, Iran
| | - Saeed Khalili
- Department of Biology Sciences, Shahid Rajaee Teacher Training University, Tehran, Iran
| | - Navid Nezafat
- Pharmaceutical Sciences Research Center, Shiraz University of Medical Sciences, Shiraz, Iran; Department of Pharmaceutical Biotechnology, School of Pharmacy, Shiraz University of Medical Sciences, Shiraz, Iran
| | - Manica Negahdaripour
- Pharmaceutical Sciences Research Center, Shiraz University of Medical Sciences, Shiraz, Iran; Department of Pharmaceutical Biotechnology, School of Pharmacy, Shiraz University of Medical Sciences, Shiraz, Iran
| | - Yaser Fattahian
- Department of Biotechnology, Institute of Science and High Technology and Environmental Sciences, Graduate University of Advanced Technology, Kerman, Iran
| | - Younes Ghasemi
- Pharmaceutical Sciences Research Center, Shiraz University of Medical Sciences, Shiraz, Iran; Department of Pharmaceutical Biotechnology, School of Pharmacy, Shiraz University of Medical Sciences, Shiraz, Iran.
| |
Collapse
|
23
|
Chatterjee A, Sicheritz-Pontén T, Yadav R, Kondabagil K. Genomic and metagenomic signatures of giant viruses are ubiquitous in water samples from sewage, inland lake, waste water treatment plant, and municipal water supply in Mumbai, India. Sci Rep 2019; 9:3690. [PMID: 30842490 PMCID: PMC6403294 DOI: 10.1038/s41598-019-40171-y] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2018] [Accepted: 02/04/2019] [Indexed: 11/09/2022] Open
Abstract
We report the detection of genomic signatures of giant viruses (GVs) in the metagenomes of three environment samples from Mumbai, India, namely, a pre-filter of a household water purifier, a sludge sample from wastewater treatment plant (WWTP), and a drying bed sample of the same WWTP. The de novo assembled contigs of each sample yielded 700 to 2000 maximum unique matches with the GV genomic database. In all three samples, the maximum number of reads aligned to Pandoraviridae, followed by Phycodnaviridae, Mimiviridae, Iridoviridae, and other Megaviruses. We also isolated GVs from every environmental sample (n = 20) we tested using co-culture of the sample with Acanthomoeba castellanii. From this, four randomly selected GVs were subjected to the genomic characterization that showed remarkable cladistic homology with the three GV families viz., Mimivirirdae (Mimivirus Bombay [MVB]), Megaviruses (Powai lake megavirus [PLMV] and Bandra megavius [BAV]), and Marseilleviridae (Kurlavirus [KV]). All 4 isolates exhibited remarkable genomic identity with respective GV families. Functionally, the genomes were indistinguishable from other previously reported GVs, encoding nearly all COGs across extant family members. Further, the uncanny genomic homogeneity exhibited by individual GV families across distant geographies indicate their yet to be ascertained ecological significance.
Collapse
Affiliation(s)
- Anirvan Chatterjee
- Department of Biosciences and Bioengineering, Indian Institute of Technology Bombay, Powai, Mumbai, India
| | - Thomas Sicheritz-Pontén
- Centre of Excellence for Omics-Driven Computational Biodiscovery (COMBio), Faculty of Applied Sciences, AIMST University, Kedah, Malaysia
| | - Rajesh Yadav
- Department of Biosciences and Bioengineering, Indian Institute of Technology Bombay, Powai, Mumbai, India
| | - Kiran Kondabagil
- Department of Biosciences and Bioengineering, Indian Institute of Technology Bombay, Powai, Mumbai, India.
| |
Collapse
|
24
|
Chaudhry SR, Lwin N, Phelan D, Escalante AA, Battistuzzi FU. Comparative analysis of low complexity regions in Plasmodia. Sci Rep 2018; 8:335. [PMID: 29321589 PMCID: PMC5762703 DOI: 10.1038/s41598-017-18695-y] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2017] [Accepted: 12/14/2017] [Indexed: 12/20/2022] Open
Abstract
Low complexity regions (LCRs) are a common feature shared by many genomes, but their evolutionary and functional significance remains mostly unknown. At the core of the uncertainty is a poor understanding of the mechanisms that regulate their retention in genomes, whether driven by natural selection or neutral evolution. Applying a comparative approach of LCRs to multiple strains and species is a powerful approach to identify patterns of conservation in these regions. Using this method, we investigate the evolutionary history of LCRs in the genus Plasmodium based on orthologous protein coding genes shared by 11 species and strains from primate and rodent-infecting pathogens. We find multiple lines of evidence in support of natural selection as a major evolutionary force shaping the composition and conservation of LCRs through time and signatures that their evolutionary paths are species specific. Our findings add a comparative analysis perspective to the debate on the evolution of LCRs and harness the power of sequence comparisons to identify potential functionally important LCR candidates.
Collapse
Affiliation(s)
- S R Chaudhry
- Department of Biological Sciences, Oakland University, Rochester, MI, USA.,Center for Molecular Medicine and Genetics, Wayne State University, Detroit, MI, USA
| | - N Lwin
- Department of Biological Sciences, Oakland University, Rochester, MI, USA
| | - D Phelan
- Department of Biological Sciences, Oakland University, Rochester, MI, USA
| | - A A Escalante
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA
| | - F U Battistuzzi
- Department of Biological Sciences, Oakland University, Rochester, MI, USA. .,Center for Data Science and Big Data Analytics, Oakland University, Rochester, MI, USA.
| |
Collapse
|
25
|
Shukla A, Chatterjee A, Kondabagil K. The number of genes encoding repeat domain-containing proteins positively correlates with genome size in amoebal giant viruses. Virus Evol 2018; 4:vex039. [PMID: 29308275 PMCID: PMC5753266 DOI: 10.1093/ve/vex039] [Citation(s) in RCA: 42] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
Curiously, in viruses, the virion volume appears to be predominantly driven by genome length rather than the number of proteins it encodes or geometric constraints. With their large genome and giant particle size, amoebal viruses (AVs) are ideally suited to study the relationship between genome and virion size and explore the role of genome plasticity in their evolutionary success. Different genomic regions of AVs exhibit distinct genealogies. Although the vertically transferred core genes and their functions are universally conserved across the nucleocytoplasmic large DNA virus (NCLDV) families and are essential for their replication, the horizontally acquired genes are variable across families and are lineage-specific. When compared with other giant virus families, we observed a near–linear increase in the number of genes encoding repeat domain-containing proteins (RDCPs) with the increase in the genome size of AVs. From what is known about the functions of RDCPs in bacteria and eukaryotes and their prevalence in the AV genomes, we envisage important roles for RDCPs in the life cycle of AVs, their genome expansion, and plasticity. This observation also supports the evolution of AVs from a smaller viral ancestor by the acquisition of diverse gene families from the environment including RDCPs that might have helped in host adaption.
Collapse
Affiliation(s)
- Avi Shukla
- Department of Biosciences and Bioengineering, Indian Institute of Technology Bombay, Powai, Mumbai, Maharashtra 400076, India
| | - Anirvan Chatterjee
- Department of Biosciences and Bioengineering, Indian Institute of Technology Bombay, Powai, Mumbai, Maharashtra 400076, India
| | - Kiran Kondabagil
- Department of Biosciences and Bioengineering, Indian Institute of Technology Bombay, Powai, Mumbai, Maharashtra 400076, India
| |
Collapse
|