1
|
Di Giulio M. The absence of the evolutionary state of the Prokaryote would imply a polyphyletic origin of proteins and that LUCA, the ancestor of bacteria and that of archaea were progenotes. Biosystems 2023; 233:105014. [PMID: 37652180 DOI: 10.1016/j.biosystems.2023.105014] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Revised: 08/25/2023] [Accepted: 08/26/2023] [Indexed: 09/02/2023]
Abstract
I analysed the similarity gradient observed in protein families - of phylogenetically deep fundamental traits - of bacteria and archaea, ranging from cases such as the core of the DNA replication apparatus where there is no sequence similarity between the proteins involved, to cases in which, as in the translation initiation factors, only some proteins involved would be homologs, to cases such as for aminoacyl-tRNA synthetases in which most of the proteins involved would be homologs. This pattern of similarity between bacteria and archaea would seem to be a very clear indication of a transitional evolutionary stage that preceded both the Last Bacterial Common Ancestor and the Last Archaeal Common Ancestor, i.e. progenotic stages. Indeed, this similarity pattern would seem to exemplify an ongoing transition as all the evolutionary phases would be represented in it. Instead, in the cellular stage it is expected that these evolutionary phases should have already been overcome, i.e. completed, and therefore no longer detectable. In fact, if we had really been in the presence of the prokaryotic stage then we should not have observed this similarity pattern in proteins involved in defining the ancestral characters of bacteria and archaea, as the completion of the different cellular structures should have required a very low number of proteins to be late evolved in lineages leading to bacteria and archaea. Indeed, the already reached state of the Prokaryote would have determined complete cellular structures therefore a total absence of proteins to evolve independently in the two main phyletic lineages and able to complete the evolution of a particular character already evidently in a definitive state, which, on the other hand, does not appear to have been the case. All this would have prevented the formation of this pattern of similarity which instead would appear to be real. In conclusion, the existence of this pattern of similarity observed in the families of homologous proteins of bacteria and archaea would imply the absence of the evolutionary stage of the Prokaryote and consequently a progenotic status to be assigned to the LUCA. Indeed, the LUCA stage would have been a stage of evolutionary transition because it is belatedly marked by the presence of all the different evolutionary phases, evidently more easily interpretable within the definition of progenote than that of genote precisely because they are inherent in an evolutionary transition and not to an evolution that has already been achieved. Finally, I discuss the importance of these arguments for the polyphyletic origin of proteins.
Collapse
Affiliation(s)
- Massimo Di Giulio
- The Ionian School, Early Evolution of Life Department, Genetic Code and tRNA Origin Laboratory, Via Roma 19, 67030, Alfedena, L'Aquila, Italy.
| |
Collapse
|
2
|
The origins of the cell membrane, the progenote, and the universal ancestor (LUCA). Biosystems 2022; 222:104799. [DOI: 10.1016/j.biosystems.2022.104799] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2022] [Revised: 10/21/2022] [Accepted: 10/22/2022] [Indexed: 11/18/2022]
|
3
|
Kondratyeva LG, Dyachkova MS, Galchenko AV. The Origin of Genetic Code and Translation in the Framework of Current Concepts on the Origin of Life. BIOCHEMISTRY. BIOKHIMIIA 2022; 87:150-169. [PMID: 35508902 DOI: 10.1134/s0006297922020079] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
The origin of genetic code and translation system is probably the central and most difficult problem in the investigations on the origin of life and one of the most complex problems in the evolutionary biology in general. There are multiple hypotheses on the emergence and development of existing genetic systems that propose the mechanisms for the origin and early evolution of genetic code, as well as for the emergence of replication and translation. Here, we discuss the most well-known of these hypotheses, although none of them provides a description of the early evolution of genetic systems without gaps and assumptions. The RNA world hypothesis is a currently prevailing scientific idea on the early evolution of biological and pre-biological structures, the main advantage of which is the assumption that RNAs as the first living systems were self-sufficient, i.e., capable of functioning as both catalysts and templates. However, this hypothesis has also significant limitations. In particular, no ribozymes with processive polymerase activity have been yet discovered or synthesized. Taking into account the mutual need of proteins and nucleic acids in each other in the current world, many authors propose the early evolution scenarios based on the co-evolution of these two classes of organic molecules. They postulate that the emergence of translation was necessary for the replication of nucleic acids, in contrast to the RNA world hypothesis, according to which the emergence of translation was preceded by the era of self-replicating RNAs. Although such scenarios are less parsimonious from the evolutionary point of view, since they require simultaneous emergence and evolution of two classes of organic molecules, as well as the emergence of synchronized replication and translation, their major advantage is that they explain the development of processive and much more accurate protein-dependent replication.
Collapse
Affiliation(s)
- Liya G Kondratyeva
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, Moscow, 117997, Russia
| | | | - Alexey V Galchenko
- Peoples' Friendship University of Russia (RUDN University), Moscow, 117198, Russia.
| |
Collapse
|
4
|
Di Giulio M. The RNase P, LUCA, the ancestors of the life domains, the progenote, and the tree of life. Biosystems 2021; 212:104604. [PMID: 34979158 DOI: 10.1016/j.biosystems.2021.104604] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2021] [Revised: 12/24/2021] [Accepted: 12/29/2021] [Indexed: 11/15/2022]
Abstract
I have tried to interpret the phylogenetic distribution of the RNase P with the aim of helping to clarify the stage reached by the evolution of cellularity in the Last Universal Common Ancestor (LUCA); that is to say, if the evolutionary stage of the LUCA was represented by a protocell (progenote) or by a complete cell (genote). Since there are several arguments that lead one to believe that only the RNA moiety of the RNase P was present in the LUCA, this might imply that this evolutionary stage was actually the RNA world. If true this would imply that the LUCA was a progenote because the RNA world being a world subject to multiple evolutionary transitions that would involve a high noise at many its levels, which would fall within the definition of the progenote. Furthermore, since RNA-mediated catalysis is much less efficient than protein-mediated catalysis, then the only RNA moiety that was present in the LUCA could imply - by per se, without invoking the existence of the RNA world - that the LUCA was a progenote because an inefficient catalysis might have characterized this evolutionary stage. This evolutionary stage would still fall under the definition of the progenote. In addition, the observation that the protein moieties of the RNase P of bacteria and archaea are not-homologs would imply that these originated independently in the two main phyletic lineages. In turn, this would imply the progenotic nature of the ancestors of both archaea and bacteria. Indeed, it is admissible that such a late origin - in the main phyletic lineages - of the protein moieties of the RNase P is witness to an evolutionary transition towards a more efficient catalysis, evidently made clear precisely by the evolution of the protein moieties of the RNase P which would have helped the RNA of the RNase P to a more efficient catalysis. Hence, this would date that evolutionary moment as a transition to a much more efficient catalysis and consequently would imply which in that evolutionary stage there was the actual transition from the progenotic to genotic status. Finally, this late origin of the RNase P protein moieties in the bacterial and archaeal domains per se could imply the presence of a progenotic stage for their ancestors, or at least that a cell stage would have been much less likely. In fact, it is true that genes can originate both in a cellular and in a progenotic stage, but they mainly typify the latter because they are, by definition, in formation. Then it is expected that in the evolutionary stage of the formation of the main phyletic lineages - that is to say, in an evolutionary time in which the formation of genes might be expected - that the origin of proteins is to be related to a rapid and progressive evolution typical of the progenote precisely because in such an evolutionary stage the origin of genes is more easily and simply explained as reflecting a progenotic rather than a genotic stage. Indeed, if instead the evolutionary stage of the ancestors of bacteria and archaea had been the cellular one, then observing the origin of the protein moieties of the RNase P would have been, to some extent, anomalous because this completion should have already occurred, simply because the transformation of a ribozyme into an enzyme should have already taken place precisely because it falls within the very definition of the cellular status. The conclusion is that both the LUCA and the ancestor of archaea and that of bacteria may have been progenotes. If these arguments were true then either the tree of life as commonly understood would not exist and therefore the main phyletic lineages would have originated directly from the LUCA, or there would have been at least two different populations of progenotes that would have finally defined the domain of bacteria and that of archaea.
Collapse
Affiliation(s)
- Massimo Di Giulio
- The Ionian School, Genetic Code and tRNA Origin Laboratory, Via Roma 19, 67030, Alfedena (L'Aquila), Italy.
| |
Collapse
|
5
|
Di Giulio M. The phylogenetic distribution of the cell division system would not imply a cellular LUCA but a progenotic LUCA. Biosystems 2021; 210:104563. [PMID: 34653531 DOI: 10.1016/j.biosystems.2021.104563] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2021] [Revised: 10/08/2021] [Accepted: 10/08/2021] [Indexed: 12/28/2022]
Abstract
The stage reached by the evolution of cellularity in the Last Universal Common Ancestor (LUCA) has not yet been identified. In actual fact, it has not been clarified whether the LUCA was a cell (genote) or a protocell (progenote). Recently, Pende et al. (2021) analysed the phylogenetic distribution of the cell division system present in bacteria and archaea reaching the conclusion that LUCA was a cell and not a progenote. I find this conclusion unreasonable with respect to the observations they presented. One of the points is that the presence in the domains of life of many genes - some paralogs - which would define the membrane-remodeling superfamily would seem to imply a tempo and a mode of evolution for the LUCA more typical of the progenote than the genote. Indeed, the simultaneous presence of different genes - in a given evolutionary stage and with functions that are also partially correlated - would seem to define a heterogeneity that would appear to be the expression of a rapid and progressive evolution precisely because this evolution would have taken place in the diversification of all these genes. Furthermore, the presence of different genes coding for the function of cell division and related functions could reflect a progenotic status in LUCA, precisely because these functions might have originated from a single ancestral gene instead coding for a protein (or proteins) with multiple functions, and therefore an expression of a rapid and progressive evolution typical of the progenote. I also criticize other aspects of considerations made by Pende at al. (2021). The arguments presented here together with those existing in the literature make the hypothesis of a cellular LUCA favoured by Pende et al. (2021) unlikely.
Collapse
Affiliation(s)
- Massimo Di Giulio
- The Ionian School, Genetic Code and tRNA Origin Laboratory, Via Roma 19, 67030, Alfedena (L'Aquila), Italy.
| |
Collapse
|
6
|
Psomopoulos FE, van Helden J, Médigue C, Chasapi A, Ouzounis CA. Ancestral state reconstruction of metabolic pathways across pangenome ensembles. Microb Genom 2021; 6. [PMID: 32924924 PMCID: PMC7725326 DOI: 10.1099/mgen.0.000429] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
As genome sequencing efforts are unveiling the genetic diversity of the biosphere with an unprecedented speed, there is a need to accurately describe the structural and functional properties of groups of extant species whose genomes have been sequenced, as well as their inferred ancestors, at any given taxonomic level of their phylogeny. Elaborate approaches for the reconstruction of ancestral states at the sequence level have been developed, subsequently augmented by methods based on gene content. While these approaches of sequence or gene-content reconstruction have been successfully deployed, there has been less progress on the explicit inference of functional properties of ancestral genomes, in terms of metabolic pathways and other cellular processes. Herein, we describe PathTrace, an efficient algorithm for parsimony-based reconstructions of the evolutionary history of individual metabolic pathways, pivotal representations of key functional modules of cellular function. The algorithm is implemented as a five-step process through which pathways are represented as fuzzy vectors, where each enzyme is associated with a taxonomic conservation value derived from the phylogenetic profile of its protein sequence. The method is evaluated with a selected benchmark set of pathways against collections of genome sequences from key data resources. By deploying a pangenome-driven approach for pathway sets, we demonstrate that the inferred patterns are largely insensitive to noise, as opposed to gene-content reconstruction methods. In addition, the resulting reconstructions are closely correlated with the evolutionary distance of the taxa under study, suggesting that a diligent selection of target pangenomes is essential for maintaining cohesiveness of the method and consistency of the inference, serving as an internal control for an arbitrary selection of queries. The PathTrace method is a first step towards the large-scale analysis of metabolic pathway evolution and our deeper understanding of functional relationships reflected in emerging pangenome collections.
Collapse
Affiliation(s)
- Fotis E Psomopoulos
- Institute of Applied Biosciences (INAB), Center for Research & Technology Hellas (CERTH), GR-57001 Thessalonica, Greece
| | - Jacques van Helden
- Lab. Technological Advances for Genomics & Clinics (TAGC), Université d'Aix-Marseille (AMU), INSERM Unit U1090, 163, Avenue de Luminy, 13288 Marseille cedex 09, France
| | - Claudine Médigue
- UMR 8030, CNRS, Université Evry-Val-d'Essonne, CEA, Institut de Biologie François Jacob - Genoscope, Laboratoire d'Analyses Bioinformatiques pour la Génomique et le Métabolisme, Evry, France
| | - Anastasia Chasapi
- Biological Computation & Process Laboratory (BCPL), Chemical Process & Energy Resources Institute (CPERI), Center for Research & Technology Hellas (CERTH), GR-57001 Thessalonica, Greece
| | - Christos A Ouzounis
- Biological Computation & Process Laboratory (BCPL), Chemical Process & Energy Resources Institute (CPERI), Center for Research & Technology Hellas (CERTH), GR-57001 Thessalonica, Greece
| |
Collapse
|
7
|
Tang WT, Hao TW, Chen GH. Comparative metabolic modeling of multiple sulfate-reducing prokaryotes reveals versatile energy conservation mechanisms. Biotechnol Bioeng 2021; 118:2676-2693. [PMID: 33844295 DOI: 10.1002/bit.27787] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2020] [Revised: 01/21/2021] [Accepted: 03/11/2021] [Indexed: 11/07/2022]
Abstract
Sulfate-reducing prokaryotes (SRPs) are crucial participants in the cycling of sulfur, carbon, and various metals in the natural environment and in engineered systems. Despite recent advances in genetics and molecular biology bringing a huge amount of information about the energy metabolism of SRPs, little effort has been made to link this important information with their biotechnological studies. This study aims to construct multiple metabolic models of SRPs that systematically compile genomic, genetic, biochemical, and molecular information about SRPs to study their energy metabolism. Pan-genome analysis was conducted to compare the genomes of SRPs, from which a list of orthologous genes related to central and energy metabolism was obtained. Twenty-four SRP metabolic models via the inference of pan-genome analysis were efficiently constructed. The metabolic model of the well-studied model SRP Desulfovibrio vulgaris Hildenborough (DvH) was validated via flux balance analysis (FBA). The DvH model predictions matched reported experimental growth and energy yields, which demonstrated that the core metabolic model worked successfully. Further, steady-state simulation of SRP metabolic models under different growth conditions showed how the use of different electron transfer pathways leads to energy generation. Three energy conservation mechanisms were identified, including menaquinone-based redox loop, hydrogen cycling, and proton pumping. Flavin-based electron bifurcation (FBEB) was also demonstrated to be an essential mechanism for supporting energy conservation. The developed models can be easily extended to other species of SRPs not examined in this study. More importantly, the present work develops an accurate and efficient approach for constructing metabolic models of multiple organisms, which can be applied to other critical microbes in environmental and industrial systems, thereby enabling the quantitative prediction of their metabolic behaviors to benefit relevant applications.
Collapse
Affiliation(s)
- Wen-Tao Tang
- Department of Civil and Environmental Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong, China
| | - Tian-Wei Hao
- Department of Civil and Environmental Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong, China.,Department of Civil and Environmental Engineering, Faculty of Science and Technology, University of Macau, Macau, China
| | - Guang-Hao Chen
- Department of Civil and Environmental Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong, China
| |
Collapse
|
8
|
Di Giulio M. Errors of the ancestral translation, LUCA, and nature of its direct descendants. Biosystems 2021; 206:104433. [PMID: 33915233 DOI: 10.1016/j.biosystems.2021.104433] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2021] [Revised: 04/20/2021] [Accepted: 04/20/2021] [Indexed: 10/21/2022]
Abstract
I analyzed the implications of the observation that the methyltransferases, Trm5 and TrmD, which perform the methylation of the 37th base (m1G37) in tRNAs of bacteria and archaea respectively, are not homologous proteins. The first implication is that these methyltransferases originated very late only when the fundamental lineages leading to bacteria and archaea had separated, otherwise the two methyltransferases would have been homologous enzymes, which they are not. The conclusion that Trm5 and TrmD originated only when the main lineages were defined would imply that at least some aspects of the translation, such as +1 frameshifting, were still in rapid and progressive evolution, that is, they were still originating. This would in itself imply a high rate of translation errors because the absence of m1G37 from tRNAs could have determined a high rate of +1 translational frameshifting in the reading of mRNAs, identifying this stage as that of a phase of the origin of the genetic code. Furthermore, the observation that the frameshifting mechanism was still in rapid and progressive evolution in such an advanced evolutionary stage would imply that other mechanisms concerning translation were still rapidly evolving simply because it would be very unique if only the frameshifting mechanism were the only one still originating. Importantly, the observation that in archaea m1G37 also acts as a determinant of the identity of the tRNACysGCA would imply in itself that some aspects of the origin of the genetic code were still originating, greatly strengthening the hypothesis that other aspects of the translation apparatus were still in rapid and progressive evolution. Then, all this would imply a status of progenote for LUCA and ancestors of archaea and bacteria because a high rate of translation errors would fall within the definition of progenote.
Collapse
Affiliation(s)
- Massimo Di Giulio
- The Ionian School, Genetic Code and tRNA Origin Laboratory, Via Roma 19, 67030, Alfedena, L'Aquila, Italy; Institute of Biosciences and Bioresources, National Research Council, Via P. Castellino, 111, 80131, Naples, Italy.
| |
Collapse
|
9
|
Di Giulio M. The late appearance of DNA, the nature of the LUCA and ancestors of the domains of life. Biosystems 2020; 202:104330. [PMID: 33352234 DOI: 10.1016/j.biosystems.2020.104330] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2020] [Revised: 12/15/2020] [Accepted: 12/15/2020] [Indexed: 01/27/2023]
Abstract
It has been firmly observed that replicative DNA polymerases of bacteria, archaea and eukaryotes are not homologous proteins. This lack of homology in the replication apparatus among the domains of life is not only compatible with but would seem to imply the view that the emergence of DNA occurred in the fundamental cellular lineages. In consequence, this diversity of DNA polymerase would go back to the level of ancestors of the domains of life and to the evolutionary time in which the DNA emerged. Therefore, the presumed evolutionary stage linked to the RNA- > DNA transition would have occurred only at the level of ancestors of the main lineages of the tree of life. Thus, the high noise associated with this major evolutionary transition and the impossibility for a cellular stage to generate different fundamental genetically profound traits - such as the different replication apparatuses of bacteria, archaea and eukaryotes - would imply not only that the last universal common ancestor (LUCA) was a progenote but that the ancestors of the domains of life were also at this evolutionary stage. So, I criticize the hypotheses which want, instead, that completely different cells - such as, bacteria and archaea - could have originated from a cellular LUCA.
Collapse
Affiliation(s)
- Massimo Di Giulio
- The Ionian School, Genetic Code and tRNA Origin Laboratory, Via Roma 19, 67030, Alfedena (L'Aquila), Italy; Institute of Biosciences and Bioresources, National Research Council, Via P. Castellino, 111, 80131, Naples, Italy.
| |
Collapse
|
10
|
Di Giulio M. LUCA as well as the ancestors of archaea, bacteria and eukaryotes were progenotes: Inference from the distribution and diversity of the reading mechanism of the AUA and AUG codons in the domains of life. Biosystems 2020; 198:104239. [PMID: 32919036 DOI: 10.1016/j.biosystems.2020.104239] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2020] [Revised: 09/01/2020] [Accepted: 09/01/2020] [Indexed: 11/25/2022]
Abstract
Here I use the rationale assuming that if of a certain trait that exerts its function in some aspect of the genetic code or, more generally, in protein synthesis, it is possible to identify the evolutionary stage of its origin then it would imply that this evolutionary moment would be characterized by a high translational noise because this trait would originate for the first time during that evolutionary stage. That is to say, if this trait had a non-marginal role in the realization of the genetic code, or in protein synthesis, then the origin of this trait would imply that, more generally, it was the genetic code itself that was still originating. But if the genetic code were still originating - at that precise evolutionary stage - then this would imply that there was a high translational noise which in turn would imply that it was in the presence of a protocell, i.e. a progenote that was by definition characterized by high translational noise. I apply this rationale to the mechanism of modification of the base 34 of the anticodon of an isoleucine tRNA that leads to the reading of AUA and AUG codons in archaea, bacteria and eukaryotes. The phylogenetic distribution of this mechanism in these phyletic lineages indicates that this mechanism originated only after the evolutionary stage of the last universal common ancestor (LUCA), namely, during the formation of cellular domains, i.e., at the stage of ancestors of these main phyletic lineages. Furthermore, given that this mechanism of modification of the base 34 of the anticodon of the isoleucine tRNA would result to emerge at a stage of the origin of the genetic code - despite in its terminal phases - then all this would imply that the ancestors of bacteria, archaea and eukaryotes were progenotes. If so, all the more so, the LUCA would also be a progenote since it preceded these ancestors temporally. A consequence of all this reasoning might be that since these three ancestors were of the progenotes that were different from each other, if at least one of them had evolved into at least two real and different cells - basically different from each other - then the number of cellular domains would not be three but it would be greater than three.
Collapse
Affiliation(s)
- Massimo Di Giulio
- The Ionian School, Genetic Code and tRNA Origin Laboratory, Via Roma 19, 67030, Alfedena (L'Aquila), Italy; Institute of Biosciences and Bioresources, National Research Council, Via P. Castellino, 111, 80131, Naples, Italy.
| |
Collapse
|
11
|
Dar KB, Bhat AH, Amin S, Anjum S, Reshi BA, Zargar MA, Masood A, Ganie SA. Exploring Proteomic Drug Targets, Therapeutic Strategies and Protein - Protein Interactions in Cancer: Mechanistic View. Curr Cancer Drug Targets 2020; 19:430-448. [PMID: 30073927 DOI: 10.2174/1568009618666180803104631] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2018] [Revised: 07/17/2018] [Accepted: 07/19/2018] [Indexed: 12/31/2022]
Abstract
Protein-Protein Interactions (PPIs) drive major signalling cascades and play critical role in cell proliferation, apoptosis, angiogenesis and trafficking. Deregulated PPIs are implicated in multiple malignancies and represent the critical targets for treating cancer. Herein, we discuss the key protein-protein interacting domains implicated in cancer notably PDZ, SH2, SH3, LIM, PTB, SAM and PH. These domains are present in numerous enzymes/kinases, growth factors, transcription factors, adaptor proteins, receptors and scaffolding proteins and thus represent essential sites for targeting cancer. This review explores the candidature of various proteins involved in cellular trafficking (small GTPases, molecular motors, matrix-degrading enzymes, integrin), transcription (p53, cMyc), signalling (membrane receptor proteins), angiogenesis (VEGFs) and apoptosis (BCL-2family), which could possibly serve as targets for developing effective anti-cancer regimen. Interactions between Ras/Raf; X-linked inhibitor of apoptosis protein (XIAP)/second mitochondria-derived activator of caspases (Smac/DIABLO); Frizzled (FRZ)/Dishevelled (DVL) protein; beta-catenin/T Cell Factor (TCF) have also been studied as prospective anticancer targets. Efficacy of diverse molecules/ drugs targeting such PPIs although evaluated in various animal models/cell lines, there is an essential need for human-based clinical trials. Therapeutic strategies like the use of biologicals, high throughput screening (HTS) and fragment-based technology could play an imperative role in designing cancer therapeutics. Moreover, bioinformatic/computational strategies based on genome sequence, protein sequence/structure and domain data could serve as competent tools for predicting PPIs. Exploring hot spots in proteomic networks represents another approach for developing targetspecific therapeutics. Overall, this review lays emphasis on a productive amalgamation of proteomics, genomics, biochemistry, and molecular dynamics for successful treatment of cancer.
Collapse
Affiliation(s)
- Khalid Bashir Dar
- Department of Clinical Biochemistry, School of Biological Sciences, University of Kashmir, Srinagar, India.,Department of Biochemistry, School of Biological Sciences, University of Kashmir, Srinagar, India
| | - Aashiq Hussain Bhat
- Department of Clinical Biochemistry, School of Biological Sciences, University of Kashmir, Srinagar, India.,Department of Biochemistry, School of Biological Sciences, University of Kashmir, Srinagar, India
| | - Shajrul Amin
- Department of Biochemistry, School of Biological Sciences, University of Kashmir, Srinagar, India
| | - Syed Anjum
- Amity Institute of Biotechnology, Amity University, Rajasthan, India
| | - Bilal Ahmad Reshi
- Department of Biotechnology, School of Biological Sciences, University of Kashmir, Srinagar, India
| | - Mohammad Afzal Zargar
- Department of Clinical Biochemistry, School of Biological Sciences, University of Kashmir, Srinagar, India
| | - Akbar Masood
- Department of Clinical Biochemistry, School of Biological Sciences, University of Kashmir, Srinagar, India
| | - Showkat Ahmad Ganie
- Department of Clinical Biochemistry, School of Biological Sciences, University of Kashmir, Srinagar, India
| |
Collapse
|
12
|
The phylogenetic distribution of the glutaminyl-tRNA synthetase and Glu-tRNA Gln amidotransferase in the fundamental lineages would imply that the ancestor of archaea, that of eukaryotes and LUCA were progenotes. Biosystems 2020; 196:104174. [PMID: 32535177 DOI: 10.1016/j.biosystems.2020.104174] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2020] [Revised: 05/25/2020] [Accepted: 05/25/2020] [Indexed: 12/21/2022]
Abstract
The function of the glutaminyl-tRNA synthetase and Glu-tRNAGln amidotransferase might be related to the origin of the genetic code because, for example, glutaminyl-tRNA synthetase catalyses the fundamental reaction that makes the genetic code. If the evolutionary stage of the origin of these two enzymes could be unambiguously identified, then the genetic code should still have been originating at that particular evolutionary stage because the fundamental reaction that makes the code itself was still evidently evolving. This would result in that particular evolutionary moment being attributed to the evolutionary stage of the progenote because it would have a relationship between the genotype and the phenotype not yet fully realized because the genetic code was precisely still originating. I then analyzed the distribution of the glutaminyl-tRNA synthetase and Glu-tRNAGln aminodotrasferase in the main phyletic lineages. Since in some cases the origin of these two enzymes can be related to the evolutionary stages of ancestors of archaea and eukaryotes, this would indicate these ancestors as progenotes because at that evolutionary moment the genetic code was evidently still evolving, thus realizing the definition of progenote. The conclusion that the ancestor of archaea and that of eukaryotes were progenotes would imply that even the last universal common ancestor (LUCA) was a progenote because it appeared, on the tree of life, temporally before these ancestors.
Collapse
|
13
|
Tamana S, Promponas VJ. An updated view of the oligosaccharyltransferase complex in Plasmodium. Glycobiology 2019; 29:385-396. [PMID: 30835280 DOI: 10.1093/glycob/cwz011] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2018] [Revised: 01/27/2019] [Accepted: 03/04/2019] [Indexed: 12/18/2022] Open
Abstract
Despite the controversy regarding the importance of protein N-linked glycosylation in species of the genus Plasmodium, genes potentially encoding core subunits of the oligosaccharyltransferase (OST) complex have already been characterized in completely sequenced genomes of malaria parasites. Nevertheless, the currently established notion is that only four out of eight subunits of the OST complex-which is considered conserved across eukaryotes-are present in Plasmodium species. In this study, we carefully conduct computational analysis to provide unequivocal evidence that all components of the OST complex, with the exception of Swp1/Ribophorin II, can be reliably identified within completely sequenced plasmodial genomes. In fact, most of the subunits currently considered as absent from Plasmodium refer to uncharacterized protein sequences already existing in sequence databases. Interestingly, the main reason why the unusually short Ost4 subunit (36 residues long in yeast) has not been identified so far in plasmodia (and possibly other species) is the failure of gene-prediction pipelines to detect such a short coding sequence. We further identify elusive OST subunits in select protist species with completely sequenced genomes. Thus, our work highlights the necessity of a systematic approach towards the characterization of OST subunits across eukaryotes. This is necessary both for obtaining a concrete picture of the evolution of the OST complex but also for elucidating its possible role in eukaryotic pathogens.
Collapse
Affiliation(s)
- Stella Tamana
- Bioinformatics Research Laboratory, Department of Biological Sciences, University of Cyprus, CY, Nicosia, Cyprus
| | - Vasilis J Promponas
- Bioinformatics Research Laboratory, Department of Biological Sciences, University of Cyprus, CY, Nicosia, Cyprus
| |
Collapse
|
14
|
Danchin A. [Revisiting the origins of life: from atoms to molecules, reproduction, then replication]. Med Sci (Paris) 2018; 34:857-864. [PMID: 30451680 DOI: 10.1051/medsci/2018212] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Regarder plus de trois milliards d’années en arrière est difficile et la reconstruction d’arbres de l’évolution à partir de l’ADN actuel repose sur des hypothèses cachées qui ne permettent pas de retrouver ses vraies racines. Cherchant à s’affranchir de notre anthropocentrisme, le scénario proposé dans les deux textes qui seront successivement publiés écarte pour commencer l’idée d’une origine unique pour le remplacer par un scénario d’évolution qui ferait apparaître un processus réplicatif – formation d’une copie exacte – au sein d’un système chimique qui ne fait que se reproduire, formant des copies voisines de ce qu’il est. Les premières cellules formeraient une population de prédateurs assimilant peu à peu divers compartiments où se déroule la suite des étapes ancestrales. Échappant aux cellules prédatrices, deux types nouveaux, peu compartimentés, bactéries et archées seraient alors apparus pour envahir la Terre, former des organites au sein des prédateurs ancestraux en donnant la vie telle qu’on la connaît aujourd’hui.
Collapse
Affiliation(s)
- Antoine Danchin
- Institut de Cardiométabolisme et Nutrition, Hôpital de la Pitié-Salpêtrière, 47, boulevard de l'Hôpital 75013 Paris, France
| |
Collapse
|
15
|
Roachford OSE, Nelson KE, Mohapatra BR. Comparative genomics of four Mycoplasma species of the human urogenital tract: Analysis of their core genomes and virulence genes. Int J Med Microbiol 2017; 307:508-520. [PMID: 28927691 DOI: 10.1016/j.ijmm.2017.09.006] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2017] [Revised: 08/29/2017] [Accepted: 09/04/2017] [Indexed: 12/23/2022] Open
Abstract
The variation in Mycoplasma lipoproteins attributed to genome rearrangements and genetic insertions leads to phenotypic plasticity that allows for the evasion of the host's defence system and pathogenesis. This paper compared for the first time the genomes of four human urogenital Mycoplasma species (M. penetrans HF-2, M. fermentans JER, M. genitalium G37 and M. hominis PG21) to categorise the metabolic functions of the core genes and to assess the effects of tandem repeats, phage-like genetic elements and prophages on the virulence genes. The results of this comparative in silico genomic analysis revealed that the genes constituting their core genomes can be separated into three distinct categories: nuclear metabolism, protein metabolism and energy generation each making up 52%, 31% and 23%, respectively. The genomes have repeat sequences ranging from 3.7% in M. hominis PG21 to 9.5% in M. fermentans JER. Tandem repeats (mostly minisatellites) and phage-like proteins (including DNA gyrases/topoisomerases) were randomly distributed in the Mycoplasma genomes. Here, we identified a coiled-coil structure containing protein in M. penetrans HF-2 which is significantly similar to the Mem protein of M. fermentans ɸMFV1. Therefore, a Mycoplasma prophage seems to be embedded within M. penetrans HF-2 unannotated genome. To the best of our knowledge, no Mycoplasma phages or prophages have been detected in M. penetrans. This study is important not only in understanding the complex genetic factors involved in phenotypic plasticity and virulence in the relatively understudied Mycoplasma species but also in elucidating the effective arrangement of their redundant minimal genomes.
Collapse
Affiliation(s)
- Orville St E Roachford
- Department of Biological and Chemical Sciences, The University of the West Indies, Cave Hill Campus, Bridgetown BB 11000, Barbados.
| | - Karen E Nelson
- J. Craig Venter Institute, 9714 Medical Center Drive, Rockville, MD 20850, USA
| | - Bidyut R Mohapatra
- Department of Biological and Chemical Sciences, The University of the West Indies, Cave Hill Campus, Bridgetown BB 11000, Barbados
| |
Collapse
|
16
|
|
17
|
Booth A, Mariscal C, Doolittle WF. The Modern Synthesis in the Light of Microbial Genomics. Annu Rev Microbiol 2016; 70:279-97. [DOI: 10.1146/annurev-micro-102215-095456] [Citation(s) in RCA: 53] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Austin Booth
- Department of Philosophy, Dalhousie University, Halifax B3H 4R2, Nova Scotia, Canada
- Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax B3H 4R2, Nova Scotia, Canada;
| | - Carlos Mariscal
- Department of Philosophy, Dalhousie University, Halifax B3H 4R2, Nova Scotia, Canada
- Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax B3H 4R2, Nova Scotia, Canada;
- Department of Philosophy, University of Nevada, Reno, Nevada 89557
| | - W. Ford Doolittle
- Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax B3H 4R2, Nova Scotia, Canada;
| |
Collapse
|
18
|
Ślesak I, Ślesak H, Zimak-Piekarczyk P, Rozpądek P. Enzymatic Antioxidant Systems in Early Anaerobes: Theoretical Considerations. ASTROBIOLOGY 2016; 16:348-58. [PMID: 27176812 PMCID: PMC4876498 DOI: 10.1089/ast.2015.1328] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/27/2015] [Accepted: 12/01/2015] [Indexed: 05/14/2023]
Abstract
UNLABELLED It is widely accepted that cyanobacteria-dependent oxygen that was released into Earth's atmosphere ca. 2.5 billion years ago sparked the evolution of the aerobic metabolism and the antioxidant system. In modern aerobes, enzymes such as superoxide dismutases (SODs), peroxiredoxins (PXs), and catalases (CATs) constitute the core of the enzymatic antioxidant system (EAS) directed against reactive oxygen species (ROS). In many anaerobic prokaryotes, the superoxide reductases (SORs) have been identified as the main force in counteracting ROS toxicity. We found that 93% of the analyzed strict anaerobes possess at least one antioxidant enzyme, and 50% have a functional EAS, that is, consisting of at least two antioxidant enzymes: one for superoxide anion radical detoxification and another for hydrogen peroxide decomposition. The results presented here suggest that the last universal common ancestor (LUCA) was not a strict anaerobe. O2 could have been available for the first microorganisms before oxygenic photosynthesis evolved, however, from the intrinsic activity of EAS, not solely from abiotic sources. KEY WORDS Archaea-Atmospheric gases-Evolution-H2O2 resistance-Oxygenic photosynthesis. Astrobiology 16, 348-358.
Collapse
Affiliation(s)
- Ireneusz Ślesak
- The Franciszek Górski Institute of Plant Physiology, Polish Academy of Sciences, Kraków, Poland
| | - Halina Ślesak
- Institute of Botany, Jagiellonian University, Kraków, Poland
| | | | - Piotr Rozpądek
- The Franciszek Górski Institute of Plant Physiology, Polish Academy of Sciences, Kraków, Poland
- Institute of Environmental Sciences, Jagiellonian University, Kraków, Poland
| |
Collapse
|
19
|
Abstract
Native proteins perform an amazing variety of biochemical functions, including enzymatic catalysis, and can engage in protein-protein and protein-DNA interactions that are essential for life. A key question is how special are these functional properties of proteins. Are they extremely rare, or are they an intrinsic feature? Comparison to the properties of compact conformations of artificially generated compact protein structures selected for thermodynamic stability but not any type of function, the artificial (ART) protein library, demonstrates that a remarkable number of the properties of native-like proteins are recapitulated. These include the complete set of small molecule ligand-binding pockets and most protein-protein interfaces. ART structures are predicted to be capable of weakly binding metabolites and cover a significant fraction of metabolic pathways, with the most enriched pathways including ancient ones such as glycolysis. Native-like active sites are also found in ART proteins. A small fraction of ART proteins are predicted to have strong protein-protein and protein-DNA interactions. Overall, it appears that biochemical function is an intrinsic feature of proteins which nature has significantly optimized during evolution. These studies raise questions as to the relative roles of specificity and promiscuity in the biochemical function and control of cells that need investigation.
Collapse
Affiliation(s)
- Jeffrey Skolnick
- Center for the Study of Systems Biology, School of Biology, Georgia Institute of Technology, Atlanta, GA, USA
| | - Mu Gao
- Center for the Study of Systems Biology, School of Biology, Georgia Institute of Technology, Atlanta, GA, USA
| | - Hongyi Zhou
- Center for the Study of Systems Biology, School of Biology, Georgia Institute of Technology, Atlanta, GA, USA
| |
Collapse
|
20
|
Choi KY. Non-enzymatic PLP-dependent oxidative deamination of amino acids induces higher alcohol synthesis. BIOTECHNOL BIOPROC E 2016. [DOI: 10.1007/s12257-015-0434-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
|
21
|
Affiliation(s)
- Kristin Hagen
- EA European Academy of Technology and Innovation Assessment GmbH, Bad Neuenahr-Ahrweiler, Germany
| | - Margret Engelhard
- EA European Academy of Technology and Innovation Assessment GmbH, Bad Neuenahr-Ahrweiler, Germany
| | - Georg Toepfer
- Center for Literary and Cultural Research Berlin, Berlin, Germany
| |
Collapse
|
22
|
Di Giulio M. The Non-Biological Meaning of the Term “Prokaryote” and Its Implications. J Mol Evol 2014; 80:98-101. [DOI: 10.1007/s00239-014-9662-8] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2014] [Accepted: 12/01/2014] [Indexed: 12/01/2022]
|
23
|
Coelho ED, Arrais JP, Matos S, Pereira C, Rosa N, Correia MJ, Barros M, Oliveira JL. Computational prediction of the human-microbial oral interactome. BMC SYSTEMS BIOLOGY 2014; 8:24. [PMID: 24576332 PMCID: PMC3975954 DOI: 10.1186/1752-0509-8-24] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/27/2013] [Accepted: 02/17/2014] [Indexed: 11/12/2022]
Abstract
BACKGROUND The oral cavity is a complex ecosystem where human chemical compounds coexist with a particular microbiota. However, shifts in the normal composition of this microbiota may result in the onset of oral ailments, such as periodontitis and dental caries. In addition, it is known that the microbial colonization of the oral cavity is mediated by protein-protein interactions (PPIs) between the host and microorganisms. Nevertheless, this kind of PPIs is still largely undisclosed. To elucidate these interactions, we have created a computational prediction method that allows us to obtain a first model of the Human-Microbial oral interactome. RESULTS We collected high-quality experimental PPIs from five major human databases. The obtained PPIs were used to create our positive dataset and, indirectly, our negative dataset. The positive and negative datasets were merged and used for training and validation of a naïve Bayes classifier. For the final prediction model, we used an ensemble methodology combining five distinct PPI prediction techniques, namely: literature mining, primary protein sequences, orthologous profiles, biological process similarity, and domain interactions. Performance evaluation of our method revealed an area under the ROC-curve (AUC) value greater than 0.926, supporting our primary hypothesis, as no single set of features reached an AUC greater than 0.877. After subjecting our dataset to the prediction model, the classified result was filtered for very high confidence PPIs (probability ≥ 1-10-7), leading to a set of 46,579 PPIs to be further explored. CONCLUSIONS We believe this dataset holds not only important pathways involved in the onset of infectious oral diseases, but also potential drug-targets and biomarkers. The dataset used for training and validation, the predictions obtained and the network final network are available at http://bioinformatics.ua.pt/software/oralint.
Collapse
Affiliation(s)
- Edgar D Coelho
- Department of Electronics, Telecommunications and Informatics (DETI), Institute of Electronics and Telematics Engineering of Aveiro (IEETA), University of Aveiro, Aveiro, Portugal
| | - Joel P Arrais
- Department of Informatics Engineering (DEI), University of Coimbra, Coimbra, Portugal
- Centre for Informatics and Systems of the University at Coimbra (CISUC), University of Coimbra, Coimbra, Portugal
| | - Sérgio Matos
- Department of Electronics, Telecommunications and Informatics (DETI), Institute of Electronics and Telematics Engineering of Aveiro (IEETA), University of Aveiro, Aveiro, Portugal
| | - Carlos Pereira
- Centre for Informatics and Systems of the University at Coimbra (CISUC), University of Coimbra, Coimbra, Portugal
- Department of Informatics Engineering and Systems, Polytechnic Institute of Coimbra, Engineering Institute of Coimbra (IPC-ISEC), Coimbra, Portugal
| | - Nuno Rosa
- Department of Health Sciences, Institute of Health Sciences, The Catholic University of Portugal, Viseu, Portugal
| | - Maria José Correia
- Department of Health Sciences, Institute of Health Sciences, The Catholic University of Portugal, Viseu, Portugal
| | - Marlene Barros
- Department of Health Sciences, Institute of Health Sciences, The Catholic University of Portugal, Viseu, Portugal
- Centre for Neurosciences and Cell Biology, University of Coimbra, Coimbra, Portugal
| | - José Luís Oliveira
- Department of Electronics, Telecommunications and Informatics (DETI), Institute of Electronics and Telematics Engineering of Aveiro (IEETA), University of Aveiro, Aveiro, Portugal
| |
Collapse
|
24
|
Hernández-Alcántara G, Torres-Larios A, Enríquez-Flores S, García-Torres I, Castillo-Villanueva A, Méndez ST, de la Mora-de la Mora I, Gómez-Manzo S, Torres-Arroyo A, López-Velázquez G, Reyes-Vivas H, Oria-Hernández J. Structural and functional perturbation of Giardia lamblia triosephosphate isomerase by modification of a non-catalytic, non-conserved region. PLoS One 2013; 8:e69031. [PMID: 23894402 PMCID: PMC3718800 DOI: 10.1371/journal.pone.0069031] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2013] [Accepted: 06/04/2013] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND We have previously proposed triosephosphate isomerase of Giardia lamblia (GlTIM) as a target for rational drug design against giardiasis, one of the most common parasitic infections in humans. Since the enzyme exists in the parasite and the host, selective inhibition is a major challenge because essential regions that could be considered molecular targets are highly conserved. Previous biochemical evidence showed that chemical modification of the non-conserved non-catalytic cysteine 222 (C222) inactivates specifically GlTIM. The inactivation correlates with the physicochemical properties of the modifying agent: addition of a non-polar, small chemical group at C222 reduces the enzyme activity by one half, whereas negatively charged, large chemical groups cause full inactivation. RESULTS In this work we used mutagenesis to extend our understanding of the functional and structural effects triggered by modification of C222. To this end, six GlTIM C222 mutants with side chains having diverse physicochemical characteristics were characterized. We found that the polarity, charge and volume of the side chain in the mutant amino acid differentially alter the activity, the affinity, the stability and the structure of the enzyme. The data show that mutagenesis of C222 mimics the effects of chemical modification. The crystallographic structure of C222D GlTIM shows the disruptive effects of introducing a negative charge at position 222: the mutation perturbs loop 7, a region of the enzyme whose interactions with the catalytic loop 6 are essential for TIM stability, ligand binding and catalysis. The amino acid sequence of TIM in phylogenetic diverse groups indicates that C222 and its surrounding residues are poorly conserved, supporting the proposal that this region is a good target for specific drug design. CONCLUSIONS The results demonstrate that it is possible to inhibit species-specifically a ubiquitous, structurally highly conserved enzyme by modification of a non-conserved, non-catalytic residue through long-range perturbation of essential regions.
Collapse
Affiliation(s)
- Gloria Hernández-Alcántara
- Laboratorio de Bioquímica-Genética, Instituto Nacional de Pediatría, Secretaría de Salud, Mexico City, Mexico
| | - Alfredo Torres-Larios
- Instituto de Fisiología Celular, Universidad Nacional Autónoma de México, Mexico City, Mexico
| | - Sergio Enríquez-Flores
- Laboratorio de Bioquímica-Genética, Instituto Nacional de Pediatría, Secretaría de Salud, Mexico City, Mexico
| | - Itzhel García-Torres
- Laboratorio de Bioquímica-Genética, Instituto Nacional de Pediatría, Secretaría de Salud, Mexico City, Mexico
| | - Adriana Castillo-Villanueva
- Laboratorio de Bioquímica-Genética, Instituto Nacional de Pediatría, Secretaría de Salud, Mexico City, Mexico
| | - Sara T. Méndez
- Laboratorio de Bioquímica-Genética, Instituto Nacional de Pediatría, Secretaría de Salud, Mexico City, Mexico
| | | | - Saúl Gómez-Manzo
- Laboratorio de Bioquímica-Genética, Instituto Nacional de Pediatría, Secretaría de Salud, Mexico City, Mexico
| | - Angélica Torres-Arroyo
- Laboratorio de Bioquímica-Genética, Instituto Nacional de Pediatría, Secretaría de Salud, Mexico City, Mexico
| | - Gabriel López-Velázquez
- Laboratorio de Bioquímica-Genética, Instituto Nacional de Pediatría, Secretaría de Salud, Mexico City, Mexico
| | - Horacio Reyes-Vivas
- Laboratorio de Bioquímica-Genética, Instituto Nacional de Pediatría, Secretaría de Salud, Mexico City, Mexico
- * E-mail: (JOH); (HRV)
| | - Jesús Oria-Hernández
- Laboratorio de Bioquímica-Genética, Instituto Nacional de Pediatría, Secretaría de Salud, Mexico City, Mexico
- * E-mail: (JOH); (HRV)
| |
Collapse
|
25
|
Psomopoulos FE, Mitkas PA, Ouzounis CA. Detection of genomic idiosyncrasies using fuzzy phylogenetic profiles. PLoS One 2013; 8:e52854. [PMID: 23341912 PMCID: PMC3544837 DOI: 10.1371/journal.pone.0052854] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2012] [Accepted: 11/22/2012] [Indexed: 11/18/2022] Open
Abstract
Phylogenetic profiles express the presence or absence of genes and their homologs across a number of reference genomes. They have emerged as an elegant representation framework for comparative genomics and have been used for the genome-wide inference and discovery of functionally linked genes or metabolic pathways. As the number of reference genomes grows, there is an acute need for faster and more accurate methods for phylogenetic profile analysis with increased performance in speed and quality. We propose a novel, efficient method for the detection of genomic idiosyncrasies, i.e. sets of genes found in a specific genome with peculiar phylogenetic properties, such as intra-genome correlations or inter-genome relationships. Our algorithm is a four-step process where genome profiles are first defined as fuzzy vectors, then discretized to binary vectors, followed by a de-noising step, and finally a comparison step to generate intra- and inter-genome distances for each gene profile. The method is validated with a carefully selected benchmark set of five reference genomes, using a range of approaches regarding similarity metrics and pre-processing stages for noise reduction. We demonstrate that the fuzzy profile method consistently identifies the actual phylogenetic relationship and origin of the genes under consideration for the majority of the cases, while the detected outliers are found to be particular genes with peculiar phylogenetic patterns. The proposed method provides a time-efficient and highly scalable approach for phylogenetic stratification, with the detected groups of genes being either similar to their own genome profile or different from it, thus revealing atypical evolutionary histories.
Collapse
Affiliation(s)
- Fotis E. Psomopoulos
- Department of Electrical and Computer Engineering, Aristotle University of Thessaloniki, Thessaloniki, Greece
| | - Pericles A. Mitkas
- Department of Electrical and Computer Engineering, Aristotle University of Thessaloniki, Thessaloniki, Greece
| | - Christos A. Ouzounis
- Centre for Bioinformatics, Department of Informatics, School of Natural and Mathematical Sciences, King’s College London, Strand, London, United Kingdom
- * E-mail:
| |
Collapse
|
26
|
Promponas VJ, Ouzounis CA, Iliopoulos I. Experimental evidence validating the computational inference of functional associations from gene fusion events: a critical survey. Brief Bioinform 2012; 15:443-54. [PMID: 23220349 PMCID: PMC4017328 DOI: 10.1093/bib/bbs072] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
More than a decade ago, a number of methods were proposed for the inference of protein interactions, using whole-genome information from gene clusters, gene fusions and phylogenetic profiles. This structural and evolutionary view of entire genomes has provided a valuable approach for the functional characterization of proteins, especially those without sequence similarity to proteins of known function. Furthermore, this view has raised the real possibility to detect functional associations of genes and their corresponding proteins for any entire genome sequence. Yet, despite these exciting developments, there have been relatively few cases of real use of these methods outside the computational biology field, as reflected from citation analysis. These methods have the potential to be used in high-throughput experimental settings in functional genomics and proteomics to validate results with very high accuracy and good coverage. In this critical survey, we provide a comprehensive overview of 30 most prominent examples of single pairwise protein interaction cases in small-scale studies, where protein interactions have either been detected by gene fusion or yielded additional, corroborating evidence from biochemical observations. Our conclusion is that with the derivation of a validated gold-standard corpus and better data integration with big experiments, gene fusion detection can truly become a valuable tool for large-scale experimental biology.
Collapse
Affiliation(s)
- Vasilis J Promponas
- Institute of Agrobiotechnology, Centre for Research & Technology Hellas (CERTH), 57001 Thessaloniki, Greece.
| | | | | |
Collapse
|
27
|
Liu W, Fang L, Li M, Li S, Guo S, Luo R, Feng Z, Li B, Zhou Z, Shao G, Chen H, Xiao S. Comparative genomics of Mycoplasma: analysis of conserved essential genes and diversity of the pan-genome. PLoS One 2012; 7:e35698. [PMID: 22536428 PMCID: PMC3335003 DOI: 10.1371/journal.pone.0035698] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2011] [Accepted: 03/20/2012] [Indexed: 12/04/2022] Open
Abstract
Mycoplasma, the smallest self-replicating organism with a minimal metabolism and little genomic redundancy, is expected to be a close approximation to the minimal set of genes needed to sustain bacterial life. This study employs comparative evolutionary analysis of twenty Mycoplasma genomes to gain an improved understanding of essential genes. By analyzing the core genome of mycoplasmas, we finally revealed the conserved essential genes set for mycoplasma survival. Further analysis showed that the core genome set has many characteristics in common with experimentally identified essential genes. Several key genes, which are related to DNA replication and repair and can be disrupted in transposon mutagenesis studies, may be critical for bacteria survival especially over long period natural selection. Phylogenomic reconstructions based on 3,355 homologous groups allowed robust estimation of phylogenetic relatedness among mycoplasma strains. To obtain deeper insight into the relative roles of molecular evolution in pathogen adaptation to their hosts, we also analyzed the positive selection pressures on particular sites and lineages. There appears to be an approximate correlation between the divergence of species and the level of positive selection detected in corresponding lineages.
Collapse
Affiliation(s)
- Wei Liu
- Division of Animal Infectious Diseases, State Key Laboratory of Agricultural Microbiology, College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, People's Republic of China
| | - Liurong Fang
- Division of Animal Infectious Diseases, State Key Laboratory of Agricultural Microbiology, College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, People's Republic of China
| | - Mao Li
- Division of Animal Infectious Diseases, State Key Laboratory of Agricultural Microbiology, College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, People's Republic of China
| | - Sha Li
- Division of Animal Infectious Diseases, State Key Laboratory of Agricultural Microbiology, College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, People's Republic of China
| | - Shaohua Guo
- Division of Animal Infectious Diseases, State Key Laboratory of Agricultural Microbiology, College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, People's Republic of China
| | - Rui Luo
- Division of Animal Infectious Diseases, State Key Laboratory of Agricultural Microbiology, College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, People's Republic of China
| | - Zhixin Feng
- Institute of Veterinary Medicine, Jiangsu Academy of Agricultural Sciences, Nanjing, People's Republic of China
| | - Bin Li
- Institute of Veterinary Medicine, Jiangsu Academy of Agricultural Sciences, Nanjing, People's Republic of China
| | - Zhemin Zhou
- Environmental Research Institute, University College Cork, Cork, Ireland
| | - Guoqing Shao
- Institute of Veterinary Medicine, Jiangsu Academy of Agricultural Sciences, Nanjing, People's Republic of China
| | - Huanchun Chen
- Division of Animal Infectious Diseases, State Key Laboratory of Agricultural Microbiology, College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, People's Republic of China
| | - Shaobo Xiao
- Division of Animal Infectious Diseases, State Key Laboratory of Agricultural Microbiology, College of Veterinary Medicine, Huazhong Agricultural University, Wuhan, People's Republic of China
- * E-mail:
| |
Collapse
|
28
|
The Last Universal Common Ancestor (LUCA) and the Ancestors of Archaea and Bacteria were Progenotes. J Mol Evol 2010; 72:119-26. [DOI: 10.1007/s00239-010-9407-2] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2010] [Accepted: 10/27/2010] [Indexed: 10/18/2022]
|
29
|
Freilich S, Goldovsky L, Gottlieb A, Blanc E, Tsoka S, Ouzounis CA. Stratification of co-evolving genomic groups using ranked phylogenetic profiles. BMC Bioinformatics 2009; 10:355. [PMID: 19860884 PMCID: PMC2775751 DOI: 10.1186/1471-2105-10-355] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2009] [Accepted: 10/27/2009] [Indexed: 01/12/2023] Open
Abstract
BACKGROUND Previous methods of detecting the taxonomic origins of arbitrary sequence collections, with a significant impact to genome analysis and in particular metagenomics, have primarily focused on compositional features of genomes. The evolutionary patterns of phylogenetic distribution of genes or proteins, represented by phylogenetic profiles, provide an alternative approach for the detection of taxonomic origins, but typically suffer from low accuracy. Herein, we present rank-BLAST, a novel approach for the assignment of protein sequences into genomic groups of the same taxonomic origin, based on the ranking order of phylogenetic profiles of target genes or proteins across the reference database. RESULTS The rank-BLAST approach is validated by computing the phylogenetic profiles of all sequences for five distinct microbial species of varying degrees of phylogenetic proximity, against a reference database of 243 fully sequenced genomes. The approach - a combination of sequence searches, statistical estimation and clustering - analyses the degree of sequence divergence between sets of protein sequences and allows the classification of protein sequences according to the species of origin with high accuracy, allowing taxonomic classification of 64% of the proteins studied. In most cases, a main cluster is detected, representing the corresponding species. Secondary, functionally distinct and species-specific clusters exhibit different patterns of phylogenetic distribution, thus flagging gene groups of interest. Detailed analyses of such cases are provided as examples. CONCLUSION Our results indicate that the rank-BLAST approach can capture the taxonomic origins of sequence collections in an accurate and efficient manner. The approach can be useful both for the analysis of genome evolution and the detection of species groups in metagenomics samples.
Collapse
Affiliation(s)
- Shiri Freilich
- The Blavatnik School of Computer Sciences and School of Medicine, Tel-Aviv University, Tel-Aviv 69978, Israel.
| | | | | | | | | | | |
Collapse
|
30
|
Karimpour-Fard A, Leach SM, Hunter LE, Gill RT. The topology of the bacterial co-conserved protein network and its implications for predicting protein function. BMC Genomics 2008; 9:313. [PMID: 18590549 PMCID: PMC2488357 DOI: 10.1186/1471-2164-9-313] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2008] [Accepted: 06/30/2008] [Indexed: 11/12/2022] Open
Abstract
Background Protein-protein interactions networks are most often generated from physical protein-protein interaction data. Co-conservation, also known as phylogenetic profiles, is an alternative source of information for generating protein interaction networks. Co-conservation methods generate interaction networks among proteins that are gained or lost together through evolution. Co-conservation is a particularly useful technique in the compact bacteria genomes. Prior studies in yeast suggest that the topology of protein-protein interaction networks generated from physical interaction assays can offer important insight into protein function. Here, we hypothesize that in bacteria, the topology of protein interaction networks derived via co-conservation information could similarly improve methods for predicting protein function. Since the topology of bacteria co-conservation protein-protein interaction networks has not previously been studied in depth, we first perform such an analysis for co-conservation networks in E. coli K12. Next, we demonstrate one way in which network connectivity measures and global and local function distribution can be exploited to predict protein function for previously uncharacterized proteins. Results Our results showed, like most biological networks, our bacteria co-conserved protein-protein interaction networks had scale-free topologies. Our results indicated that some properties of the physical yeast interaction network hold in our bacteria co-conservation networks, such as high connectivity for essential proteins. However, the high connectivity among protein complexes in the yeast physical network was not seen in the co-conservation network which uses all bacteria as the reference set. We found that the distribution of node connectivity varied by functional category and could be informative for function prediction. By integrating of functional information from different annotation sources and using the network topology, we were able to infer function for uncharacterized proteins. Conclusion Interactions networks based on co-conservation can contain information distinct from networks based on physical or other interaction types. Our study has shown co-conservation based networks to exhibit a scale free topology, as expected for biological networks. We also revealed ways that connectivity in our networks can be informative for the functional characterization of proteins.
Collapse
Affiliation(s)
- Anis Karimpour-Fard
- Center for Computational Pharmacology, University of Colorado School of Medicine, Aurora, Colorado 80045, USA.
| | | | | | | |
Collapse
|
31
|
Computational prediction of protein-protein interactions. Mol Biotechnol 2007; 38:1-17. [PMID: 18095187 DOI: 10.1007/s12033-007-0069-2] [Citation(s) in RCA: 126] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2007] [Accepted: 07/16/2007] [Indexed: 01/19/2023]
Abstract
Recently a number of computational approaches have been developed for the prediction of protein-protein interactions. Complete genome sequencing projects have provided the vast amount of information needed for these analyses. These methods utilize the structural, genomic, and biological context of proteins and genes in complete genomes to predict protein interaction networks and functional linkages between proteins. Given that experimental techniques remain expensive, time-consuming, and labor-intensive, these methods represent an important advance in proteomics. Some of these approaches utilize sequence data alone to predict interactions, while others combine multiple computational and experimental datasets to accurately build protein interaction maps for complete genomes. These methods represent a complementary approach to current high-throughput projects whose aim is to delineate protein interaction maps in complete genomes. We will describe a number of computational protocols for protein interaction prediction based on the structural, genomic, and biological context of proteins in complete genomes, and detail methods for protein interaction network visualization and analysis.
Collapse
|
32
|
|
33
|
Di Giulio M. The non-monophyletic origin of the tRNA molecule and the origin of genes only after the evolutionary stage of the last universal common ancestor (LUCA). J Theor Biol 2006; 240:343-52. [PMID: 16289209 DOI: 10.1016/j.jtbi.2005.09.023] [Citation(s) in RCA: 74] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2005] [Revised: 09/15/2005] [Accepted: 09/23/2005] [Indexed: 11/17/2022]
Abstract
A model has been proposed suggesting that the tRNA molecule must have originated by direct duplication of an RNA hairpin structure [Di Giulio, M., 1992. On the origin of the transfer RNA molecule. J. Theor. Biol. 159, 199-214]. A non-monophyletic origin of this molecule has also been theorized [Di Giulio, M., 1999. The non-monophyletic origin of tRNA molecule. J. Theor. Biol. 197, 403-414]. In other words, the tRNA genes evolved only after the evolutionary stage of the last universal common ancestor (LUCA) through the assembly of two minigenes codifying for different RNA hairpin structures, which is what the exon theory of genes suggests when it is applied to the model of tRNA origin. Recent observations strongly corroborate this theorization because it has been found that some tRNA genes are completely separate in two minigenes codifying for the 5' and 3' halves of this molecule [Randau, L., et al., 2005a. Nanoarchaeum equitans creates functional tRNAs from separate genes for their 5'- and 3'-halves. Nature 433, 537-541]. In this paper it is shown that these tRNA genes codifying for the 5' and 3' halves of this molecule are the ancestral form from which the tRNA genes continuously codifying for the complete tRNA molecule are thought to have evolved. This, together with the very existence of completely separate tRNA genes codifying for their 5' and 3' halves, proves a non-monophyletic origin for tRNA genes, as a monophyletic origin would exclude the existence of these genes which have, on the contrary, been observed. Here the polyphyletic origin of genes codifying for proteins is also suggested and discussed. Moreover, a hypothesis is advanced to suggest that the LUCA might have had a fragmented genome made up of RNA and the possibility that 'Paleokaryotes' may exist is outlined. Finally, the characteristic of the indivisibility of homology that these polyphyletic origins seem to remove at the sequence level is discussed.
Collapse
Affiliation(s)
- Massimo Di Giulio
- Institute of Genetics and Biophysics Adriano Buzzati Traverso, CNR, Via P. Castellino 111, 80131 Naples, Napoli, Italy.
| |
Collapse
|
34
|
Ouzounis CA, Kunin V, Darzentas N, Goldovsky L. A minimal estimate for the gene content of the last universal common ancestor--exobiology from a terrestrial perspective. Res Microbiol 2005; 157:57-68. [PMID: 16431085 DOI: 10.1016/j.resmic.2005.06.015] [Citation(s) in RCA: 84] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2005] [Revised: 06/15/2005] [Accepted: 06/30/2005] [Indexed: 10/25/2022]
Abstract
Using an algorithm for ancestral state inference of gene content, given a large number of extant genome sequences and a phylogenetic tree, we aim to reconstruct the gene content of the last universal common ancestor (LUCA), a hypothetical life form that presumably was the progenitor of the three domains of life. The method allows for gene loss, previously found to be a major factor in shaping gene content, and thus the estimate of LUCA's gene content appears to be substantially higher than that proposed previously, with a typical number of over 1000 gene families, of which more than 90% are also functionally characterized. More precisely, when only prokaryotes are considered, the number varies between 1006 and 1189 gene families while when eukaryotes are also included, this number increases to between 1344 and 1529 families depending on the underlying phylogenetic tree. Therefore, the common belief that the hypothetical genome of LUCA should resemble those of the smallest extant genomes of obligate parasites is not supported by recent advances in computational genomics. Instead, a fairly complex genome similar to those of free-living prokaryotes, with a variety of functional capabilities including metabolic transformation, information processing, membrane/transport proteins and complex regulation, shared between the three domains of life, emerges as the most likely progenitor of life on Earth, with profound repercussions for planetary exploration and exobiology.
Collapse
Affiliation(s)
- Christos A Ouzounis
- Computational Genomics Group, The European Bioinformatics Institute, EMBL Cambridge Outstation, Cambridge CB10 1SD, UK.
| | | | | | | |
Collapse
|
35
|
Audit B, Ouzounis CA. From genes to genomes: universal scale-invariant properties of microbial chromosome organisation. J Mol Biol 2003; 332:617-33. [PMID: 12963371 DOI: 10.1016/s0022-2836(03)00811-8] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
The availability of complete genome sequences for a large variety of organisms is a major advance in understanding genome structure and function. One attribute of genome structure is chromosome organisation in terms of gene localisation and orientation. For example, bacterial operons, i.e. clusters of co-oriented genes that form transcription units, enable functionally related genes to be expressed simultaneously. The description of genome organisation was pioneered with the study of the distribution of genes of the Escherichia coli partial genetic map before the full genome sequence was known. Deploying powerful techniques from circular statistics and signal processing, we revisit the issue of gene localisation and orientation using 89 complete microbial chromosomes from the eubacterial and archaeal domains. We demonstrate that there is no characteristic size pertinent to the description of chromosome structure, e.g. there does not exist any single length appropriate to describe gene clustering. Our results show that, for all 89 chromosomes, gene positions and gene orientations share a common form of scale-invariant correlations known as "long-range correlations" that we can reveal for distances from the gene length, up to the chromosome size. This observation indicates that genes tend to assemble and to co-orient over any scale of observation greater than a few kilobases. This unexpected property of chromosome structure can be portrayed as an operon-like organisation at all scales and implies that a complete scale range extending over more than three orders of magnitudes of chromosome segment lengths is necessary to properly describe prokaryotic genome organisation. We propose that this pattern results from the effects of the superhelical context on gene expression coupled with the structure and dynamics of the nucleoid, possibly accommodating the diverse gene expression profiles needed during the different stages of cellular life.
Collapse
Affiliation(s)
- Benjamin Audit
- Wellcome Trust Genome Campus, Computational Genomics Group, The European Bioinformatics Institute, EMBL Cambridge Outstation, Cambridge, CB10 1SD, UK
| | | |
Collapse
|
36
|
Kunin V, Ouzounis CA. The balance of driving forces during genome evolution in prokaryotes. Genome Res 2003; 13:1589-94. [PMID: 12840037 PMCID: PMC403731 DOI: 10.1101/gr.1092603] [Citation(s) in RCA: 153] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2003] [Accepted: 04/22/2003] [Indexed: 11/24/2022]
Abstract
Genomes are shaped by evolutionary processes such as gene genesis, horizontal gene transfer (HGT), and gene loss. To quantify the relative contributions of these processes, we analyze the distribution of 12,762 protein families on a phylogenetic tree, derived from entire genomes of 41 Bacteria and 10 Archaea. We show that gene loss is the most important factor in shaping genome content, being up to three times more frequent than HGT, followed by gene genesis, which may contribute up to twice as many genes as HGT. We suggest that gene gain and gene loss in prokaryotes are balanced; thus, on average, prokaryotic genome size is kept constant. Despite the importance of HGT, our results indicate that the majority of protein families have only been transmitted by vertical inheritance. To test our method, we present a study of strain-specific genes of Helicobacter pylori, and demonstrate correct predictions of gene loss and HGT for at least 81% of validated cases. This approach indicates that it is possible to trace genome content history and quantify the factors that shape contemporary prokaryotic genomes.
Collapse
Affiliation(s)
- Victor Kunin
- Computational Genomics Group, The European Bioinformatics Institute, EMBL Cambridge Outstation, Cambridge CB10 1SD, UK
| | | |
Collapse
|
37
|
Abstract
Molecular analysis of conserved sequences in the ribosomal RNAs of modern organisms reveals a three-domain phylogeny that converges in a universal ancestor for all life. We used the Clusters of Orthologous Groups database and information from published genomes to search for other universally conserved genes that have the same phylogenetic pattern as ribosomal RNA, and therefore constitute the ancestral genetic core of cells. Our analyses identified a small set of genes that can be traced back to the universal ancestor and have coevolved since that time. As indicated by earlier studies, almost all of these genes are involved with the transfer of genetic information, and most of them directly interact with the ribosome. Other universal genes have either undergone lateral transfer in the past, or have diverged so much in sequence that their distant past could not be resolved. The nature of the conserved genes suggests innovations that may have been essential to the divergence of the three domains of life. The analysis also identified several genes of unknown function with phylogenies that track with the ribosomal RNA genes. The products of these genes are likely to play fundamental roles in cellular processes.
Collapse
Affiliation(s)
- J Kirk Harris
- Department of Molecular, Cellular and Developmental Biology, University of Colorado, Boulder, Colorado 80309-0347, USA
| | | | | | | |
Collapse
|
38
|
Deppenmeier U. The unique biochemistry of methanogenesis. PROGRESS IN NUCLEIC ACID RESEARCH AND MOLECULAR BIOLOGY 2003; 71:223-83. [PMID: 12102556 DOI: 10.1016/s0079-6603(02)71045-3] [Citation(s) in RCA: 181] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
Methanogenic archaea have an unusual type of metabolism because they use H2 + CO2, formate, methylated C1 compounds, or acetate as energy and carbon sources for growth. The methanogens produce methane as the major end product of their metabolism in a unique energy-generating process. The organisms received much attention because they catalyze the terminal step in the anaerobic breakdown of organic matter under sulfate-limiting conditions and are essential for both the recycling of carbon compounds and the maintenance of the global carbon flux on Earth. Furthermore, methane is an important greenhouse gas that directly contributes to climate changes and global warming. Hence, the understanding of the biochemical processes leading to methane formation are of major interest. This review focuses on the metabolic pathways of methanogenesis that are rather unique and involve a number of unusual enzymes and coenzymes. It will be shown how the previously mentioned substrates are converted to CH4 via the CO2-reducing, methylotrophic, or aceticlastic pathway. All catabolic processes finally lead to the formation of a mixed disulfide from coenzyme M and coenzyme B that functions as an electron acceptor of certain anaerobic respiratory chains. Molecular hydrogen, reduced coenzyme F420, or reduced ferredoxin are used as electron donors. The redox reactions as catalyzed by the membrane-bound electron transport chains are coupled to proton translocation across the cytoplasmic membrane. The resulting electrochemical proton gradient is the driving force for ATP synthesis as catalyzed by an A1A0-type ATP synthase. Other energy-transducing enzymes involved in methanogenesis are the membrane-integral methyltransferase and the formylmethanofuran dehydrogenase complex. The former enzyme is a unique, reversible sodium ion pump that couples methyl-group transfer with the transport of Na+ across the membrane. The formylmethanofuran dehydrogenase is a reversible ion pump that catalyzes formylation and deformylation of methanofuran. Furthermore, the review addresses questions related to the biochemical and genetic characteristics of the energy-transducing enzymes and to the mechanisms of ion translocation.
Collapse
Affiliation(s)
- Uwe Deppenmeier
- Department of Microbiology and Genetics, Universität Göttingen, Germany
| |
Collapse
|
39
|
Ganoza MC, Kiel MC, Aoki H. Evolutionary conservation of reactions in translation. Microbiol Mol Biol Rev 2002; 66:460-85, table of contents. [PMID: 12209000 PMCID: PMC120792 DOI: 10.1128/mmbr.66.3.460-485.2002] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Current X-ray diffraction and cryoelectron microscopic data of ribosomes of eubacteria have shed considerable light on the molecular mechanisms of translation. Structural studies of the protein factors that activate ribosomes also point to many common features in the primary sequence and tertiary structure of these proteins. The reconstitution of the complex apparatus of translation has also revealed new information important to the mechanisms. Surprisingly, the latter approach has uncovered a number of proteins whose sequence and/or structure and function are conserved in all cells, indicating that the mechanisms are indeed conserved. The possible mechanisms of a new initiation factor and two elongation factors are discussed in this context.
Collapse
Affiliation(s)
- M Clelia Ganoza
- C. H. Best Institute, Banting and Best Department of Medical Research, University of Toronto, Toronto, Ontario, Canada M5G 1L6.
| | | | | |
Collapse
|
40
|
Das R, Junker J, Greenbaum D, Gerstein MB. Global perspectives on proteins: comparing genomes in terms of folds, pathways and beyond. THE PHARMACOGENOMICS JOURNAL 2002; 1:115-25. [PMID: 11911438 DOI: 10.1038/sj.tpj.6500021] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The sequencing of complete genomes provides us with a global view of all the proteins in an organism. Proteomic analysis can be done on a purely sequence-based level, with a focus on finding homologues and grouping them into families and clusters of orthologs. However, incorporating protein structure into this analysis provides valuable simplification; it allows one to collect together very distantly related sequences, thus condensing the proteome into a minimal number of 'parts.' We describe issues related to surveying proteomes in terms of structural parts, including methods for fold assignment and formats for comparisons (eg top-10 lists and whole-genome trees), and show how biases in the databases and in sampling can affect these surveys. We illustrate our main points through a case study on the unique protein properties evident in many thermophile genomes (eg more salt bridges). Finally, we discuss metabolic pathways as an even greater simplification of genomes. In comparison to folds these allow the organization of many more genes into coherent systems, yet can nevertheless be understood in many of the same terms.
Collapse
Affiliation(s)
- R Das
- Department of Molecular Biophysics & Biochemistry, Yale University, New Haven, CT 06520, USA
| | | | | | | |
Collapse
|
41
|
Christen P, Mehta PK. From cofactor to enzymes. The molecular evolution of pyridoxal-5'-phosphate-dependent enzymes. CHEM REC 2002; 1:436-47. [PMID: 11933250 DOI: 10.1002/tcr.10005] [Citation(s) in RCA: 144] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
The pyridoxal-5'-phosphate (vitamin B(6))-dependent enzymes that act on amino acid substrates have multiple evolutionary origins. Thus, the common mechanistic features of B(6) enzymes are not accidental historical traits but reflect evolutionary or chemical necessities. The B(6) enzymes belong to four independent evolutionary lineages of paralogous proteins, of which the alpha family (with aspartate aminotransferase as the prototype enzyme) is by far the largest and most diverse. The considerably smaller beta family (tryptophan synthase beta as the prototype enzyme) is structurally and functionally more homogenous. Both the D-alanine aminotransferase family and the alanine racemase family consist of only a few enzymes. The primordial pyridoxal-5'-phosphate-dependent protein catalysts apparently first diverged into reaction-specific protoenzymes, which then diverged further by specializing for substrate specificity. Aminotransferases as well as amino acid decarboxylases are found in two different evolutionary lineages, providing examples of convergent enzyme evolution. The functional specialization of most B(6) enzymes seems to have already occurred in the universal ancestor cell before the divergence of eukaryotes, archebacteria, and eubacteria 1500 million years ago. Pyridoxal-5'-phosphate must have emerged very early in biological evolution; conceivably, metal ions and organic cofactors were the first biological catalysts. To simulate particular steps of molecular evolution, both the substrate and reaction specificity of existent B(6) enzymes were changed by substitution of active-site residues, and monoclonal pyridoxal-5'-phosphate-dependent catalytic antibodies were produced with selection criteria that might have been operative in the evolution of protein-assisted pyridoxal catalysis.
Collapse
Affiliation(s)
- P Christen
- Biochemisches Institut, Universität Zürich, Winterthurerstrasse 190, CH-8057 Zürich, Switzerland.
| | | |
Collapse
|
42
|
Enright AJ, Van Dongen S, Ouzounis CA. An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res 2002; 30:1575-84. [PMID: 11917018 PMCID: PMC101833 DOI: 10.1093/nar/30.7.1575] [Citation(s) in RCA: 2339] [Impact Index Per Article: 106.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open
Abstract
Detection of protein families in large databases is one of the principal research objectives in structural and functional genomics. Protein family classification can significantly contribute to the delineation of functional diversity of homologous proteins, the prediction of function based on domain architecture or the presence of sequence motifs as well as comparative genomics, providing valuable evolutionary insights. We present a novel approach called TRIBE-MCL for rapid and accurate clustering of protein sequences into families. The method relies on the Markov cluster (MCL) algorithm for the assignment of proteins into families based on precomputed sequence similarity information. This novel approach does not suffer from the problems that normally hinder other protein sequence clustering algorithms, such as the presence of multi-domain proteins, promiscuous domains and fragmented proteins. The method has been rigorously tested and validated on a number of very large databases, including SwissProt, InterPro, SCOP and the draft human genome. Our results indicate that the method is ideally suited to the rapid and accurate detection of protein families on a large scale. The method has been used to detect and categorise protein families within the draft human genome and the resulting families have been used to annotate a large proportion of human proteins.
Collapse
Affiliation(s)
- A J Enright
- Computational Genomics Group, The European Bioinformatics Institute, EMBL Cambridge Outstation, Cambridge CB10 1SD, UK.
| | | | | |
Collapse
|
43
|
Sekowska A, Danchin A, Risler JL. Phylogeny of related functions: the case of polyamine biosynthetic enzymes. MICROBIOLOGY (READING, ENGLAND) 2000; 146 ( Pt 8):1815-1828. [PMID: 10931887 DOI: 10.1099/00221287-146-8-1815] [Citation(s) in RCA: 55] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Genome annotation requires explicit identification of gene function. This task frequently uses protein sequence alignments with examples having a known function. Genetic drift, co-evolution of subunits in protein complexes and a variety of other constraints interfere with the relevance of alignments. Using a specific class of proteins, it is shown that a simple data analysis approach can help solve some of the problems posed. The origin of ureohydrolases has been explored by comparing sequence similarity trees, maximizing amino acid alignment conservation. The trees separate agmatinases from arginases but suggest the presence of unknown biases responsible for unexpected positions of some enzymes. Using factorial correspondence analysis, a distance tree between sequences was established, comparing regions with gaps in the alignments. The gap tree gives a consistent picture of functional kinship, perhaps reflecting some aspects of phylogeny, with a clear domain of enzymes encoding two types of ureohydrolases (agmatinases and arginases) and activities related to, but different from ureohydrolases. Several annotated genes appeared to correspond to a wrong assignment if the trees were significant. They were cloned and their products expressed and identified biochemically. This substantiated the validity of the gap tree. Its organization suggests a very ancient origin of ureohydrolases. Some enzymes of eukaryotic origin are spread throughout the arginase part of the trees: they might have been derived from the genes found in the early symbiotic bacteria that became the organelles. They were transferred to the nucleus when symbiotic genes had to escape Muller's ratchet. This work also shows that arginases and agmatinases share the same two manganese-ion-binding sites and exhibit only subtle differences that can be accounted for knowing the three-dimensional structure of arginases. In the absence of explicit biochemical data, extreme caution is needed when annotating genes having similarities to ureohydrolases.
Collapse
Affiliation(s)
- Agnieszka Sekowska
- Hong Kong University Pasteur Research Centre, Dexter HC Man Building, 8 Sassoon Road, Pokfulam, Hong Kong2
- Regulation of Gene Expression, Institut Pasteur, 28 rue du Docteur Roux, 75724 Paris Cedex 15, France1
| | - Antoine Danchin
- Hong Kong University Pasteur Research Centre, Dexter HC Man Building, 8 Sassoon Road, Pokfulam, Hong Kong2
- Regulation of Gene Expression, Institut Pasteur, 28 rue du Docteur Roux, 75724 Paris Cedex 15, France1
| | - Jean-Loup Risler
- Genome and Informatics, Université de Versailles-Saint-Quentin, 45 Avenue des Etats Unis, 78035 Versailles Cedex, France3
| |
Collapse
|
44
|
Mehta PK, Christen P. The molecular evolution of pyridoxal-5'-phosphate-dependent enzymes. ADVANCES IN ENZYMOLOGY AND RELATED AREAS OF MOLECULAR BIOLOGY 2000; 74:129-84. [PMID: 10800595 DOI: 10.1002/9780470123201.ch4] [Citation(s) in RCA: 69] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
The pyridoxal-5-phosphate-dependent enzymes (B6 enzymes) that act on amino acid substrates are of multiple evolutionary origin. The numerous common mechanistic features of B6 enzymes thus are not historical traits passed on from a common ancestor enzyme but rather reflect evolutionary or chemical necessities. Family profile analysis of amino acid sequences supported by comparison of the available three-dimensional (3-D) crystal structures indicates that the B6 enzymes known to date belong to four independent evolutionary lineages of homologous (or more precisely paralogous) proteins, of which the alpha family is by far the largest. The alpha family (with aspartate aminotransferase as the prototype enzyme) includes enzymes that catalyze, with several exceptions, transformations of amino acids in which the covalency changes are limited to the same carbon atom that carries the amino group forming the imine linkage with the coenzyme (i.e., Calpha in most cases). Enzymes of the beta family (tryptophan synthase beta as the prototype enzyme) mainly catalyze replacement and elimination reactions at Cbeta. The D-alanine aminotransferase family and the alanine racemase family are the two other independent lineages, both with relatively few member enzymes. The primordial pyridoxal-5-phosphate-dependent enzymes apparently were regio-specific catalysts that first diverged into reaction-specific enzymes and then specialized for substrate specificity. Aminotransferases as well as amino acid decarboxylases are found in two different evolutionary lineages. Comparison of sequences from eukaryotic, archebacterial, and eubacterial species indicates that the functional specialization of most B6 enzymes has occurred already in the universal ancestor cell. The cofactor pyridoxal-5-phosphate must have emerged very early in biological evolution; conceivably, organic cofactors and metal ions were the first biological catalysts. In attempts to stimulate particular steps of molecular evolution, oligonucleotide-directed mutagenesis of active-site residues and directed molecular evolution have been applied to change both the substrate and reaction specificity of existent B6 enzymes. Pyridoxal-5-phosphate-dependent catalytic antibodies were elicited with a screening protocol that applied functional selection criteria as they might have been operative in the evolution of protein-assisted pyridoxal catalysis.
Collapse
Affiliation(s)
- P K Mehta
- Biochemisches Institut, Universität Zürich, Switzerland
| | | |
Collapse
|
45
|
Abstract
Faced with the avalanche of genomic sequences and data on messenger RNA expression, biological scientists are confronting a frightening prospect: piles of information but only flakes of knowledge. How can the thousands of sequences being determined and deposited, and the thousands of expression profiles being generated by the new array methods, be synthesized into useful knowledge? What form will this knowledge take? These are questions being addressed by scientists in the field known as 'functional genomics'.
Collapse
Affiliation(s)
- D Eisenberg
- Molecular Biology Institute and UCLA-DOE Laboratory of Structural Biology and Molecular Medicine, University of California at Los Angeles, 90095-1570, USA.
| | | | | | | |
Collapse
|
46
|
Abstract
The past year has seen several attempts to reconstruct the proteome of the universal ancestor of all life on the basis of comparisons of contempory genomes. However, increasing evidence for lateral gene transfer could mean that such attempts are based on an incorrect understanding of evolution.
Collapse
Affiliation(s)
- W F Doolittle
- Department of Biochemistry and Molecular Biology, Canadian Institute for Advanced Research, Dalhousie University, Halifax, B3H 4H7, Canada.
| |
Collapse
|
47
|
Rigoutsos I, Floratos A, Ouzounis C, Gao Y, Parida L. Dictionary building via unsupervised hierarchical motif discovery in the sequence space of natural proteins. Proteins 1999; 37:264-77. [PMID: 10584071 DOI: 10.1002/(sici)1097-0134(19991101)37:2<264::aid-prot11>3.0.co;2-c] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Using Teiresias, a pattern discovery method that identifies all motifs present in any given set of protein sequences without requiring alignment or explicit enumeration of the solution space, we have explored the GenPept sequence database and built a dictionary of all sequence patterns with two or more instances. The entries of this dictionary, henceforth named seqlets, cover 98.12% of all amino acid positions in the input database and in essence provide a comprehensive finite set of descriptors for protein sequence space. As such, seqlets can be effectively used to describe almost every naturally occurring protein. In fact, seqlets can be thought of as building blocks of protein molecules that are a necessary (but not sufficient) condition for function or family equivalence memberships. Thus, seqlets can either define conserved family signatures or cut across molecular families and previously undetected sequence signals deriving from functional convergence. Moreover, we show that seqlets also can capture structurally conserved motifs. The availability of a dictionary of seqlets that has been derived in such an unsupervised, hierarchical manner is generating new opportunities for addressing problems that range from reliable classification and the correlation of sequence fragments with functional categories to faster and sensitive engines for homology searches, evolutionary studies, and protein structure prediction.
Collapse
Affiliation(s)
- I Rigoutsos
- Computational Biology Center, Thomas J. Watson Research Center, Yorktown Heights, New York 10598, USA.
| | | | | | | | | |
Collapse
|
48
|
Abstract
Using the sequences of all the known transcription-associated proteins from Bacteria and Eucarya (a total of 4,147), we have identified their homologous counterparts in the four complete archaeal genomes. Through extensive sequence comparisons, we establish the presence of 280 predicted transcription factors or transcription-associated proteins in the four archaeal genomes, of which 168 have homologs only in Bacteria, 51 have homologs only in Eucarya, and the remaining 61 have homologs in both phylogenetic domains. Although bacterial and eukaryotic transcription have very few factors in common, each exclusively shares a significantly greater number with the Archaea, especially the Bacteria. This last fact contrasts with the obvious close relationship between the archaeal and eukaryotic transcription mechanisms per se, and in particular, basic transcription initiation. We interpret these results to mean that the archaeal transcription system has retained more ancestral characteristics than have the transcription mechanisms in either of the other two domains.
Collapse
Affiliation(s)
- N C Kyrpides
- Department of Microbiology, University of Illinois at Urbana-Champaign, B103 Chemistry and Life Sciences, MC 110, 407 South Goodwin Avenue, Urbana, IL 61801, USA
| | | |
Collapse
|
49
|
Jarrell KF, Bayley DP, Correia JD, Thomas NA. Recent Excitement about the Archaea. Bioscience 1999. [DOI: 10.2307/1313474] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
|
50
|
Abstract
Eight microbial genomes are compared in terms of protein structure. Specifically, yeast, H. influenzae, M. genitalium, M. jannaschii, Synechocystis, M. pneumoniae, H. pylori, and E. coli are compared in terms of patterns of fold usage-whether a given fold occurs in a particular organism. Of the approximately 340 soluble protein folds currently in the structure databank (PDB), 240 occur in at least one of the eight genomes, and 30 are shared amongst all eight. The shared folds are depleted in allhelical structure and enriched in mixed helix-sheet structure compared to the folds in the PDB. The top-10 most common of the shared 30 are enriched in superfolds, uniting many non-homologous sequence families, and are especially similar in overall architecture-eight having helices packed onto a central sheet. They are also very different from the common folds in the PBD, highlighting databank biases. Folds can be ranked in terms of expression as well as genome duplication. In yeast the top-10 most highly expressed folds are considerably different from the most highly duplicated folds. A tree can be constructed grouping genomes in terms of their shared folds. This has a remarkably similar topology to more conventional classifications, based on very different measures of relatedness. Finally, folds of membrane proteins can be analyzed through transmembrane-helix (TM) prediction. All the genomes appear to have similar usage patterns for these folds, with the occurrence of a particular fold falling off rapidly with increasing numbers of TM-elements, according to a "Zipf-like" law. This implies there are no marked preferences for proteins with particular numbers of TM-helices (e.g. 7-TM) in microbial genomes.
Collapse
Affiliation(s)
- M Gerstein
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520, USA.
| |
Collapse
|