1
|
Petersen SD, Zhang J, Lee JS, Jakociunas T, Grav LM, Kildegaard HF, Keasling JD, Jensen MK. Modular 5'-UTR hexamers for context-independent tuning of protein expression in eukaryotes. Nucleic Acids Res 2019; 46:e127. [PMID: 30124898 PMCID: PMC6265478 DOI: 10.1093/nar/gky734] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2018] [Accepted: 08/01/2018] [Indexed: 11/25/2022] Open
Abstract
Functional characterization of regulatory DNA elements in broad genetic contexts is a prerequisite for forward engineering of biological systems. Translation initiation site (TIS) sequences are attractive to use for regulating gene activity and metabolic pathway fluxes because the genetic changes are minimal. However, limited knowledge is available on tuning gene outputs by varying TISs in different genetic and environmental contexts. Here, we created TIS hexamer libraries in baker’s yeast Saccharomyces cerevisiae directly 5′ end of a reporter gene in various promoter contexts and measured gene activity distributions for each library. Next, selected TIS sequences, resulted in almost 10-fold changes in reporter outputs, were experimentally characterized in various environmental and genetic contexts in both yeast and mammalian cells. From our analyses, we observed strong linear correlations (R2 = 0.75–0.98) between all pairwise combinations of TIS order and gene activity. Finally, our analysis enabled the identification of a TIS with almost 50% stronger output than a commonly used TIS for protein expression in mammalian cells, and selected TISs were also used to tune gene activities in yeast at a metabolic branch point in order to prototype fitness and carotenoid production landscapes. Taken together, the characterized TISs support reliable context-independent forward engineering of translation initiation in eukaryotes.
Collapse
Affiliation(s)
- Søren D Petersen
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, 2800 Kgs. Lyngby, Denmark
| | - Jie Zhang
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, 2800 Kgs. Lyngby, Denmark
| | - Jae S Lee
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, 2800 Kgs. Lyngby, Denmark
| | - Tadas Jakociunas
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, 2800 Kgs. Lyngby, Denmark
| | - Lise M Grav
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, 2800 Kgs. Lyngby, Denmark
| | - Helene F Kildegaard
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, 2800 Kgs. Lyngby, Denmark
| | - Jay D Keasling
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, 2800 Kgs. Lyngby, Denmark.,Joint BioEnergy Institute, Emeryville, CA 94608, USA.,Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA.,Department of Chemical and Biomolecular Engineering, University of California, Berkeley, CA 94720, USA.,Department of Bioengineering, University of California, Berkeley, CA 94720, USA.,Center for Synthetic Biochemistry, Institute for Synthetic Biology, Shenzhen Institutes of Advanced Technologies, Shenzhen 518055, China
| | - Michael K Jensen
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, 2800 Kgs. Lyngby, Denmark
| |
Collapse
|
2
|
Gsponer J, Babu M. Cellular strategies for regulating functional and nonfunctional protein aggregation. Cell Rep 2012; 2:1425-37. [PMID: 23168257 PMCID: PMC3607227 DOI: 10.1016/j.celrep.2012.09.036] [Citation(s) in RCA: 78] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2012] [Revised: 07/23/2012] [Accepted: 09/27/2012] [Indexed: 12/20/2022] Open
Abstract
Growing evidence suggests that aggregation-prone proteins are both harmful and functional for a cell. How do cellular systems balance the detrimental and beneficial effect of protein aggregation? We reveal that aggregation-prone proteins are subject to differential transcriptional, translational, and degradation control compared to nonaggregation-prone proteins, which leads to their decreased synthesis, low abundance, and high turnover. Genetic modulators that enhance the aggregation phenotype are enriched in genes that influence expression homeostasis. Moreover, genes encoding aggregation-prone proteins are more likely to be harmful when overexpressed. The trends are evolutionarily conserved and suggest a strategy whereby cellular mechanisms specifically modulate the availability of aggregation-prone proteins to (1) keep concentrations below the critical ones required for aggregation and (2) shift the equilibrium between the monomeric and oligomeric/aggregate form, as explained by Le Chatelier’s principle. This strategy may prevent formation of undesirable aggregates and keep functional assemblies/aggregates under control.
Collapse
Affiliation(s)
- Jörg Gsponer
- Centre for High-Throughput Biology, Department of Biochemistry and Molecular Biology, University of British Columbia, East Mall, Vancouver V6T 1Z4, Canada
- Corresponding author
| | - M. Madan Babu
- MRC Laboratory of Molecular Biology, Hills Road, Cambridge CB2 0QH, UK
- Corresponding author
| |
Collapse
|
3
|
Tiwary BK. The coordinated expression, interaction and evolution of the neuroendocrine genes. Integr Biol (Camb) 2012; 4:1377-85. [PMID: 22990097 DOI: 10.1039/c2ib20081c] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
The neuroendocrine system is a complex biological system controlled by various neuropeptides and hormones. The evolution and network properties of neuroendocrine genes are analyzed along with their expression profiles. The neuroendocrine genes show very similar expression profiles and local network properties across a wide range of tissues consistent with the physiological roles of their proteins. Moreover, the coordinated evolution of 10 neuroendocrine genes involved in mammalian reproduction and homeostasis is demonstrated using several methods, such as correlated evolution, relative-rate test, relative-ratio test and codon usage bias. The neuroendocrine genes seem to evolve predominantly under similar selective strengths and regimes of purifying selection, which is well reflected in their evolutionary fingerprints. This result demonstrates for the first time a key role of natural selection in creating and maintaining a well-designed neuroendocrine system at the genomic level. It also indicates that component properties of a complex system at a higher physiological scale may determine component properties at a lower genomic scale and/or vice versa.
Collapse
Affiliation(s)
- Basant K Tiwary
- Centre for Bioinformatics, School of Life Sciences, Pondicherry University, Pondicherry-605 014, India.
| |
Collapse
|
4
|
Tuller T, Mossel E. Co-evolution is incompatible with the Markov assumption in phylogenetics. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011; 8:1667-1670. [PMID: 21116038 DOI: 10.1109/tcbb.2010.124] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Markov models are extensively used in the analysis of molecular evolution. A recent line of research suggests that pairs of proteins with functional and physical interactions co-evolve with each other. Here, by analyzing hundreds of orthologous sets of three fungi and their co-evolutionary relations, we demonstrate that co-evolutionary assumption may violate the Markov assumption. Our results encourage developing alternative probabilistic models for the cases of extreme co-evolution.
Collapse
Affiliation(s)
- Tamir Tuller
- Faculty of Mathematics and Computer Science, Weizmann Institute of Science, PO Box 26, Rehovot 76100, Israel.
| | | |
Collapse
|
5
|
Abstract
BACKGROUND In a previous study we demonstrated that co-evolutionary information can be utilized for improving the accuracy of ancestral gene content reconstruction. To this end, we defined a new computational problem, the Ancestral Co-Evolutionary (ACE) problem, and developed algorithms for solving it. RESULTS In the current paper we generalize our previous study in various ways. First, we describe new efficient computational approaches for solving the ACE problem. The new approaches are based on reductions to classical methods such as linear programming relaxation, quadratic programming, and min-cut. Second, we report new computational hardness results related to the ACE, including practical cases where it can be solved in polynomial time.Third, we generalize the ACE problem and demonstrate how our approach can be used for inferring parts of the genomes of non-ancestral organisms. To this end, we describe a heuristic for finding the portion of the genome ('dominant set') that can be used to reconstruct the rest of the genome with the lowest error rate. This heuristic utilizes both evolutionary information and co-evolutionary information.We implemented these algorithms on a large input of the ACE problem (95 unicellular organisms, 4,873 protein families, and 10, 576 of co-evolutionary relations), demonstrating that some of these algorithms can outperform the algorithm used in our previous study. In addition, we show that based on our approach a 'dominant set' cab be used reconstruct a major fraction of a genome (up to 79%) with relatively low error-rate (e.g. 0.11). We find that the 'dominant set' tends to include metabolic and regulatory genes, with high evolutionary rate, and low protein abundance and number of protein-protein interactions. CONCLUSIONS The ACE problem can be efficiently extended for inferring the genomes of organisms that exist today. In addition, it may be solved in polynomial time in many practical cases. Metabolic and regulatory genes were found to be the most important groups of genes necessary for reconstructing gene content of an organism based on other related genomes.
Collapse
Affiliation(s)
- Hadas Birin
- School of Computer Science, Tel Aviv University, Israel
| | - Tamir Tuller
- Department of Biomedical Engineering, Faculty of Engineering, Tel Aviv University, Tel Aviv, Israel
| |
Collapse
|
6
|
Zhang X, Kupiec M, Gophna U, Tuller T. Analysis of coevolving gene families using mutually exclusive orthologous modules. Genome Biol Evol 2011; 3:413-23. [PMID: 21498882 PMCID: PMC5654409 DOI: 10.1093/gbe/evr030] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Coevolutionary networks can encapsulate information about the dynamics of presence and absence of gene families in organisms. Analysis of such networks should reveal fundamental principles underlying the evolution of cellular systems and the functionality of sets of genes. In this study, we describe a new approach for analyzing coevolutionary networks. Our method detects Mutually Exclusive Orthologous Modules (MEOMs). A MEOM is composed of two sets of gene families, each including gene families that tend to appear in the same organisms, such that the two sets tend to mutually exclude each other (if one set appears in a certain organism the second set does not). Thus, a MEOM reflects the evolutionary replacement of one set of genes by another due to reasons such as lineage/environmental specificity, incompatibility, or functional redundancy. We use our method to analyze a coevolutionary network that is based on 383 microorganisms from the three domains of life. As we demonstrate, our method is useful for detecting meaningful evolutionary clades of organisms as well as sets of proteins that interact with each other. Among our results, we report that: 1) MEOMs tend to include gene families whose cellular functions involve transport, energy production, metabolism, and translation, suggesting that changes in the metabolic environments that require adaptation to new sources of energy are central triggers of complex/pathway replacement in evolution. 2) Many MEOMs are related to outer membrane proteins, such proteins are involved in interaction with the environment and could thus be replaced as a result of adaptation. 3) MEOMs tend to separate organisms with large phylogenetic distance but they also separate organisms that live in different ecological niches. 4) Strikingly, although many MEOMs can be identified, there are much fewer cases where the two cliques in the MEOM completely mutually exclude each other, demonstrating the flexibility of protein evolution. 5) CO dehydrogenase and thymidylate synthase and the glycine cleavage genes mutually exclude each other in archaea; this may represent an alternative route for generation of methyl donors for thymidine synthesis.
Collapse
Affiliation(s)
- Xiuwei Zhang
- Laboratory for Computational Biology and Bioinformatics, School of Computer and Communication Sciences, Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | | | | | | |
Collapse
|
7
|
Hudson CM, Conant GC. Expression level, cellular compartment and metabolic network position all influence the average selective constraint on mammalian enzymes. BMC Evol Biol 2011; 11:89. [PMID: 21470417 PMCID: PMC3082228 DOI: 10.1186/1471-2148-11-89] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2010] [Accepted: 04/06/2011] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND A gene's position in regulatory, protein interaction or metabolic networks can be predictive of the strength of purifying selection acting on it, but these relationships are neither universal nor invariably strong. Following work in bacteria, fungi and invertebrate animals, we explore the relationship between selective constraint and metabolic function in mammals. RESULTS We measure the association between selective constraint, estimated by the ratio of nonsynonymous (Ka) to synonymous (Ks) substitutions, and several, primarily metabolic, measures of gene function. We find significant differences between the selective constraints acting on enzyme-coding genes from different cellular compartments, with the nucleus showing higher constraint than genes from either the cytoplasm or the mitochondria. Among metabolic genes, the centrality of an enzyme in the metabolic network is significantly correlated with Ka/Ks. In contrast to yeasts, gene expression magnitude does not appear to be the primary predictor of selective constraint in these organisms. CONCLUSIONS Our results imply that the relationship between selective constraint and enzyme centrality is complex: the strength of selective constraint acting on mammalian genes is quite variable and does not appear to exclusively follow patterns seen in other organisms.
Collapse
Affiliation(s)
- Corey M Hudson
- Informatics Institute, University of Missouri, Columbia, MO, USA.
| | | |
Collapse
|
8
|
Tuller T, Girshovich Y, Sella Y, Kreimer A, Freilich S, Kupiec M, Gophna U, Ruppin E. Association between translation efficiency and horizontal gene transfer within microbial communities. Nucleic Acids Res 2011; 39:4743-55. [PMID: 21343180 PMCID: PMC3113575 DOI: 10.1093/nar/gkr054] [Citation(s) in RCA: 71] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
Horizontal gene transfer (HGT) is a major force in microbial evolution. Previous studies have suggested that a variety of factors, including restricted recombination and toxicity of foreign gene products, may act as barriers to the successful integration of horizontally transferred genes. This study identifies an additional central barrier to HGT-the lack of co-adaptation between the codon usage of the transferred gene and the tRNA pool of the recipient organism. Analyzing the genomic sequences of more than 190 microorganisms and the HGT events that have occurred between them, we show that the number of genes that were horizontally transferred between organisms is positively correlated with the similarity between their tRNA pools. Those genes that are better adapted to the tRNA pools of the target genomes tend to undergo more frequent HGT. At the community (or environment) level, organisms that share a common ecological niche tend to have similar tRNA pools. These results remain significant after controlling for diverse ecological and evolutionary parameters. Our analysis demonstrates that there are bi-directional associations between the similarity in the tRNA pools of organisms and the number of HGT events occurring between them. Similar tRNA pools between a donor and a host tend to increase the probability that a horizontally acquired gene will become fixed in its new genome. Our results also suggest that frequent HGT may be a homogenizing force that increases the similarity in the tRNA pools of organisms within the same community.
Collapse
Affiliation(s)
- Tamir Tuller
- Faculty of Mathematics and Computer Science, Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Blavatnik School of Computer Science, School of Medicine, Tel Aviv University, Ramat Aviv 69978, Israel
| | | | | | | | | | | | | | | |
Collapse
|
9
|
Tuller T, Felder Y, Kupiec M. Discovering local patterns of co-evolution: computational aspects and biological examples. BMC Bioinformatics 2010; 11:43. [PMID: 20096103 PMCID: PMC3224649 DOI: 10.1186/1471-2105-11-43] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2009] [Accepted: 01/22/2010] [Indexed: 12/02/2022] Open
Abstract
Background Co-evolution is the process in which two (or more) sets of orthologs exhibit a similar or correlative pattern of evolution. Co-evolution is a powerful way to learn about the functional interdependencies between sets of genes and cellular functions and to predict physical interactions. More generally, it can be used for answering fundamental questions about the evolution of biological systems. Orthologs that exhibit a strong signal of co-evolution in a certain part of the evolutionary tree may show a mild signal of co-evolution in other branches of the tree. The major reasons for this phenomenon are noise in the biological input, genes that gain or lose functions, and the fact that some measures of co-evolution relate to rare events such as positive selection. Previous publications in the field dealt with the problem of finding sets of genes that co-evolved along an entire underlying phylogenetic tree, without considering the fact that often co-evolution is local. Results In this work, we describe a new set of biological problems that are related to finding patterns of local co-evolution. We discuss their computational complexity and design algorithms for solving them. These algorithms outperform other bi-clustering methods as they are designed specifically for solving the set of problems mentioned above. We use our approach to trace the co-evolution of fungal, eukaryotic, and mammalian genes at high resolution across the different parts of the corresponding phylogenetic trees. Specifically, we discover regions in the fungi tree that are enriched with positive evolution. We show that metabolic genes exhibit a remarkable level of co-evolution and different patterns of co-evolution in various biological datasets. In addition, we find that protein complexes that are related to gene expression exhibit non-homogenous levels of co-evolution across different parts of the fungi evolutionary line. In the case of mammalian evolution, signaling pathways that are related to neurotransmission exhibit a relatively higher level of co-evolution along the primate subtree. Conclusions We show that finding local patterns of co-evolution is a computationally challenging task and we offer novel algorithms that allow us to solve this problem, thus opening a new approach for analyzing the evolution of biological systems.
Collapse
Affiliation(s)
- Tamir Tuller
- School of Computer Science, Tel Aviv University, Tel Aviv, Israel.
| | | | | |
Collapse
|
10
|
Mano A, Tuller T, Béjà O, Pinter RY. Comparative classification of species and the study of pathway evolution based on the alignment of metabolic pathways. BMC Bioinformatics 2010; 11 Suppl 1:S38. [PMID: 20122211 PMCID: PMC3009510 DOI: 10.1186/1471-2105-11-s1-s38] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
Abstract
BACKGROUND Pathways provide topical descriptions of cellular circuitry. Comparing analogous pathways reveals intricate insights into individual functional differences among species. While previous works in the field performed genomic comparisons and evolutionary studies that were based on specific genes or proteins, whole genomic sequence, or even single pathways, none of them described a genomic system level comparative analysis of metabolic pathways. In order to properly implement such an analysis one should overcome two specific challenges: how to combine the effect of many pathways under a unified framework and how to appropriately analyze co-evolution of pathways. Here we present a computational approach for solving these two challenges. First, we describe a comprehensive, scalable, information theory based computational pipeline that calculates pathway alignment information and then compiles it in a novel manner that allows further analysis. This approach can be used for building phylogenies and for pointing out specific differences that can then be analyzed in depth. Second, we describe a new approach for comparing the evolution of metabolic pathways. This approach can be used for detecting co-evolutionary relationships between metabolic pathways. RESULTS We demonstrate the advantages of our approach by applying our pipeline to data from the MetaCyc repository (which includes a total of 205 organisms and 660 metabolic pathways). Our analysis revealed several surprising biological observations. For example, we show that the different habitats in which Archaea organisms reside are reflected by a pathway based phylogeny. In addition, we discover two striking clusters of metabolic pathways, each cluster includes pathways that have very similar evolution. CONCLUSION We demonstrate that distance measures that are based on the topology and the content of metabolic networks are useful for studying evolution and co-evolution.
Collapse
Affiliation(s)
- Adi Mano
- Dept, of Computer Science, Technion - Israel Institute of Technology, Haifa 32000, Israel.
| | | | | | | |
Collapse
|
11
|
Tuller T, Birin H, Gophna U, Kupiec M, Ruppin E. Reconstructing ancestral gene content by coevolution. Genome Res 2009; 20:122-32. [PMID: 19948819 DOI: 10.1101/gr.096115.109] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Inferring the gene content of ancestral genomes is a fundamental challenge in molecular evolution. Due to the statistical nature of this problem, ancestral genomes inferred by the maximum likelihood (ML) or the maximum-parsimony (MP) methods are prone to considerable error rates. In general, these errors are difficult to abolish by using longer genomic sequences or by analyzing more taxa. This study describes a new approach for improving ancestral genome reconstruction, the ancestral coevolver (ACE), which utilizes coevolutionary information to improve the accuracy of such reconstructions over previous approaches. The principal idea is to reduce the potentially large solution space by choosing a single optimal (or near optimal) solution that is in accord with the coevolutionary relationships between protein families. Simulation experiments, both on artificial and real biological data, show that ACE yields a marked decrease in error rate compared with ML or MP. Applied to a large data set (95 organisms, 4873 protein families, and 10,000 coevolutionary relationships), some of the ancestral genomes reconstructed by ACE were remarkably different in their gene content from those reconstructed by ML or MP alone (more than 10% in some nodes). These reconstructions, while having almost similar likelihood/parsimony scores as those obtained with ML/MP, had markedly higher concordance with the coevolutionary information. Specifically, when ACE was implemented to improve the results of ML, it added a large number of proteins to those encoded by LUCA (last universal common ancestor), most of them ribosomal proteins and components of the F(0)F(1)-type ATP synthase/ATPases, complexes that are vital in most living organisms. Our analysis suggests that LUCA appears to have been bacterial-like and had a genome size similar to the genome sizes of many extant organisms.
Collapse
Affiliation(s)
- Tamir Tuller
- School of Computer Sciences, Tel Aviv University, Ramat Aviv, Israel.
| | | | | | | | | |
Collapse
|
12
|
Ruano-Rubio V, Poch O, Thompson JD. Comparison of eukaryotic phylogenetic profiling approaches using species tree aware methods. BMC Bioinformatics 2009; 10:383. [PMID: 19930674 PMCID: PMC2787529 DOI: 10.1186/1471-2105-10-383] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2009] [Accepted: 11/24/2009] [Indexed: 12/02/2022] Open
Abstract
Background Phylogenetic profiling encompasses an important set of methodologies for in silico high throughput inference of functional relationships between genes. The simplest profiles represent the distribution of gene presence-absence in a set of species as a sequence of 0's and 1's, and it is assumed that functionally related genes will have more similar profiles. The methodology has been successfully used in numerous studies of prokaryotic genomes, although its application in eukaryotes appears problematic, with reported low accuracy due to the complex genomic organization within this domain of life. Recently some groups have proposed an alternative approach based on the correlation of homologous gene group sizes, taking into account all potentially informative genetic events leading to a change in group size, regardless of whether they result in a de novo group gain or total gene group loss. Results We have compared the performance of classical presence-absence and group size based approaches using a large, diverse set of eukaryotic species. In contrast to most previous comparisons in Eukarya, we take into account the species phylogeny. We also compare the approaches using two different group categories, based on orthology and on domain-sharing. Our results confirm a limited overall performance of phylogenetic profiling in eukaryotes. Although group size based approaches initially showed an increase in performance for the domain-sharing based groups, this seems to be an overestimation due to a simplistic negative control dataset and the choice of null hypothesis rejection criteria. Conclusion Presence-absence profiling represents a more accurate classifier of related versus non-related profile pairs, when the profiles under consideration have enough information content. Group size based approaches provide a complementary means of detecting domain or family level co-evolution between groups that may be elusive to presence-absence profiling. Moreover positive correlation between co-evolution scores and functional links imply that these methods could be used to estimate functional distances between gene groups and to cluster them based on their functional relatedness. This study should have important implications for the future development and application of phylogenetic profiling methods, not only in eukaryotic, but also in prokaryotic datasets.
Collapse
Affiliation(s)
- Valentín Ruano-Rubio
- Laboratoire de Biologie et Génomique Intégrative, Département de Biologie et Génomique Structurales, Institut de Génétique et de Biologie Moléculaire et Cellulaire, CNRS/INSERM/UDS, Illkirch, France.
| | | | | |
Collapse
|
13
|
Ciccarelli FD, Miklós I. Co-evolutionary Models for Reconstructing Ancestral Genomic Sequences: Computational Issues and Biological Examples. COMPARATIVE GENOMICS 2009. [PMCID: PMC7120581 DOI: 10.1007/978-3-642-04744-2_14] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The inference of ancestral genomes is a fundamental problem in molecular evolution. Due to the statistical nature of this problem, the most likely or the most parsimonious ancestral genomes usually include considerable error rates. In general, these errors cannot be abolished by utilizing more exhaustive computational approaches, by using longer genomic sequences, or by analyzing more taxa. In recent studies we showed that co-evolution is an important force that can be used for significantly improving the inference of ancestral genome content. In this work we formally define a computational problem for the inference of ancestral genome content by co-evolution. We show that this problem is NP-hard and present both a Fixed Parameter Tractable (FPT) algorithm, and heuristic approximation algorithms for solving it. The running time of these algorithms on simulated inputs with hundreds of protein families and hundreds of co-evolutionary relations was fast (up to four minutes) and it achieved an approximation ratio < 1.3. We use our approach to study the ancestral genome content of the Fungi. To this end, we implement our approach on a dataset of 33,931 protein families and 20,317 co-evolutionary relations. Our algorithm added and removed hundreds of proteins from the ancestral genomes inferred by maximum likelihood (ML) or maximum parsimony (MP) while slightly affecting the likelihood/parsimony score of the results. A biological analysis revealed various pieces of evidence that support the biological plausibility of the new solutions.
Collapse
Affiliation(s)
| | - István Miklós
- Rényi Institute, Hungarian Academy of Sciences, Reáltanoda utca 13-15, 1053 Budapest, Hungary
| |
Collapse
|