1
|
Langschied F, Leisegang MS, Brandes RP, Ebersberger I. ncOrtho: efficient and reliable identification of miRNA orthologs. Nucleic Acids Res 2023; 51:e71. [PMID: 37260093 PMCID: PMC10359484 DOI: 10.1093/nar/gkad467] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Revised: 05/04/2023] [Accepted: 05/30/2023] [Indexed: 06/02/2023] Open
Abstract
MicroRNAs (miRNAs) are post-transcriptional regulators that finetune gene expression via translational repression or degradation of their target mRNAs. Despite their functional relevance, frameworks for the scalable and accurate detection of miRNA orthologs are missing. Consequently, there is still no comprehensive picture of how miRNAs and their associated regulatory networks have evolved. Here we present ncOrtho, a synteny informed pipeline for the targeted search of miRNA orthologs in unannotated genome sequences. ncOrtho matches miRNA annotations from multi-tissue transcriptomes in precision, while scaling to the analysis of hundreds of custom-selected species. The presence-absence pattern of orthologs to 266 human miRNA families across 402 vertebrate species reveals four bursts of miRNA acquisition, of which the most recent event occurred in the last common ancestor of higher primates. miRNA families are rarely modified or lost, but notable exceptions for both events exist. miRNA co-ortholog numbers faithfully indicate lineage-specific whole genome duplications, and miRNAs are powerful markers for phylogenomic analyses. Their exceptionally low genetic diversity makes them suitable to resolve clades where the phylogenetic signal is blurred by incomplete lineage sorting of ancestral alleles. In summary, ncOrtho allows to routinely consider miRNAs in evolutionary analyses that were thus far reserved to protein-coding genes.
Collapse
Affiliation(s)
- Felix Langschied
- Applied Bioinformatics Group, Institute of Cell Biology and Neuroscience, Goethe University, Frankfurt, Germany
| | - Matthias S Leisegang
- Institute for Cardiovascular Physiology, Goethe University, Frankfurt, Germany
- German Center of Cardiovascular Research (DZHK), Partner site RheinMain, Frankfurt, Germany
| | - Ralf P Brandes
- Institute for Cardiovascular Physiology, Goethe University, Frankfurt, Germany
- German Center of Cardiovascular Research (DZHK), Partner site RheinMain, Frankfurt, Germany
| | - Ingo Ebersberger
- Applied Bioinformatics Group, Institute of Cell Biology and Neuroscience, Goethe University, Frankfurt, Germany
- Senckenberg Biodiversity and Climate Research Centre (S-BIK-F), Frankfurt am Main, Germany
- LOEWE Centre for Translational Biodiversity Genomics (TBG), Frankfurt am Main, Germany
| |
Collapse
|
2
|
Elhabashy H, Merino F, Alva V, Kohlbacher O, Lupas AN. Exploring protein-protein interactions at the proteome level. Structure 2022; 30:462-475. [DOI: 10.1016/j.str.2022.02.004] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2021] [Revised: 10/26/2021] [Accepted: 02/02/2022] [Indexed: 02/08/2023]
|
3
|
Fukunaga T, Iwasaki W. Inverse Potts model improves accuracy of phylogenetic profiling. Bioinformatics 2022; 38:1794-1800. [PMID: 35060594 PMCID: PMC8963296 DOI: 10.1093/bioinformatics/btac034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2021] [Revised: 01/11/2022] [Accepted: 01/13/2022] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Phylogenetic profiling is a powerful computational method for revealing the functions of function-unknown genes. Although conventional similarity metrics in phylogenetic profiling achieved high prediction accuracy, they have two estimation biases: an evolutionary bias and a spurious correlation bias. While previous studies reduced the evolutionary bias by considering a phylogenetic tree, few studies have analyzed the spurious correlation bias. RESULTS To reduce the spurious correlation bias, we developed metrics based on the inverse Potts model (IPM) for phylogenetic profiling. We also developed a metric based on both the IPM and a phylogenetic tree. In an empirical dataset analysis, we demonstrated that these IPM-based metrics improved the prediction performance of phylogenetic profiling. In addition, we found that the integration of several metrics, including the IPM-based metrics, had superior performance to a single metric. AVAILABILITY AND IMPLEMENTATION The source code is freely available at https://github.com/fukunagatsu/Ipm. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Wataru Iwasaki
- Department of Integrated Biosciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba 2770882, Japan,Department of Biological Sciences, Graduate School of Science, The University of Tokyo, Tokyo 1130032, Japan,Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba 2770882, Japan,Atmosphere and Ocean Research Institute, The University of Tokyo, Chiba 2770882, Japan,Institute for Quantitative Biosciences, The University of Tokyo, Tokyo 1130032, Japan,Collaborative Research Institute for Innovative Microbiology, The University of Tokyo, Tokyo 1130032, Japan
| |
Collapse
|
4
|
Stupp D, Sharon E, Bloch I, Zitnik M, Zuk O, Tabach Y. Co-evolution based machine-learning for predicting functional interactions between human genes. Nat Commun 2021; 12:6454. [PMID: 34753957 PMCID: PMC8578642 DOI: 10.1038/s41467-021-26792-w] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2020] [Accepted: 10/09/2021] [Indexed: 12/20/2022] Open
Abstract
Over the next decade, more than a million eukaryotic species are expected to be fully sequenced. This has the potential to improve our understanding of genotype and phenotype crosstalk, gene function and interactions, and answer evolutionary questions. Here, we develop a machine-learning approach for utilizing phylogenetic profiles across 1154 eukaryotic species. This method integrates co-evolution across eukaryotic clades to predict functional interactions between human genes and the context for these interactions. We benchmark our approach showing a 14% performance increase (auROC) compared to previous methods. Using this approach, we predict functional annotations for less studied genes. We focus on DNA repair and verify that 9 of the top 50 predicted genes have been identified elsewhere, with others previously prioritized by high-throughput screens. Overall, our approach enables better annotation of function and functional interactions and facilitates the understanding of evolutionary processes underlying co-evolution. The manuscript is accompanied by a webserver available at: https://mlpp.cs.huji.ac.il. With the rise in number of eukaryotic species being fully sequenced, large scale phylogenetic profiling can give insights on gene function, Here, the authors describe a machine-learning approach that integrates co-evolution across eukaryotic clades to predict gene function and functional interactions among human genes.
Collapse
Affiliation(s)
- Doron Stupp
- Department of Developmental Biology and Cancer Research, The Institute for Medical Research Israel-Canada, The Hebrew University of Jerusalem, 9112001, Jerusalem, Israel
| | - Elad Sharon
- Department of Developmental Biology and Cancer Research, The Institute for Medical Research Israel-Canada, The Hebrew University of Jerusalem, 9112001, Jerusalem, Israel
| | - Idit Bloch
- Department of Developmental Biology and Cancer Research, The Institute for Medical Research Israel-Canada, The Hebrew University of Jerusalem, 9112001, Jerusalem, Israel
| | - Marinka Zitnik
- Department of Biomedical Informatics, Harvard University, Boston, MA, 02115, USA
| | - Or Zuk
- Department of Statistics and Data Science, The Hebrew University of Jerusalem, Jerusalem, 9190501, Israel.
| | - Yuval Tabach
- Department of Developmental Biology and Cancer Research, The Institute for Medical Research Israel-Canada, The Hebrew University of Jerusalem, 9112001, Jerusalem, Israel.
| |
Collapse
|
5
|
Fukunaga T, Iwasaki W. Mirage: estimation of ancestral gene-copy numbers by considering different evolutionary patterns among gene families. BIOINFORMATICS ADVANCES 2021; 1:vbab014. [PMID: 36700099 PMCID: PMC9710636 DOI: 10.1093/bioadv/vbab014] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/28/2021] [Revised: 07/22/2021] [Accepted: 07/28/2021] [Indexed: 01/28/2023]
Abstract
Motivation Reconstruction of gene copy number evolution is an essential approach for understanding how complex biological systems have been organized. Although various models have been proposed for gene copy number evolution, existing evolutionary models have not appropriately addressed the fact that different gene families can have very different gene gain/loss rates. Results In this study, we developed Mirage (MIxtuRe model for Ancestral Genome Estimation), which allows different gene families to have flexible gene gain/loss rates. Mirage can use three models for formulating heterogeneous evolution among gene families: the discretized Γ model, probability distribution-free model and pattern mixture (PM) model. Simulation analysis showed that Mirage can accurately estimate heterogeneous gene gain/loss rates and reconstruct gene-content evolutionary history. Application to empirical datasets demonstrated that the PM model fits genome data from various taxonomic groups better than the other heterogeneous models. Using Mirage, we revealed that metabolic function-related gene families displayed frequent gene gains and losses in all taxa investigated. Availability and implementation The source code of Mirage is freely available at https://github.com/fukunagatsu/Mirage. Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
- Tsukasa Fukunaga
- Waseda Institute for Advanced Study, Waseda University, Tokyo 1690051, Japan,Department of Computer Science, Graduate School of Information Science and Technology, The University of Tokyo, Tokyo 1130032, Japan,To whom correspondence should be addressed. or
| | - Wataru Iwasaki
- Department of Integrated Biosciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba 2770882, Japan,Department of Biological Sciences, Graduate School of Science, The University of Tokyo, Tokyo 1130032, Japan,Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba 2770882, Japan,Atmosphere and Ocean Research Institute, The University of Tokyo, Chiba 2770882, Japan,Institute for Quantitative Biosciences, The University of Tokyo, Tokyo 1130032, Japan,Collaborative Research Institute for Innovative Microbiology, The University of Tokyo, Tokyo 1130032, Japan,To whom correspondence should be addressed. or
| |
Collapse
|
6
|
Tsaban T, Stupp D, Sherill-Rofe D, Bloch I, Sharon E, Schueler-Furman O, Wiener R, Tabach Y. CladeOScope: functional interactions through the prism of clade-wise co-evolution. NAR Genom Bioinform 2021; 3:lqab024. [PMID: 33928243 PMCID: PMC8057497 DOI: 10.1093/nargab/lqab024] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2020] [Revised: 03/12/2021] [Accepted: 03/18/2021] [Indexed: 12/11/2022] Open
Abstract
Mapping co-evolved genes via phylogenetic profiling (PP) is a powerful approach to uncover functional interactions between genes and to associate them with pathways. Despite many successful endeavors, the understanding of co-evolutionary signals in eukaryotes remains partial. Our hypothesis is that 'Clades', branches of the tree of life (e.g. primates and mammals), encompass signals that cannot be detected by PP using all eukaryotes. As such, integrating information from different clades should reveal local co-evolution signals and improve function prediction. Accordingly, we analyzed 1028 genomes in 66 clades and demonstrated that the co-evolutionary signal was scattered across clades. We showed that functionally related genes are frequently co-evolved in only parts of the eukaryotic tree and that clades are complementary in detecting functional interactions within pathways. We examined the non-homologous end joining pathway and the UFM1 ubiquitin-like protein pathway and showed that both demonstrated distinguished co-evolution patterns in specific clades. Our research offers a different way to look at co-evolution across eukaryotes and points to the importance of modular co-evolution analysis. We developed the 'CladeOScope' PP method to integrate information from 16 clades across over 1000 eukaryotic genomes and is accessible via an easy to use web server at http://cladeoscope.cs.huji.ac.il.
Collapse
Affiliation(s)
- Tomer Tsaban
- Department of Developmental Biology and Cancer Research, Institute for Medical Research Israel-Canada and Hadassah Medical School, The Hebrew University of Jerusalem, Jerusalem 9112001, Israel
| | - Doron Stupp
- Department of Developmental Biology and Cancer Research, Institute for Medical Research Israel-Canada and Hadassah Medical School, The Hebrew University of Jerusalem, Jerusalem 9112001, Israel
| | - Dana Sherill-Rofe
- Department of Developmental Biology and Cancer Research, Institute for Medical Research Israel-Canada and Hadassah Medical School, The Hebrew University of Jerusalem, Jerusalem 9112001, Israel
| | - Idit Bloch
- Department of Developmental Biology and Cancer Research, Institute for Medical Research Israel-Canada and Hadassah Medical School, The Hebrew University of Jerusalem, Jerusalem 9112001, Israel
| | - Elad Sharon
- Department of Developmental Biology and Cancer Research, Institute for Medical Research Israel-Canada and Hadassah Medical School, The Hebrew University of Jerusalem, Jerusalem 9112001, Israel
| | - Ora Schueler-Furman
- Department of Microbiology and Molecular Genetics, Institute for Medical Research Israel-Canada and Hadassah Medical School, The Hebrew University of Jerusalem, Jerusalem 9112001, Israel
| | - Reuven Wiener
- Department of Biochemistry and Molecular Biology, Institute for Medical Research Israel-Canada and Hadassah Medical School,The Hebrew University of Jerusalem, Jerusalem 9112001, Israel
| | - Yuval Tabach
- Department of Developmental Biology and Cancer Research, Institute for Medical Research Israel-Canada and Hadassah Medical School, The Hebrew University of Jerusalem, Jerusalem 9112001, Israel
| |
Collapse
|
7
|
Matsumoto H, Mimori T, Fukunaga T. Novel metric for hyperbolic phylogenetic tree embeddings. Biol Methods Protoc 2021; 6:bpab006. [PMID: 33928190 PMCID: PMC8058397 DOI: 10.1093/biomethods/bpab006] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2020] [Revised: 03/19/2021] [Accepted: 03/23/2021] [Indexed: 01/09/2023] Open
Abstract
Advances in experimental technologies, such as DNA sequencing, have opened up new avenues for the applications of phylogenetic methods to various fields beyond their traditional application in evolutionary investigations, extending to the fields of development, differentiation, cancer genomics, and immunogenomics. Thus, the importance of phylogenetic methods is increasingly being recognized, and the development of a novel phylogenetic approach can contribute to several areas of research. Recently, the use of hyperbolic geometry has attracted attention in artificial intelligence research. Hyperbolic space can better represent a hierarchical structure compared to Euclidean space, and can therefore be useful for describing and analyzing a phylogenetic tree. In this study, we developed a novel metric that considers the characteristics of a phylogenetic tree for representation in hyperbolic space. We compared the performance of the proposed hyperbolic embeddings, general hyperbolic embeddings, and Euclidean embeddings, and confirmed that our method could be used to more precisely reconstruct evolutionary distance. We also demonstrate that our approach is useful for predicting the nearest-neighbor node in a partial phylogenetic tree with missing nodes. Furthermore, we proposed a novel approach based on our metric to integrate multiple trees for analyzing tree nodes or imputing missing distances. This study highlights the utility of adopting a geometric approach for further advancing the applications of phylogenetic methods.
Collapse
Affiliation(s)
- Hirotaka Matsumoto
- School of Information and Data Sciences, Nagasaki University, Nagasaki, Japan.,Laboratory for Bioinformatics Research, RIKEN Center for Biosystems Dynamics Research, Saitama, Japan
| | - Takahiro Mimori
- Medical Image Analysis Team, RIKEN Center for Advanced Intelligence Project, Tokyo, Japan
| | - Tsukasa Fukunaga
- Department of Computer Science, Graduate School of Information Science and Engineering, The University of Tokyo, Tokyo, Japan
| |
Collapse
|
8
|
Niu Y, Moghimyfiroozabad S, Moghimyfiroozabad A, Tierney TS, Alavian KN. The factors for the early and late development of midbrain dopaminergic neurons segregate into two distinct evolutionary clusters. BRAIN DISORDERS 2021. [DOI: 10.1016/j.dscb.2021.100002] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022] Open
|
9
|
Bloch I, Sherill-Rofe D, Stupp D, Unterman I, Beer H, Sharon E, Tabach Y. Optimization of co-evolution analysis through phylogenetic profiling reveals pathway-specific signals. Bioinformatics 2021; 36:4116-4125. [PMID: 32353123 DOI: 10.1093/bioinformatics/btaa281] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2019] [Revised: 04/17/2020] [Accepted: 04/23/2020] [Indexed: 12/11/2022] Open
Abstract
SUMMARY The exponential growth in available genomic data is expected to reach full sequencing of a million genomes in the coming decade. Improving and developing methods to analyze these genomes and to reveal their utility is of major interest in a wide variety of fields, such as comparative and functional genomics, evolution and bioinformatics. Phylogenetic profiling is an established method for predicting functional interactions between proteins based on similarities in their evolutionary patterns across species. Proteins that function together (i.e. generate complexes, interact in the same pathways or improve adaptation to environmental niches) tend to show coordinated evolution across the tree of life. The normalized phylogenetic profiling (NPP) method takes into account minute changes in proteins across species to identify protein co-evolution. Despite the success of this method, it is still not clear what set of parameters is required for optimal use of co-evolution in predicting functional interactions. Moreover, it is not clear if pathway evolution or function should direct parameter choice. Here, we create a reliable and usable NPP construction pipeline. We explore the effect of parameter selection on functional interaction prediction using NPP from 1028 genomes, both separately and in various value combinations. We identify several parameter sets that optimize performance for pathways with certain biological annotation. This work reveals the importance of choosing the right parameters for optimized function prediction based on a biological context. AVAILABILITY AND IMPLEMENTATION Source code and documentation are available on GitHub: https://github.com/iditam/CompareNPPs. CONTACT yuvaltab@ekmd.huji.ac.il. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Idit Bloch
- Department of Developmental Biology and Cancer Research, Institute for Medical Research Israel-Canada, Hebrew University of Jerusalem, Jerusalem 9112102, Israel
| | - Dana Sherill-Rofe
- Department of Developmental Biology and Cancer Research, Institute for Medical Research Israel-Canada, Hebrew University of Jerusalem, Jerusalem 9112102, Israel
| | - Doron Stupp
- Department of Developmental Biology and Cancer Research, Institute for Medical Research Israel-Canada, Hebrew University of Jerusalem, Jerusalem 9112102, Israel
| | - Irene Unterman
- Department of Developmental Biology and Cancer Research, Institute for Medical Research Israel-Canada, Hebrew University of Jerusalem, Jerusalem 9112102, Israel
| | - Hodaya Beer
- Department of Developmental Biology and Cancer Research, Institute for Medical Research Israel-Canada, Hebrew University of Jerusalem, Jerusalem 9112102, Israel
| | - Elad Sharon
- Department of Developmental Biology and Cancer Research, Institute for Medical Research Israel-Canada, Hebrew University of Jerusalem, Jerusalem 9112102, Israel
| | - Yuval Tabach
- Department of Developmental Biology and Cancer Research, Institute for Medical Research Israel-Canada, Hebrew University of Jerusalem, Jerusalem 9112102, Israel
| |
Collapse
|
10
|
Tremblay BJM, Lobb B, Doxey AC. PhyloCorrelate: inferring bacterial gene-gene functional associations through large-scale phylogenetic profiling. Bioinformatics 2021; 37:17-22. [PMID: 33416870 DOI: 10.1093/bioinformatics/btaa1105] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2020] [Revised: 12/26/2020] [Accepted: 12/29/2020] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Statistical detection of co-occurring genes across genomes, known as "phylogenetic profiling", is a powerful bioinformatic technique for inferring gene-gene functional associations. However, this can be a challenging task given the size and complexity of phylogenomic databases, difficulty in accounting for phylogenetic structure, inconsistencies in genome annotation, and substantial computational requirements. RESULTS We introduce PhyloCorrelate-a computational framework for gene co-occurrence analysis across large phylogenomic datasets. PhyloCorrelate implements a variety of co-occurrence metrics including standard correlation metrics and model-based metrics that account for phylogenetic history. By combining multiple metrics, we developed an optimized score that exhibits a superior ability to link genes with overlapping GO terms and KEGG pathways, enabling gene function prediction. Using genomic and functional annotation data from the Genome Taxonomy Database and AnnoTree, we performed all-by-all comparisons of gene occurrence profiles across the bacterial tree of life, totaling 154,217,052 comparisons for 28,315 genes across 27,372 bacterial genomes. All predictions are available in an online database, which instantaneously returns the top correlated genes for any PFAM, TIGRFAM, or KEGG query. In total, PhyloCorrelate detected 29,762 high confidence associations between bacterial gene/protein pairs, and generated functional predictions for 834 DUFs and proteins of unknown function. AVAILABILITY PhyloCorrelate is available as a web-server at phylocorrelate.uwaterloo.ca as well as an R package for analysis of custom datasets. We anticipate that PhyloCorrelate will be broadly useful as a tool for predicting function and interactions for gene families. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Briallen Lobb
- Department of Biology, 200 University Ave. West, Waterloo, ON, N2L 3G1, Canada
| | - Andrew C Doxey
- Department of Biology, 200 University Ave. West, Waterloo, ON, N2L 3G1, Canada
| |
Collapse
|
11
|
Han Y, Cheng L, Sun W. Analysis of Protein-Protein Interaction Networks through Computational Approaches. Protein Pept Lett 2020; 27:265-278. [PMID: 31692419 DOI: 10.2174/0929866526666191105142034] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2019] [Revised: 05/08/2019] [Accepted: 09/26/2019] [Indexed: 01/02/2023]
Abstract
The interactions among proteins and genes are extremely important for cellular functions. Molecular interactions at protein or gene levels can be used to construct interaction networks in which the interacting species are categorized based on direct interactions or functional similarities. Compared with the limited experimental techniques, various computational tools make it possible to analyze, filter, and combine the interaction data to get comprehensive information about the biological pathways. By the efficient way of integrating experimental findings in discovering PPIs and computational techniques for prediction, the researchers have been able to gain many valuable data on PPIs, including some advanced databases. Moreover, many useful tools and visualization programs enable the researchers to establish, annotate, and analyze biological networks. We here review and list the computational methods, databases, and tools for protein-protein interaction prediction.
Collapse
Affiliation(s)
- Ying Han
- Cardiovascular Department, The Fourth Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Liang Cheng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Weiju Sun
- Cardiovascular Department, The First Affiliated Hospital of Harbin Medical University, Harbin, China
| |
Collapse
|
12
|
Moi D, Kilchoer L, Aguilar PS, Dessimoz C. Scalable phylogenetic profiling using MinHash uncovers likely eukaryotic sexual reproduction genes. PLoS Comput Biol 2020; 16:e1007553. [PMID: 32697802 PMCID: PMC7423146 DOI: 10.1371/journal.pcbi.1007553] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2019] [Revised: 08/12/2020] [Accepted: 05/18/2020] [Indexed: 01/09/2023] Open
Abstract
Phylogenetic profiling is a computational method to predict genes involved in the same biological process by identifying protein families which tend to be jointly lost or retained across the tree of life. Phylogenetic profiling has customarily been more widely used with prokaryotes than eukaryotes, because the method is thought to require many diverse genomes. There are now many eukaryotic genomes available, but these are considerably larger, and typical phylogenetic profiling methods require at least quadratic time as a function of the number of genes. We introduce a fast, scalable phylogenetic profiling approach entitled HogProf, which leverages hierarchical orthologous groups for the construction of large profiles and locality-sensitive hashing for efficient retrieval of similar profiles. We show that the approach outperforms Enhanced Phylogenetic Tree, a phylogeny-based method, and use the tool to reconstruct networks and query for interactors of the kinetochore complex as well as conserved proteins involved in sexual reproduction: Hap2, Spo11 and Gex1. HogProf enables large-scale phylogenetic profiling across the three domains of life, and will be useful to predict biological pathways among the hundreds of thousands of eukaryotic species that will become available in the coming few years. HogProf is available at https://github.com/DessimozLab/HogProf. Genes that are involved in the same biological process tend to co-evolve. This property is exploited by the technique of phylogenetic profiling, which identifies co-evolving (and therefore likely functionally related) genes through patterns of correlated gene retention and loss in evolution and across species. However, conventional methods to computing and clustering these correlated genes do not scale with increasing numbers of genomes. HogProf is a novel phylogenetic profiling tool built on probabilistic data structures. It allows the user to construct searchable databases containing the evolutionary history of hundreds of thousands of protein families. Such fast detection of coevolution takes advantage of the rapidly increasing amount of genomic data publicly available, and can uncover unknown biological networks and guide in-vivo research and experimentation. We have applied our tool to describe the biological networks underpinning sexual reproduction in eukaryotes.
Collapse
Affiliation(s)
- David Moi
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
- * E-mail: (DM); (CD)
| | - Laurent Kilchoer
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Pablo S. Aguilar
- Instituto de Investigaciones Biotecnologicas (IIBIO), Universidad Nacional de San Martín, Buenos Aires, Argentina
- Instituto de Fisiología, Biología Molecular y Neurociencias (IFIBYNE-CONICET), Buenos Aires, Argentina
| | - Christophe Dessimoz
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Department of Genetics, Evolution, and Environment, University College London, London, United Kingdom
- Department of Computer Science, University College London, London, United Kingdom
- * E-mail: (DM); (CD)
| |
Collapse
|
13
|
Fukunaga T, Iwasaki W. Logicome Profiler: Exhaustive detection of statistically significant logic relationships from comparative omics data. PLoS One 2020; 15:e0232106. [PMID: 32357172 PMCID: PMC7194410 DOI: 10.1371/journal.pone.0232106] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2019] [Accepted: 04/07/2020] [Indexed: 02/01/2023] Open
Abstract
Logic relationship analysis is a data mining method that comprehensively detects item triplets that satisfy logic relationships from a binary matrix dataset, such as an ortholog table in comparative genomics. Thanks to recent technological advancements, many binary matrix datasets are now being produced in genomics, transcriptomics, epigenomics, metagenomics, and many other fields for comparative purposes. However, regardless of presumed interpretability and importance of logic relationships, existing data mining methods are not based on the framework of statistical hypothesis testing. That means, the type-1 and type-2 error rates are neither controlled nor estimated. Here, we developed Logicome Profiler, which exhaustively detects statistically significant triplet logic relationships from a binary matrix dataset (Logicome means ome of logics). To test all item triplets in a dataset while avoiding false positives, Logicome Profiler adjusts a significance level by the Bonferroni or Benjamini-Yekutieli method for the multiple testing correction. Its application to an ocean metagenomic dataset showed that Logicome Profiler can effectively detect statistically significant triplet logic relationships among environmental microbes and genes, which include those among urea transporter, urease, and photosynthesis-related genes. Beyond omics data analysis, Logicome Profiler is applicable to various binary matrix datasets in general for finding significant triplet logic relationships. The source code is available at https://github.com/fukunagatsu/LogicomeProfiler.
Collapse
Affiliation(s)
- Tsukasa Fukunaga
- Department of Computer Science, Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, Japan
- * E-mail:
| | - Wataru Iwasaki
- Department of Biological Sciences, Graduate School of Science, The University of Tokyo, Tokyo, Japan
- Department of Computational Biology and Medical Science, Graduate School of Frontier Sciences, The University of Tokyo, Chiba, Japan
- Atmosphere and Ocean Research Institute, The University of Tokyo, Chiba, Japan
- Institute for Quantitative Biosciences, The University of Tokyo, Tokyo, Japan
- Collaborative Research Institute for Innovative Microbiology, The University of Tokyo, Chiba, Japan
| |
Collapse
|
14
|
Sánchez-Caballero L, Elurbe DM, Baertling F, Guerrero-Castillo S, van den Brand M, van Strien J, van Dam TJP, Rodenburg R, Brandt U, Huynen MA, Nijtmans LGJ. TMEM70 functions in the assembly of complexes I and V. BIOCHIMICA ET BIOPHYSICA ACTA-BIOENERGETICS 2020; 1861:148202. [PMID: 32275929 DOI: 10.1016/j.bbabio.2020.148202] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/29/2020] [Revised: 03/19/2020] [Accepted: 04/02/2020] [Indexed: 10/24/2022]
Abstract
Protein complexes from the oxidative phosphorylation (OXPHOS) system are assembled with the help of proteins called assembly factors. We here delineate the function of the inner mitochondrial membrane protein TMEM70, in which mutations have been linked to OXPHOS deficiencies, using a combination of BioID, complexome profiling and coevolution analyses. TMEM70 interacts with complex I and V and for both complexes the loss of TMEM70 results in the accumulation of an assembly intermediate followed by a reduction of the next assembly intermediate in the pathway. This indicates that TMEM70 has a role in the stability of membrane-bound subassemblies or in the membrane recruitment of subunits into the forming complex. Independent evidence for a role of TMEM70 in OXPHOS assembly comes from evolutionary analyses. The TMEM70/TMEM186/TMEM223 protein family, of which we show that TMEM186 and TMEM223 are mitochondrial in human as well, only occurs in species with OXPHOS complexes. Our results validate the use of combining complexome profiling with BioID and evolutionary analyses in elucidating congenital defects in protein complex assembly.
Collapse
Affiliation(s)
- Laura Sánchez-Caballero
- Department of Paediatrics, Radboud Centre for Mitochondrial Medicine, Radboud University Medical Centre, Nijmegen, the Netherlands
| | - Dei M Elurbe
- Centre for Molecular and Biomolecular Informatics, Radboud Institute for Molecular Life Sciences, Radboud University Medical Centre, Nijmegen, the Netherlands
| | - Fabian Baertling
- Department of Paediatrics, Radboud Centre for Mitochondrial Medicine, Radboud University Medical Centre, Nijmegen, the Netherlands; Department of General Paediatrics, Neonatology and Paediatric Cardiology, University Children's Hospital Düsseldorf, Heinrich Heine University, Düsseldorf, Germany
| | - Sergio Guerrero-Castillo
- Department of Paediatrics, Radboud Centre for Mitochondrial Medicine, Radboud University Medical Centre, Nijmegen, the Netherlands
| | - Mariel van den Brand
- Department of Paediatrics, Radboud Centre for Mitochondrial Medicine, Radboud University Medical Centre, Nijmegen, the Netherlands
| | - Joeri van Strien
- Centre for Molecular and Biomolecular Informatics, Radboud Institute for Molecular Life Sciences, Radboud University Medical Centre, Nijmegen, the Netherlands
| | - Teunis J P van Dam
- Theoretical Biology and Bioinformatics, Department of Biology, Utrecht University, Utrecht, the Netherlands
| | - Richard Rodenburg
- Department of Paediatrics, Radboud Centre for Mitochondrial Medicine, Radboud University Medical Centre, Nijmegen, the Netherlands
| | - Ulrich Brandt
- Department of Paediatrics, Radboud Centre for Mitochondrial Medicine, Radboud University Medical Centre, Nijmegen, the Netherlands
| | - Martijn A Huynen
- Centre for Molecular and Biomolecular Informatics, Radboud Institute for Molecular Life Sciences, Radboud University Medical Centre, Nijmegen, the Netherlands.
| | - Leo G J Nijtmans
- Department of Paediatrics, Radboud Centre for Mitochondrial Medicine, Radboud University Medical Centre, Nijmegen, the Netherlands
| |
Collapse
|
15
|
Deutekom ES, Vosseberg J, van Dam TJP, Snel B. Measuring the impact of gene prediction on gene loss estimates in Eukaryotes by quantifying falsely inferred absences. PLoS Comput Biol 2019; 15:e1007301. [PMID: 31461468 PMCID: PMC6736253 DOI: 10.1371/journal.pcbi.1007301] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2018] [Revised: 09/10/2019] [Accepted: 08/01/2019] [Indexed: 12/25/2022] Open
Abstract
In recent years it became clear that in eukaryotic genome evolution gene loss is prevalent over gene gain. However, the absence of genes in an annotated genome is not always equivalent to the loss of genes. Due to sequencing issues, or incorrect gene prediction, genes can be falsely inferred as absent. This implies that loss estimates are overestimated and, more generally, that falsely inferred absences impact genomic comparative studies. However, reliable estimates of how prevalent this issue is are lacking. Here we quantified the impact of gene prediction on gene loss estimates in eukaryotes by analysing 209 phylogenetically diverse eukaryotic organisms and comparing their predicted proteomes to that of their respective six-frame translated genomes. We observe that 4.61% of domains per species were falsely inferred to be absent for Pfam domains predicted to have been present in the last eukaryotic common ancestor. Between phylogenetically different categories this estimate varies substantially: for clade-specific loss (ancestral loss) we found 1.30% and for species-specific loss 16.88% to be falsely inferred as absent. For BUSCO 1-to-1 orthologous families, 18.30% were falsely inferred to be absent. Finally, we showed that falsely inferred absences indeed impact loss estimates, with the number of losses decreasing by 11.78%. Our work strengthens the increasing number of studies showing that gene loss is an important factor in eukaryotic genome evolution. However, while we demonstrate that on average inferring gene absences from predicted proteomes is reliable, caution is warranted when inferring species-specific absences.
Collapse
Affiliation(s)
- Eva S. Deutekom
- Theoretical Biology and Bioinformatics, Department of Biology, Science faculty, Utrecht University, Utrecht, The Netherlands
| | - Julian Vosseberg
- Theoretical Biology and Bioinformatics, Department of Biology, Science faculty, Utrecht University, Utrecht, The Netherlands
| | - Teunis J. P. van Dam
- Theoretical Biology and Bioinformatics, Department of Biology, Science faculty, Utrecht University, Utrecht, The Netherlands
| | - Berend Snel
- Theoretical Biology and Bioinformatics, Department of Biology, Science faculty, Utrecht University, Utrecht, The Netherlands
| |
Collapse
|
16
|
CiliaCarta: An integrated and validated compendium of ciliary genes. PLoS One 2019; 14:e0216705. [PMID: 31095607 PMCID: PMC6522010 DOI: 10.1371/journal.pone.0216705] [Citation(s) in RCA: 78] [Impact Index Per Article: 15.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2018] [Accepted: 04/26/2019] [Indexed: 12/25/2022] Open
Abstract
The cilium is an essential organelle at the surface of mammalian cells whose dysfunction causes a wide range of genetic diseases collectively called ciliopathies. The current rate at which new ciliopathy genes are identified suggests that many ciliary components remain undiscovered. We generated and rigorously analyzed genomic, proteomic, transcriptomic and evolutionary data and systematically integrated these using Bayesian statistics into a predictive score for ciliary function. This resulted in 285 candidate ciliary genes. We generated independent experimental evidence of ciliary associations for 24 out of 36 analyzed candidate proteins using multiple cell and animal model systems (mouse, zebrafish and nematode) and techniques. For example, we show that OSCP1, which has previously been implicated in two distinct non-ciliary processes, causes ciliogenic and ciliopathy-associated tissue phenotypes when depleted in zebrafish. The candidate list forms the basis of CiliaCarta, a comprehensive ciliary compendium covering 956 genes. The resource can be used to objectively prioritize candidate genes in whole exome or genome sequencing of ciliopathy patients and can be accessed at http://bioinformatics.bio.uu.nl/john/syscilia/ciliacarta/.
Collapse
|
17
|
Li Y, Ning S, Calvo SE, Mootha VK, Liu JS. Bayesian hidden Markov tree models for clustering genes with shared evolutionary history. Ann Appl Stat 2019. [DOI: 10.1214/18-aoas1208] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
18
|
Kim H, Joe A, Lee M, Yang S, Ma X, Ronald PC, Lee I. A Genome-Scale Co-Functional Network of Xanthomonas Genes Can Accurately Reconstruct Regulatory Circuits Controlled by Two-Component Signaling Systems. Mol Cells 2019; 42:166-174. [PMID: 30759970 PMCID: PMC6399010 DOI: 10.14348/molcells.2018.0403] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2018] [Revised: 12/09/2018] [Accepted: 12/19/2018] [Indexed: 01/24/2023] Open
Abstract
Bacterial species in the genus Xanthomonas infect virtually all crop plants. Although many genes involved in Xanthomonas virulence have been identified through molecular and cellular studies, the elucidation of virulence-associated regulatory circuits is still far from complete. Functional gene networks have proven useful in generating hypotheses for genetic factors of biological processes in various species. Here, we present a genome-scale co-functional network of Xanthomonas oryze pv. oryzae (Xoo) genes, XooNet (www.inetbio.org/xoonet/), constructed by integrating heterogeneous types of genomics data derived from Xoo and other bacterial species. XooNet contains 106,000 functional links, which cover approximately 83% of the coding genome. XooNet is highly predictive for diverse biological processes in Xoo and can accurately reconstruct cellular pathways regulated by two-component signaling transduction systems (TCS). XooNet will be a useful in silico research platform for genetic dissection of virulence pathways in Xoo.
Collapse
Affiliation(s)
- Hanhae Kim
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul,
Korea
- Bio and Basic Science R&D Coordination Division, Korea Institute of S&T Evaluation and Planning, Seoul,
Korea
| | - Anna Joe
- Department of Plant Pathology and the Genome Center, University of California, CA 95616,
USA
- Feedstocks Division, Joint Bioenergy Institute, CA 94608,
USA
| | - Muyoung Lee
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul,
Korea
| | - Sunmo Yang
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul,
Korea
| | - Xiaozhi Ma
- Rice Research Institute, Guangdong Academy of Agricultural Sciences, Guangzhou,
China
| | - Pamela C. Ronald
- Department of Plant Pathology and the Genome Center, University of California, CA 95616,
USA
- Feedstocks Division, Joint Bioenergy Institute, CA 94608,
USA
| | - Insuk Lee
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul,
Korea
| |
Collapse
|
19
|
Vidulin V, Šmuc T, Džeroski S, Supek F. The evolutionary signal in metagenome phyletic profiles predicts many gene functions. MICROBIOME 2018; 6:129. [PMID: 29991352 PMCID: PMC6040064 DOI: 10.1186/s40168-018-0506-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/07/2017] [Accepted: 06/19/2018] [Indexed: 06/08/2023]
Abstract
BACKGROUND The function of many genes is still not known even in model organisms. An increasing availability of microbiome DNA sequencing data provides an opportunity to infer gene function in a systematic manner. RESULTS We evaluated if the evolutionary signal contained in metagenome phyletic profiles (MPP) is predictive of a broad array of gene functions. The MPPs are an encoding of environmental DNA sequencing data that consists of relative abundances of gene families across metagenomes. We find that such MPPs can accurately predict 826 Gene Ontology functional categories, while drawing on human gut microbiomes, ocean metagenomes, and DNA sequences from various other engineered and natural environments. Overall, in this task, the MPPs are highly accurate, and moreover they provide coverage for a set of Gene Ontology terms largely complementary to standard phylogenetic profiles, derived from fully sequenced genomes. We also find that metagenomes approximated from taxon relative abundance obtained via 16S rRNA gene sequencing may provide surprisingly useful predictive models. Crucially, the MPPs derived from different types of environments can infer distinct, non-overlapping sets of gene functions and therefore complement each other. Consistently, simulations on > 5000 metagenomes indicate that the amount of data is not in itself critical for maximizing predictive accuracy, while the diversity of sampled environments appears to be the critical factor for obtaining robust models. CONCLUSIONS In past work, metagenomics has provided invaluable insight into ecology of various habitats, into diversity of microbial life and also into human health and disease mechanisms. We propose that environmental DNA sequencing additionally constitutes a useful tool to predict biological roles of genes, yielding inferences out of reach for existing comparative genomics approaches.
Collapse
Affiliation(s)
- Vedrana Vidulin
- Faculty of Information Studies, 8000 Novo Mesto, Slovenia
- Division of Electronics, Rudjer Boskovic Institute, 10000 Zagreb, Croatia
- Department of Knowledge Technologies, Jozef Stefan Institute, 1000 Ljubljana, Slovenia
| | - Tomislav Šmuc
- Division of Electronics, Rudjer Boskovic Institute, 10000 Zagreb, Croatia
| | - Sašo Džeroski
- Department of Knowledge Technologies, Jozef Stefan Institute, 1000 Ljubljana, Slovenia
| | - Fran Supek
- Genome Data Science, Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, 08028 Barcelona, Spain
| |
Collapse
|
20
|
Beck C, Knoop H, Steuer R. Modules of co-occurrence in the cyanobacterial pan-genome reveal functional associations between groups of ortholog genes. PLoS Genet 2018. [PMID: 29522508 PMCID: PMC5862535 DOI: 10.1371/journal.pgen.1007239] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Cyanobacteria are a monophyletic phylogenetic group of global importance and have received considerable attention as potential host organisms for the renewable synthesis of chemical bulk products from atmospheric CO2. The cyanobacterial phylum exhibits enormous metabolic diversity with respect to morphology, lifestyle and habitat. As yet, however, research has mostly focused on few model strains and cyanobacterial diversity is insufficiently understood. In this respect, the increasing availability of fully sequenced bacterial genomes opens new and unprecedented opportunities to investigate the genetic inventory of organisms in the context of their pan-genome. Here, we seek understand cyanobacterial diversity using a comparative genome analysis of 77 fully sequenced and assembled cyanobacterial genomes. We use phylogenetic profiling to analyze the co-occurrence of clusters of likely ortholog genes (CLOGs) and reveal novel functional associations between CLOGs that are not captured by co-localization of genes. Going beyond pair-wise co-occurrences, we propose a network approach that allows us to identify modules of co-occurring CLOGs. The extracted modules exhibit a high degree of functional coherence and reveal known as well as previously unknown functional associations. We argue that the high functional coherence observed for the modules is a consequence of the similar-yet-diverse nature of cyanobacteria. Our approach highlights the importance of a multi-strain analysis to understand gene functions and environmental adaptations, with implications beyond the cyanobacterial phylum. The analysis is augmented with a simple toolbox that facilitates further analysis to investigate the co-occurrence neighborhood of specific CLOGs of interest. Cyanobacteria are photoautotrophic prokaryotes of global importance and offer great potential as host organisms for the renewable synthesis of chemical bulk products, including biofuels, from atmospheric CO2. As yet, however, research has mostly focussed on a small number of model strains and the genetic inventory of the cyanobacterial phylum is still insufficiently understood. The rapidly increasing availability of fully sequenced cyanobacterial genomes opens new and unprecendented possibilities to study the diversity of cyanobacterial strain in the context of the cyanobacterial pan-genome. Here, we seek to understand the genetic inventory of individual cyanobacterial strains based on the hypothesis that genes that are functionally related also co-occur within the genomes of different strains. We confirm this hypothesis by in depth analysis of co-occurrence that goes beyond pair-wise co-occurrences. We show that co-occurrence does not imply co-localization on the genome. Our work provides a novel approach to infer gene function and highlights the importance of a multi-strain analysis, with implications beyond the analysis of the cyanobacterial phylum.
Collapse
Affiliation(s)
- Christian Beck
- Humboldt-Universität zu Berlin, Institut für Theoretische Biologie (ITB), Berlin, Germany
| | - Henning Knoop
- Humboldt-Universität zu Berlin, Institut für Theoretische Biologie (ITB), Berlin, Germany
| | - Ralf Steuer
- Humboldt-Universität zu Berlin, Institut für Theoretische Biologie (ITB), Berlin, Germany
- * E-mail:
| |
Collapse
|
21
|
Niu Y, Moghimyfiroozabad S, Safaie S, Yang Y, Jonas EA, Alavian KN. Phylogenetic Profiling of Mitochondrial Proteins and Integration Analysis of Bacterial Transcription Units Suggest Evolution of F1Fo ATP Synthase from Multiple Modules. J Mol Evol 2017; 85:219-233. [PMID: 29177973 PMCID: PMC5709465 DOI: 10.1007/s00239-017-9819-3] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2017] [Accepted: 11/11/2017] [Indexed: 11/26/2022]
Abstract
ATP synthase is a complex universal enzyme responsible for ATP synthesis across all kingdoms of life. The F-type ATP synthase has been suggested to have evolved from two functionally independent, catalytic (F1) and membrane bound (Fo), ancestral modules. While the modular evolution of the synthase is supported by studies indicating independent assembly of the two subunits, the presence of intermediate assembly products suggests a more complex evolutionary process. We analyzed the phylogenetic profiles of the human mitochondrial proteins and bacterial transcription units to gain additional insight into the evolution of the F-type ATP synthase complex. In this study, we report the presence of intermediary modules based on the phylogenetic profiles of the human mitochondrial proteins. The two main intermediary modules comprise the α3β3 hexamer in the F1 and the c-subunit ring in the Fo. A comprehensive analysis of bacterial transcription units of F1Fo ATP synthase revealed that while a long and constant order of F1Fo ATP synthase genes exists in a majority of bacterial genomes, highly conserved combinations of separate transcription units are present among certain bacterial classes and phyla. Based on our findings, we propose a model that includes the involvement of multiple modules in the evolution of F1Fo ATP synthase. The central and peripheral stalk subunits provide a link for the integration of the F1/Fo modules.
Collapse
Affiliation(s)
- Yulong Niu
- Division of Brain Sciences, Department of Medicine, Imperial College London, E508, Burlington Danes Hammersmith Hospital, DuCane Road, London, W12 0NN, UK
- Key Lab of Bio-Resources and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, People's Republic of China
- Department of Internal Medicine, Endocrinology, Yale University, New Haven, CT, USA
| | | | - Sepehr Safaie
- Department of Mathematics and Computer Science, The Bahá'í Institute for Higher Education (BIHE), Tehran, Iran
| | - Yi Yang
- Key Lab of Bio-Resources and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, People's Republic of China
| | - Elizabeth A Jonas
- Department of Internal Medicine, Endocrinology, Yale University, New Haven, CT, USA
| | - Kambiz N Alavian
- Division of Brain Sciences, Department of Medicine, Imperial College London, E508, Burlington Danes Hammersmith Hospital, DuCane Road, London, W12 0NN, UK.
- Department of Biology, The Bahá'í Institute for Higher Education (BIHE), Tehran, Iran.
- Department of Internal Medicine, Endocrinology, Yale University, New Haven, CT, USA.
| |
Collapse
|
22
|
Sferra G, Fratini F, Ponzi M, Pizzi E. Phylo_dCor: distance correlation as a novel metric for phylogenetic profiling. BMC Bioinformatics 2017; 18:396. [PMID: 28870256 PMCID: PMC5584357 DOI: 10.1186/s12859-017-1815-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2017] [Accepted: 08/29/2017] [Indexed: 12/20/2022] Open
Abstract
Background Elaboration of powerful methods to predict functional and/or physical protein-protein interactions from genome sequence is one of the main tasks in the post-genomic era. Phylogenetic profiling allows the prediction of protein-protein interactions at a whole genome level in both Prokaryotes and Eukaryotes. For this reason it is considered one of the most promising methods. Results Here, we propose an improvement of phylogenetic profiling that enables handling of large genomic datasets and infer global protein-protein interactions. This method uses the distance correlation as a new measure of phylogenetic profile similarity. We constructed robust reference sets and developed Phylo-dCor, a parallelized version of the algorithm for calculating the distance correlation that makes it applicable to large genomic data. Using Saccharomyces cerevisiae and Escherichia coli genome datasets, we showed that Phylo-dCor outperforms phylogenetic profiling methods previously described based on the mutual information and Pearson’s correlation as measures of profile similarity. Conclusions In this work, we constructed and assessed robust reference sets and propose the distance correlation as a measure for comparing phylogenetic profiles. To make it applicable to large genomic data, we developed Phylo-dCor, a parallelized version of the algorithm for calculating the distance correlation. Two R scripts that can be run on a wide range of machines are available upon request. Electronic supplementary material The online version of this article (10.1186/s12859-017-1815-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Gabriella Sferra
- Dipartimento di Malattie Infettive, Parassitarie e Immunomediate, Istituto Superiore di Sanità, Viale Regina Elena 299, 00161, Rome, Italy
| | - Federica Fratini
- Dipartimento di Malattie Infettive, Parassitarie e Immunomediate, Istituto Superiore di Sanità, Viale Regina Elena 299, 00161, Rome, Italy
| | - Marta Ponzi
- Dipartimento di Malattie Infettive, Parassitarie e Immunomediate, Istituto Superiore di Sanità, Viale Regina Elena 299, 00161, Rome, Italy
| | - Elisabetta Pizzi
- Dipartimento di Malattie Infettive, Parassitarie e Immunomediate, Istituto Superiore di Sanità, Viale Regina Elena 299, 00161, Rome, Italy.
| |
Collapse
|
23
|
Niu Y, Liu C, Moghimyfiroozabad S, Yang Y, Alavian KN. PrePhyloPro: phylogenetic profile-based prediction of whole proteome linkages. PeerJ 2017; 5:e3712. [PMID: 28875072 PMCID: PMC5578374 DOI: 10.7717/peerj.3712] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2017] [Accepted: 07/28/2017] [Indexed: 02/05/2023] Open
Abstract
Direct and indirect functional links between proteins as well as their interactions as part of larger protein complexes or common signaling pathways may be predicted by analyzing the correlation of their evolutionary patterns. Based on phylogenetic profiling, here we present a highly scalable and time-efficient computational framework for predicting linkages within the whole human proteome. We have validated this method through analysis of 3,697 human pathways and molecular complexes and a comparison of our results with the prediction outcomes of previously published co-occurrency model-based and normalization methods. Here we also introduce PrePhyloPro, a web-based software that uses our method for accurately predicting proteome-wide linkages. We present data on interactions of human mitochondrial proteins, verifying the performance of this software. PrePhyloPro is freely available at http://prephylopro.org/phyloprofile/.
Collapse
Affiliation(s)
- Yulong Niu
- Department of Medicine, Division of Brain Sciences, Imperial College London, London, United Kingdom.,Key Lab of Bio-resources and Eco-environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, Sichuan, China.,School of Medicine, Department of Internal Medicine, Endocrinology, Yale University, New Haven, CT, United States of America
| | - Chengcheng Liu
- Department of Periodontics, West China Hospital of Stomatology, Sichuan University, Chengdu, China
| | | | - Yi Yang
- Key Lab of Bio-resources and Eco-environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, Sichuan, China
| | - Kambiz N Alavian
- Department of Medicine, Division of Brain Sciences, Imperial College London, London, United Kingdom.,School of Medicine, Department of Internal Medicine, Endocrinology, Yale University, New Haven, CT, United States of America.,Department of Biology, The Bahá'í Institute for Higher Education (BIHE), Tehran, Iran
| |
Collapse
|
24
|
van Hooff JJ, Tromer E, van Wijk LM, Snel B, Kops GJ. Evolutionary dynamics of the kinetochore network in eukaryotes as revealed by comparative genomics. EMBO Rep 2017. [PMID: 28642229 PMCID: PMC5579357 DOI: 10.15252/embr.201744102] [Citation(s) in RCA: 141] [Impact Index Per Article: 20.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
During eukaryotic cell division, the sister chromatids of duplicated chromosomes are pulled apart by microtubules, which connect via kinetochores. The kinetochore is a multiprotein structure that links centromeres to microtubules, and that emits molecular signals in order to safeguard the equal distribution of duplicated chromosomes over daughter cells. Although microtubule‐mediated chromosome segregation is evolutionary conserved, kinetochore compositions seem to have diverged. To systematically inventory kinetochore diversity and to reconstruct its evolution, we determined orthologs of 70 kinetochore proteins in 90 phylogenetically diverse eukaryotes. The resulting ortholog sets imply that the last eukaryotic common ancestor (LECA) possessed a complex kinetochore and highlight that current‐day kinetochores differ substantially. These kinetochores diverged through gene loss, duplication, and, less frequently, invention and displacement. Various kinetochore components co‐evolved with one another, albeit in different manners. These co‐evolutionary patterns improve our understanding of kinetochore function and evolution, which we illustrated with the RZZ complex, TRIP13, the MCC, and some nuclear pore proteins. The extensive diversity of kinetochore compositions in eukaryotes poses numerous questions regarding evolutionary flexibility of essential cellular functions.
Collapse
Affiliation(s)
- Jolien Je van Hooff
- Hubrecht Institute - KNAW (Royal Netherlands Academy of Arts and Sciences), Utrecht, The Netherlands.,Theoretical Biology and Bioinformatics, Department of Biology, Science Faculty, Utrecht University, Utrecht, The Netherlands.,Molecular Cancer Research, University Medical Center Utrecht, Utrecht, The Netherlands
| | - Eelco Tromer
- Hubrecht Institute - KNAW (Royal Netherlands Academy of Arts and Sciences), Utrecht, The Netherlands.,Theoretical Biology and Bioinformatics, Department of Biology, Science Faculty, Utrecht University, Utrecht, The Netherlands
| | - Leny M van Wijk
- Theoretical Biology and Bioinformatics, Department of Biology, Science Faculty, Utrecht University, Utrecht, The Netherlands
| | - Berend Snel
- Theoretical Biology and Bioinformatics, Department of Biology, Science Faculty, Utrecht University, Utrecht, The Netherlands
| | - Geert Jpl Kops
- Hubrecht Institute - KNAW (Royal Netherlands Academy of Arts and Sciences), Utrecht, The Netherlands .,Molecular Cancer Research, University Medical Center Utrecht, Utrecht, The Netherlands.,Cancer Genomics Netherlands, University Medical Center Utrecht, Utrecht, The Netherlands
| |
Collapse
|
25
|
van Hooff JJE, Snel B, Kops GJPL. Unique Phylogenetic Distributions of the Ska and Dam1 Complexes Support Functional Analogy and Suggest Multiple Parallel Displacements of Ska by Dam1. Genome Biol Evol 2017; 9:1295-1303. [PMID: 28472331 PMCID: PMC5439489 DOI: 10.1093/gbe/evx088] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/03/2017] [Indexed: 12/27/2022] Open
Abstract
Faithful chromosome segregation relies on kinetochores, the large protein complexes that connect chromatin to spindle microtubules. Although human and yeast kinetochores are largely homologous, they track microtubules with the unrelated protein complexes Ska (Ska-C, human) and Dam1 (Dam1-C, yeast). We here uncovered that Ska-C and Dam1-C are both widespread among eukaryotes, but in an exceptionally inverse manner, supporting their functional analogy. Within the complexes, all Ska-C and various Dam1-C subunits are ancient paralogs, showing that gene duplication shaped these complexes. We examined various evolutionary scenarios to explain the nearly mutually exclusive patterns of Ska-C and Dam1-C in present-day species. We propose that Ska-C was present in the last eukaryotic common ancestor, that subsequently Dam1-C displaced Ska-C in an early fungus and was horizontally transferred to diverse non-fungal lineages, displacing Ska-C in these lineages too.
Collapse
Affiliation(s)
- Jolien J. E. van Hooff
- Hubrecht Institute – KNAW (Royal Netherlands Academy of Arts and Sciences), Utrecht, The Netherlands
- Theoretical Biology and Bioinformatics, Department of Biology, Science Faculty, Utrecht University, The Netherlands
- Molecular Cancer Research, University Medical Center Utrecht, The Netherlands
| | - Berend Snel
- Theoretical Biology and Bioinformatics, Department of Biology, Science Faculty, Utrecht University, The Netherlands
| | - Geert J. P. L. Kops
- Hubrecht Institute – KNAW (Royal Netherlands Academy of Arts and Sciences), Utrecht, The Netherlands
- Molecular Cancer Research, University Medical Center Utrecht, The Netherlands
- Cancer Genomics Netherlands, University Medical Center Utrecht, The Netherlands
| |
Collapse
|
26
|
Wittouck S, van Noort V. Correlated duplications and losses in the evolution of palmitoylation writer and eraser families. BMC Evol Biol 2017; 17:83. [PMID: 28320309 PMCID: PMC5359973 DOI: 10.1186/s12862-017-0932-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2016] [Accepted: 03/09/2017] [Indexed: 12/27/2022] Open
Abstract
Background Protein post-translational modifications (PTMs) change protein properties. Each PTM type is associated with domain families that apply the modification (writers), remove the modification (erasers) and bind to the modified sites (readers) together called toolkit domains. The evolutionary origin and diversification remains largely understudied, except for tyrosine phosphorylation. Protein palmitoylation entails the addition of a palmitoyl fatty acid to a cysteine residue. This PTM functions as a membrane anchor and is involved in a range of cellular processes. One writer family and two erasers families are known for protein palmitoylation. Results In this work we unravel the evolutionary history of these writer and eraser families. We constructed a high-quality profile hidden Markov model (HMM) of each family, searched for protein family members in fully sequenced genomes and subsequently constructed phylogenetic distributions of the families. We constructed Maximum Likelihood phylogenetic trees and using gene tree rearrangement and tree reconciliation inferred their evolutionary histories in terms of duplication and loss events. We identified lineages where the families expanded or contracted and found that the evolutionary histories of the families are correlated. The results show that the erasers were invented first, before the origin of the eukaryotes. The writers first arose in the eukaryotic ancestor. The writers and erasers show co-expansions in several eukaryotic ancestral lineages. These expansions often seem to be followed by contractions in some or all of the lineages further in evolution. Conclusions A general pattern of correlated evolution appears between writer and eraser domains. These co-evolution patterns could be used in new methods for interaction prediction based on phylogenies. Electronic supplementary material The online version of this article (doi:10.1186/s12862-017-0932-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Stijn Wittouck
- Centre of Microbial and Plant Genetics, KU Leuven, Leuven, Belgium.,Department of Bioscience Engineering, University of Antwerp, Antwerp, Belgium
| | - Vera van Noort
- Centre of Microbial and Plant Genetics, KU Leuven, Leuven, Belgium.
| |
Collapse
|
27
|
Shim JE, Lee T, Lee I. From sequencing data to gene functions: co-functional network approaches. Anim Cells Syst (Seoul) 2017; 21:77-83. [PMID: 30460054 PMCID: PMC6138336 DOI: 10.1080/19768354.2017.1284156] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2017] [Accepted: 01/15/2017] [Indexed: 01/04/2023] Open
Abstract
Advanced high-throughput sequencing technology accumulated massive amount of genomics and transcriptomics data in the public databases. Due to the high technical accessibility, DNA and RNA sequencing have huge potential for the study of gene functions in most species including animals and crops. A proven analytic platform to convert sequencing data to gene functional information is co-functional network. Because all genes exert their functions through interactions with others, network analysis is a legitimate way to study gene functions. The workflow of network-based functional study is composed of three steps: (i) inferencing co-functional links, (ii) evaluating and integrating the links into genome-scale networks, and (iii) generating functional hypotheses from the networks. Co-functional links can be inferred from DNA sequencing data by using phylogenetic profiling, gene neighborhood, domain profiling, associalogs, and co-expression analysis from RNA sequencing data. The inferred links are then evaluated and integrated into a genome-scale network with aid from gold-standard co-functional links. Functional hypotheses can be generated from the network based on (i) network connectivity, (ii) network propagation, and (iii) subnetwork analysis. The functional analysis pipeline described here requires only sequencing data which can be readily available for most species by next-generation sequencing technology. Therefore, co-functional networks will greatly potentiate the use of the sequencing data for the study of genetics in any cellular organism.
Collapse
Affiliation(s)
- Jung Eun Shim
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, Korea
| | - Tak Lee
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, Korea
| | - Insuk Lee
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, Korea
| |
Collapse
|
28
|
Abstract
Functional constraints between genes display similar patterns of gain or loss during speciation. Similar phylogenetic profiles, therefore, can be an indication of a functional association between genes. The phylogenetic profiling method has been applied successfully to the reconstruction of gene pathways and the inference of unknown gene functions. This method requires only sequence data to generate phylogenetic profiles. This method therefore has the potential to take advantage of the recent explosion in available sequence data to reveal a significant number of functional associations between genes. Since the initial development of phylogenetic profiling, many modifications to improve this method have been proposed, including improvements in the measurement of profile similarity and the selection of reference species. Here, we describe the existing methods of phylogenetic profiling for the inference of functional associations and discuss their technical limitations and caveats.
Collapse
Affiliation(s)
- Junha Shin
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, 50 Yonsei-ro, Seodaemun-gu, Seoul, 120-749, South Korea.
| | - Insuk Lee
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, 50 Yonsei-ro, Seodaemun-gu, Seoul, 120-749, South Korea.
| |
Collapse
|
29
|
Abstract
Protein function is a concept that can have different interpretations in different biological contexts, and the number and diversity of novel proteins identified by large-scale "omics" technologies poses increasingly new challenges. In this review we explore current strategies used to predict protein function focused on high-throughput sequence analysis, as for example, inference based on sequence similarity, sequence composition, structure, and protein-protein interaction. Various prediction strategies are discussed together with illustrative workflows highlighting the use of some benchmark tools and knowledge bases in the field.
Collapse
Affiliation(s)
- Leonardo Magalhães Cruz
- Department of Biochemistry and Molecular Biology, Federal University of Paraná (UFPR), Curitiba, PR, Brazil.
- Sector of Professional and Technological Education, Federal University of Paraná (UFPR), Curitiba, PR, Brazil.
| | - Sheyla Trefflich
- Sector of Professional and Technological Education, Federal University of Paraná (UFPR), Curitiba, PR, Brazil
| | - Vinícius Almir Weiss
- Sector of Professional and Technological Education, Federal University of Paraná (UFPR), Curitiba, PR, Brazil
| | - Mauro Antônio Alves Castro
- Sector of Professional and Technological Education, Federal University of Paraná (UFPR), Curitiba, PR, Brazil
| |
Collapse
|
30
|
Adebali O, Zhulin IB. Aquerium: A web application for comparative exploration of domain-based protein occurrences on the taxonomically clustered genome tree. Proteins 2016; 85:72-77. [PMID: 27802571 DOI: 10.1002/prot.25199] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2016] [Accepted: 10/20/2016] [Indexed: 01/27/2023]
Abstract
Gene duplication and loss are major driving forces in evolution. While many important genomic resources provide information on gene presence, there is a lack of tools giving equal importance to presence and absence information as well as web platforms enabling easy visual comparison of multiple domain-based protein occurrences at once. Here, we present Aquerium, a platform for visualizing genomic presence and absence of biomolecules with a focus on protein domain architectures. The web server offers advanced domain organization querying against the database of pre-computed domains for ∼26,000 organisms and it can be utilized for identification of evolutionary events, such as fusion, disassociation, duplication, and shuffling of protein domains. The tool also allows alternative inputs of custom entries or BLASTP results for visualization. Aquerium will be a useful tool for biologists who perform comparative genomic and evolutionary analyses. The web server is freely accessible at http://aquerium.utk.edu. Proteins 2016; 85:72-77. © 2016 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Ogun Adebali
- UT-ORNL Graduate School of Genome Science and Technology, University of Tennessee, Knoxville, Tennessee, 37996.,Department of Microbiology, University of Tennessee, Knoxville, Tennessee, 37996.,Computer Science and Mathematics Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee, 37961
| | - Igor B Zhulin
- UT-ORNL Graduate School of Genome Science and Technology, University of Tennessee, Knoxville, Tennessee, 37996.,Department of Microbiology, University of Tennessee, Knoxville, Tennessee, 37996.,Computer Science and Mathematics Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee, 37961
| |
Collapse
|
31
|
Vidulin V, Šmuc T, Supek F. Extensive complementarity between gene function prediction methods. Bioinformatics 2016; 32:3645-3653. [PMID: 27522084 DOI: 10.1093/bioinformatics/btw532] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2016] [Revised: 07/11/2016] [Accepted: 08/09/2016] [Indexed: 12/22/2022] Open
Abstract
MOTIVATION The number of sequenced genomes rises steadily but we still lack the knowledge about the biological roles of many genes. Automated function prediction (AFP) is thus a necessity. We hypothesized that AFP approaches that draw on distinct genome features may be useful for predicting different types of gene functions, motivating a systematic analysis of the benefits gained by obtaining and integrating such predictions. RESULTS Our pipeline amalgamates 5 133 543 genes from 2071 genomes in a single massive analysis that evaluates five established genomic AFP methodologies. While 1227 Gene Ontology (GO) terms yielded reliable predictions, the majority of these functions were accessible to only one or two of the methods. Moreover, different methods tend to assign a GO term to non-overlapping sets of genes. Thus, inferences made by diverse genomic AFP methods display a striking complementary, both gene-wise and function-wise. Because of this, a viable integration strategy is to rely on a single most-confident prediction per gene/function, rather than enforcing agreement across multiple AFP methods. Using an information-theoretic approach, we estimate that current databases contain 29.2 bits/gene of known Escherichia coli gene functions. This can be increased by up to 5.5 bits/gene using individual AFP methods or by 11 additional bits/gene upon integration, thereby providing a highly-ranking predictor on the Critical Assessment of Function Annotation 2 community benchmark. Availability of more sequenced genomes boosts the predictive accuracy of AFP approaches and also the benefit from integrating them. AVAILABILITY AND IMPLEMENTATION The individual and integrated GO predictions for the complete set of genes are available from http://gorbi.irb.hr/ CONTACT: fran.supek@irb.hrSupplementary information: Supplementary materials are available at Bioinformatics online.
Collapse
Affiliation(s)
- Vedrana Vidulin
- Division of Electronics, Ruđer Bošković Institute, Bijenička cesta 54, Zagreb 10000, Croatia
| | - Tomislav Šmuc
- Division of Electronics, Ruđer Bošković Institute, Bijenička cesta 54, Zagreb 10000, Croatia
| | - Fran Supek
- Division of Electronics, Ruđer Bošković Institute, Bijenička cesta 54, Zagreb 10000, Croatia.,EMBL/CRG Systems Biology Research Unit, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology and UPF, Dr. Aiguader 88, Barcelona 08003, Spain
| |
Collapse
|
32
|
Developing of the Computer Method for Annotation of Bacterial Genes. Adv Bioinformatics 2016; 2015:635437. [PMID: 26770195 PMCID: PMC4684837 DOI: 10.1155/2015/635437] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2015] [Revised: 11/16/2015] [Accepted: 11/18/2015] [Indexed: 02/07/2023] Open
Abstract
Over the last years a great number of bacterial genomes were sequenced. Now one of the most important challenges of computational genomics is the functional annotation of nucleic acid sequences. In this study we presented the computational method and the annotation system for predicting biological functions using phylogenetic profiles. The phylogenetic profile of a gene was created by way of searching for similarities between the nucleotide sequence of the gene and 1204 reference genomes, with further estimation of the statistical significance of found similarities. The profiles of the genes with known functions were used for prediction of possible functions and functional groups for the new genes. We conducted the functional annotation for genes from 104 bacterial genomes and compared the functions predicted by our system with the already known functions. For the genes that have already been annotated, the known function matched the function we predicted in 63% of the time, and in 86% of the time the known function was found within the top five predicted functions. Besides, our system increased the share of annotated genes by 19%. The developed system may be used as an alternative or complementary system to the current annotation systems.
Collapse
|
33
|
TMEM107 recruits ciliopathy proteins to subdomains of the ciliary transition zone and causes Joubert syndrome. Nat Cell Biol 2015; 18:122-31. [PMID: 26595381 DOI: 10.1038/ncb3273] [Citation(s) in RCA: 99] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2015] [Accepted: 10/20/2015] [Indexed: 01/10/2023]
Abstract
The transition zone (TZ) ciliary subcompartment is thought to control cilium composition and signalling by facilitating a protein diffusion barrier at the ciliary base. TZ defects cause ciliopathies such as Meckel-Gruber syndrome (MKS), nephronophthisis (NPHP) and Joubert syndrome (JBTS). However, the molecular composition and mechanisms underpinning TZ organization and barrier regulation are poorly understood. To uncover candidate TZ genes, we employed bioinformatics (coexpression and co-evolution) and identified TMEM107 as a TZ protein mutated in oral-facial-digital syndrome and JBTS patients. Mechanistic studies in Caenorhabditis elegans showed that TMEM-107 controls ciliary composition and functions redundantly with NPHP-4 to regulate cilium integrity, TZ docking and assembly of membrane to microtubule Y-link connectors. Furthermore, nematode TMEM-107 occupies an intermediate layer of the TZ-localized MKS module by organizing recruitment of the ciliopathy proteins MKS-1, TMEM-231 (JBTS20) and JBTS-14 (TMEM237). Finally, MKS module membrane proteins are immobile and super-resolution microscopy in worms and mammalian cells reveals periodic localizations within the TZ. This work expands the MKS module of ciliopathy-causing TZ proteins associated with diffusion barrier formation and provides insight into TZ subdomain architecture.
Collapse
|
34
|
Supek F. The Code of Silence: Widespread Associations Between Synonymous Codon Biases and Gene Function. J Mol Evol 2015; 82:65-73. [PMID: 26538122 DOI: 10.1007/s00239-015-9714-8] [Citation(s) in RCA: 43] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2015] [Accepted: 10/30/2015] [Indexed: 02/07/2023]
Abstract
Some mutations in gene coding regions exchange one synonymous codon for another, and thus do not alter the amino acid sequence of the encoded protein. Even though they are often called 'silent,' these mutations may exhibit a plethora of effects on the living cell. Therefore, they are often selected during evolution, causing synonymous codon usage biases in genomes. Comparative analyses of bacterial, archaeal, fungal, and human cancer genomes have found many links between a gene's biological role and the accrual of synonymous mutations during evolution. In particular, highly expressed genes in certain functional categories are enriched with optimal codons, which are decoded by the abundant tRNAs, thus enhancing the speed and accuracy of the translating ribosome. The set of genes exhibiting codon adaptation differs between genomes, and these differences show robust associations to organismal phenotypes. In addition to selection for translation efficiency, other distinct codon bias patterns have been found in: amino acid starvation genes, cyclically expressed genes, tissue-specific genes in animals and plants, oxidative stress response genes, cellular differentiation genes, and oncogenes. In addition, genomes of organisms harboring tRNA modifications exhibit particular codon preferences. The evolutionary trace of codon bias patterns across orthologous genes may be examined to learn about a gene's relevance to various phenotypes, or, more generally, its function in the cell.
Collapse
Affiliation(s)
- Fran Supek
- Division of electronics, Rudjer Boskovic Institute, 10000, Zagreb, Croatia.
- EMBL-CRG Systems Biology Unit, Centre for Genomic Regulation (CRG), 08003, Barcelona, Spain.
- Universitat Pompeu Fabra (UPF), 08003, Barcelona, Spain.
| |
Collapse
|
35
|
Shin J, Lee I. Co-Inheritance Analysis within the Domains of Life Substantially Improves Network Inference by Phylogenetic Profiling. PLoS One 2015; 10:e0139006. [PMID: 26394049 PMCID: PMC4578931 DOI: 10.1371/journal.pone.0139006] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2015] [Accepted: 09/07/2015] [Indexed: 01/23/2023] Open
Abstract
Phylogenetic profiling, a network inference method based on gene inheritance profiles, has been widely used to construct functional gene networks in microbes. However, its utility for network inference in higher eukaryotes has been limited. An improved algorithm with an in-depth understanding of pathway evolution may overcome this limitation. In this study, we investigated the effects of taxonomic structures on co-inheritance analysis using 2,144 reference species in four query species: Escherichia coli, Saccharomyces cerevisiae, Arabidopsis thaliana, and Homo sapiens. We observed three clusters of reference species based on a principal component analysis of the phylogenetic profiles, which correspond to the three domains of life-Archaea, Bacteria, and Eukaryota-suggesting that pathways inherit primarily within specific domains or lower-ranked taxonomic groups during speciation. Hence, the co-inheritance pattern within a taxonomic group may be eroded by confounding inheritance patterns from irrelevant taxonomic groups. We demonstrated that co-inheritance analysis within domains substantially improved network inference not only in microbe species but also in the higher eukaryotes, including humans. Although we observed two sub-domain clusters of reference species within Eukaryota, co-inheritance analysis within these sub-domain taxonomic groups only marginally improved network inference. Therefore, we conclude that co-inheritance analysis within domains is the optimal approach to network inference with the given reference species. The construction of a series of human gene networks with increasing sample sizes of the reference species for each domain revealed that the size of the high-accuracy networks increased as additional reference species genomes were included, suggesting that within-domain co-inheritance analysis will continue to expand human gene networks as genomes of additional species are sequenced. Taken together, we propose that co-inheritance analysis within the domains of life will greatly potentiate the use of the expected onslaught of sequenced genomes in the study of molecular pathways in higher eukaryotes.
Collapse
Affiliation(s)
- Junha Shin
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, Korea
| | - Insuk Lee
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, Korea
| |
Collapse
|
36
|
Dey G, Meyer T. Phylogenetic Profiling for Probing the Modular Architecture of the Human Genome. Cell Syst 2015; 1:106-15. [PMID: 27135799 DOI: 10.1016/j.cels.2015.08.006] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2015] [Revised: 08/03/2015] [Accepted: 08/10/2015] [Indexed: 12/22/2022]
Abstract
Information about functional connections between genes can be derived from patterns of coupled loss of their homologs across multiple species. This comparative approach, termed phylogenetic profiling, has been successfully used to infer genetic interactions in bacteria and eukaryotes. Rapid progress in sequencing eukaryotic species has enabled the recent phylogenetic profiling of the human genome, resulting in systematic functional predictions for uncharacterized human genes. Importantly, groups of co-evolving genes reveal widespread modularity in the underlying genetic network, facilitating experimental analyses in human cells as well as comparative studies of conserved functional modules across species. This strategy is particularly successful in identifying novel metabolic proteins and components of multi-protein complexes. The targeted sequencing of additional key eukaryotes and the incorporation of improved methods to generate and compare phylogenetic profiles will further boost the predictive power and utility of this evolutionary approach to the functional analysis of gene interaction networks.
Collapse
Affiliation(s)
- Gautam Dey
- Chemical and Systems Biology, Stanford University, Stanford CA 94305, USA.
| | - Tobias Meyer
- Chemical and Systems Biology, Stanford University, Stanford CA 94305, USA.
| |
Collapse
|
37
|
Lee T, Kim H, Lee I. Network-assisted crop systems genetics: network inference and integrative analysis. CURRENT OPINION IN PLANT BIOLOGY 2015; 24:61-70. [PMID: 25698380 DOI: 10.1016/j.pbi.2015.02.001] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/08/2014] [Revised: 01/15/2015] [Accepted: 02/02/2015] [Indexed: 05/24/2023]
Abstract
Although next-generation sequencing (NGS) technology has enabled the decoding of many crop species genomes, most of the underlying genetic components for economically important crop traits remain to be determined. Network approaches have proven useful for the study of the reference plant, Arabidopsis thaliana, and the success of network-based crop genetics will also require the availability of a genome-scale functional networks for crop species. In this review, we discuss how to construct functional networks and elucidate the holistic view of a crop system. The crop gene network then can be used for gene prioritization and the analysis of resequencing-based genome-wide association study (GWAS) data, the amount of which will rapidly grow in the field of crop science in the coming years.
Collapse
Affiliation(s)
- Tak Lee
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, Republic of Korea
| | - Hyojin Kim
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, Republic of Korea
| | - Insuk Lee
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, Republic of Korea.
| |
Collapse
|
38
|
Škunca N, Dessimoz C. Phylogenetic profiling: how much input data is enough? PLoS One 2015; 10:e0114701. [PMID: 25679783 PMCID: PMC4332489 DOI: 10.1371/journal.pone.0114701] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2014] [Accepted: 11/10/2014] [Indexed: 12/04/2022] Open
Abstract
Phylogenetic profiling is a well-established approach for predicting gene function based on patterns of gene presence and absence across species. Much of the recent developments have focused on methodological improvements, but relatively little is known about the effect of input data size on the quality of predictions. In this work, we ask: how many genomes and functional annotations need to be considered for phylogenetic profiling to be effective? Phylogenetic profiling generally benefits from an increased amount of input data. However, by decomposing this improvement in predictive accuracy in terms of the contribution of additional genomes and of additional annotations, we observed diminishing returns in adding more than ∼100 genomes, whereas increasing the number of annotations remained strongly beneficial throughout. We also observed that maximising phylogenetic diversity within a clade of interest improves predictive accuracy, but the effect is small compared to changes in the number of genomes under comparison. Finally, we show that these findings are supported in light of the Open World Assumption, which posits that functional annotation databases are inherently incomplete. All the tools and data used in this work are available for reuse from http://lab.dessimoz.org/14_phylprof. Scripts used to analyse the data are available on request from the authors.
Collapse
Affiliation(s)
- Nives Škunca
- ETH Zürich, Department of Computer Science, Universitätstr. 19, 8092 Zürich, Switzerland
- Swiss Institute of Bioinformatics, Universitätstr. 6, 8092 Zürich, Switzerland
- University College London, Gower St, London WC1E 6BT, UK
- * E-mail: (NS), (CD)
| | - Christophe Dessimoz
- Swiss Institute of Bioinformatics, Universitätstr. 6, 8092 Zürich, Switzerland
- University College London, Gower St, London WC1E 6BT, UK
- * E-mail: (NS), (CD)
| |
Collapse
|
39
|
Dey G, Jaimovich A, Collins SR, Seki A, Meyer T. Systematic Discovery of Human Gene Function and Principles of Modular Organization through Phylogenetic Profiling. Cell Rep 2015; 10:993-1006. [PMID: 25683721 DOI: 10.1016/j.celrep.2015.01.025] [Citation(s) in RCA: 48] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2014] [Revised: 12/17/2014] [Accepted: 01/09/2015] [Indexed: 01/17/2023] Open
Abstract
Functional links between genes can be predicted using phylogenetic profiling, by correlating the appearance and loss of homologs in subsets of species. However, effective genome-wide phylogenetic profiling has been hindered by the large fraction of human genes related to each other through historical duplication events. Here, we overcame this challenge by automatically profiling over 30,000 groups of homologous human genes (orthogroups) representing the entire protein-coding genome across 177 eukaryotic species (hOP profiles). By generating a full pairwise orthogroup phylogenetic co-occurrence matrix, we derive unbiased genome-wide predictions of functional modules (hOP modules). Our approach predicts functions for hundreds of poorly characterized genes. The results suggest evolutionary constraints that lead components of protein complexes and metabolic pathways to co-evolve while genes in signaling and transcriptional networks do not. As a proof of principle, we validated two subsets of candidates experimentally for their predicted link to the actin-nucleating WASH complex and cilia/basal body function.
Collapse
Affiliation(s)
- Gautam Dey
- Department of Chemical and Systems Biology, Stanford University, Stanford, CA 94305, USA
| | - Ariel Jaimovich
- Department of Chemical and Systems Biology, Stanford University, Stanford, CA 94305, USA
| | - Sean R Collins
- Department of Chemical and Systems Biology, Stanford University, Stanford, CA 94305, USA
| | - Akiko Seki
- Department of Chemical and Systems Biology, Stanford University, Stanford, CA 94305, USA
| | - Tobias Meyer
- Department of Chemical and Systems Biology, Stanford University, Stanford, CA 94305, USA.
| |
Collapse
|
40
|
Haft DH. Using comparative genomics to drive new discoveries in microbiology. Curr Opin Microbiol 2015; 23:189-96. [PMID: 25617609 DOI: 10.1016/j.mib.2014.11.017] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2014] [Revised: 11/19/2014] [Accepted: 11/20/2014] [Indexed: 01/17/2023]
Abstract
Bioinformatics looks to many microbiologists like a service industry. In this view, annotation starts with what is known from experiments in the lab, makes reasonable inferences of which genes match other genes in function, builds databases to make all that we know accessible, but creates nothing truly new. Experiments lead, then biocuration and computational biology follow. But the astounding success of genome sequencing is changing the annotation paradigm. Every genome sequenced is an intercepted coded message from the microbial world, and as all cryptographers know, it is easier to decode a thousand messages than a single message. Some biology is best discovered not by phenomenology, but by decoding genome content, forming hypotheses, and doing the first few rounds of validation computationally. Through such reasoning, a role and function may be assigned to a protein with no sequence similarity to any protein yet studied. Experimentation can follow after the discovery to cement and to extend the findings. Unfortunately, this approach remains so unfamiliar to most bench scientists that lab work and comparative genomics typically segregate to different teams working on unconnected projects. This review will discuss several themes in comparative genomics as a discovery method, including highly derived data, use of patterns of design to reason by analogy, and in silico testing of computationally generated hypotheses.
Collapse
|
41
|
Zahiri J, Mohammad-Noori M, Ebrahimpour R, Saadat S, Bozorgmehr JH, Goldberg T, Masoudi-Nejad A. LocFuse: human protein-protein interaction prediction via classifier fusion using protein localization information. Genomics 2014; 104:496-503. [PMID: 25458812 DOI: 10.1016/j.ygeno.2014.10.006] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2014] [Revised: 09/28/2014] [Accepted: 10/02/2014] [Indexed: 12/20/2022]
Abstract
UNLABELLED Protein-protein interaction (PPI) detection is one of the central goals of functional genomics and systems biology. Knowledge about the nature of PPIs can help fill the widening gap between sequence information and functional annotations. Although experimental methods have produced valuable PPI data, they also suffer from significant limitations. Computational PPI prediction methods have attracted tremendous attentions. Despite considerable efforts, PPI prediction is still in its infancy in complex multicellular organisms such as humans. Here, we propose a novel ensemble learning method, LocFuse, which is useful in human PPI prediction. This method uses eight different genomic and proteomic features along with four types of different classifiers. The prediction performance of this classifier selection method was found to be considerably better than methods employed hitherto. This confirms the complex nature of the PPI prediction problem and also the necessity of using biological information for classifier fusion. The LocFuse is available at: http://lbb.ut.ac.ir/Download/LBBsoft/LocFuse. BIOLOGICAL SIGNIFICANCE The results revealed that if we divide proteome space according to the cellular localization of proteins, then the utility of some classifiers in PPI prediction can be improved. Therefore, to predict the interaction for any given protein pair, we can select the most accurate classifier with regard to the cellular localization information. Based on the results, we can say that the importance of different features for PPI prediction varies between differently localized proteins; however in general, our novel features, which were extracted from position-specific scoring matrices (PSSMs), are the most important ones and the Random Forest (RF) classifier performs best in most cases. LocFuse was developed with a user-friendly graphic interface and it is freely available for Linux, Mac OSX and MS Windows operating systems.
Collapse
Affiliation(s)
- Javad Zahiri
- Laboratory of Systems Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran; Department of Biophysics, Faculty of Biological Sciences, Tarbiat Modares University, Tehran, Iran
| | - Morteza Mohammad-Noori
- School of Mathematics, Statistics and Computer Science, College of Science, University of Tehran, Tehran, Iran
| | - Reza Ebrahimpour
- Brain and Intelligent Systems Research Lab, Department of Electrical and Computer Engineering, Shahid Rajaee Teacher Training University, Tehran, Iran
| | - Samaneh Saadat
- Laboratory of Systems Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
| | - Joseph H Bozorgmehr
- Laboratory of Systems Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
| | - Tatyana Goldberg
- Department for Bioinformatics and Computational Biology, Faculty of Informatics, TUM, Garching 85748, Germany
| | - Ali Masoudi-Nejad
- Laboratory of Systems Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran.
| |
Collapse
|
42
|
Li Y, Calvo SE, Gutman R, Liu JS, Mootha VK. Expansion of biological pathways based on evolutionary inference. Cell 2014; 158:213-25. [PMID: 24995987 DOI: 10.1016/j.cell.2014.05.034] [Citation(s) in RCA: 81] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2013] [Revised: 02/06/2014] [Accepted: 05/12/2014] [Indexed: 01/24/2023]
Abstract
The availability of diverse genomes makes it possible to predict gene function based on shared evolutionary history. This approach can be challenging, however, for pathways whose components do not exhibit a shared history but rather consist of distinct "evolutionary modules." We introduce a computational algorithm, clustering by inferred models of evolution (CLIME), which inputs a eukaryotic species tree, homology matrix, and pathway (gene set) of interest. CLIME partitions the gene set into disjoint evolutionary modules, simultaneously learning the number of modules and a tree-based evolutionary history that defines each module. CLIME then expands each module by scanning the genome for new components that likely arose under the inferred evolutionary model. Application of CLIME to ∼1,000 annotated human pathways and to the proteomes of yeast, red algae, and malaria reveals unanticipated evolutionary modularity and coevolving components. CLIME is freely available and should become increasingly powerful with the growing wealth of eukaryotic genomes.
Collapse
Affiliation(s)
- Yang Li
- Howard Hughes Medical Institute and Department of Molecular Biology, Massachusetts General Hospital, Boston, MA 02114, USA; Department of Statistics, Harvard University, Cambridge, MA 02138, USA
| | - Sarah E Calvo
- Howard Hughes Medical Institute and Department of Molecular Biology, Massachusetts General Hospital, Boston, MA 02114, USA; Broad Institute, Cambridge, MA 02141, USA
| | - Roee Gutman
- Department of Biostatistics, Brown University, Providence, RI 02912, USA
| | - Jun S Liu
- Department of Statistics, Harvard University, Cambridge, MA 02138, USA.
| | - Vamsi K Mootha
- Howard Hughes Medical Institute and Department of Molecular Biology, Massachusetts General Hospital, Boston, MA 02114, USA; Broad Institute, Cambridge, MA 02141, USA; Department of Systems Biology, Harvard Medical School, Boston, MA 02115, USA.
| |
Collapse
|
43
|
Dutilh BE, Cassman N, McNair K, Sanchez SE, Silva GGZ, Boling L, Barr JJ, Speth DR, Seguritan V, Aziz RK, Felts B, Dinsdale EA, Mokili JL, Edwards RA. A highly abundant bacteriophage discovered in the unknown sequences of human faecal metagenomes. Nat Commun 2014; 5:4498. [PMID: 25058116 PMCID: PMC4111155 DOI: 10.1038/ncomms5498] [Citation(s) in RCA: 491] [Impact Index Per Article: 49.1] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2014] [Accepted: 06/25/2014] [Indexed: 01/20/2023] Open
Abstract
Metagenomics, or sequencing of the genetic material from a complete microbial community, is a
promising tool to discover novel microbes and viruses. Viral metagenomes typically contain many
unknown sequences. Here we describe the discovery of a previously unidentified bacteriophage present
in the majority of published human faecal metagenomes, which we refer to as crAssphage. Its
~97 kbp genome is six times more abundant in publicly available metagenomes than all other
known phages together; it comprises up to 90% and 22% of all reads in virus-like particle
(VLP)-derived metagenomes and total community metagenomes, respectively; and it totals 1.68% of all
human faecal metagenomic sequencing reads in the public databases. The majority of
crAssphage-encoded proteins match no known sequences in the database, which is why it was not
detected before. Using a new co-occurrence profiling approach, we predict a Bacteroides host
for this phage, consistent with Bacteroides-related protein homologues and a unique
carbohydrate-binding domain encoded in the phage genome. Metagenomic studies of microbial communities often report DNA sequences from
unidentified viruses. Here, Dutilh et al. analyse metagenomic data to reveal the complete
genome of an abundant, ubiquitous virus from human faeces, and predict that the virus infects
bacteria of the Bacteroides group.
Collapse
Affiliation(s)
- Bas E Dutilh
- 1] Centre for Molecular and Biomolecular Informatics, Radboud Institute for Molecular Life Sciences, Radboud university medical centre, Geert Grooteplein 28, 6525 GA Nijmegen, The Netherlands [2] Department of Computer Science, San Diego State University, 5500 Campanile Drive, San Diego, California 92182, USA [3] Department of Biology, San Diego State University, 5500 Campanile Drive, San Diego, California 92182, USA [4] Department of Marine Biology, Institute of Biology, Federal University of Rio de Janeiro, Av. Carlos Chagas Fo. 373, Prédio Anexo ao Bloco A do Centro de Ciências da Saúde, Ilha do Fundão, CEP 21941-902 Rio de Janeiro, Brazil
| | - Noriko Cassman
- 1] Department of Biology, San Diego State University, 5500 Campanile Drive, San Diego, California 92182, USA [2]
| | - Katelyn McNair
- Department of Computer Science, San Diego State University, 5500 Campanile Drive, San Diego, California 92182, USA
| | - Savannah E Sanchez
- Department of Biology, San Diego State University, 5500 Campanile Drive, San Diego, California 92182, USA
| | - Genivaldo G Z Silva
- Computational Science Research Center, San Diego State University, 5500 Campanile Drive, San Diego, California 92182, USA
| | - Lance Boling
- Department of Biology, San Diego State University, 5500 Campanile Drive, San Diego, California 92182, USA
| | - Jeremy J Barr
- Department of Biology, San Diego State University, 5500 Campanile Drive, San Diego, California 92182, USA
| | - Daan R Speth
- Department of Microbiology, Institute for Water and Wetland Research, Radboud University, Heyendaalseweg 135, 6525 AJ Nijmegen, The Netherlands
| | - Victor Seguritan
- Department of Biology, San Diego State University, 5500 Campanile Drive, San Diego, California 92182, USA
| | - Ramy K Aziz
- 1] Department of Computer Science, San Diego State University, 5500 Campanile Drive, San Diego, California 92182, USA [2] Department of Microbiology and Immunology, Faculty of Pharmacy, Cairo University, Kasr El-Aini Street, Cairo 11562, Egypt
| | - Ben Felts
- Department of Mathematics, San Diego State University, 5500 Campanile Drive, San Diego, California 92182, USA
| | - Elizabeth A Dinsdale
- 1] Department of Biology, San Diego State University, 5500 Campanile Drive, San Diego, California 92182, USA [2] Computational Science Research Center, San Diego State University, 5500 Campanile Drive, San Diego, California 92182, USA
| | - John L Mokili
- Department of Biology, San Diego State University, 5500 Campanile Drive, San Diego, California 92182, USA
| | - Robert A Edwards
- 1] Department of Computer Science, San Diego State University, 5500 Campanile Drive, San Diego, California 92182, USA [2] Department of Marine Biology, Institute of Biology, Federal University of Rio de Janeiro, Av. Carlos Chagas Fo. 373, Prédio Anexo ao Bloco A do Centro de Ciências da Saúde, Ilha do Fundão, CEP 21941-902 Rio de Janeiro, Brazil [3] Computational Science Research Center, San Diego State University, 5500 Campanile Drive, San Diego, California 92182, USA [4] Division of Mathematics and Computer Science, Argonne National Laboratory, 9700 S Cass Ave B109, Argonne, Illinois 60439, USA
| |
Collapse
|
44
|
Reynolds KA. Finding a common path: predicting gene function using inferred evolutionary trees. Dev Cell 2014; 30:4-5. [PMID: 25026031 DOI: 10.1016/j.devcel.2014.06.029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Abstract
Reporting in Cell, Li and colleagues (2014) describe an innovative method to functionally classify genes using evolutionary information. This approach demonstrates broad utility for eukaryotic gene annotation and suggests an intriguing new decomposition of pathways and complexes into evolutionarily conserved modules.
Collapse
Affiliation(s)
- Kimberly A Reynolds
- Green Center for Systems Biology, University of Texas Southwestern Medical Center, 6001 Forest Park Road, Dallas, TX 75390-8597, USA.
| |
Collapse
|
45
|
Different subunits belonging to the same protein complex often exhibit discordant expression levels and evolutionary properties. Curr Opin Struct Biol 2014; 26:113-20. [DOI: 10.1016/j.sbi.2014.06.001] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2014] [Revised: 04/27/2014] [Accepted: 06/04/2014] [Indexed: 11/21/2022]
|
46
|
Lua RC, Marciano DC, Katsonis P, Adikesavan AK, Wilkins AD, Lichtarge O. Prediction and redesign of protein-protein interactions. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2014; 116:194-202. [PMID: 24878423 DOI: 10.1016/j.pbiomolbio.2014.05.004] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/25/2014] [Revised: 05/02/2014] [Accepted: 05/17/2014] [Indexed: 12/14/2022]
Abstract
Understanding the molecular basis of protein function remains a central goal of biology, with the hope to elucidate the role of human genes in health and in disease, and to rationally design therapies through targeted molecular perturbations. We review here some of the computational techniques and resources available for characterizing a critical aspect of protein function - those mediated by protein-protein interactions (PPI). We describe several applications and recent successes of the Evolutionary Trace (ET) in identifying molecular events and shapes that underlie protein function and specificity in both eukaryotes and prokaryotes. ET is a part of analytical approaches based on the successes and failures of evolution that enable the rational control of PPI.
Collapse
Affiliation(s)
- Rhonald C Lua
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - David C Marciano
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Panagiotis Katsonis
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Anbu K Adikesavan
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Angela D Wilkins
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA; Computational and Integrative Biomedical Research Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Olivier Lichtarge
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA; Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, TX 77030, USA; Computational and Integrative Biomedical Research Center, Baylor College of Medicine, Houston, TX 77030, USA.
| |
Collapse
|
47
|
Frolov AA, Husek D, Polyakov PY, Snasel V. New BFA method based on attractor neural network and likelihood maximization. Neurocomputing 2014. [DOI: 10.1016/j.neucom.2013.07.047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
48
|
Konietzny SGA, Pope PB, Weimann A, McHardy AC. Inference of phenotype-defining functional modules of protein families for microbial plant biomass degraders. BIOTECHNOLOGY FOR BIOFUELS 2014; 7:124. [PMID: 25342967 PMCID: PMC4189754 DOI: 10.1186/s13068-014-0124-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/19/2014] [Accepted: 08/05/2014] [Indexed: 05/14/2023]
Abstract
BACKGROUND Efficient industrial processes for converting plant lignocellulosic materials into biofuels are a key to global efforts to come up with alternative energy sources to fossil fuels. Novel cellulolytic enzymes have been discovered in microbial genomes and metagenomes of microbial communities. However, the identification of relevant genes without known homologs, and the elucidation of the lignocellulolytic pathways and protein complexes for different microorganisms remain challenging. RESULTS We describe a new computational method for the targeted discovery of functional modules of plant biomass-degrading protein families, based on their co-occurrence patterns across genomes and metagenome datasets, and the strength of association of these modules with the genomes of known degraders. From approximately 6.4 million family annotations for 2,884 microbial genomes, and 332 taxonomic bins from 18 metagenomes, we identified 5 functional modules that are distinctive for plant biomass degraders, which we term "plant biomass degradation modules" (PDMs). These modules incorporate protein families involved in the degradation of cellulose, hemicelluloses, and pectins, structural components of the cellulosome, and additional families with potential functions in plant biomass degradation. The PDMs were linked to 81 gene clusters in genomes of known lignocellulose degraders, including previously described clusters of lignocellulolytic genes. On average, 70% of the families of each PDM were found to map to gene clusters in known degraders, which served as an additional confirmation of their functional relationships. The presence of a PDM in a genome or taxonomic metagenome bin furthermore allowed us to accurately predict the ability of any particular organism to degrade plant biomass. For 15 draft genomes of a cow rumen metagenome, we used cross-referencing to confirmed cellulolytic enzymes to validate that the PDMs identified plant biomass degraders within a complex microbial community. CONCLUSIONS Functional modules of protein families that are involved in different aspects of plant cell wall degradation can be inferred from co-occurrence patterns across (meta-)genomes with a probabilistic topic model. PDMs represent a new resource of protein families and candidate genes implicated in microbial plant biomass degradation. They can also be used to predict the plant biomass degradation ability for a genome or taxonomic bin. The method is also suitable for characterizing other microbial phenotypes.
Collapse
Affiliation(s)
- Sebastian GA Konietzny
- />Max-Planck Research Group for Computational Genomics and Epidemiology, Max-Planck Institute for Informatics, University Campus E1 4, Saarbrücken, 66123 Germany
- />Department of Algorithmic Bioinformatics, Heinrich Heine University Düsseldorf, Düsseldorf, 40225 Germany
| | - Phillip B Pope
- />Department of Chemistry, Biotechnology and Food Science, Norwegian University of Life Sciences, Post Office Box 5003, 1432 Ås, Norway
| | - Aaron Weimann
- />Department of Algorithmic Bioinformatics, Heinrich Heine University Düsseldorf, Düsseldorf, 40225 Germany
| | - Alice C McHardy
- />Max-Planck Research Group for Computational Genomics and Epidemiology, Max-Planck Institute for Informatics, University Campus E1 4, Saarbrücken, 66123 Germany
- />Department of Algorithmic Bioinformatics, Heinrich Heine University Düsseldorf, Düsseldorf, 40225 Germany
| |
Collapse
|
49
|
Abstract
The genome of budding yeast (Saccharomyces cerevisiae) contains approximately 5800 protein-encoding genes, the majority of which are associated with some known biological function. Yet the extent of amino acid sequence conservation of these genes over all phyla has only been partially examined. Here we provide a more comprehensive overview and visualization of the conservation of yeast genes and a means for browsing and exploring the data in detail, down to the individual yeast gene, at http://yeast-phylogroups.princeton.edu. We used data from the OrthoMCL database, which has defined orthologs from approximately 150 completely sequenced genomes, including diverse representatives of the archeal, bacterial, and eukaryotic domains. By clustering genes based on similar patterns of conservation, we organized and visualized all the protein-encoding genes in yeast as a single heat map. Most genes fall into one of eight major clusters, called "phylogroups." Gene ontology analysis of the phylogroups revealed that they were associated with specific, distinct trends in gene function, generalizations likely to be of interest to a wide range of biologists.
Collapse
|
50
|
Dutilh BE, Backus L, Edwards RA, Wels M, Bayjanov JR, van Hijum SAFT. Explaining microbial phenotypes on a genomic scale: GWAS for microbes. Brief Funct Genomics 2013; 12:366-80. [PMID: 23625995 PMCID: PMC3743258 DOI: 10.1093/bfgp/elt008] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
There is an increasing availability of complete or draft genome sequences for microbial organisms. These data form a potentially valuable resource for genotype-phenotype association and gene function prediction, provided that phenotypes are consistently annotated for all the sequenced strains. In this review, we address the requirements for successful gene-trait matching. We outline a basic protocol for microbial functional genomics, including genome assembly, annotation of genotypes (including single nucleotide polymorphisms, orthologous groups and prophages), data pre-processing, genotype-phenotype association, visualization and interpretation of results. The methodologies for association described herein can be applied to other data types, opening up possibilities to analyze transcriptome-phenotype associations, and correlate microbial population structure or activity, as measured by metagenomics, to environmental parameters.
Collapse
Affiliation(s)
- Bas E Dutilh
- CMBI, NCMLS, Radboud University Medical Centre. Geert Grooteplein 28, 6525 GA Nijmegen, The Netherlands.
| | | | | | | | | | | |
Collapse
|