26
|
Guala D, Ogris C, Müller N, Sonnhammer ELL. Genome-wide functional association networks: background, data & state-of-the-art resources. Brief Bioinform 2019; 21:1224-1237. [PMID: 31281921 PMCID: PMC7373183 DOI: 10.1093/bib/bbz064] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2019] [Revised: 04/29/2019] [Accepted: 05/04/2019] [Indexed: 02/06/2023] Open
Abstract
The vast amount of experimental data from recent advances in the field of high-throughput biology begs for integration into more complex data structures such as genome-wide functional association networks. Such networks have been used for elucidation of the interplay of intra-cellular molecules to make advances ranging from the basic science understanding of evolutionary processes to the more translational field of precision medicine. The allure of the field has resulted in rapid growth of the number of available network resources, each with unique attributes exploitable to answer different biological questions. Unfortunately, the high volume of network resources makes it impossible for the intended user to select an appropriate tool for their particular research question. The aim of this paper is to provide an overview of the underlying data and representative network resources as well as to mention methods of integration, allowing a customized approach to resource selection. Additionally, this report will provide a primer for researchers venturing into the field of network integration.
Collapse
|
27
|
Morgan D, Tjärnberg A, Nordling TEM, Sonnhammer ELL. A generalized framework for controlling FDR in gene regulatory network inference. Bioinformatics 2018; 35:1026-1032. [DOI: 10.1093/bioinformatics/bty764] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2018] [Revised: 08/23/2018] [Accepted: 08/28/2018] [Indexed: 12/23/2022] Open
|
28
|
Guala D, Bernhem K, Blal HA, Jans D, Lundberg E, Brismar H, Sonnhammer ELL. Experimental validation of predicted cancer genes using FRET. Methods Appl Fluoresc 2018; 6:035007. [PMID: 29570091 DOI: 10.1088/2050-6120/aab932] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Huge amounts of data are generated in genome wide experiments, designed to investigate diseases with complex genetic causes. Follow up of all potential leads produced by such experiments is currently cost prohibitive and time consuming. Gene prioritization tools alleviate these constraints by directing further experimental efforts towards the most promising candidate targets. Recently a gene prioritization tool called MaxLink was shown to outperform other widely used state-of-the-art prioritization tools in a large scale in silico benchmark. An experimental validation of predictions made by MaxLink has however been lacking. In this study we used Fluorescence Resonance Energy Transfer, an established experimental technique for detection of protein-protein interactions, to validate potential cancer genes predicted by MaxLink. Our results provide confidence in the use of MaxLink for selection of new targets in the battle with polygenic diseases.
Collapse
|
29
|
Carreras-Puigvert J, Zitnik M, Jemth AS, Carter M, Unterlass JE, Hallström B, Loseva O, Karem Z, Calderón-Montaño JM, Lindskog C, Edqvist PH, Matuszewski DJ, Ait Blal H, Berntsson RPA, Häggblad M, Martens U, Studham M, Lundgren B, Wählby C, Sonnhammer ELL, Lundberg E, Stenmark P, Zupan B, Helleday T. A comprehensive structural, biochemical and biological profiling of the human NUDIX hydrolase family. Nat Commun 2017; 8:1541. [PMID: 29142246 PMCID: PMC5688067 DOI: 10.1038/s41467-017-01642-w] [Citation(s) in RCA: 93] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2016] [Accepted: 10/06/2017] [Indexed: 01/04/2023] Open
Abstract
The NUDIX enzymes are involved in cellular metabolism and homeostasis, as well as mRNA processing. Although highly conserved throughout all organisms, their biological roles and biochemical redundancies remain largely unclear. To address this, we globally resolve their individual properties and inter-relationships. We purify 18 of the human NUDIX proteins and screen 52 substrates, providing a substrate redundancy map. Using crystal structures, we generate sequence alignment analyses revealing four major structural classes. To a certain extent, their substrate preference redundancies correlate with structural classes, thus linking structure and activity relationships. To elucidate interdependence among the NUDIX hydrolases, we pairwise deplete them generating an epistatic interaction map, evaluate cell cycle perturbations upon knockdown in normal and cancer cells, and analyse their protein and mRNA expression in normal and cancer tissues. Using a novel FUSION algorithm, we integrate all data creating a comprehensive NUDIX enzyme profile map, which will prove fundamental to understanding their biological functionality.
Collapse
|
30
|
Guala D, Sonnhammer ELL. A large-scale benchmark of gene prioritization methods. Sci Rep 2017; 7:46598. [PMID: 28429739 PMCID: PMC5399445 DOI: 10.1038/srep46598] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2016] [Accepted: 03/22/2017] [Indexed: 11/16/2022] Open
Abstract
In order to maximize the use of results from high-throughput experimental studies, e.g. GWAS, for identification and diagnostics of new disease-associated genes, it is important to have properly analyzed and benchmarked gene prioritization tools. While prospective benchmarks are underpowered to provide statistically significant results in their attempt to differentiate the performance of gene prioritization tools, a strategy for retrospective benchmarking has been missing, and new tools usually only provide internal validations. The Gene Ontology(GO) contains genes clustered around annotation terms. This intrinsic property of GO can be utilized in construction of robust benchmarks, objective to the problem domain. We demonstrate how this can be achieved for network-based gene prioritization tools, utilizing the FunCoup network. We use cross-validation and a set of appropriate performance measures to compare state-of-the-art gene prioritization algorithms: three based on network diffusion, NetRank and two implementations of Random Walk with Restart, and MaxLink that utilizes network neighborhood. Our benchmark suite provides a systematic and objective way to compare the multitude of available and future gene prioritization tools, enabling researchers to select the best gene prioritization tool for the task at hand, and helping to guide the development of more accurate methods.
Collapse
|
31
|
Kaduk M, Riegler C, Lemp O, Sonnhammer ELL. HieranoiDB: a database of orthologs inferred by Hieranoid. Nucleic Acids Res 2017; 45:D687-D690. [PMID: 27742821 PMCID: PMC5210627 DOI: 10.1093/nar/gkw923] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2016] [Revised: 09/30/2016] [Accepted: 10/05/2016] [Indexed: 02/04/2023] Open
Abstract
HieranoiDB (http://hieranoiDB.sbc.su.se) is a freely available on-line database for hierarchical groups of orthologs inferred by the Hieranoid algorithm. It infers orthologs at each node in a species guide tree with the InParanoid algorithm as it progresses from the leaves to the root. Here we present a database HieranoiDB with a web interface that makes it easy to search and visualize the output of Hieranoid, and to download it in various formats. Searching can be performed using protein description, identifier or sequence. In this first version, orthologs are available for the 66 Quest for Orthologs reference proteomes. The ortholog trees are shown graphically and interactively with marked speciation and duplication nodes that show the inferred evolutionary scenario, and allow for correct extraction of predicted orthologs from the Hieranoid trees.
Collapse
|
32
|
Tjärnberg A, Morgan DC, Studham M, Nordling TEM, Sonnhammer ELL. GeneSPIDER – gene regulatory network inference benchmarking with controlled network and data properties. MOLECULAR BIOSYSTEMS 2017; 13:1304-1312. [DOI: 10.1039/c7mb00058h] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
A key question in network inference, that has not been properly answered, is what accuracy can be expected for a given biological dataset and inference method.
Collapse
|
33
|
Ogris C, Guala D, Helleday T, Sonnhammer ELL. A novel method for crosstalk analysis of biological networks: improving accuracy of pathway annotation. Nucleic Acids Res 2016; 45:e8. [PMID: 27664219 PMCID: PMC5314790 DOI: 10.1093/nar/gkw849] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2016] [Revised: 08/17/2016] [Accepted: 08/23/2016] [Indexed: 12/13/2022] Open
Abstract
Analyzing gene expression patterns is a mainstay to gain functional insights of biological systems. A plethora of tools exist to identify significant enrichment of pathways for a set of differentially expressed genes. Most tools analyze gene overlap between gene sets and are therefore severely hampered by the current state of pathway annotation, yet at the same time they run a high risk of false assignments. A way to improve both true positive and false positive rates (FPRs) is to use a functional association network and instead look for enrichment of network connections between gene sets. We present a new network crosstalk analysis method BinoX that determines the statistical significance of network link enrichment or depletion between gene sets, using the binomial distribution. This is a much more appropriate statistical model than previous methods have employed, and as a result BinoX yields substantially better true positive and FPRs than was possible before. A number of benchmarks were performed to assess the accuracy of BinoX and competing methods. We demonstrate examples of how BinoX finds many biologically meaningful pathway annotations for gene sets from cancer and other diseases, which are not found by other methods. BinoX is available at http://sonnhammer.org/BinoX.
Collapse
|
34
|
Saripella GV, Sonnhammer ELL, Forslund K. Benchmarking the next generation of homology inference tools. Bioinformatics 2016; 32:2636-41. [PMID: 27256311 PMCID: PMC5013910 DOI: 10.1093/bioinformatics/btw305] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2015] [Accepted: 05/05/2016] [Indexed: 12/21/2022] Open
Abstract
Motivation: Over the last decades, vast numbers of sequences were deposited in public databases. Bioinformatics tools allow homology and consequently functional inference for these sequences. New profile-based homology search tools have been introduced, allowing reliable detection of remote homologs, but have not been systematically benchmarked. To provide such a comparison, which can guide bioinformatics workflows, we extend and apply our previously developed benchmark approach to evaluate the ‘next generation’ of profile-based approaches, including CS-BLAST, HHSEARCH and PHMMER, in comparison with the non-profile based search tools NCBI-BLAST, USEARCH, UBLAST and FASTA. Method: We generated challenging benchmark datasets based on protein domain architectures within either the PFAM + Clan, SCOP/Superfamily or CATH/Gene3D domain definition schemes. From each dataset, homologous and non-homologous protein pairs were aligned using each tool, and standard performance metrics calculated. We further measured congruence of domain architecture assignments in the three domain databases. Results: CSBLAST and PHMMER had overall highest accuracy. FASTA, UBLAST and USEARCH showed large trade-offs of accuracy for speed optimization. Conclusion: Profile methods are superior at inferring remote homologs but the difference in accuracy between methods is relatively small. PHMMER and CSBLAST stand out with the highest accuracy, yet still at a reasonable computational cost. Additionally, we show that less than 0.1% of Swiss-Prot protein pairs considered homologous by one database are considered non-homologous by another, implying that these classifications represent equivalent underlying biological phenomena, differing mostly in coverage and granularity. Availability and Implementation: Benchmark datasets and all scripts are placed at (http://sonnhammer.org/download/Homology_benchmark). Contact:forslund@embl.de Supplementary information: Supplementary data are available at Bioinformatics online.
Collapse
|
35
|
Ogris C, Helleday T, Sonnhammer ELL. PathwAX: a web server for network crosstalk based pathway annotation. Nucleic Acids Res 2016; 44:W105-9. [PMID: 27151197 PMCID: PMC4987909 DOI: 10.1093/nar/gkw356] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2016] [Accepted: 04/19/2016] [Indexed: 12/22/2022] Open
Abstract
Pathway annotation of gene lists is often used to functionally analyse biomolecular data such as gene expression in order to establish which processes are activated in a given experiment. Databases such as KEGG or GO represent collections of how genes are known to be organized in pathways, and the challenge is to compare a given gene list with the known pathways such that all true relations are identified. Most tools apply statistical measures to the gene overlap between the gene list and pathway. It is however problematic to avoid false negatives and false positives when only using the gene overlap. The pathwAX web server (http://pathwAX.sbc.su.se/) applies a different approach which is based on network crosstalk. It uses the comprehensive network FunCoup to analyse network crosstalk between a query gene list and KEGG pathways. PathwAX runs the BinoX algorithm, which employs Monte-Carlo sampling of randomized networks and estimates a binomial distribution, for estimating the statistical significance of the crosstalk. This results in substantially higher accuracy than gene overlap methods. The system was optimized for speed and allows interactive web usage. We illustrate the usage and output of pathwAX.
Collapse
|
36
|
Tjärnberg A, Nordling TEM, Studham M, Nelander S, Sonnhammer ELL. Avoiding pitfalls in L1-regularised inference of gene networks. MOLECULAR BIOSYSTEMS 2015; 11:287-96. [DOI: 10.1039/c4mb00419a] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
Abstract
L1 regularisation methods fail to infer the correct network even when the data are so informative that all existing links can be proven to exist.
Collapse
|
37
|
Sonnhammer ELL, Östlund G. InParanoid 8: orthology analysis between 273 proteomes, mostly eukaryotic. Nucleic Acids Res 2014; 43:D234-9. [PMID: 25429972 PMCID: PMC4383983 DOI: 10.1093/nar/gku1203] [Citation(s) in RCA: 341] [Impact Index Per Article: 34.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
The InParanoid database (http://InParanoid.sbc.su.se) provides a user interface to orthologs inferred by the InParanoid algorithm. As there are now international efforts to curate and standardize complete proteomes, we have switched to using these resources rather than gathering and curating the proteomes ourselves. InParanoid release 8 is based on the 66 reference proteomes that the ‘Quest for Orthologs’ community has agreed on using, plus 207 additional proteomes from the UniProt complete proteomes—in total 273 species. These represent 246 eukaryotes, 20 bacteria and seven archaea. Compared to the previous release, this increases the number of species by 173% and the number of pairwise species comparisons by 650%. In turn, the number of ortholog groups has increased by 423%. We present the contents and usages of InParanoid 8, and a detailed analysis of how the proteome content has changed since the previous release.
Collapse
|
38
|
Sonnhammer ELL, Gabaldón T, Sousa da Silva AW, Martin M, Robinson-Rechavi M, Boeckmann B, Thomas PD, Dessimoz C. Big data and other challenges in the quest for orthologs. Bioinformatics 2014; 30:2993-8. [PMID: 25064571 PMCID: PMC4201156 DOI: 10.1093/bioinformatics/btu492] [Citation(s) in RCA: 98] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2014] [Revised: 06/25/2014] [Accepted: 07/16/2014] [Indexed: 01/29/2023] Open
Abstract
UNLABELLED Given the rapid increase of species with a sequenced genome, the need to identify orthologous genes between them has emerged as a central bioinformatics task. Many different methods exist for orthology detection, which makes it difficult to decide which one to choose for a particular application. Here, we review the latest developments and issues in the orthology field, and summarize the most recent results reported at the third 'Quest for Orthologs' meeting. We focus on community efforts such as the adoption of reference proteomes, standard file formats and benchmarking. Progress in these areas is good, and they are already beneficial to both orthology consumers and providers. However, a major current issue is that the massive increase in complete proteomes poses computational challenges to many of the ortholog database providers, as most orthology inference algorithms scale at least quadratically with the number of proteomes. The Quest for Orthologs consortium is an open community with a number of working groups that join efforts to enhance various aspects of orthology analysis, such as defining standard formats and datasets, documenting community resources and benchmarking. AVAILABILITY AND IMPLEMENTATION All such materials are available at http://questfororthologs.org.
Collapse
|
39
|
Studham ME, Tjärnberg A, Nordling TEM, Nelander S, Sonnhammer ELL. Functional association networks as priors for gene regulatory network inference. ACTA ACUST UNITED AC 2014; 30:i130-8. [PMID: 24931976 PMCID: PMC4058914 DOI: 10.1093/bioinformatics/btu285] [Citation(s) in RCA: 55] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Motivation: Gene regulatory network (GRN) inference reveals the influences genes have on one another in cellular regulatory systems. If the experimental data are inadequate for reliable inference of the network, informative priors have been shown to improve the accuracy of inferences. Results: This study explores the potential of undirected, confidence-weighted networks, such as those in functional association databases, as a prior source for GRN inference. Such networks often erroneously indicate symmetric interaction between genes and may contain mostly correlation-based interaction information. Despite these drawbacks, our testing on synthetic datasets indicates that even noisy priors reflect some causal information that can improve GRN inference accuracy. Our analysis on yeast data indicates that using the functional association databases FunCoup and STRING as priors can give a small improvement in GRN inference accuracy with biological data. Contact:matthew.studham@scilifelab.se Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
|
40
|
Alexeyenko A, Lindberg J, Pérez-Bercoff A, Sonnhammer ELL. Overview and comparison of ortholog databases. DRUG DISCOVERY TODAY. TECHNOLOGIES 2014; 3:137-43. [PMID: 24980400 DOI: 10.1016/j.ddtec.2006.06.002] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
Orthologs are an indispensable bridge to transfer biological knowledge between species, from protein annotations to sophisticated disease models. However, orthology assignment is not trivial. A large number of resources now exist, each with its own idiosyncrasies. The goal of this review is to compare their contents and clarify which database is most suited for a certain task.:
Collapse
|
41
|
Guala D, Sjölund E, Sonnhammer ELL. MaxLink: network-based prioritization of genes tightly linked to a disease seed set. Bioinformatics 2014; 30:2689-90. [DOI: 10.1093/bioinformatics/btu344] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023] Open
|
42
|
Finn RD, Bateman A, Clements J, Coggill P, Eberhardt RY, Eddy SR, Heger A, Hetherington K, Holm L, Mistry J, Sonnhammer ELL, Tate J, Punta M. Pfam: the protein families database. Nucleic Acids Res 2013; 42:D222-30. [PMID: 24288371 PMCID: PMC3965110 DOI: 10.1093/nar/gkt1223] [Citation(s) in RCA: 4230] [Impact Index Per Article: 384.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Abstract
Pfam, available via servers in the UK (http://pfam.sanger.ac.uk/) and the USA (http://pfam.janelia.org/), is a widely used database of protein families, containing 14 831 manually curated entries in the current release, version 27.0. Since the last update article 2 years ago, we have generated 1182 new families and maintained sequence coverage of the UniProt Knowledgebase (UniProtKB) at nearly 80%, despite a 50% increase in the size of the underlying sequence database. Since our 2012 article describing Pfam, we have also undertaken a comprehensive review of the features that are provided by Pfam over and above the basic family data. For each feature, we determined the relevance, computational burden, usage statistics and the functionality of the feature in a website context. As a consequence of this review, we have removed some features, enhanced others and developed new ones to meet the changing demands of computational biology. Here, we describe the changes to Pfam content. Notably, we now provide family alignments based on four different representative proteome sequence data sets and a new interactive DNA search interface. We also discuss the mapping between Pfam and known 3D structures.
Collapse
|
43
|
Schmitt T, Ogris C, Sonnhammer ELL. FunCoup 3.0: database of genome-wide functional coupling networks. Nucleic Acids Res 2013; 42:D380-8. [PMID: 24185702 PMCID: PMC3965084 DOI: 10.1093/nar/gkt984] [Citation(s) in RCA: 90] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
We present an update of the FunCoup database (http://FunCoup.sbc.su.se) of functional couplings, or functional associations, between genes and gene products. Identifying these functional couplings is an important step in the understanding of higher level mechanisms performed by complex cellular processes. FunCoup distinguishes between four classes of couplings: participation in the same signaling cascade, participation in the same metabolic process, co-membership in a protein complex and physical interaction. For each of these four classes, several types of experimental and statistical evidence are combined by Bayesian integration to predict genome-wide functional coupling networks. The FunCoup framework has been completely re-implemented to allow for more frequent future updates. It contains many improvements, such as a regularization procedure to automatically downweight redundant evidences and a novel method to incorporate phylogenetic profile similarity. Several datasets have been updated and new data have been added in FunCoup 3.0. Furthermore, we have developed a new Web site, which provides powerful tools to explore the predicted networks and to retrieve detailed information about the data underlying each prediction.
Collapse
|
44
|
Schreiber F, Sonnhammer ELL. Hieranoid: hierarchical orthology inference. J Mol Biol 2013; 425:2072-2081. [PMID: 23485417 DOI: 10.1016/j.jmb.2013.02.018] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2012] [Revised: 02/13/2013] [Accepted: 02/16/2013] [Indexed: 12/13/2022]
Abstract
An accurate inference of orthologs is essential in many research fields such as comparative genomics, molecular evolution, and genome annotation. Existing methods for genome-scale orthology inference are mostly based on all-versus-all similarity searches that scale quadratically with the number of species. This limits their application to the increasing number of available large-scale datasets. Here, we present Hieranoid, a new orthology inference method using a hierarchical approach. Hieranoid performs pairwise orthology analysis using InParanoid at each node in a guide tree as it progresses from its leaves to the root. This concept reduces the total runtime complexity from a quadratic to a linear function of the number of species. The tree hierarchy provides a natural structure in multi-species ortholog groups, and the aggregation of multiple sequences allows for multiple alignment similarity searching techniques, which can yield more accurate ortholog groups. Using the recently published orthobench benchmark, Hieranoid showed the overall best performance. Our progressive approach presents a new way to infer orthologs that combines efficient graph-based methodology with aspects of compute-intensive tree-based methods. The linear scaling with the number of species is a major advantage for large-scale applications and makes Hieranoid well suited to cope with vast amounts of sequenced genomes in the future. Hieranoid is an open source and can be downloaded at Hieranoid.sbc.su.se.
Collapse
|
45
|
Frings O, Alexeyenko A, Sonnhammer ELL. MGclus: network clustering employing shared neighbors. MOLECULAR BIOSYSTEMS 2013; 9:1670-5. [PMID: 23396516 DOI: 10.1039/c3mb25473a] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Network analysis is an important tool for functional annotation of genes and proteins. A common approach to discern structure in a global network is to infer network clusters, or modules, and assume a functional coherence within each module, which may represent a complex or a pathway. It is however not trivial to define optimal modules. Although many methods have been proposed, it is unclear which methods perform best in general. It seems that most methods produce far from optimal results but in different ways. MGclus is a new algorithm designed to detect modules with a strongly interconnected neighborhood in large scale biological interaction networks. In our benchmarks we found MGclus to outperform other methods when applied to random graphs with varying degree of noise, and to perform equally or better when applied to biological protein interaction networks. MGclus is implemented in Java and utilizes the JGraphT graph library. It has an easy to use command-line interface and is available for download from .
Collapse
|
46
|
McCormack T, Frings O, Alexeyenko A, Sonnhammer ELL. Statistical assessment of crosstalk enrichment between gene groups in biological networks. PLoS One 2013; 8:e54945. [PMID: 23372799 PMCID: PMC3553069 DOI: 10.1371/journal.pone.0054945] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2011] [Accepted: 12/21/2012] [Indexed: 11/19/2022] Open
Abstract
Motivation Analyzing groups of functionally coupled genes or proteins in the context of global interaction networks has become an important aspect of bioinformatic investigations. Assessing the statistical significance of crosstalk enrichment between or within groups of genes can be a valuable tool for functional annotation of experimental gene sets. Results Here we present CrossTalkZ, a statistical method and software to assess the significance of crosstalk enrichment between pairs of gene or protein groups in large biological networks. We demonstrate that the standard z-score is generally an appropriate and unbiased statistic. We further evaluate the ability of four different methods to reliably recover crosstalk within known biological pathways. We conclude that the methods preserving the second-order topological network properties perform best. Finally, we show how CrossTalkZ can be used to annotate experimental gene sets using known pathway annotations and that its performance at this task is superior to gene enrichment analysis (GEA). Availability and Implementation CrossTalkZ (available at http://sonnhammer.sbc.su.se/download/software/CrossTalkZ/) is implemented in C++, easy to use, fast, accepts various input file formats, and produces a number of statistics. These include z-score, p-value, false discovery rate, and a test of normality for the null distributions.
Collapse
|
47
|
Frings O, Mank JE, Alexeyenko A, Sonnhammer ELL. Network analysis of functional genomics data: application to avian sex-biased gene expression. ScientificWorldJournal 2012; 2012:130491. [PMID: 23319882 PMCID: PMC3540752 DOI: 10.1100/2012/130491] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2012] [Accepted: 11/25/2012] [Indexed: 12/03/2022] Open
Abstract
Gene expression analysis is often used to investigate the molecular and functional underpinnings of a phenotype. However, differential expression of individual genes is limited in that it does not consider how the genes interact with each other in networks. To address this shortcoming we propose a number of network-based analyses that give additional functional insights into the studied process. These were applied to a dataset of sex-specific gene expression in the chicken gonad and brain at different developmental stages. We first constructed a global chicken interaction network. Combining the network with the expression data showed that most sex-biased genes tend to have lower network connectivity, that is, act within local network environments, although some interesting exceptions were found. Genes of the same sex bias were generally more strongly connected with each other than expected. We further studied the fates of duplicated sex-biased genes and found that there is a significant trend to keep the same pattern of sex bias after duplication. We also identified sex-biased modules in the network, which reveal pathways or complexes involved in sex-specific processes. Altogether, this work integrates evolutionary genomics with systems biology in a novel way, offering new insights into the modular nature of sex-biased genes.
Collapse
|
48
|
Dessimoz C, Gabaldón T, Roos DS, Sonnhammer ELL, Herrero J. Toward community standards in the quest for orthologs. Bioinformatics 2012; 28:900-4. [PMID: 22332236 PMCID: PMC3307119 DOI: 10.1093/bioinformatics/bts050] [Citation(s) in RCA: 63] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023] Open
Abstract
The identification of orthologs—genes pairs descended from a common ancestor through speciation, rather than duplication—has emerged as an essential component of many bioinformatics applications, ranging from the annotation of new genomes to experimental target prioritization. Yet, the development and application of orthology inference methods is hampered by the lack of consensus on source proteomes, file formats and benchmarks. The second ‘Quest for Orthologs’ meeting brought together stakeholders from various communities to address these challenges. We report on achievements and outcomes of this meeting, focusing on topics of particular relevance to the research community at large. The Quest for Orthologs consortium is an open community that welcomes contributions from all researchers interested in orthology research and applications. Contact:dessimoz@ebi.ac.uk
Collapse
|
49
|
Punta M, Coggill PC, Eberhardt RY, Mistry J, Tate J, Boursnell C, Pang N, Forslund K, Ceric G, Clements J, Heger A, Holm L, Sonnhammer ELL, Eddy SR, Bateman A, Finn RD. The Pfam protein families database. Nucleic Acids Res 2011; 40:D290-301. [PMID: 22127870 PMCID: PMC3245129 DOI: 10.1093/nar/gkr1065] [Citation(s) in RCA: 2862] [Impact Index Per Article: 220.2] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
Pfam is a widely used database of protein families, currently containing more than 13,000 manually curated protein families as of release 26.0. Pfam is available via servers in the UK (http://pfam.sanger.ac.uk/), the USA (http://pfam.janelia.org/) and Sweden (http://pfam.sbc.su.se/). Here, we report on changes that have occurred since our 2010 NAR paper (release 24.0). Over the last 2 years, we have generated 1840 new families and increased coverage of the UniProt Knowledgebase (UniProtKB) to nearly 80%. Notably, we have taken the step of opening up the annotation of our families to the Wikipedia community, by linking Pfam families to relevant Wikipedia pages and encouraging the Pfam and Wikipedia communities to improve and expand those pages. We continue to improve the Pfam website and add new visualizations, such as the 'sunburst' representation of taxonomic distribution of families. In this work we additionally address two topics that will be of particular interest to the Pfam community. First, we explain the definition and use of family-specific, manually curated gathering thresholds. Second, we discuss some of the features of domains of unknown function (also known as DUFs), which constitute a rapidly growing class of families within Pfam.
Collapse
|
50
|
Alexeyenko A, Schmitt T, Tjärnberg A, Guala D, Frings O, Sonnhammer ELL. Comparative interactomics with Funcoup 2.0. Nucleic Acids Res 2011; 40:D821-8. [PMID: 22110034 PMCID: PMC3245127 DOI: 10.1093/nar/gkr1062] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
FunCoup (http://FunCoup.sbc.su.se) is a database that maintains and visualizes global gene/protein networks of functional coupling that have been constructed by Bayesian integration of diverse high-throughput data. FunCoup achieves high coverage by orthology-based integration of data sources from different model organisms and from different platforms. We here present release 2.0 in which the data sources have been updated and the methodology has been refined. It contains a new data type Genetic Interaction, and three new species: chicken, dog and zebra fish. As FunCoup extensively transfers functional coupling information between species, the new input datasets have considerably improved both coverage and quality of the networks. The number of high-confidence network links has increased dramatically. For instance, the human network has more than eight times as many links above confidence 0.5 as the previous release. FunCoup provides facilities for analysing the conservation of subnetworks in multiple species. We here explain how to do comparative interactomics on the FunCoup website.
Collapse
|