1
|
Occhipinti A, Hamadi Y, Kugler H, Wintersteiger CM, Yordanov B, Angione C. Discovering Essential Multiple Gene Effects Through Large Scale Optimization: An Application to Human Cancer Metabolism. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:2339-2352. [PMID: 32248120 DOI: 10.1109/tcbb.2020.2973386] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Computational modelling of metabolic processes has proven to be a useful approach to formulate our knowledge and improve our understanding of core biochemical systems that are crucial to maintaining cellular functions. Towards understanding the broader role of metabolism on cellular decision-making in health and disease conditions, it is important to integrate the study of metabolism with other core regulatory systems and omics within the cell, including gene expression patterns. After quantitatively integrating gene expression profiles with a genome-scale reconstruction of human metabolism, we propose a set of combinatorial methods to reverse engineer gene expression profiles and to find pairs and higher-order combinations of genetic modifications that simultaneously optimize multi-objective cellular goals. This enables us to suggest classes of transcriptomic profiles that are most suitable to achieve given metabolic phenotypes. We demonstrate how our techniques are able to compute beneficial, neutral or "toxic" combinations of gene expression levels. We test our methods on nine tissue-specific cancer models, comparing our outcomes with the corresponding normal cells, identifying genes as targets for potential therapies. Our methods open the way to a broad class of applications that require an understanding of the interplay among genotype, metabolism, and cellular behaviour, at scale.
Collapse
|
2
|
Röhl A, Riou T, Bockmayr A. Computing irreversible minimal cut sets in genome-scale metabolic networks via flux cone projection. Bioinformatics 2020; 35:2618-2625. [PMID: 30590390 DOI: 10.1093/bioinformatics/bty1027] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2017] [Revised: 12/06/2018] [Accepted: 12/14/2018] [Indexed: 12/19/2022] Open
Abstract
MOTIVATION Minimal cut sets (MCSs) for metabolic networks are sets of reactions which, if they are removed from the network, prevent a target reaction from carrying flux. To compute MCSs different methods exist, which may fail to find sufficiently many MCSs for larger genome-scale networks. RESULTS Here we introduce irreversible minimal cut sets (iMCSs). These are MCSs that consist of irreversible reactions only. The advantage of iMCSs is that they can be computed by projecting the flux cone of the metabolic network on the set of irreversible reactions, which usually leads to a smaller cone. Using oriented matroid theory, we show how the projected cone can be computed efficiently and how this can be applied to find iMCSs even in large genome-scale networks. AVAILABILITY AND IMPLEMENTATION Software is freely available at https://sourceforge.net/projects/irreversibleminimalcutsets/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Annika Röhl
- Department of Mathematics and Computer Science, FB Mathematik und Informatik, Freie Universität Berlin, Berlin, Germany
| | - Tanguy Riou
- Department FRANCE, Ecole Centrale de Nantes, Nantes, France
| | - Alexander Bockmayr
- Department of Mathematics and Computer Science, FB Mathematik und Informatik, Freie Universität Berlin, Berlin, Germany
| |
Collapse
|
3
|
Shibuya Y, Comin M. Better quality score compression through sequence-based quality smoothing. BMC Bioinformatics 2019; 20:302. [PMID: 31757199 PMCID: PMC6873394 DOI: 10.1186/s12859-019-2883-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2019] [Accepted: 05/07/2019] [Indexed: 11/10/2022] Open
Abstract
MOTIVATION Current NGS techniques are becoming exponentially cheaper. As a result, there is an exponential growth of genomic data unfortunately not followed by an exponential growth of storage, leading to the necessity of compression. Most of the entropy of NGS data lies in the quality values associated to each read. Those values are often more diversified than necessary. Because of that, many tools such as Quartz or GeneCodeq, try to change (smooth) quality scores in order to improve compressibility without altering the important information they carry for downstream analysis like SNP calling. RESULTS We use the FM-Index, a type of compressed suffix array, to reduce the storage requirements of a dictionary of k-mers and an effective smoothing algorithm to maintain high precision for SNP calling pipelines, while reducing quality scores entropy. We present YALFF (Yet Another Lossy Fastq Filter), a tool for quality scores compression by smoothing leading to improved compressibility of FASTQ files. The succinct k-mers dictionary allows YALFF to run on consumer computers with only 5.7 GB of available free RAM. YALFF smoothing algorithm can improve genotyping accuracy while using less resources. AVAILABILITY https://github.com/yhhshb/yalff.
Collapse
Affiliation(s)
- Yoshihiro Shibuya
- Department of Information Engineering, University of Padova, via Gradenigo 6/A, Padova, Italy
- Laboratoire d’Informatique Gaspard-Monge (LIGM), University Paris-Est Marne-la-Vallée, Bâtiment Copernic - 5, bd Descartes, Champs sur Marne, France
| | - Matteo Comin
- Department of Information Engineering, University of Padova, via Gradenigo 6/A, Padova, Italy
| |
Collapse
|
4
|
Qian J, Comin M. MetaCon: unsupervised clustering of metagenomic contigs with probabilistic k-mers statistics and coverage. BMC Bioinformatics 2019; 20:367. [PMID: 31757198 PMCID: PMC6873667 DOI: 10.1186/s12859-019-2904-4] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2019] [Accepted: 05/15/2019] [Indexed: 11/30/2022] Open
Abstract
Motivation Sequencing technologies allow the sequencing of microbial communities directly from the environment without prior culturing. Because assembly typically produces only genome fragments, also known as contigs, it is crucial to group them into putative species for further taxonomic profiling and down-streaming functional analysis. Taxonomic analysis of microbial communities requires contig clustering, a process referred to as binning, that is still one of the most challenging tasks when analyzing metagenomic data. The major problems are the lack of taxonomically related genomes in existing reference databases, the uneven abundance ratio of species, sequencing errors, and the limitations due to binning contig of different lengths. Results In this context we present MetaCon a novel tool for unsupervised metagenomic contig binning based on probabilistic k-mers statistics and coverage. MetaCon uses a signature based on k-mers statistics that accounts for the different probability of appearance of a k-mer in different species, also contigs of different length are clustered in two separate phases. The effectiveness of MetaCon is demonstrated in both simulated and real datasets in comparison with state-of-art binning approaches such as CONCOCT, MaxBin and MetaBAT. Electronic supplementary material The online version of this article (10.1186/s12859-019-2904-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Jia Qian
- Department of Information Engineering, University of Padova, Via Giovanni Gradenigo 6, Padova, Italy
| | - Matteo Comin
- Department of Information Engineering, University of Padova, Via Giovanni Gradenigo 6, Padova, Italy.
| |
Collapse
|
5
|
Wang R, Liu G, Wang C. Identifying protein complexes based on an edge weight algorithm and core-attachment structure. BMC Bioinformatics 2019; 20:471. [PMID: 31521132 PMCID: PMC6744658 DOI: 10.1186/s12859-019-3007-y] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2018] [Accepted: 07/26/2019] [Indexed: 02/02/2023] Open
Abstract
Background Protein complex identification from protein-protein interaction (PPI) networks is crucial for understanding cellular organization principles and functional mechanisms. In recent decades, numerous computational methods have been proposed to identify protein complexes. However, most of the current state-of-the-art studies still have some challenges to resolve, including their high false-positives rates, incapability of identifying overlapping complexes, lack of consideration for the inherent organization within protein complexes, and absence of some biological attachment proteins. Results In this paper, to overcome these limitations, we present a protein complex identification method based on an edge weight method and core-attachment structure (EWCA) which consists of a complex core and some sparse attachment proteins. First, we propose a new weighting method to assess the reliability of interactions. Second, we identify protein complex cores by using the structural similarity between a seed and its direct neighbors. Third, we introduce a new method to detect attachment proteins that is able to distinguish and identify peripheral proteins and overlapping proteins. Finally, we bind attachment proteins to their corresponding complex cores to form protein complexes and discard redundant protein complexes. The experimental results indicate that EWCA outperforms existing state-of-the-art methods in terms of both accuracy and p-value. Furthermore, EWCA could identify many more protein complexes with statistical significance. Additionally, EWCA could have better balance accuracy and efficiency than some state-of-the-art methods with high accuracy. Conclusions In summary, EWCA has better performance for protein complex identification by a comprehensive comparison with twelve algorithms in terms of different evaluation metrics. The datasets and software are freely available for academic research at https://github.com/RongquanWang/EWCA.
Collapse
Affiliation(s)
- Rongquan Wang
- College of Computer Science and Technology, Jilin University, No. 2699 Qianjin Street, Changchun, 130012, China.,Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, No. 2699 Qianjin Street, Changchun, 130012, China
| | - Guixia Liu
- College of Computer Science and Technology, Jilin University, No. 2699 Qianjin Street, Changchun, 130012, China. .,Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, No. 2699 Qianjin Street, Changchun, 130012, China.
| | - Caixia Wang
- School of International Economics, China Foreign Affairs University, 24 Zhanlanguan Road, Xicheng District, Beijing, 100037, China
| |
Collapse
|
6
|
Abstract
Protein complexes are known to play a major role in controlling cellular activity in a living being. Identifying complexes from raw protein-protein interactions (PPIs) is an important area of research. Earlier work has been limited mostly to yeast and a few other model organisms. Such protein complex identification methods, when applied to large human PPIs often give poor performance. We introduce a novel method called ComFiR to detect such protein complexes and further rank diseased complexes based on a query disease. We have shown that it has better performance in identifying protein complexes from human PPI data. This method is evaluated in terms of positive predictive value, sensitivity and accuracy. We have introduced a ranking approach and showed its application on Alzheimer's disease.
Collapse
|
7
|
Girotto S, Comin M, Pizzi C. Higher recall in metagenomic sequence classification exploiting overlapping reads. BMC Genomics 2017; 18:917. [PMID: 29244002 PMCID: PMC5731601 DOI: 10.1186/s12864-017-4273-6] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background In recent years several different fields, such as ecology, medicine and microbiology, have experienced an unprecedented development due to the possibility of direct sequencing of microbioimic samples. Among problems that researchers in the field have to deal with, taxonomic classification of metagenomic reads is one of the most challenging. State of the art methods classify single reads with almost 100% precision. However, very often, the performance in terms of recall falls at about 50%. As a consequence, state-of-the-art methods are indeed capable of correctly classify only half of the reads in the sample. How to achieve better performances in terms of overall quality of classification remains a largely unsolved problem. Results In this paper we propose a method for metagenomics CLassification Improvement with Overlapping Reads (CLIOR), that exploits the information carried by the overlapping reads graph of the input read dataset to improve recall, f-measure, and the estimated abundance of species. In this work, we applied CLIOR on top of the classification produced by the classifier Clark-l. Experiments on simulated and synthetic metagenomes show that CLIOR can lead to substantial improvement of the recall rate, sometimes doubling it. On average, on simulated datasets, the increase of recall is paired with an higher precision too, while on synthetic datasets it comes at expenses of a small loss of precision. On experiments on real metagenomes CLIOR is able to assign many more reads while keeping the abundance ratios in line with previous studies. Conclusions Our results showed that with CLIOR is possible to boost the recall of a state-of-the-art metagenomic classifier by inferring and/or correcting the assignment of reads with missing or erroneous labeling. CLIOR is not restricted to the reads classification algorithm used in our experiments, but it may be applied to other methods too. Finally, CLIOR does not need large computational resources, and it can be run on a laptop. Electronic supplementary material The online version of this article (doi:10.1186/s12864-017-4273-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Samuele Girotto
- Department of Information Engineering, University of Padova, via Gradenigo 6/A, Padova, 35131, Italy
| | - Matteo Comin
- Department of Information Engineering, University of Padova, via Gradenigo 6/A, Padova, 35131, Italy.
| | - Cinzia Pizzi
- Department of Information Engineering, University of Padova, via Gradenigo 6/A, Padova, 35131, Italy.
| |
Collapse
|
8
|
Jelínek J, Škoda P, Hoksza D. Utilizing knowledge base of amino acids structural neighborhoods to predict protein-protein interaction sites. BMC Bioinformatics 2017; 18:492. [PMID: 29244012 PMCID: PMC5731498 DOI: 10.1186/s12859-017-1921-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Protein-protein interactions (PPI) play a key role in an investigation of various biochemical processes, and their identification is thus of great importance. Although computational prediction of which amino acids take part in a PPI has been an active field of research for some time, the quality of in-silico methods is still far from perfect. RESULTS We have developed a novel prediction method called INSPiRE which benefits from a knowledge base built from data available in Protein Data Bank. All proteins involved in PPIs were converted into labeled graphs with nodes corresponding to amino acids and edges to pairs of neighboring amino acids. A structural neighborhood of each node was then encoded into a bit string and stored in the knowledge base. When predicting PPIs, INSPiRE labels amino acids of unknown proteins as interface or non-interface based on how often their structural neighborhood appears as interface or non-interface in the knowledge base. We evaluated INSPiRE's behavior with respect to different types and sizes of the structural neighborhood. Furthermore, we examined the suitability of several different features for labeling the nodes. Our evaluations showed that INSPiRE clearly outperforms existing methods with respect to Matthews correlation coefficient. CONCLUSION In this paper we introduce a new knowledge-based method for identification of protein-protein interaction sites called INSPiRE. Its knowledge base utilizes structural patterns of known interaction sites in the Protein Data Bank which are then used for PPI prediction. Extensive experiments on several well-established datasets show that INSPiRE significantly surpasses existing PPI approaches.
Collapse
Affiliation(s)
- Jan Jelínek
- Department of Software Engineering, Faculty of Mathematics and Physics, Charles University, Ke Karlovu 3, Prague 2, Czech Republic
| | - Petr Škoda
- Department of Software Engineering, Faculty of Mathematics and Physics, Charles University, Ke Karlovu 3, Prague 2, Czech Republic
| | - David Hoksza
- Department of Software Engineering, Faculty of Mathematics and Physics, Charles University, Ke Karlovu 3, Prague 2, Czech Republic
| |
Collapse
|
9
|
Xu B, Wang Y, Wang Z, Zhou J, Zhou S, Guan J. An effective approach to detecting both small and large complexes from protein-protein interaction networks. BMC Bioinformatics 2017; 18:419. [PMID: 29072136 PMCID: PMC5657047 DOI: 10.1186/s12859-017-1820-8] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background Predicting protein complexes from protein-protein interaction (PPI) networks has been studied for decade. Various methods have been proposed to address some challenging issues of this problem, including overlapping clusters, high false positive/negative rates of PPI data and diverse complex structures. It is well known that most current methods can detect effectively only complexes of size ≥3, which account for only about half of the total existing complexes. Recently, a method was proposed specifically for finding small complexes (size = 2 and 3) from PPI networks. However, up to now there is no effective approach that can predict both small (size ≤ 3) and large (size >3) complexes from PPI networks. Results In this paper, we propose a novel method, called CPredictor2.0, that can detect both small and large complexes under a unified framework. Concretely, we first group proteins of similar functions. Then, the Markov clustering algorithm is employed to discover clusters in each group. Finally, we merge all discovered clusters that overlap with each other to a certain degree, and the merged clusters as well as the remaining clusters constitute the set of detected complexes. Extensive experiments have shown that the new method can more effectively predict both small and large complexes, in comparison with the state-of-the-art methods. Conclusions The proposed method, CPredictor2.0, can be applied to accurately predict both small and large protein complexes.
Collapse
Affiliation(s)
- Bin Xu
- Department of Computer Science and Technology, Tongji University, 4800 Cao'an Road, Shanghai, 201804, China
| | - Yang Wang
- School of Software, Jiangxi Normal University, 99 Ziyang Avenue, Nanchang, 330022, China
| | - Zewei Wang
- Shanghai Southwest Model Middle School, 67 Huicheng Vallige-1, Baise Road, Shanghai, 200237, China
| | - Jiaogen Zhou
- The institute of subtropical Agriculture, China Academy of Sciences, 444 Yuandaer Road, Mapoling, Changsha, 410125, China
| | - Shuigeng Zhou
- Shanghai Key Lab of Intelligent Information Processing, and School of Computer Science, Fudan University, 220 Handan Road, Shanghai, 200433, China.,The Bioinformatics Lab at Changzhou NO. 7 People's Hospital, Changzhou, Jiangsu, 213011, China
| | - Jihong Guan
- Department of Computer Science and Technology, Tongji University, 4800 Cao'an Road, Shanghai, 201804, China.
| |
Collapse
|
10
|
Mueller AJ, Peffers MJ, Proctor CJ, Clegg PD. Systems approaches in osteoarthritis: Identifying routes to novel diagnostic and therapeutic strategies. J Orthop Res 2017; 35:1573-1588. [PMID: 28318047 PMCID: PMC5574007 DOI: 10.1002/jor.23563] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/13/2016] [Accepted: 03/06/2017] [Indexed: 02/04/2023]
Abstract
Systems orientated research offers the possibility of identifying novel therapeutic targets and relevant diagnostic markers for complex diseases such as osteoarthritis. This review demonstrates that the osteoarthritis research community has been slow to incorporate systems orientated approaches into research studies, although a number of key studies reveal novel insights into the regulatory mechanisms that contribute both to joint tissue homeostasis and its dysfunction. The review introduces both top-down and bottom-up approaches employed in the study of osteoarthritis. A holistic and multiscale approach, where clinical measurements may predict dysregulation and progression of joint degeneration, should be a key objective in future research. The review concludes with suggestions for further research and emerging trends not least of which is the coupled development of diagnostic tests and therapeutics as part of a concerted effort by the osteoarthritis research community to meet clinical needs. © 2017 The Authors. Journal of Orthopaedic Research Published by Wiley Periodicals, Inc. on behalf of Orthopaedic Research Society. J Orthop Res 35:1573-1588, 2017.
Collapse
Affiliation(s)
- Alan J. Mueller
- Faculty of Health and Life SciencesDepartment of Musculoskeletal BiologyInstitute of Ageing and Chronic DiseaseUniversity of LiverpoolWilliam Henry Duncan Building, 6 West Derby StreetLiverpoolL7 8TXUnited Kingdom
| | - Mandy J. Peffers
- Faculty of Health and Life SciencesDepartment of Musculoskeletal BiologyInstitute of Ageing and Chronic DiseaseUniversity of LiverpoolWilliam Henry Duncan Building, 6 West Derby StreetLiverpoolL7 8TXUnited Kingdom,The MRC‐Arthritis Research UK Centre for Integrated Research into Musculoskeletal Ageing (CIMA)LiverpoolUnited Kingdom
| | - Carole J. Proctor
- The MRC‐Arthritis Research UK Centre for Integrated Research into Musculoskeletal Ageing (CIMA)LiverpoolUnited Kingdom,Institute of Cellular MedicineNewcastle UniversityFramlington PlaceNewcastle upon TyneNE2 4HHUnited Kingdom
| | - Peter D. Clegg
- Faculty of Health and Life SciencesDepartment of Musculoskeletal BiologyInstitute of Ageing and Chronic DiseaseUniversity of LiverpoolWilliam Henry Duncan Building, 6 West Derby StreetLiverpoolL7 8TXUnited Kingdom,The MRC‐Arthritis Research UK Centre for Integrated Research into Musculoskeletal Ageing (CIMA)LiverpoolUnited Kingdom
| |
Collapse
|
11
|
Abstract
Background A metagenomic sample is a set of DNA fragments, randomly extracted from multiple cells in an environment, belonging to distinct, often unknown species. Unsupervised metagenomic clustering aims at partitioning a metagenomic sample into sets that approximate taxonomic units, without using reference genomes. Since samples are large and steadily growing, space-efficient clustering algorithms are strongly needed. Results We design and implement a space-efficient algorithmic framework that solves a number of core primitives in unsupervised metagenomic clustering using just the bidirectional Burrows-Wheeler index and a union-find data structure on the set of reads. When run on a sample of total length n, with m reads of maximum length ℓ each, on an alphabet of total size σ, our algorithms take O(n(t+logσ)) time and just 2n+o(n)+O(max{ℓσlogn,K logm}) bits of space in addition to the index and to the union-find data structure, where K is a measure of the redundancy of the sample and t is the query time of the union-find data structure. Conclusions Our experimental results show that our algorithms are practical, they can exploit multiple cores by a parallel traversal of the suffix-link tree, and they are competitive both in space and in time with the state of the art.
Collapse
|
12
|
Röhl A, Bockmayr A. A mixed-integer linear programming approach to the reduction of genome-scale metabolic networks. BMC Bioinformatics 2017; 18:2. [PMID: 28049424 PMCID: PMC5210269 DOI: 10.1186/s12859-016-1412-z] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2016] [Accepted: 12/07/2016] [Indexed: 01/08/2023] Open
Abstract
Background Constraint-based analysis has become a widely used method to study metabolic networks. While some of the associated algorithms can be applied to genome-scale network reconstructions with several thousands of reactions, others are limited to small or medium-sized models. In 2015, Erdrich et al. introduced a method called NetworkReducer, which reduces large metabolic networks to smaller subnetworks, while preserving a set of biological requirements that can be specified by the user. Already in 2001, Burgard et al. developed a mixed-integer linear programming (MILP) approach for computing minimal reaction sets under a given growth requirement. Results Here we present an MILP approach for computing minimum subnetworks with the given properties. The minimality (with respect to the number of active reactions) is not guaranteed by NetworkReducer, while the method by Burgard et al. does not allow specifying the different biological requirements. Our procedure is about 5-10 times faster than NetworkReducer and can enumerate all minimum subnetworks in case there exist several ones. This allows identifying common reactions that are present in all subnetworks, and reactions appearing in alternative pathways. Conclusions Applying complex analysis methods to genome-scale metabolic networks is often not possible in practice. Thus it may become necessary to reduce the size of the network while keeping important functionalities. We propose a MILP solution to this problem. Compared to previous work, our approach is more efficient and allows computing not only one, but even all minimum subnetworks satisfying the required properties. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1412-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Annika Röhl
- Department of Mathematics and Computer Science, Freie Universität Berlin, Arnimallee 6, Berlin, Germany.
| | - Alexander Bockmayr
- Department of Mathematics and Computer Science, Freie Universität Berlin, Arnimallee 6, Berlin, Germany
| |
Collapse
|
13
|
Meng L, Striegel A, Milenković T. Local versus global biological network alignment. Bioinformatics 2016; 32:3155-3164. [PMID: 27357169 PMCID: PMC5048063 DOI: 10.1093/bioinformatics/btw348] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2015] [Revised: 02/18/2016] [Accepted: 05/23/2016] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Network alignment (NA) aims to find regions of similarities between species' molecular networks. There exist two NA categories: local (LNA) and global (GNA). LNA finds small highly conserved network regions and produces a many-to-many node mapping. GNA finds large conserved regions and produces a one-to-one node mapping. Given the different outputs of LNA and GNA, when a new NA method is proposed, it is compared against existing methods from the same category. However, both NA categories have the same goal: to allow for transferring functional knowledge from well- to poorly-studied species between conserved network regions. So, which one to choose, LNA or GNA? To answer this, we introduce the first systematic evaluation of the two NA categories. RESULTS We introduce new measures of alignment quality that allow for fair comparison of the different LNA and GNA outputs, as such measures do not exist. We provide user-friendly software for efficient alignment evaluation that implements the new and existing measures. We evaluate prominent LNA and GNA methods on synthetic and real-world biological networks. We study the effect on alignment quality of using different interaction types and confidence levels. We find that the superiority of one NA category over the other is context-dependent. Further, when we contrast LNA and GNA in the application of learning novel protein functional knowledge, the two produce very different predictions, indicating their complementarity. Our results and software provide guidelines for future NA method development and evaluation. AVAILABILITY AND IMPLEMENTATION Software: http://www.nd.edu/~cone/LNA_GNA CONTACT: : tmilenko@nd.eduSupplementary information: Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Lei Meng
- Department of Computer Science and Engineering ECK Institute of Global Health and Interdisciplinary Center for Network Science and Applications
| | - Aaron Striegel
- Department of Computer Science and Engineering Wireless Institute, University of Notre Dame, Notre Dame, IN 46556, USA
| | - Tijana Milenković
- Department of Computer Science and Engineering ECK Institute of Global Health and Interdisciplinary Center for Network Science and Applications
| |
Collapse
|
14
|
Girotto S, Pizzi C, Comin M. MetaProb: accurate metagenomic reads binning based on probabilistic sequence signatures. Bioinformatics 2016; 32:i567-i575. [DOI: 10.1093/bioinformatics/btw466] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
|