1
|
Mansoor S, Hamid S, Tuan TT, Park JE, Chung YS. Advance computational tools for multiomics data learning. Biotechnol Adv 2024; 77:108447. [PMID: 39251098 DOI: 10.1016/j.biotechadv.2024.108447] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2024] [Revised: 09/01/2024] [Accepted: 09/05/2024] [Indexed: 09/11/2024]
Abstract
The burgeoning field of bioinformatics has seen a surge in computational tools tailored for omics data analysis driven by the heterogeneous and high-dimensional nature of omics data. In biomedical and plant science research multi-omics data has become pivotal for predictive analytics in the era of big data necessitating sophisticated computational methodologies. This review explores a diverse array of computational approaches which play crucial role in processing, normalizing, integrating, and analyzing omics data. Notable methods such similarity-based methods, network-based approaches, correlation-based methods, Bayesian methods, fusion-based methods and multivariate techniques among others are discussed in detail, each offering unique functionalities to address the complexities of multi-omics data. Furthermore, this review underscores the significance of computational tools in advancing our understanding of data and their transformative impact on research.
Collapse
Affiliation(s)
- Sheikh Mansoor
- Department of Plant Resources and Environment, Jeju National University, 63243, Republic of Korea
| | - Saira Hamid
- Watson Crick Centre for Molecular Medicine, Islamic University of Science and Technology, Awantipora, Pulwama, J&K, India
| | - Thai Thanh Tuan
- Department of Plant Resources and Environment, Jeju National University, 63243, Republic of Korea; Multimedia Communications Laboratory, University of Information Technology, Ho Chi Minh city 70000, Vietnam; Multimedia Communications Laboratory, Vietnam National University, Ho Chi Minh city 70000, Vietnam
| | - Jong-Eun Park
- Department of Animal Biotechnology, College of Applied Life Science, Jeju National University, Jeju, Jeju-do, Republic of Korea.
| | - Yong Suk Chung
- Department of Plant Resources and Environment, Jeju National University, 63243, Republic of Korea.
| |
Collapse
|
2
|
Kaiser T, Jahansouz C, Staley C. Network-based approaches for the investigation of microbial community structure and function using metagenomics-based data. Future Microbiol 2022; 17:621-631. [PMID: 35360922 DOI: 10.2217/fmb-2021-0219] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Network-based approaches offer a powerful framework to evaluate microbial community organization and function as it relates to a variety of environmental processes. Emerging studies are exploring network theory as a method for data integration that is likely to be critical for the integration of 'omics' data using systems biology approaches. Intricacies of network theory and methodological and computational complexities in network construction, however, impede the use of these tools for translational science. We provide a perspective on the methods of network construction, interpretation and emerging uses for these techniques in understanding host-microbiota interactions.
Collapse
Affiliation(s)
- Thomas Kaiser
- Department of Surgery, University of Minnesota, Minneapolis, MN 55455, USA.,Biotechnology Institute, University of Minnesota, Saint Paul, MN 55108, USA
| | - Cyrus Jahansouz
- Department of Surgery, University of Minnesota, Minneapolis, MN 55455, USA
| | - Christopher Staley
- Department of Surgery, University of Minnesota, Minneapolis, MN 55455, USA.,Biotechnology Institute, University of Minnesota, Saint Paul, MN 55108, USA
| |
Collapse
|
3
|
Wu S, Chen D, Snyder MP. Network biology bridges the gaps between quantitative genetics and multi-omics to map complex diseases. Curr Opin Chem Biol 2021; 66:102101. [PMID: 34861483 DOI: 10.1016/j.cbpa.2021.102101] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2021] [Revised: 10/17/2021] [Accepted: 10/27/2021] [Indexed: 12/27/2022]
Abstract
With advances in high-throughput sequencing technologies, quantitative genetics approaches have provided insights into genetic basis of many complex diseases. Emerging in-depth multi-omics profiling technologies have created exciting opportunities for systematically investigating intricate interaction networks with different layers of biological molecules underlying disease etiology. Herein, we summarized two main categories of biological networks: evidence-based and statistically inferred. These different types of molecular networks complement each other at both bulk and single-cell levels. We also review three main strategies to incorporate quantitative genetics results with multi-omics data by network analysis: (a) network propagation, (b) functional module-based methods, (c) comparative/dynamic networks. These strategies not only aid in elucidating molecular mechanisms of complex diseases but can guide the search for therapeutic targets.
Collapse
Affiliation(s)
- Si Wu
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
| | - Dijun Chen
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, Nanjing, 210023, China
| | - Michael P Snyder
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA.
| |
Collapse
|
4
|
Arici MK, Tuncbag N. Performance Assessment of the Network Reconstruction Approaches on Various Interactomes. Front Mol Biosci 2021; 8:666705. [PMID: 34676243 PMCID: PMC8523993 DOI: 10.3389/fmolb.2021.666705] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2021] [Accepted: 07/14/2021] [Indexed: 01/04/2023] Open
Abstract
Beyond the list of molecules, there is a necessity to collectively consider multiple sets of omic data and to reconstruct the connections between the molecules. Especially, pathway reconstruction is crucial to understanding disease biology because abnormal cellular signaling may be pathological. The main challenge is how to integrate the data together in an accurate way. In this study, we aim to comparatively analyze the performance of a set of network reconstruction algorithms on multiple reference interactomes. We first explored several human protein interactomes, including PathwayCommons, OmniPath, HIPPIE, iRefWeb, STRING, and ConsensusPathDB. The comparison is based on the coverage of each interactome in terms of cancer driver proteins, structural information of protein interactions, and the bias toward well-studied proteins. We next used these interactomes to evaluate the performance of network reconstruction algorithms including all-pair shortest path, heat diffusion with flux, personalized PageRank with flux, and prize-collecting Steiner forest (PCSF) approaches. Each approach has its own merits and weaknesses. Among them, PCSF had the most balanced performance in terms of precision and recall scores when 28 pathways from NetPath were reconstructed using the listed algorithms. Additionally, the reference interactome affects the performance of the network reconstruction approaches. The coverage and disease- or tissue-specificity of each interactome may vary, which may result in differences in the reconstructed networks.
Collapse
Affiliation(s)
- M Kaan Arici
- Graduate School of Informatics, Middle East Technical University, Ankara, Turkey.,Foot and Mouth Diseases Institute, Ministry of Agriculture and Forestry, Ankara, Turkey
| | - Nurcan Tuncbag
- Chemical and Biological Engineering, College of Engineering, Koc University, Istanbul, Turkey.,School of Medicine, Koc University, Istanbul, Turkey
| |
Collapse
|
5
|
Zeng H, Zhang J, Preising GA, Rubel T, Singh P, Ritz A. Graphery: interactive tutorials for biological network algorithms. Nucleic Acids Res 2021; 49:W257-W262. [PMID: 34037782 PMCID: PMC8262715 DOI: 10.1093/nar/gkab420] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2021] [Revised: 04/19/2021] [Accepted: 05/03/2021] [Indexed: 11/14/2022] Open
Abstract
Networks have been an excellent framework for modeling complex biological information, but the methodological details of network-based tools are often described for a technical audience. We have developed Graphery, an interactive tutorial webserver that illustrates foundational graph concepts frequently used in network-based methods. Each tutorial describes a graph concept along with executable Python code that can be interactively run on a graph. Users navigate each tutorial using their choice of real-world biological networks that highlight the diverse applications of network algorithms. Graphery also allows users to modify the code within each tutorial or write new programs, which all can be executed without requiring an account. Graphery accepts ideas for new tutorials and datasets that will be shaped by both computational and biological researchers, growing into a community-contributed learning platform. Graphery is available at https://graphery.reedcompbio.org/.
Collapse
Affiliation(s)
- Heyuan Zeng
- Computer Science Department, Reed College, 3203 SE Woodstock Blvd, Portland, OR 97202, USA.,Biology Department, Reed College, 3203 SE Woodstock Blvd, Portland, OR 97202, USA
| | - Jinbiao Zhang
- Information and Communication Technology Department, Xiamen University Malaysia, Jalan Sunsuria, Bandar Sunsuria, 43900 Sepang, Selangor Darul Ehsan, Malaysia
| | - Gabriel A Preising
- Biology Department, Reed College, 3203 SE Woodstock Blvd, Portland, OR 97202, USA
| | - Tobias Rubel
- Biology Department, Reed College, 3203 SE Woodstock Blvd, Portland, OR 97202, USA
| | - Pramesh Singh
- Biology Department, Reed College, 3203 SE Woodstock Blvd, Portland, OR 97202, USA
| | - Anna Ritz
- Biology Department, Reed College, 3203 SE Woodstock Blvd, Portland, OR 97202, USA
| |
Collapse
|
6
|
Liu Z, Ma A, Mathé E, Merling M, Ma Q, Liu B. Network analyses in microbiome based on high-throughput multi-omics data. Brief Bioinform 2021; 22:1639-1655. [PMID: 32047891 PMCID: PMC7986608 DOI: 10.1093/bib/bbaa005] [Citation(s) in RCA: 44] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2019] [Revised: 01/07/2020] [Accepted: 01/08/2020] [Indexed: 02/06/2023] Open
Abstract
Together with various hosts and environments, ubiquitous microbes interact closely with each other forming an intertwined system or community. Of interest, shifts of the relationships between microbes and their hosts or environments are associated with critical diseases and ecological changes. While advances in high-throughput Omics technologies offer a great opportunity for understanding the structures and functions of microbiome, it is still challenging to analyse and interpret the omics data. Specifically, the heterogeneity and diversity of microbial communities, compounded with the large size of the datasets, impose a tremendous challenge to mechanistically elucidate the complex communities. Fortunately, network analyses provide an efficient way to tackle this problem, and several network approaches have been proposed to improve this understanding recently. Here, we systemically illustrate these network theories that have been used in biological and biomedical research. Then, we review existing network modelling methods of microbial studies at multiple layers from metagenomics to metabolomics and further to multi-omics. Lastly, we discuss the limitations of present studies and provide a perspective for further directions in support of the understanding of microbial communities.
Collapse
Affiliation(s)
- Zhaoqian Liu
- Department of Biomedical Informatics, College of Medicine, the Ohio State University, Columbus, OH 43210, USA
| | - Anjun Ma
- Department of Biomedical Informatics, College of Medicine, the Ohio State University, Columbus, OH 43210, USA
| | - Ewy Mathé
- Department of Biomedical Informatics, College of Medicine, the Ohio State University, Columbus, OH 43210, USA
| | - Marlena Merling
- Department of Biomedical Informatics, College of Medicine, the Ohio State University, Columbus, OH 43210, USA
| | - Qin Ma
- Department of Biomedical Informatics, College of Medicine, the Ohio State University, Columbus, OH 43210, USA
| | - Bingqiang Liu
- Department of Biomedical Informatics, College of Medicine, the Ohio State University, Columbus, OH 43210, USA
| |
Collapse
|
7
|
Belyaeva A, Cammarata L, Radhakrishnan A, Squires C, Yang KD, Shivashankar GV, Uhler C. Causal network models of SARS-CoV-2 expression and aging to identify candidates for drug repurposing. Nat Commun 2021; 12:1024. [PMID: 33589624 PMCID: PMC7884845 DOI: 10.1038/s41467-021-21056-z] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2020] [Accepted: 01/05/2021] [Indexed: 12/21/2022] Open
Abstract
Given the severity of the SARS-CoV-2 pandemic, a major challenge is to rapidly repurpose existing approved drugs for clinical interventions. While a number of data-driven and experimental approaches have been suggested in the context of drug repurposing, a platform that systematically integrates available transcriptomic, proteomic and structural data is missing. More importantly, given that SARS-CoV-2 pathogenicity is highly age-dependent, it is critical to integrate aging signatures into drug discovery platforms. We here take advantage of large-scale transcriptional drug screens combined with RNA-seq data of the lung epithelium with SARS-CoV-2 infection as well as the aging lung. To identify robust druggable protein targets, we propose a principled causal framework that makes use of multiple data modalities. Our analysis highlights the importance of serine/threonine and tyrosine kinases as potential targets that intersect the SARS-CoV-2 and aging pathways. By integrating transcriptomic, proteomic and structural data that is available for many diseases, our drug discovery platform is broadly applicable. Rigorous in vitro experiments as well as clinical trials are needed to validate the identified candidate drugs.
Collapse
Affiliation(s)
| | | | | | | | | | - G V Shivashankar
- ETH Zurich, Zurich, Switzerland
- Paul Scherrer Institute, Villigen, Switzerland
| | - Caroline Uhler
- Massachusetts Institute of Technology, Cambridge, MA, USA.
| |
Collapse
|
8
|
Jiang Y, Liang Y, Wang D, Xu D, Joshi T. A dynamic programing approach to integrate gene expression data and network information for pathway model generation. Bioinformatics 2020; 36:169-176. [PMID: 31168616 DOI: 10.1093/bioinformatics/btz467] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2018] [Revised: 05/15/2019] [Accepted: 05/31/2019] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION As large amounts of biological data continue to be rapidly generated, a major focus of bioinformatics research has been aimed toward integrating these data to identify active pathways or modules under certain experimental conditions or phenotypes. Although biologically significant modules can often be detected globally by many existing methods, it is often hard to interpret or make use of the results toward pathway model generation and testing. RESULTS To address this gap, we have developed the IMPRes algorithm, a new step-wise active pathway detection method using a dynamic programing approach. IMPRes takes advantage of the existing pathway interaction knowledge in Kyoto Encyclopedia of Genes and Genomes. Omics data are then used to assign penalties to genes, interactions and pathways. Finally, starting from one or multiple seed genes, a shortest path algorithm is applied to detect downstream pathways that best explain the gene expression data. Since dynamic programing enables the detection one step at a time, it is easy for researchers to trace the pathways, which may lead to more accurate drug design and more effective treatment strategies. The evaluation experiments conducted on three yeast datasets have shown that IMPRes can achieve competitive or better performance than other state-of-the-art methods. Furthermore, a case study on human lung cancer dataset was performed and we provided several insights on genes and mechanisms involved in lung cancer, which had not been discovered before. AVAILABILITY AND IMPLEMENTATION IMPRes visualization tool is available via web server at http://digbio.missouri.edu/impres. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yuexu Jiang
- Department of Computer Science and Technology, Jilin University, Changchun 130012, China.,Department of Electrical Engineering and Computer Science, Columbia, MO 65211, USA
| | - Yanchun Liang
- Department of Computer Science and Technology, Jilin University, Changchun 130012, China
| | - Duolin Wang
- Department of Computer Science and Technology, Jilin University, Changchun 130012, China.,Department of Electrical Engineering and Computer Science, Columbia, MO 65211, USA
| | - Dong Xu
- Department of Computer Science and Technology, Jilin University, Changchun 130012, China.,Department of Electrical Engineering and Computer Science, Columbia, MO 65211, USA.,Informatics Institute and Christopher S. Bond Life Sciences Center, Columbia, MO 65211, USA
| | - Trupti Joshi
- Department of Electrical Engineering and Computer Science, Columbia, MO 65211, USA.,Informatics Institute and Christopher S. Bond Life Sciences Center, Columbia, MO 65211, USA.,Department of Health Management and Informatics, University of Missouri, Columbia, MO 65211, USA
| |
Collapse
|
9
|
Basha O, Mauer O, Simonovsky E, Shpringer R, Yeger-Lotem E. ResponseNet v.3: revealing signaling and regulatory pathways connecting your proteins and genes across human tissues. Nucleic Acids Res 2020; 47:W242-W247. [PMID: 31114913 PMCID: PMC6602570 DOI: 10.1093/nar/gkz421] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2019] [Revised: 04/23/2019] [Accepted: 05/06/2019] [Indexed: 12/13/2022] Open
Abstract
ResponseNet v.3 is an enhanced version of ResponseNet, a web server that is designed to highlight signaling and regulatory pathways connecting user-defined proteins and genes by using the ResponseNet network optimization approach (http://netbio.bgu.ac.il/respnet). Users run ResponseNet by defining source and target sets of proteins, genes and/or microRNAs, and by specifying a molecular interaction network (interactome). The output of ResponseNet is a sparse, high-probability interactome subnetwork that connects the two sets, thereby revealing additional molecules and interactions that are involved in the studied condition. In recent years, massive efforts were invested in profiling the transcriptomes of human tissues, enabling the inference of human tissue interactomes. ResponseNet v.3 expands ResponseNet2.0 by harnessing ∼11,600 RNA-sequenced human tissue profiles made available by the Genotype-Tissue Expression consortium, to support context-specific analysis of 44 human tissues. Thus, ResponseNet v.3 allows users to illuminate the signaling and regulatory pathways potentially active in the context of a specific tissue, and to compare them with active pathways in other tissues. In the era of precision medicine, such analyses open the door for tissue- and patient-specific analyses of pathways and diseases.
Collapse
Affiliation(s)
- Omer Basha
- Department of Clinical Biochemistry & Pharmacology, Faculty of Health Sciences
| | - Omry Mauer
- Department of Clinical Biochemistry & Pharmacology, Faculty of Health Sciences
| | - Eyal Simonovsky
- Department of Clinical Biochemistry & Pharmacology, Faculty of Health Sciences
| | - Rotem Shpringer
- Department of Clinical Biochemistry & Pharmacology, Faculty of Health Sciences
| | - Esti Yeger-Lotem
- Department of Clinical Biochemistry & Pharmacology, Faculty of Health Sciences.,National Institute for Biotechnology in the Negev, Ben-Gurion University of the Negev, Beer-Sheva 84105, Israel
| |
Collapse
|
10
|
Peng J, Zhu L, Wang Y, Chen J. Mining Relationships among Multiple Entities in Biological Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:769-776. [PMID: 30872239 DOI: 10.1109/tcbb.2019.2904965] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Identifying topological relationships among multiple entities in biological networks is critical towards the understanding of the organizational principles of network functionality. Theoretically, this problem can be solved using minimum Steiner tree (MSTT) algorithms. However, due to large network size, it remains to be computationally challenging, and the predictive value of multi-entity topological relationships is still unclear. We present a novel solution called Cluster-based Steiner Tree Miner (CST-Miner) to instantly identify multi-entity topological relationships in biological networks. Given a list of user-specific entities, CST-Miner decomposes a biological network into nested cluster-based subgraphs, on which multiple minimum Steiner trees are identified. By merging all of them into a minimum cost tree, the optimal topological relationships among all the user-specific entities are revealed. Experimental results showed that CST-Miner can finish in nearly log-linear time and the tree constructed by CST-Miner is close to the global minimum.
Collapse
|
11
|
Bersanelli M, Mosca E, Milanesi L, Bazzani A, Castellani G. Frailness and resilience of gene networks predicted by detection of co-occurring mutations via a stochastic perturbative approach. Sci Rep 2020; 10:2643. [PMID: 32060296 PMCID: PMC7021762 DOI: 10.1038/s41598-020-59036-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2019] [Accepted: 11/22/2019] [Indexed: 11/13/2022] Open
Abstract
In recent years complex networks have been identified as powerful mathematical frameworks for the adequate modeling of many applied problems in disparate research fields. Assuming a Master Equation (ME) modeling the exchange of information within the network, we set up a perturbative approach in order to investigate how node alterations impact on the network information flow. The main assumption of the perturbed ME (pME) model is that the simultaneous presence of multiple node alterations causes more or less intense network frailties depending on the specific features of the perturbation. In this perspective the collective behavior of a set of molecular alterations on a gene network is a particularly adapt scenario for a first application of the proposed method, since most diseases are neither related to a single mutation nor to an established set of molecular alterations. Therefore, after characterizing the method numerically, we applied as a proof of principle the pME approach to breast cancer (BC) somatic mutation data downloaded from Cancer Genome Atlas (TCGA) database. For each patient we measured the network frailness of over 90 significant subnetworks of the protein-protein interaction network, where each perturbation was defined by patient-specific somatic mutations. Interestingly the frailness measures depend on the position of the alterations on the gene network more than on their amount, unlike most traditional enrichment scores. In particular low-degree mutations play an important role in causing high frailness measures. The potential applicability of the proposed method is wide and suggests future development in the control theory context.
Collapse
Affiliation(s)
- Matteo Bersanelli
- Department of Physics and Astronomy, University of Bologna, Bologna, 40127, Italy. .,National Institute for Nuclear Physics (INFN), Bologna, 40127, Italy.
| | - Ettore Mosca
- Institute of Biomedical Technologies, National Research Council, Segrate, Milan, 20090, Italy
| | - Luciano Milanesi
- Institute of Biomedical Technologies, National Research Council, Segrate, Milan, 20090, Italy
| | - Armando Bazzani
- Department of Physics and Astronomy, University of Bologna, Bologna, 40127, Italy
| | - Gastone Castellani
- Department of Physics and Astronomy, University of Bologna, Bologna, 40127, Italy
| |
Collapse
|
12
|
IMPRes-Pro: A high dimensional multiomics integration method for in silico hypothesis generation. Methods 2020; 173:16-23. [DOI: 10.1016/j.ymeth.2019.06.013] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2019] [Revised: 06/08/2019] [Accepted: 06/13/2019] [Indexed: 01/18/2023] Open
|
13
|
Jeggari A, Alekseenko Z, Petrov I, Dias JM, Ericson J, Alexeyenko A. EviNet: a web platform for network enrichment analysis with flexible definition of gene sets. Nucleic Acids Res 2019; 46:W163-W170. [PMID: 29893885 PMCID: PMC6030852 DOI: 10.1093/nar/gky485] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2018] [Accepted: 05/29/2018] [Indexed: 12/18/2022] Open
Abstract
The new web resource EviNet provides an easily run interface to network enrichment analysis for exploration of novel, experimentally defined gene sets. The major advantages of this analysis are (i) applicability to any genes found in the global network rather than only to those with pathway/ontology term annotations, (ii) ability to connect genes via different molecular mechanisms rather than within one high-throughput platform, and (iii) statistical power sufficient to detect enrichment of very small sets, down to individual genes. The users’ gene sets are either defined prior to upload or derived interactively from an uploaded file by differential expression criteria. The pathways and networks used in the analysis can be chosen from the collection menu. The calculation is typically done within seconds or minutes and the stable URL is provided immediately. The results are presented in both visual (network graphs) and tabular formats using jQuery libraries. Uploaded data and analysis results are kept in separated project directories not accessible by other users. EviNet is available at https://www.evinet.org/.
Collapse
Affiliation(s)
- Ashwini Jeggari
- Department of Cell and Molecular Biology, Karolinska Institutet, 171 77 Stockholm, Sweden
| | - Zhanna Alekseenko
- Department of Cell and Molecular Biology, Karolinska Institutet, 171 77 Stockholm, Sweden
| | - Iurii Petrov
- Department of Microbiology, Tumor and Cell Biology (MTC), Karolinska Institutet, Stockholm, Sweden
| | - José M Dias
- Department of Cell and Molecular Biology, Karolinska Institutet, 171 77 Stockholm, Sweden
| | - Johan Ericson
- Department of Cell and Molecular Biology, Karolinska Institutet, 171 77 Stockholm, Sweden
| | - Andrey Alexeyenko
- Department of Microbiology, Tumor and Cell Biology (MTC), Karolinska Institutet, Stockholm, Sweden.,National Bioinformatics Infrastructure Sweden, Science for Life Laboratory, Box 1031, 171 21 Solna, Sweden
| |
Collapse
|
14
|
Fernández-Tajes J, Gaulton KJ, van de Bunt M, Torres J, Thurner M, Mahajan A, Gloyn AL, Lage K, McCarthy MI. Developing a network view of type 2 diabetes risk pathways through integration of genetic, genomic and functional data. Genome Med 2019; 11:19. [PMID: 30914061 PMCID: PMC6436236 DOI: 10.1186/s13073-019-0628-8] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2018] [Accepted: 03/08/2019] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND Genome-wide association studies (GWAS) have identified several hundred susceptibility loci for type 2 diabetes (T2D). One critical, but unresolved, issue concerns the extent to which the mechanisms through which these diverse signals influencing T2D predisposition converge on a limited set of biological processes. However, the causal variants identified by GWAS mostly fall into a non-coding sequence, complicating the task of defining the effector transcripts through which they operate. METHODS Here, we describe implementation of an analytical pipeline to address this question. First, we integrate multiple sources of genetic, genomic and biological data to assign positional candidacy scores to the genes that map to T2D GWAS signals. Second, we introduce genes with high scores as seeds within a network optimization algorithm (the asymmetric prize-collecting Steiner tree approach) which uses external, experimentally confirmed protein-protein interaction (PPI) data to generate high-confidence sub-networks. Third, we use GWAS data to test the T2D association enrichment of the "non-seed" proteins introduced into the network, as a measure of the overall functional connectivity of the network. RESULTS We find (a) non-seed proteins in the T2D protein-interaction network so generated (comprising 705 nodes) are enriched for association to T2D (p = 0.0014) but not control traits, (b) stronger T2D-enrichment for islets than other tissues when we use RNA expression data to generate tissue-specific PPI networks and (c) enhanced enrichment (p = 3.9 × 10- 5) when we combine the analysis of the islet-specific PPI network with a focus on the subset of T2D GWAS loci which act through defective insulin secretion. CONCLUSIONS These analyses reveal a pattern of non-random functional connectivity between candidate causal genes at T2D GWAS loci and highlight the products of genes including YWHAG, SMAD4 or CDK2 as potential contributors to T2D-relevant islet dysfunction. The approach we describe can be applied to other complex genetic and genomic datasets, facilitating integration of diverse data types into disease-associated networks.
Collapse
Affiliation(s)
- Juan Fernández-Tajes
- 0000 0004 1936 8948grid.4991.5Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK
| | - Kyle J. Gaulton
- 0000 0001 2107 4242grid.266100.3Department of Pediatrics, University of California, San Diego, CA USA
| | - Martijn van de Bunt
- 0000 0004 1936 8948grid.4991.5Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK ,0000 0004 1936 8948grid.4991.5Oxford Centre for Diabetes, Endocrinology and Metabolism, University of Oxford, Oxford, UK ,Present Address: Department of Bioinformatics and Data Mining, Novo Nordisk A/S, Maaloev, Denmark
| | - Jason Torres
- 0000 0004 1936 8948grid.4991.5Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK ,0000 0004 1936 8948grid.4991.5Oxford Centre for Diabetes, Endocrinology and Metabolism, University of Oxford, Oxford, UK
| | - Matthias Thurner
- 0000 0004 1936 8948grid.4991.5Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK ,0000 0004 1936 8948grid.4991.5Oxford Centre for Diabetes, Endocrinology and Metabolism, University of Oxford, Oxford, UK
| | - Anubha Mahajan
- 0000 0004 1936 8948grid.4991.5Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK
| | - Anna L. Gloyn
- 0000 0004 1936 8948grid.4991.5Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK ,0000 0004 1936 8948grid.4991.5Oxford Centre for Diabetes, Endocrinology and Metabolism, University of Oxford, Oxford, UK ,0000 0004 0488 9484grid.415719.fOxford NIHR Biomedical Research Centre, Churchill Hospital, Oxford, UK
| | - Kasper Lage
- 0000 0004 0386 9924grid.32224.35Department of Surgery, Massachusetts, General Hospital, Boston, MA USA ,grid.66859.34Broad Institute of MIT and Harvard, Cambridge, MA USA ,000000041936754Xgrid.38142.3cHarvard Medical School, Boston, MA USA
| | - Mark I. McCarthy
- 0000 0004 1936 8948grid.4991.5Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK ,0000 0004 1936 8948grid.4991.5Oxford Centre for Diabetes, Endocrinology and Metabolism, University of Oxford, Oxford, UK ,0000 0004 0488 9484grid.415719.fOxford NIHR Biomedical Research Centre, Churchill Hospital, Oxford, UK
| |
Collapse
|
15
|
Abstract
Viruses utilize a number of host factors in order to carry out their replication cycles. Influenza A virus (IAV) and human respiratory syncytial virus (RSV) both infect the tissues of the respiratory tract, and as such we hypothesize that they might require similar host factors. Several published genome-wide screens have identified putative IAV host factors; however, there is significant discordance between their hits. In order to build on this work, we integrated a variety of "OMICS" data sources using two complementary network analyses, yielding 51 genes enriched for both IAV and RSV replication. We designed a targeted small interfering RNA (siRNA)-based assay to screen these genes against IAV under robust conditions and identified 13 genes supported by two IAV subtypes in both primary and transformed human lung cells. One of these hits, RNA binding motif 14 (RBM14), was validated as a required host factor and furthermore was shown to relocalize to the nucleolus upon IAV infection but not with other viruses. Additionally, the IAV NS1 protein is both necessary and sufficient for RBM14 relocalization, and relocalization also requires the double-stranded RNA (dsRNA) binding capacity of NS1. This work reports the discovery of a new host requirement for IAV replication and exposes a novel example of interplay between IAV NS1 and the host protein, RBM14.IMPORTANCE Influenza A virus (IAV) and respiratory syncytial virus (RSV) present major global disease burdens. There are high economic costs associated with morbidity as well as significant mortality rates, especially in developing countries, in children, and in the elderly. There are currently limited therapeutic options for these viruses, which underscores the need for novel research into virus biology that may lead to the discovery of new therapeutic approaches. This work extends existing research into host factors involved in virus replication and explores the interaction between IAV and one such host factor, RBM14. Further study to fully characterize this interaction may elucidate novel mechanisms used by the virus during its replication cycle and open new avenues for understanding virus biology.
Collapse
|
16
|
Misra BB, Langefeld CD, Olivier M, Cox LA. Integrated Omics: Tools, Advances, and Future Approaches. J Mol Endocrinol 2018; 62:JME-18-0055. [PMID: 30006342 DOI: 10.1530/jme-18-0055] [Citation(s) in RCA: 220] [Impact Index Per Article: 36.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/24/2018] [Revised: 07/02/2018] [Accepted: 07/12/2018] [Indexed: 12/13/2022]
Abstract
With the rapid adoption of high-throughput omic approaches to analyze biological samples such as genomics, transcriptomics, proteomics, and metabolomics, each analysis can generate tera- to peta-byte sized data files on a daily basis. These data file sizes, together with differences in nomenclature among these data types, make the integration of these multi-dimensional omics data into biologically meaningful context challenging. Variously named as integrated omics, multi-omics, poly-omics, trans-omics, pan-omics, or shortened to just 'omics', the challenges include differences in data cleaning, normalization, biomolecule identification, data dimensionality reduction, biological contextualization, statistical validation, data storage and handling, sharing, and data archiving. The ultimate goal is towards the holistic realization of a 'systems biology' understanding of the biological question in hand. Commonly used approaches in these efforts are currently limited by the 3 i's - integration, interpretation, and insights. Post integration, these very large datasets aim to yield unprecedented views of cellular systems at exquisite resolution for transformative insights into processes, events, and diseases through various computational and informatics frameworks. With the continued reduction in costs and processing time for sample analyses, and increasing types of omics datasets generated such as glycomics, lipidomics, microbiomics, and phenomics, an increasing number of scientists in this interdisciplinary domain of bioinformatics face these challenges. We discuss recent approaches, existing tools, and potential caveats in the integration of omics datasets for development of standardized analytical pipelines that could be adopted by the global omics research community.
Collapse
Affiliation(s)
- Biswapriya B Misra
- B Misra, Internal Medicine, Wake Forest University School of Medicine, Winston-Salem, United States
| | - Carl D Langefeld
- C Langefeld, Biostatistical Sciences, Wake Forest University School of Medicine, Winston-Salem, United States
| | - Michael Olivier
- M Olivier, Internal Medicine, Wake Forest University School of Medicine, Winston-Salem, United States
| | - Laura A Cox
- L Cox, Internal Medicine, Wake Forest University School of Medicine, Winston-Salem, United States
| |
Collapse
|
17
|
Abstract
The diversity and huge omics data take biology and biomedicine research and application into a big data era, just like that popular in human society a decade ago. They are opening a new challenge from horizontal data ensemble (e.g., the similar types of data collected from different labs or companies) to vertical data ensemble (e.g., the different types of data collected for a group of person with match information), which requires the integrative analysis in biology and biomedicine and also asks for emergent development of data integration to address the great changes from previous population-guided to newly individual-guided investigations.Data integration is an effective concept to solve the complex problem or understand the complicate system. Several benchmark studies have revealed the heterogeneity and trade-off that existed in the analysis of omics data. Integrative analysis can combine and investigate many datasets in a cost-effective reproducible way. Current integration approaches on biological data have two modes: one is "bottom-up integration" mode with follow-up manual integration, and the other one is "top-down integration" mode with follow-up in silico integration.This paper will firstly summarize the combinatory analysis approaches to give candidate protocol on biological experiment design for effectively integrative study on genomics and then survey the data fusion approaches to give helpful instruction on computational model development for biological significance detection, which have also provided newly data resources and analysis tools to support the precision medicine dependent on the big biomedical data. Finally, the problems and future directions are highlighted for integrative analysis of omics big data.
Collapse
Affiliation(s)
- Xiang-Tian Yu
- Key Laboratory of Systems Biology, Institute of Biochemistry and Cell Biology, Chinese Academy Science, Shanghai, China
| | - Tao Zeng
- Key Laboratory of Systems Biology, Institute of Biochemistry and Cell Biology, Chinese Academy Science, Shanghai, China.
| |
Collapse
|
18
|
Sun Y, Ma C, Halgamuge S. The node-weighted Steiner tree approach to identify elements of cancer-related signaling pathways. BMC Bioinformatics 2017; 18:551. [PMID: 29297291 PMCID: PMC5751691 DOI: 10.1186/s12859-017-1958-4] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023] Open
Abstract
Background Cancer constitutes a momentous health burden in our society. Critical information on cancer may be hidden in its signaling pathways. However, even though a large amount of money has been spent on cancer research, some critical information on cancer-related signaling pathways still remains elusive. Hence, new works towards a complete understanding of cancer-related signaling pathways will greatly benefit the prevention, diagnosis, and treatment of cancer. Results We propose the node-weighted Steiner tree approach to identify important elements of cancer-related signaling pathways at the level of proteins. This new approach has advantages over previous approaches since it is fast in processing large protein-protein interaction networks. We apply this new approach to identify important elements of two well-known cancer-related signaling pathways: PI3K/Akt and MAPK. First, we generate a node-weighted protein-protein interaction network using protein and signaling pathway data. Second, we modify and use two preprocessing techniques and a state-of-the-art Steiner tree algorithm to identify a subnetwork in the generated network. Third, we propose two new metrics to select important elements from this subnetwork. On a commonly used personal computer, this new approach takes less than 2 s to identify the important elements of PI3K/Akt and MAPK signaling pathways in a large node-weighted protein-protein interaction network with 16,843 vertices and 1,736,922 edges. We further analyze and demonstrate the significance of these identified elements to cancer signal transduction by exploring previously reported experimental evidences. Conclusions Our node-weighted Steiner tree approach is shown to be both fast and effective to identify important elements of cancer-related signaling pathways. Furthermore, it may provide new perspectives into the identification of signaling pathways for other human diseases.
Collapse
Affiliation(s)
- Yahui Sun
- Department of Mechanical Engineering, The University of Melbourne, Melbourne, 3010, Australia.
| | - Chenkai Ma
- Department of Surgery, The University of Melbourne, Melbourne, 3010, Australia
| | - Saman Halgamuge
- Research School of Engineering, College of Engineering & Computer Science, The Australian National University, Canberra, 2601, ACT, Australia
| |
Collapse
|
19
|
Mohammadi S, Grama A. A convex optimization approach for identification of human tissue-specific interactomes. Bioinformatics 2017; 32:i243-i252. [PMID: 27307623 PMCID: PMC4908329 DOI: 10.1093/bioinformatics/btw245] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Motivation: Analysis of organism-specific interactomes has yielded novel insights into cellular function and coordination, understanding of pathology, and identification of markers and drug targets. Genes, however, can exhibit varying levels of cell type specificity in their expression, and their coordinated expression manifests in tissue-specific function and pathology. Tissue-specific/tissue-selective interaction mechanisms have significant applications in drug discovery, as they are more likely to reveal drug targets. Furthermore, tissue-specific transcription factors (tsTFs) are significantly implicated in human disease, including cancers. Finally, disease genes and protein complexes have the tendency to be differentially expressed in tissues in which defects cause pathology. These observations motivate the construction of refined tissue-specific interactomes from organism-specific interactomes. Results: We present a novel technique for constructing human tissue-specific interactomes. Using a variety of validation tests (Edge Set Enrichment Analysis, Gene Ontology Enrichment, Disease-Gene Subnetwork Compactness), we show that our proposed approach significantly outperforms state-of-the-art techniques. Finally, using case studies of Alzheimer’s and Parkinson’s diseases, we show that tissue-specific interactomes derived from our study can be used to construct pathways implicated in pathology and demonstrate the use of these pathways in identifying novel targets. Availability and implementation:http://www.cs.purdue.edu/homes/mohammas/projects/ActPro.html Contact:mohammadi@purdue.edu
Collapse
Affiliation(s)
- Shahin Mohammadi
- Department of Computer Sciences, Purdue University, West Lafayette, IN 47907, USA
| | - Ananth Grama
- Department of Computer Sciences, Purdue University, West Lafayette, IN 47907, USA
| |
Collapse
|
20
|
PCSF: An R-package for network-based interpretation of high-throughput data. PLoS Comput Biol 2017; 13:e1005694. [PMID: 28759592 PMCID: PMC5552342 DOI: 10.1371/journal.pcbi.1005694] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2017] [Revised: 08/10/2017] [Accepted: 07/23/2017] [Indexed: 11/19/2022] Open
Abstract
With the recent technological developments a vast amount of high-throughput data has been profiled to understand the mechanism of complex diseases. The current bioinformatics challenge is to interpret the data and underlying biology, where efficient algorithms for analyzing heterogeneous high-throughput data using biological networks are becoming increasingly valuable. In this paper, we propose a software package based on the Prize-collecting Steiner Forest graph optimization approach. The PCSF package performs fast and user-friendly network analysis of high-throughput data by mapping the data onto a biological networks such as protein-protein interaction, gene-gene interaction or any other correlation or coexpression based networks. Using the interaction networks as a template, it determines high-confidence subnetworks relevant to the data, which potentially leads to predictions of functional units. It also interactively visualizes the resulting subnetwork with functional enrichment analysis.
Collapse
|
21
|
From Proteomic Analysis to Potential Therapeutic Targets: Functional Profile of Two Lung Cancer Cell Lines, A549 and SW900, Widely Studied in Pre-Clinical Research. PLoS One 2016; 11:e0165973. [PMID: 27814385 PMCID: PMC5096714 DOI: 10.1371/journal.pone.0165973] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2016] [Accepted: 10/20/2016] [Indexed: 12/18/2022] Open
Abstract
Lung cancer is a serious health problem and the leading cause of cancer death worldwide. The standard use of cell lines as in vitro pre-clinical models to study the molecular mechanisms that drive tumorigenesis and access drug sensitivity/effectiveness is of undisputable importance. Label-free mass spectrometry and bioinformatics were employed to study the proteomic profiles of two representative lung cancer cell lines and to unravel the specific biological processes. Adenocarcinoma A549 cells were enriched in proteins related to cellular respiration, ubiquitination, apoptosis and response to drug/hypoxia/oxidative stress. In turn, squamous carcinoma SW900 cells were enriched in proteins related to translation, apoptosis, response to inorganic/organic substances and cytoskeleton organization. Several proteins with differential expression were related to cancer transformation, tumor resistance, proliferation, migration, invasion and metastasis. Combined analysis of proteome and interactome data highlighted key proteins and suggested that adenocarcinoma might be more prone to PI3K/Akt/mTOR and topoisomerase IIα inhibitors, and squamous carcinoma to Ck2 inhibitors. Moreover, ILF3 overexpression in adenocarcinoma, and PCNA and NEDD8 in squamous carcinoma shows them as promising candidates for therapeutic purposes. This study highlights the functional proteomic differences of two main subtypes of lung cancer models and hints several targeted therapies that might assist in this type of cancer.
Collapse
|
22
|
Alanis-Lobato G, Andrade-Navarro MA, Schaefer MH. HIPPIE v2.0: enhancing meaningfulness and reliability of protein-protein interaction networks. Nucleic Acids Res 2016; 45:D408-D414. [PMID: 27794551 PMCID: PMC5210659 DOI: 10.1093/nar/gkw985] [Citation(s) in RCA: 286] [Impact Index Per Article: 35.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2016] [Revised: 09/28/2016] [Accepted: 10/14/2016] [Indexed: 01/01/2023] Open
Abstract
The increasing number of experimentally detected interactions between proteins makes it difficult for researchers to extract the interactions relevant for specific biological processes or diseases. This makes it necessary to accompany the large-scale detection of protein–protein interactions (PPIs) with strategies and tools to generate meaningful PPI subnetworks. To this end, we generated the Human Integrated Protein–Protein Interaction rEference or HIPPIE (http://cbdm.uni-mainz.de/hippie/). HIPPIE is a one-stop resource for the generation and interpretation of PPI networks relevant to a specific research question. We provide means to generate highly reliable, context-specific PPI networks and to make sense out of them. We just released the second major update of HIPPIE, implementing various new features. HIPPIE grew substantially over the last years and now contains more than 270 000 confidence scored and annotated PPIs. We integrated different types of experimental information for the confidence scoring and the construction of context-specific networks. We implemented basic graph algorithms that highlight important proteins and interactions. HIPPIE's graphical interface implements several ways for wet lab and computational scientists alike to access the PPI data.
Collapse
Affiliation(s)
- Gregorio Alanis-Lobato
- Faculty of Biology, Johannes Gutenberg Universität, Mainz, Germany
- Institute of Molecular Biology, Mainz, Germany
| | - Miguel A Andrade-Navarro
- Faculty of Biology, Johannes Gutenberg Universität, Mainz, Germany
- Institute of Molecular Biology, Mainz, Germany
| | - Martin H Schaefer
- EMBL/CRG Systems Biology Research Unit, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
| |
Collapse
|
23
|
Lu S, Cai C, Yan G, Zhou Z, Wan Y, Chen V, Chen L, Cooper GF, Obeid LM, Hannun YA, Lee AV, Lu X. Signal-Oriented Pathway Analyses Reveal a Signaling Complex as a Synthetic Lethal Target for p53 Mutations. Cancer Res 2016; 76:6785-6794. [PMID: 27758891 DOI: 10.1158/0008-5472.can-16-1740] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2016] [Revised: 08/31/2016] [Accepted: 09/18/2016] [Indexed: 11/16/2022]
Abstract
Defining processes that are synthetic lethal with p53 mutations in cancer cells may reveal possible therapeutic strategies. In this study, we report the development of a signal-oriented computational framework for cancer pathway discovery in this context. We applied our bipartite graph-based functional module discovery algorithm to identify transcriptomic modules abnormally expressed in multiple tumors, such that the genes in a module were likely regulated by a common, perturbed signal. For each transcriptomic module, we applied our weighted k-path merge algorithm to search for a set of somatic genome alterations (SGA) that likely perturbed the signal, that is, the candidate members of the pathway that regulate the transcriptomic module. Computational evaluations indicated that our methods-identified pathways were perturbed by SGA. In particular, our analyses revealed that SGA affecting TP53, PTK2, YWHAZ, and MED1 perturbed a set of signals that promote cell proliferation, anchor-free colony formation, and epithelial-mesenchymal transition (EMT). These proteins formed a signaling complex that mediates these oncogenic processes in a coordinated fashion. Disruption of this signaling complex by knocking down PTK2, YWHAZ, or MED1 attenuated and reversed oncogenic phenotypes caused by mutant p53 in a synthetic lethal manner. This signal-oriented framework for searching pathways and therapeutic targets is applicable to all cancer types, thus potentially impacting precision medicine in cancer. Cancer Res; 76(23); 6785-94. ©2016 AACR.
Collapse
Affiliation(s)
- Songjian Lu
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania.,Center for Causal Discovery, University of Pittsburgh, Pittsburgh, Pennsylvania
| | - Chunhui Cai
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania.,Center for Causal Discovery, University of Pittsburgh, Pittsburgh, Pennsylvania
| | - Gonghong Yan
- University of Pittsburgh Cancer Institute, Pittsburgh, Pennsylvania.,Department of Pharmacology and Chemical Biology, University of Pittsburgh, Pittsburgh, Pennsylvania.,Magee-Womens Research Institute, Pittsburgh, Pennsylvania
| | - Zhuan Zhou
- University of Pittsburgh Cancer Institute, Pittsburgh, Pennsylvania.,Department of Cell Biology, University of Pittsburgh, Pittsburgh, Pennsylvania
| | - Yong Wan
- University of Pittsburgh Cancer Institute, Pittsburgh, Pennsylvania.,Department of Cell Biology, University of Pittsburgh, Pittsburgh, Pennsylvania
| | - Vicky Chen
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania.,Center for Causal Discovery, University of Pittsburgh, Pittsburgh, Pennsylvania
| | - Lujia Chen
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania.,Center for Causal Discovery, University of Pittsburgh, Pittsburgh, Pennsylvania
| | - Gregory F Cooper
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania.,Center for Causal Discovery, University of Pittsburgh, Pittsburgh, Pennsylvania
| | - Lina M Obeid
- Department of Medicine, the State University of New York at Stony Brook, Stony Brook, New York
| | - Yusuf A Hannun
- Department of Medicine, the State University of New York at Stony Brook, Stony Brook, New York
| | - Adrian V Lee
- Center for Causal Discovery, University of Pittsburgh, Pittsburgh, Pennsylvania. .,University of Pittsburgh Cancer Institute, Pittsburgh, Pennsylvania.,Department of Pharmacology and Chemical Biology, University of Pittsburgh, Pittsburgh, Pennsylvania.,Magee-Womens Research Institute, Pittsburgh, Pennsylvania
| | - Xinghua Lu
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania. .,Center for Causal Discovery, University of Pittsburgh, Pittsburgh, Pennsylvania
| |
Collapse
|
24
|
Keskin O, Tuncbag N, Gursoy A. Predicting Protein–Protein Interactions from the Molecular to the Proteome Level. Chem Rev 2016; 116:4884-909. [DOI: 10.1021/acs.chemrev.5b00683] [Citation(s) in RCA: 207] [Impact Index Per Article: 25.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Affiliation(s)
| | - Nurcan Tuncbag
- Graduate
School of Informatics, Department of Health Informatics, Middle East Technical University, 06800 Ankara, Turkey
| | | |
Collapse
|
25
|
Abstract
Cancer is now increasingly studied from the perspective of dysregulated pathways, rather than as a disease resulting from mutations of individual genes. A pathway-centric view acknowledges the heterogeneity between genomic profiles from different cancer patients while assuming that the mutated genes are likely to belong to the same pathway and cause similar disease phenotypes. Indeed, network-centric approaches have proven to be helpful for finding genotypic causes of diseases, classifying disease subtypes, and identifying drug targets. In this review, we discuss how networks can be used to help understand patient-to-patient variations and how one can leverage this variability to elucidate interactions between cancer drivers.
Collapse
Affiliation(s)
- Yoo-Ah Kim
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Dong-Yeon Cho
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Teresa M. Przytycka
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
| |
Collapse
|
26
|
Conesa A, Madrigal P, Tarazona S, Gomez-Cabrero D, Cervera A, McPherson A, Szcześniak MW, Gaffney DJ, Elo LL, Zhang X, Mortazavi A. A survey of best practices for RNA-seq data analysis. Genome Biol 2016; 17:13. [PMID: 26813401 PMCID: PMC4728800 DOI: 10.1186/s13059-016-0881-8] [Citation(s) in RCA: 1405] [Impact Index Per Article: 175.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
RNA-sequencing (RNA-seq) has a wide variety of applications, but no single analysis pipeline can be used in all cases. We review all of the major steps in RNA-seq data analysis, including experimental design, quality control, read alignment, quantification of gene and transcript levels, visualization, differential gene expression, alternative splicing, functional analysis, gene fusion detection and eQTL mapping. We highlight the challenges associated with each step. We discuss the analysis of small RNAs and the integration of RNA-seq with other functional genomics techniques. Finally, we discuss the outlook for novel technologies that are changing the state of the art in transcriptomics.
Collapse
Affiliation(s)
- Ana Conesa
- Institute for Food and Agricultural Sciences, Department of Microbiology and Cell Science, University of Florida, Gainesville, FL, 32603, USA. .,Centro de Investigación Príncipe Felipe, Genomics of Gene Expression Laboratory, 46012, Valencia, Spain.
| | - Pedro Madrigal
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK. .,Wellcome Trust-Medical Research Council Cambridge Stem Cell Institute, Anne McLaren Laboratory for Regenerative Medicine, Department of Surgery, University of Cambridge, Cambridge, CB2 0SZ, UK.
| | - Sonia Tarazona
- Centro de Investigación Príncipe Felipe, Genomics of Gene Expression Laboratory, 46012, Valencia, Spain.,Department of Applied Statistics, Operations Research and Quality, Universidad Politécnica de Valencia, 46020, Valencia, Spain
| | - David Gomez-Cabrero
- Unit of Computational Medicine, Department of Medicine, Karolinska Institutet, Karolinska University Hospital, 171 77, Stockholm, Sweden.,Center for Molecular Medicine, Karolinska Institutet, 17177, Stockholm, Sweden.,Unit of Clinical Epidemiology, Department of Medicine, Karolinska University Hospital, L8, 17176, Stockholm, Sweden.,Science for Life Laboratory, 17121, Solna, Sweden
| | - Alejandra Cervera
- Systems Biology Laboratory, Institute of Biomedicine and Genome-Scale Biology Research Program, University of Helsinki, 00014, Helsinki, Finland
| | - Andrew McPherson
- School of Computing Science, Simon Fraser University, Burnaby, V5A 1S6, BC, Canada
| | - Michał Wojciech Szcześniak
- Department of Bioinformatics, Institute of Molecular Biology and Biotechnology, Adam Mickiewicz University in Poznań, 61-614, Poznań, Poland
| | - Daniel J Gaffney
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Laura L Elo
- Turku Centre for Biotechnology, University of Turku and Åbo Akademi University, FI-20520, Turku, Finland
| | - Xuegong Zhang
- Key Lab of Bioinformatics/Bioinformatics Division, TNLIST and Department of Automation, Tsinghua University, Beijing, 100084, China.,School of Life Sciences, Tsinghua University, Beijing, 100084, China
| | - Ali Mortazavi
- Department of Developmental and Cell Biology, University of California, Irvine, Irvine, CA, 92697-2300, USA. .,Center for Complex Biological Systems, University of California, Irvine, Irvine, CA, 92697, USA.
| |
Collapse
|
27
|
Bersanelli M, Mosca E, Remondini D, Giampieri E, Sala C, Castellani G, Milanesi L. Methods for the integration of multi-omics data: mathematical aspects. BMC Bioinformatics 2016; 17 Suppl 2:15. [PMID: 26821531 PMCID: PMC4959355 DOI: 10.1186/s12859-015-0857-9] [Citation(s) in RCA: 221] [Impact Index Per Article: 27.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Methods for the integrative analysis of multi-omics data are required to draw a more complete and accurate picture of the dynamics of molecular systems. The complexity of biological systems, the technological limits, the large number of biological variables and the relatively low number of biological samples make the analysis of multi-omics datasets a non-trivial problem. RESULTS AND CONCLUSIONS We review the most advanced strategies for integrating multi-omics datasets, focusing on mathematical and methodological aspects.
Collapse
Affiliation(s)
- Matteo Bersanelli
- Department of Physics and Astronomy, Universita' di Bologna, Via B. Pichat 6/2, Bologna, 40127, Italy. .,Institute of Biomedical Technologies - CNR, Via Fratelli Cervi 93, Segrate MI, 20090, Italy.
| | - Ettore Mosca
- Institute of Biomedical Technologies - CNR, Via Fratelli Cervi 93, Segrate MI, 20090, Italy.
| | - Daniel Remondini
- Department of Physics and Astronomy, Universita' di Bologna, Via B. Pichat 6/2, Bologna, 40127, Italy.
| | - Enrico Giampieri
- Department of Physics and Astronomy, Universita' di Bologna, Via B. Pichat 6/2, Bologna, 40127, Italy.
| | - Claudia Sala
- Department of Physics and Astronomy, Universita' di Bologna, Via B. Pichat 6/2, Bologna, 40127, Italy.
| | - Gastone Castellani
- Department of Physics and Astronomy, Universita' di Bologna, Via B. Pichat 6/2, Bologna, 40127, Italy.
| | - Luciano Milanesi
- Institute of Biomedical Technologies - CNR, Via Fratelli Cervi 93, Segrate MI, 20090, Italy.
| |
Collapse
|
28
|
Husson SJ, Moyson S, Valkenborg D, Baggerman G, Mertens I. Proteomics applications in Caenorhabditis elegans research. Biochem Biophys Res Commun 2015; 468:519-24. [DOI: 10.1016/j.bbrc.2015.11.026] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2015] [Accepted: 11/04/2015] [Indexed: 01/04/2023]
|
29
|
Mahajan G, Mande SC. From System-Wide Differential Gene Expression to Perturbed Regulatory Factors: A Combinatorial Approach. PLoS One 2015; 10:e0142147. [PMID: 26562430 PMCID: PMC4642966 DOI: 10.1371/journal.pone.0142147] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2015] [Accepted: 10/19/2015] [Indexed: 11/19/2022] Open
Abstract
High-throughput experiments such as microarrays and deep sequencing provide large scale information on the pattern of gene expression, which undergoes extensive remodeling as the cell dynamically responds to varying environmental cues or has its function disrupted under pathological conditions. An important initial step in the systematic analysis and interpretation of genome-scale expression alteration involves identification of a set of perturbed transcriptional regulators whose differential activity can provide a proximate hypothesis to account for these transcriptomic changes. In the present work, we propose an unbiased and logically natural approach to transcription factor enrichment. It involves overlaying a list of experimentally determined differentially expressed genes on a background regulatory network coming from e.g. literature curation or computational motif scanning, and identifying that subset of regulators whose aggregated target set best discriminates between the altered and the unaffected genes. In other words, our methodology entails testing of all possible regulatory subnetworks, rather than just the target sets of individual regulators as is followed in most standard approaches. We have proposed an iterative search method to efficiently find such a combination, and benchmarked it on E. coli microarray and regulatory network data available in the public domain. Comparative analysis carried out on artificially generated differential expression profiles, as well as empirical factor overexpression data for M. tuberculosis, shows that our methodology provides marked improvement in accuracy of regulatory inference relative to the standard method that involves evaluating factor enrichment in an individual manner.
Collapse
|
30
|
Das P, Nutan KK, Singla-Pareek SL, Pareek A. Understanding salinity responses and adopting 'omics-based' approaches to generate salinity tolerant cultivars of rice. FRONTIERS IN PLANT SCIENCE 2015; 6:712. [PMID: 26442026 PMCID: PMC4563168 DOI: 10.3389/fpls.2015.00712] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/05/2015] [Accepted: 08/25/2015] [Indexed: 05/21/2023]
Abstract
Soil salinity is one of the main constraints affecting production of rice worldwide, by reducing growth, pollen viability as well as yield of the plant. Therefore, detailed understanding of the response of rice towards soil salinity at the physiological and molecular level is a prerequisite for its effective management. Various approaches have been adopted by molecular biologists or breeders to understand the mechanism for salinity tolerance in plants and to develop salt tolerant rice cultivars. Genome wide analysis using 'omics-based' tools followed by identification and functional validation of individual genes is becoming one of the popular approaches to tackle this task. On the other hand, mutation breeding and insertional mutagenesis has also been exploited to obtain salinity tolerant crop plants. This review looks into various responses at cellular and whole plant level generated in rice plants toward salinity stress thus, evaluating the suitability of intervention of functional genomics to raise stress tolerant plants. We have tried to highlight the usefulness of the contemporary 'omics-based' approaches such as genomics, proteomics, transcriptomics and phenomics towards dissecting out the salinity tolerance trait in rice. In addition, we have highlighted the importance of integration of various 'omics' approaches to develop an understanding of the machinery involved in salinity response in rice and to move forward to develop salt tolerant cultivars of rice.
Collapse
Affiliation(s)
- Priyanka Das
- Stress Physiology and Molecular Biology Laboratory, School of Life Sciences, Jawaharlal Nehru UniversityNew Delhi, India
| | - Kamlesh K. Nutan
- Stress Physiology and Molecular Biology Laboratory, School of Life Sciences, Jawaharlal Nehru UniversityNew Delhi, India
| | - Sneh L. Singla-Pareek
- Plant Molecular Biology Group, International Centre for Genetic Engineering and BiotechnologyNew Delhi, India
| | - Ashwani Pareek
- Stress Physiology and Molecular Biology Laboratory, School of Life Sciences, Jawaharlal Nehru UniversityNew Delhi, India
| |
Collapse
|
31
|
Wachter A, Beißbarth T. pwOmics: an R package for pathway-based integration of time-series omics data using public database knowledge. Bioinformatics 2015; 31:3072-4. [PMID: 26002883 DOI: 10.1093/bioinformatics/btv323] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2015] [Accepted: 05/17/2015] [Indexed: 02/07/2023] Open
Abstract
UNLABELLED Characterization of biological processes is progressively enabled with the increased generation of omics data on different signaling levels. Here we present a straightforward approach for the integrative analysis of data from different high-throughput technologies based on pathway and interaction models from public databases. pwOmics performs pathway-based level-specific data comparison of coupled human proteomic and genomic/transcriptomic datasets based on their log fold changes. Separate downstream and upstream analyses results on the functional levels of pathways, transcription factors and genes/transcripts are performed in the cross-platform consensus analysis. These provide a basis for the combined interpretation of regulatory effects over time. Via network reconstruction and inference methods (Steiner tree, dynamic Bayesian network inference) consensus graphical networks can be generated for further analyses and visualization. AVAILABILITY AND IMPLEMENTATION The R package pwOmics is freely available on Bioconductor (http://www.bioconductor.org/). CONTACT astrid.wachter@med.uni-goettingen.de.
Collapse
Affiliation(s)
- Astrid Wachter
- Department of Medical Statistics, Georg-August-University Göttingen, Germany
| | - Tim Beißbarth
- Department of Medical Statistics, Georg-August-University Göttingen, Germany
| |
Collapse
|
32
|
Rameseder J, Krismer K, Dayma Y, Ehrenberger T, Hwang MK, Airoldi EM, Floyd SR, Yaffe MB. A Multivariate Computational Method to Analyze High-Content RNAi Screening Data. ACTA ACUST UNITED AC 2015; 20:985-97. [PMID: 25918037 DOI: 10.1177/1087057115583037] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2014] [Accepted: 03/26/2015] [Indexed: 01/18/2023]
Abstract
High-content screening (HCS) using RNA interference (RNAi) in combination with automated microscopy is a powerful investigative tool to explore complex biological processes. However, despite the plethora of data generated from these screens, little progress has been made in analyzing HC data using multivariate methods that exploit the full richness of multidimensional data. We developed a novel multivariate method for HCS, multivariate robust analysis method (M-RAM), integrating image feature selection with ranking of perturbations for hit identification, and applied this method to an HC RNAi screen to discover novel components of the DNA damage response in an osteosarcoma cell line. M-RAM automatically selects the most informative phenotypic readouts and time points to facilitate the more efficient design of follow-up experiments and enhance biological understanding. Our method outperforms univariate hit identification and identifies relevant genes that these approaches would have missed. We found that statistical cell-to-cell variation in phenotypic responses is an important predictor of hits in RNAi-directed image-based screens. Genes that we identified as modulators of DNA damage signaling in U2OS cells include B-Raf, a cancer driver gene in multiple tumor types, whose role in DNA damage signaling we confirm experimentally, and multiple subunits of protein kinase A.
Collapse
Affiliation(s)
- Jonathan Rameseder
- Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA, USA Computational Systems Biology Initiative, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Konstantin Krismer
- Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Yogesh Dayma
- Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Tobias Ehrenberger
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Mun Kyung Hwang
- Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Edoardo M Airoldi
- Department of Statistics and FAS Center for Systems Biology, Harvard University, Cambridge, MA, USA Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Scott R Floyd
- Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA, USA Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA, USA
| | - Michael B Yaffe
- Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA, USA Broad Institute of MIT and Harvard, Cambridge, MA, USA Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
33
|
De Maeyer D, Weytjens B, Renkens J, De Raedt L, Marchal K. PheNetic: network-based interpretation of molecular profiling data. Nucleic Acids Res 2015; 43:W244-50. [PMID: 25878035 PMCID: PMC4489255 DOI: 10.1093/nar/gkv347] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2015] [Accepted: 04/03/2015] [Indexed: 12/17/2022] Open
Abstract
Molecular profiling experiments have become standard in current wet-lab practices. Classically, enrichment analysis has been used to identify biological functions related to these experimental results. Combining molecular profiling results with the wealth of currently available interactomics data, however, offers the opportunity to identify the molecular mechanism behind an observed molecular phenotype. In this paper, we therefore introduce ‘PheNetic’, a user-friendly web server for inferring a sub-network based on probabilistic logical querying. PheNetic extracts from an interactome, the sub-network that best explains genes prioritized through a molecular profiling experiment. Depending on its run mode, PheNetic searches either for a regulatory mechanism that gave explains to the observed molecular phenotype or for the pathways (in)activated in the molecular phenotype. The web server provides access to a large number of interactomes, making sub-network inference readily applicable to a wide variety of organisms. The inferred sub-networks can be interactively visualized in the browser. PheNetic's method and use are illustrated using an example analysis of differential expression results of ampicillin treated Escherichia coli cells. The PheNetic web service is available at http://bioinformatics.intec.ugent.be/phenetic/.
Collapse
Affiliation(s)
- Dries De Maeyer
- Dept. of Microbial and Molecular Systems, KULeuven, Leuven, 3000, Belgium Dept. of Information Technology (INTEC, iMINDS), U.Ghent, Ghent, 9052, Belgium
| | - Bram Weytjens
- Dept. of Microbial and Molecular Systems, KULeuven, Leuven, 3000, Belgium Dept. of Information Technology (INTEC, iMINDS), U.Ghent, Ghent, 9052, Belgium
| | - Joris Renkens
- Dept. of Computer Science, KULeuven, Leuven, 3000, Belgium
| | - Luc De Raedt
- Dept. of Computer Science, KULeuven, Leuven, 3000, Belgium
| | - Kathleen Marchal
- Dept. of Microbial and Molecular Systems, KULeuven, Leuven, 3000, Belgium Dept. of Information Technology (INTEC, iMINDS), U.Ghent, Ghent, 9052, Belgium Dept. of Plant Biotechnology and Bioinformatics, U.Ghent, Ghent, 9052, Belgium
| |
Collapse
|
34
|
Gosline SJC, Oh C, Fraenkel E. SAMNetWeb: identifying condition-specific networks linking signaling and transcription. Bioinformatics 2014; 31:1124-6. [PMID: 25414365 DOI: 10.1093/bioinformatics/btu748] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2014] [Accepted: 11/07/2014] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION High-throughput datasets such as genetic screens, mRNA expression assays and global phospho-proteomic experiments are often difficult to interpret due to inherent noise in each experimental system. Computational tools have improved interpretation of these datasets by enabling the identification of biological processes and pathways that are most likely to explain the measured results. These tools are primarily designed to analyse data from a single experiment (e.g. drug treatment versus control), creating a need for computational algorithms that can handle heterogeneous datasets across multiple experimental conditions at once. SUMMARY We introduce SAMNetWeb, a web-based tool that enables functional enrichment analysis and visualization of high-throughput datasets. SAMNetWeb can analyse two distinct data types (e.g. mRNA expression and global proteomics) simultaneously across multiple experimental systems to identify pathways activated in these experiments and then visualize the pathways in a single interaction network. Through the use of a multi-commodity flow based algorithm that requires each experiment 'share' underlying protein interactions, SAMNetWeb can identify distinct and common pathways across experiments. AVAILABILITY AND IMPLEMENTATION SAMNetWeb is freely available at http://fraenkel.mit.edu/samnetweb.
Collapse
Affiliation(s)
- Sara J C Gosline
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Coyin Oh
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Ernest Fraenkel
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| |
Collapse
|
35
|
Boyanova D, Nilla S, Klau GW, Dandekar T, Müller T, Dittrich M. Functional module search in protein networks based on semantic similarity improves the analysis of proteomics data. Mol Cell Proteomics 2014; 13:1877-89. [PMID: 24807868 DOI: 10.1074/mcp.m113.032839] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
The continuously evolving field of proteomics produces increasing amounts of data while improving the quality of protein identifications. Albeit quantitative measurements are becoming more popular, many proteomic studies are still based on non-quantitative methods for protein identification. These studies result in potentially large sets of identified proteins, where the biological interpretation of proteins can be challenging. Systems biology develops innovative network-based methods, which allow an integrated analysis of these data. Here we present a novel approach, which combines prior knowledge of protein-protein interactions (PPI) with proteomics data using functional similarity measurements of interacting proteins. This integrated network analysis exactly identifies network modules with a maximal consistent functional similarity reflecting biological processes of the investigated cells. We validated our approach on small (H9N2 virus-infected gastric cells) and large (blood constituents) proteomic data sets. Using this novel algorithm, we identified characteristic functional modules in virus-infected cells, comprising key signaling proteins (e.g. the stress-related kinase RAF1) and demonstrate that this method allows a module-based functional characterization of cell types. Analysis of a large proteome data set of blood constituents resulted in clear separation of blood cells according to their developmental origin. A detailed investigation of the T-cell proteome further illustrates how the algorithm partitions large networks into functional subnetworks each representing specific cellular functions. These results demonstrate that the integrated network approach not only allows a detailed analysis of proteome networks but also yields a functional decomposition of complex proteomic data sets and thereby provides deeper insights into the underlying cellular processes of the investigated system.
Collapse
Affiliation(s)
- Desislava Boyanova
- From the ‡Department of Bioinformatics, Biocenter, University of Würzburg, Am Hubland, D-97074 Würzburg, Germany
| | - Santosh Nilla
- From the ‡Department of Bioinformatics, Biocenter, University of Würzburg, Am Hubland, D-97074 Würzburg, Germany
| | - Gunnar W Klau
- §Life Sciences, Centrum Wiskunde & Informatica (CWI), Science Park 123, 1098 XG Amsterdam, The Netherlands
| | - Thomas Dandekar
- From the ‡Department of Bioinformatics, Biocenter, University of Würzburg, Am Hubland, D-97074 Würzburg, Germany
| | - Tobias Müller
- From the ‡Department of Bioinformatics, Biocenter, University of Würzburg, Am Hubland, D-97074 Würzburg, Germany
| | - Marcus Dittrich
- From the ‡Department of Bioinformatics, Biocenter, University of Würzburg, Am Hubland, D-97074 Würzburg, Germany;
| |
Collapse
|
36
|
Lan A, Ziv-Ukelson M, Yeger-Lotem E. A context-sensitive framework for the analysis of human signalling pathways in molecular interaction networks. Bioinformatics 2013; 29:i210-6. [PMID: 23812986 PMCID: PMC3694656 DOI: 10.1093/bioinformatics/btt240] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open
Abstract
MOTIVATION A major challenge in systems biology is to reveal the cellular pathways that give rise to specific phenotypes and behaviours. Current techniques often rely on a network representation of molecular interactions, where each node represents a protein or a gene and each interaction is assigned a single static score. However, the use of single interaction scores fails to capture the tendency of proteins to favour different partners under distinct cellular conditions. RESULTS Here, we propose a novel context-sensitive network model, in which genes and protein nodes are assigned multiple contexts based on their gene ontology annotations, and their interactions are associated with multiple context-sensitive scores. Using this model, we developed a new approach and a corresponding tool, ContextNet, based on a dynamic programming algorithm for identifying signalling paths linking proteins to their downstream target genes. ContextNet finds high-ranking context-sensitive paths in the interactome, thereby revealing the intermediate proteins in the path and their path-specific contexts. We validated the model using 18 348 manually curated cellular paths derived from the SPIKE database. We next applied our framework to elucidate the responses of human primary lung cells to influenza infection. Top-ranking paths were much more likely to contain infection-related proteins, and this likelihood was highly correlated with path score. Moreover, the contexts assigned by the algorithm pointed to putative, as well as previously known responses to viral infection. Thus, context sensitivity is an important extension to current network biology models and can be efficiently used to elucidate cellular response mechanisms. AVAILABILITY ContextNet is publicly available at http://netbio.bgu.ac.il/ContextNet. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Alexander Lan
- Department of Computer Science, National Center for Biotechnology in the Negev, Ben-Gurion University of the Negev, Beer-Sheva 84105, Israel
| | | | | |
Collapse
|
37
|
Gwinner F, Acosta-Martin AE, Boytard L, Chwastyniak M, Beseme O, Drobecq H, Duban-Deweer S, Juthier F, Jude B, Amouyel P, Pinet F, Schwikowski B. Identification of additional proteins in differential proteomics using protein interaction networks. Proteomics 2013; 13:1065-76. [PMID: 23386401 DOI: 10.1002/pmic.201200482] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2012] [Revised: 12/22/2012] [Accepted: 01/07/2013] [Indexed: 01/08/2023]
Abstract
In this study, we developed a novel computational approach based on protein-protein interaction networks to identify a list of proteins that might have remained undetected in differential proteomic profiling experiments. We tested our computational approach on two sets of human smooth muscle cell protein extracts that were affected differently by DNase I treatment. Differential proteomic analysis by saturation DIGE resulted in the identification of 41 human proteins. The application of our approach to these 41 input proteins consisted of four steps: (i) Compilation of a human protein-protein interaction network from public databases; (ii) calculation of interaction scores based on functional similarity; (iii) determination of a set of candidate proteins that are needed to efficiently and confidently connect the 41 input proteins; and (iv) ranking of the resulting 25 candidate proteins. Two of the three highest-ranked proteins, beta-arrestin 1, and beta-arrestin 2, were experimentally tested, revealing that their abundance levels in human smooth muscle cell samples were indeed affected by DNase I treatment. These proteins had not been detected during the experimental proteomic analysis. Our study suggests that our computational approach may represent a simple, universal, and cost-effective means to identify additional proteins that remain elusive for current 2D gel-based proteomic profiling techniques.
Collapse
Affiliation(s)
- Frederik Gwinner
- Department of Genomes and Genetics, Systems Biology Laboratory, Institut Pasteur, Paris, France
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
38
|
Basha O, Tirman S, Eluk A, Yeger-Lotem E. ResponseNet2.0: Revealing signaling and regulatory pathways connecting your proteins and genes--now with human data. Nucleic Acids Res 2013; 41:W198-203. [PMID: 23761447 PMCID: PMC3692079 DOI: 10.1093/nar/gkt532] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Genome sequencing and transcriptomic profiling are two widely used approaches for the identification of human disease pathways. However, each approach typically provides a limited view of disease pathways: Genome sequencing can identify disease-related mutations but rarely reveals their mode-of-action, while transcriptomic assays do not reveal the series of events that lead to the transcriptomic change. ResponseNet is an integrative network-optimization approach that we developed to fill these gaps by highlighting major signaling and regulatory molecular interaction paths that connect disease-related mutations and genes. The ResponseNet web-server provides a user-friendly interface to ResponseNet. Specifically, users can upload weighted lists of proteins and genes and obtain a sparse, weighted, molecular interaction subnetwork connecting them, that is biased toward regulatory and signaling pathways. ResponseNet2.0 enhances the functionality of the ResponseNet web-server in two important ways. First, it supports analysis of human data by offering a human interactome composed of proteins, genes and micro-RNAs. Second, it offers a new informative view of the output, including a randomization analysis, to help users assess the biological relevance of the output subnetwork. ResponseNet2.0 is available at http://netbio.bgu.ac.il/respnet .
Collapse
Affiliation(s)
- Omer Basha
- Department of Clinical Biochemistry & Pharmacology, Ben-Gurion University of the Negev, Beer-Sheva 84105, Israel
| | | | | | | |
Collapse
|
39
|
Abstract
High-throughput experimental technologies are generating increasingly massive and complex genomic data sets. The sheer enormity and heterogeneity of these data threaten to make the arising problems computationally infeasible. Fortunately, powerful algorithmic techniques lead to software that can answer important biomedical questions in practice. In this Review, we sample the algorithmic landscape, focusing on state-of-the-art techniques, the understanding of which will aid the bench biologist in analysing omics data. We spotlight specific examples that have facilitated and enriched analyses of sequence, transcriptomic and network data sets.
Collapse
Affiliation(s)
- Bonnie Berger
- Department of Mathematics and Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA.
| | | | | |
Collapse
|
40
|
Csermely P, Korcsmáros T, Kiss HJM, London G, Nussinov R. Structure and dynamics of molecular networks: a novel paradigm of drug discovery: a comprehensive review. Pharmacol Ther 2013; 138:333-408. [PMID: 23384594 PMCID: PMC3647006 DOI: 10.1016/j.pharmthera.2013.01.016] [Citation(s) in RCA: 512] [Impact Index Per Article: 46.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2013] [Accepted: 01/22/2013] [Indexed: 02/02/2023]
Abstract
Despite considerable progress in genome- and proteome-based high-throughput screening methods and in rational drug design, the increase in approved drugs in the past decade did not match the increase of drug development costs. Network description and analysis not only give a systems-level understanding of drug action and disease complexity, but can also help to improve the efficiency of drug design. We give a comprehensive assessment of the analytical tools of network topology and dynamics. The state-of-the-art use of chemical similarity, protein structure, protein-protein interaction, signaling, genetic interaction and metabolic networks in the discovery of drug targets is summarized. We propose that network targeting follows two basic strategies. The "central hit strategy" selectively targets central nodes/edges of the flexible networks of infectious agents or cancer cells to kill them. The "network influence strategy" works against other diseases, where an efficient reconfiguration of rigid networks needs to be achieved by targeting the neighbors of central nodes/edges. It is shown how network techniques can help in the identification of single-target, edgetic, multi-target and allo-network drug target candidates. We review the recent boom in network methods helping hit identification, lead selection optimizing drug efficacy, as well as minimizing side-effects and drug toxicity. Successful network-based drug development strategies are shown through the examples of infections, cancer, metabolic diseases, neurodegenerative diseases and aging. Summarizing >1200 references we suggest an optimized protocol of network-aided drug development, and provide a list of systems-level hallmarks of drug quality. Finally, we highlight network-related drug development trends helping to achieve these hallmarks by a cohesive, global approach.
Collapse
Affiliation(s)
- Peter Csermely
- Department of Medical Chemistry, Semmelweis University, P.O. Box 260, H-1444 Budapest 8, Hungary.
| | | | | | | | | |
Collapse
|
41
|
Sales G, Calura E, Martini P, Romualdi C. Graphite Web: Web tool for gene set analysis exploiting pathway topology. Nucleic Acids Res 2013; 41:W89-97. [PMID: 23666626 PMCID: PMC3977659 DOI: 10.1093/nar/gkt386] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Graphite web is a novel web tool for pathway analyses and network visualization for gene
expression data of both microarray and RNA-seq experiments. Several pathway analyses have
been proposed either in the univariate or in the global and multivariate context to tackle
the complexity and the interpretation of expression results. These methods can be further
divided into ‘topological’ and ‘non-topological’ methods according
to their ability to gain power from pathway topology. Biological pathways are, in fact,
not only gene lists but can be represented through a network where genes and connections
are, respectively, nodes and edges. To this day, the most used approaches are
non-topological and univariate although they miss the relationship among genes. On the
contrary, topological and multivariate approaches are more powerful, but difficult to be
used by researchers without bioinformatic skills. Here we present Graphite web, the first
public web server for pathway analysis on gene expression data that combines topological
and multivariate pathway analyses with an efficient system of interactive network
visualizations for easy results interpretation. Specifically, Graphite web implements five
different gene set analyses on three model organisms and two pathway databases. Graphite
Web is freely available at http://graphiteweb.bio.unipd.it/.
Collapse
Affiliation(s)
- Gabriele Sales
- Department of Biology, University of Padova, Via U. Bassi 58/B, 35121 Padova, Italy
| | | | | | | |
Collapse
|
42
|
Sadeghi A, Fröhlich H. Steiner tree methods for optimal sub-network identification: an empirical study. BMC Bioinformatics 2013; 14:144. [PMID: 23627667 PMCID: PMC3674966 DOI: 10.1186/1471-2105-14-144] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2012] [Accepted: 03/27/2013] [Indexed: 01/19/2023] Open
Abstract
Background Analysis and interpretation of biological networks is one of the primary goals of systems biology. In this context identification of sub-networks connecting sets of seed proteins or seed genes plays a crucial role. Given that no natural node and edge weighting scheme is available retrieval of a minimum size sub-graph leads to the classical Steiner tree problem, which is known to be NP-complete. Many approximate solutions have been published and theoretically analyzed in the computer science literature, but far less is known about their practical performance in the bioinformatics field. Results Here we conducted a systematic simulation study of four different approximate and one exact algorithms on a large human protein-protein interaction network with ~14,000 nodes and ~400,000 edges. Moreover, we devised an own algorithm to retrieve a sub-graph of merged Steiner trees. The application of our algorithms was demonstrated for two breast cancer signatures and a sub-network playing a role in male pattern baldness. Conclusion We found a modified version of the shortest paths based approximation algorithm by Takahashi and Matsuyama to lead to accurate solutions, while at the same time being several orders of magnitude faster than the exact approach. Our devised algorithm for merged Steiner trees, which is a further development of the Takahashi and Matsuyama algorithm, proved to be useful for small seed lists. All our implemented methods are available in the R-package SteinerNet on CRAN (http://www.r-project.org) and as a supplement to this paper.
Collapse
Affiliation(s)
- Afshin Sadeghi
- Bonn-Aachen International Center for IT, Rheinische Friedrich-Wilhelms Universitat Bonn, Dahlmannstr 2, 53113 Bonn, Germany.
| | | |
Collapse
|
43
|
Ma NL, Rahmat Z, Lam SS. A review of the "Omics" approach to biomarkers of oxidative stress in Oryza sativa. Int J Mol Sci 2013; 14:7515-41. [PMID: 23567269 PMCID: PMC3645701 DOI: 10.3390/ijms14047515] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2013] [Revised: 03/20/2013] [Accepted: 03/20/2013] [Indexed: 12/27/2022] Open
Abstract
Physiological and ecological constraints that cause the slow growth and depleted production of crops have raised a major concern in the agriculture industry as they represent a possible threat of short food supply in the future. The key feature that regulates the stress signaling pathway is always related to the reactive oxygen species (ROS). The accumulation of ROS in plant cells would leave traces of biomarkers at the genome, proteome, and metabolome levels, which could be identified with the recent technological breakthrough coupled with improved performance of bioinformatics. This review highlights the recent breakthrough in molecular strategies (comprising transcriptomics, proteomics, and metabolomics) in identifying oxidative stress biomarkers and the arising opportunities and obstacles observed in research on biomarkers in rice. The major issue in incorporating bioinformatics to validate the biomarkers from different omic platforms for the use of rice-breeding programs is also discussed. The development of powerful techniques for identification of oxidative stress-related biomarkers and the integration of data from different disciplines shed light on the oxidative response pathways in plants.
Collapse
Affiliation(s)
- Nyuk Ling Ma
- Department of Biology, Faculty of Science and Technology, University Malaysia Terengganu, 21030 Kuala Terengganu, Terengganu, Malaysia
| | - Zaidah Rahmat
- Department of Biotechnology and Medical Engineering, Faculty of Biosciences and Medical Engineering, University Technology Malaysia, 81310 Johor Bahru, Johor, Malaysia; E-Mail:
| | - Su Shiung Lam
- Department of Engineering Science, Faculty of Science and Technology, University Malaysia Terengganu, 21030 Kuala Terengganu, Terengganu, Malaysia; E-Mail:
| |
Collapse
|
44
|
Atias N, Sharan R. iPoint: an integer programming based algorithm for inferring protein subnetworks. MOLECULAR BIOSYSTEMS 2013; 9:1662-9. [PMID: 23385645 DOI: 10.1039/c3mb25432a] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
Large scale screening experiments have become the workhorse of molecular biology, producing data at an ever increasing scale. The interpretation of such data, particularly in the context of a protein interaction network, has the potential to shed light on the molecular pathways underlying the phenotype or the process in question. A host of approaches have been developed in recent years to tackle this reconstruction challenge. These approaches aim to infer a compact subnetwork that connects the genes revealed by the screen while optimizing local (individual path lengths) or global (likelihood) aspects of the subnetwork. Yosef et al. [Mol. Syst. Biol., 2009, 5, 248] were the first to provide a joint optimization of both criteria, albeit approximate in nature. Here we devise an integer linear programming formulation for the joint optimization problem, allowing us to solve it to optimality in minutes on current networks. We apply our algorithm, iPoint, to various data sets in yeast and human and evaluate its performance against state-of-the-art algorithms. We show that iPoint attains very compact and accurate solutions that outperform previous network inference algorithms with respect to their local and global attributes, their consistency across multiple experiments targeting the same pathway, and their agreement with current biological knowledge.
Collapse
Affiliation(s)
- Nir Atias
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv 69978, Israel
| | | |
Collapse
|
45
|
Mosca E, Milanesi L. Network-based analysis of omics with multi-objective optimization. MOLECULAR BIOSYSTEMS 2013; 9:2971-80. [DOI: 10.1039/c3mb70327d] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
|
46
|
Abstract
Complex diseases are caused by a combination of genetic and environmental factors. Uncovering the molecular pathways through which genetic factors affect a phenotype is always difficult, but in the case of complex diseases this is further complicated since genetic factors in affected individuals might be different. In recent years, systems biology approaches and, more specifically, network based approaches emerged as powerful tools for studying complex diseases. These approaches are often built on the knowledge of physical or functional interactions between molecules which are usually represented as an interaction network. An interaction network not only reports the binary relationships between individual nodes but also encodes hidden higher level organization of cellular communication. Computational biologists were challenged with the task of uncovering this organization and utilizing it for the understanding of disease complexity, which prompted rich and diverse algorithmic approaches to be proposed. We start this chapter with a description of the general characteristics of complex diseases followed by a brief introduction to physical and functional networks. Next we will show how these networks are used to leverage genotype, gene expression, and other types of data to identify dysregulated pathways, infer the relationships between genotype and phenotype, and explain disease heterogeneity. We group the methods by common underlying principles and first provide a high level description of the principles followed by more specific examples. We hope that this chapter will give readers an appreciation for the wealth of algorithmic techniques that have been developed for the purpose of studying complex diseases as well as insight into their strengths and limitations.
Collapse
Affiliation(s)
- Dong-Yeon Cho
- National Center for Biotechnology Information, NLM, NIH, Bethesda, Maryland, United States of America
| | - Yoo-Ah Kim
- National Center for Biotechnology Information, NLM, NIH, Bethesda, Maryland, United States of America
| | - Teresa M. Przytycka
- National Center for Biotechnology Information, NLM, NIH, Bethesda, Maryland, United States of America
- * E-mail:
| |
Collapse
|