1
|
James K, Alsobhe A, Cockell SJ, Wipat A, Pocock M. Integration of probabilistic functional networks without an external Gold Standard. BMC Bioinformatics 2022; 23:302. [PMID: 35879662 PMCID: PMC9316706 DOI: 10.1186/s12859-022-04834-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2021] [Accepted: 07/11/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Probabilistic functional integrated networks (PFINs) are designed to aid our understanding of cellular biology and can be used to generate testable hypotheses about protein function. PFINs are generally created by scoring the quality of interaction datasets against a Gold Standard dataset, usually chosen from a separate high-quality data source, prior to their integration. Use of an external Gold Standard has several drawbacks, including data redundancy, data loss and the need for identifier mapping, which can complicate the network build and impact on PFIN performance. Additionally, there typically are no Gold Standard data for non-model organisms. RESULTS We describe the development of an integration technique, ssNet, that scores and integrates both high-throughput and low-throughout data from a single source database in a consistent manner without the need for an external Gold Standard dataset. Using data from Saccharomyces cerevisiae we show that ssNet is easier and faster, overcoming the challenges of data redundancy, Gold Standard bias and ID mapping. In addition ssNet results in less loss of data and produces a more complete network. CONCLUSIONS The ssNet method allows PFINs to be built successfully from a single database, while producing comparable network performance to networks scored using an external Gold Standard source and with reduced data loss.
Collapse
Affiliation(s)
- Katherine James
- Department of Applied Sciences, Northumbria University, Sandyford Rd, Newcastle upon Tyne, NE1 8ST, UK. .,Interdisciplinary Computing and Complex BioSystems Group, Newcastle University, Science Square, Newcastle upon Tyne, NE4 5TG, UK.
| | - Aoesha Alsobhe
- Interdisciplinary Computing and Complex BioSystems Group, Newcastle University, Science Square, Newcastle upon Tyne, NE4 5TG, UK.,Saudi Electronic University, Abi Bakr As Siddiq Branch Rd, Riyadh, 1332, Saudi Arabia
| | - Simon J Cockell
- School of Biomedical, Nutritional and Sports Science, Faculty of Medical Sciences, Newcastle University, Framlington Place, Newcastle upon Tyne, NE2 4HH, UK
| | - Anil Wipat
- Interdisciplinary Computing and Complex BioSystems Group, Newcastle University, Science Square, Newcastle upon Tyne, NE4 5TG, UK
| | - Matthew Pocock
- Interdisciplinary Computing and Complex BioSystems Group, Newcastle University, Science Square, Newcastle upon Tyne, NE4 5TG, UK
| |
Collapse
|
2
|
OUP accepted manuscript. Brief Funct Genomics 2022; 21:243-269. [DOI: 10.1093/bfgp/elac007] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2021] [Revised: 03/17/2022] [Accepted: 03/18/2022] [Indexed: 11/14/2022] Open
|
3
|
Nepomuceno-Chamorro IA, Nepomuceno JA, Galván-Rojas JL, Vega-Márquez B, Rubio-Escudero C. Using prior knowledge in the inference of gene association networks. APPL INTELL 2020. [DOI: 10.1007/s10489-020-01705-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
4
|
Hwang S, Kim CY, Yang S, Kim E, Hart T, Marcotte EM, Lee I. HumanNet v2: human gene networks for disease research. Nucleic Acids Res 2020; 47:D573-D580. [PMID: 30418591 PMCID: PMC6323914 DOI: 10.1093/nar/gky1126] [Citation(s) in RCA: 114] [Impact Index Per Article: 28.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2018] [Accepted: 10/25/2018] [Indexed: 12/15/2022] Open
Abstract
Human gene networks have proven useful in many aspects of disease research, with numerous network-based strategies developed for generating hypotheses about gene-disease-drug associations. The ability to predict and organize genes most relevant to a specific disease has proven especially important. We previously developed a human functional gene network, HumanNet, by integrating diverse types of omics data using Bayesian statistics framework and demonstrated its ability to retrieve disease genes. Here, we present HumanNet v2 (http://www.inetbio.org/humannet), a database of human gene networks, which was updated by incorporating new data types, extending data sources and improving network inference algorithms. HumanNet now comprises a hierarchy of human gene networks, allowing for more flexible incorporation of network information into studies. HumanNet performs well in ranking disease-linked gene sets with minimal literature-dependent biases. We observe that incorporating model organisms’ protein–protein interactions does not markedly improve disease gene predictions, suggesting that many of the disease gene associations are now captured directly in human-derived datasets. With an improved interactive user interface for disease network analysis, we expect HumanNet will be a useful resource for network medicine.
Collapse
Affiliation(s)
- Sohyun Hwang
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul 03722, Korea.,Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas, Austin, TX 78712, USA.,Department of Biomedical Science, College of Life Science, CHA University, Seongnam-si 13496, Korea
| | - Chan Yeong Kim
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul 03722, Korea
| | - Sunmo Yang
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul 03722, Korea
| | - Eiru Kim
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX USA
| | - Traver Hart
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX USA
| | - Edward M Marcotte
- Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas, Austin, TX 78712, USA.,Department of Molecular Biosciences, University of Texas at Austin, TX 78712, USA
| | - Insuk Lee
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul 03722, Korea
| |
Collapse
|
5
|
James K, Olson PD. The tapeworm interactome: inferring confidence scored protein-protein interactions from the proteome of Hymenolepis microstoma. BMC Genomics 2020; 21:346. [PMID: 32380953 PMCID: PMC7204028 DOI: 10.1186/s12864-020-6710-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2019] [Accepted: 03/30/2020] [Indexed: 12/14/2022] Open
Abstract
Background Reference genome and transcriptome assemblies of helminths have reached a level of completion whereby secondary analyses that rely on accurate gene estimation or syntenic relationships can be now conducted with a high level of confidence. Recent public release of the v.3 assembly of the mouse bile-duct tapeworm, Hymenolepis microstoma, provides chromosome-level characterisation of the genome and a stabilised set of protein coding gene models underpinned by bioinformatic and empirical data. However, interactome data have not been produced. Conserved protein-protein interactions in other organisms, termed interologs, can be used to transfer interactions between species, allowing systems-level analysis in non-model organisms. Results Here, we describe a probabilistic, integrated network of interologs for the H. microstoma proteome, based on conserved protein interactions found in eukaryote model species. Almost a third of the 10,139 gene models in the v.3 assembly could be assigned interaction data and assessment of the resulting network indicates that topologically-important proteins are related to essential cellular pathways, and that the network clusters into biologically meaningful components. Moreover, network parameters are similar to those of single-species interaction networks that we constructed in the same way for S. cerevisiae, C. elegans and H. sapiens, demonstrating that information-rich, system-level analyses can be conducted even on species separated by a large phylogenetic distance from the major model organisms from which most protein interaction evidence is based. Using the interolog network, we then focused on sub-networks of interactions assigned to discrete suites of genes of interest, including signalling components and transcription factors, germline multipotency genes, and genes differentially-expressed between larval and adult worms. Results show not only an expected bias toward highly-conserved proteins, such as components of intracellular signal transduction, but in some cases predicted interactions with transcription factors that aid in identifying their target genes. Conclusions With key helminth genomes now complete, systems-level analyses can provide an important predictive framework to guide basic and applied research on helminths and will become increasingly informative as new protein-protein interaction data accumulate.
Collapse
Affiliation(s)
- Katherine James
- Department of Applied Sciences, Northumbria University, Newcastle Upon Tyne, UK. .,Department of Life Sciences, The Natural History Museum, Cromwell Road, London, UK.
| | - Peter D Olson
- Department of Life Sciences, The Natural History Museum, Cromwell Road, London, UK
| |
Collapse
|
6
|
Peng J, Zhu L, Wang Y, Chen J. Mining Relationships among Multiple Entities in Biological Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:769-776. [PMID: 30872239 DOI: 10.1109/tcbb.2019.2904965] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Identifying topological relationships among multiple entities in biological networks is critical towards the understanding of the organizational principles of network functionality. Theoretically, this problem can be solved using minimum Steiner tree (MSTT) algorithms. However, due to large network size, it remains to be computationally challenging, and the predictive value of multi-entity topological relationships is still unclear. We present a novel solution called Cluster-based Steiner Tree Miner (CST-Miner) to instantly identify multi-entity topological relationships in biological networks. Given a list of user-specific entities, CST-Miner decomposes a biological network into nested cluster-based subgraphs, on which multiple minimum Steiner trees are identified. By merging all of them into a minimum cost tree, the optimal topological relationships among all the user-specific entities are revealed. Experimental results showed that CST-Miner can finish in nearly log-linear time and the tree constructed by CST-Miner is close to the global minimum.
Collapse
|
7
|
Gysi DM, Nowick K. Construction, comparison and evolution of networks in life sciences and other disciplines. J R Soc Interface 2020; 17:20190610. [PMID: 32370689 PMCID: PMC7276545 DOI: 10.1098/rsif.2019.0610] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2019] [Accepted: 04/09/2020] [Indexed: 12/12/2022] Open
Abstract
Network approaches have become pervasive in many research fields. They allow for a more comprehensive understanding of complex relationships between entities as well as their group-level properties and dynamics. Many networks change over time, be it within seconds or millions of years, depending on the nature of the network. Our focus will be on comparative network analyses in life sciences, where deciphering temporal network changes is a core interest of molecular, ecological, neuropsychological and evolutionary biologists. Further, we will take a journey through different disciplines, such as social sciences, finance and computational gastronomy, to present commonalities and differences in how networks change and can be analysed. Finally, we envision how borrowing ideas from these disciplines could enrich the future of life science research.
Collapse
Affiliation(s)
- Deisy Morselli Gysi
- Department of Computer Science, Interdisciplinary Center of Bioinformatics, University of Leipzig, 04109 Leipzig, Germany
- Swarm Intelligence and Complex Systems Group, Faculty of Mathematics and Computer Science, University of Leipzig, 04109 Leipzig, Germany
- Center for Complex Networks Research, Northeastern University, 177 Huntington Avenue, Boston, MA 02115, USA
| | - Katja Nowick
- Human Biology Group, Institute for Biology, Faculty of Biology, Chemistry, Pharmacy, Freie Universität Berlin, Königin-Luise-Straβe 1-3, 14195 Berlin, Germany
| |
Collapse
|
8
|
Lee S, Lee T, Yang S, Lee I. BarleyNet: A Network-Based Functional Omics Analysis Server for Cultivated Barley, Hordeum vulgare L. FRONTIERS IN PLANT SCIENCE 2020; 11:98. [PMID: 32133024 PMCID: PMC7040090 DOI: 10.3389/fpls.2020.00098] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/12/2019] [Accepted: 01/22/2020] [Indexed: 05/14/2023]
Abstract
Cultivated barley (Hordeum vulgare L.) is one of the most produced cereal crops worldwide after maize, bread wheat, and rice. Barley is an important crop species not only as a food source, but also in plant genetics because it harbors numerous stress response alleles in its genome that can be exploited for crop engineering. However, the functional annotation of its genome is relatively poor compared with other major crops. Moreover, bioinformatics tools for system-wide analyses of omics data from barley are not yet available. We have thus developed BarleyNet, a co-functional network of 26,145 barley genes, along with a web server for network-based predictions (http://www.inetbio.org/barleynet). We demonstrated that BarleyNet's prediction of biological processes is more accurate than that of an existing barley gene network. We implemented three complementary network-based algorithms for prioritizing genes or functional concepts to study genetic components of complex traits such as environmental stress responses: (i) a pathway-centric search for candidate genes of pathways or complex traits; (ii) a gene-centric search to infer novel functional concepts for genes; and (iii) a context-centric search for novel genes associated with stress response. We demonstrated the usefulness of these network analysis tools in the study of stress response using proteomics and transcriptomics data from barley leaves and roots upon drought or heat stresses. These results suggest that BarleyNet will facilitate our understanding of the underlying genetic components of complex traits in barley.
Collapse
Affiliation(s)
| | | | | | - Insuk Lee
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, South Korea
| |
Collapse
|
9
|
Network Integrative Genomic and Transcriptomic Analysis of Carbapenem-Resistant Klebsiella pneumoniae Strains Identifies Genes for Antibiotic Resistance and Virulence. mSystems 2019; 4:4/4/e00202-19. [PMID: 31117026 PMCID: PMC6589436 DOI: 10.1128/msystems.00202-19] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open
Abstract
Global increases in the use of carbapenems have resulted in several strains of Gram-negative bacteria acquiring carbapenem resistance, thereby limiting treatment options. Klebsiella pneumoniae is a common carbapenem-resistant pathogenic bacterium that is widely studied to identify novel antibiotic resistance mechanisms and drug targets. Antibiotic-resistant clinical isolates generally harbor many genetic alterations, and the identification of responsible mutations would provide insights into the molecular mechanisms of antibiotic resistance. We propose a method to prioritize mutated genes responsible for antibiotic resistance on the basis of expression changes in their local subnetworks, hypothesizing that mutated genes that show significant expression changes among the corresponding functionally associated genes are more likely to be involved in the carbapenem resistance. For network-based gene prioritization, we developed KlebNet (www.inetbio.org/klebnet), a genome-scale cofunctional network of K. pneumoniae genes. Using KlebNet, we reconstructed the functional modules for carbapenem resistance and virulence and identified the functional association between antibiotic resistance and virulence. Using complementation assays with the top candidate genes, we were able to validate a novel gene that negatively regulated carbapenem resistance and four novel genes that positively regulated virulence in Galleria mellonella larvae. Therefore, our study demonstrated the feasibility of network-based identification of genes required for antibiotic resistance and virulence of human-pathogenic bacteria.IMPORTANCE Klebsiella pneumoniae is a major bacterial pathogen that causes pneumonia and urinary tract infections in human. K. pneumoniae infections are treated with carbapenem, but carbapenem-resistant K. pneumoniae has been spreading worldwide. We are able to identify antimicrobial-resistant genes among mutated genes of the antibiotic-resistant clinical isolates. However, they usually harbor many mutated genes, including those that cause weak or neutral functional effects. Therefore, we need to prioritize the mutated genes to identify the more likely candidates for the follow-up functional analysis. For this study, we present a functional network of K. pneumoniae genes and propose a network-based method of prioritizing the mutated genes of the resistant clinical isolates. We also reconstructed the network-based functional modules for carbapenem resistance and virulence and retrieved the functional association between antibiotic resistance and virulence. This study demonstrated the feasibility of network-based analysis of clinical genomics data for the study of K. pneumoniae infection.
Collapse
|
10
|
Combined haplotype blocks regression and multi-locus mixed model analysis reveals novel candidate genes associated with milk traits in dairy sheep. Livest Sci 2019. [DOI: 10.1016/j.livsci.2018.11.020] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
11
|
Peng J, Zhang X, Hui W, Lu J, Li Q, Liu S, Shang X. Improving the measurement of semantic similarity by combining gene ontology and co-functional network: a random walk based approach. BMC SYSTEMS BIOLOGY 2018; 12:18. [PMID: 29560823 PMCID: PMC5861498 DOI: 10.1186/s12918-018-0539-0] [Citation(s) in RCA: 42] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
BACKGROUND Gene Ontology (GO) is one of the most popular bioinformatics resources. In the past decade, Gene Ontology-based gene semantic similarity has been effectively used to model gene-to-gene interactions in multiple research areas. However, most existing semantic similarity approaches rely only on GO annotations and structure, or incorporate only local interactions in the co-functional network. This may lead to inaccurate GO-based similarity resulting from the incomplete GO topology structure and gene annotations. RESULTS We present NETSIM2, a new network-based method that allows researchers to measure GO-based gene functional similarities by considering the global structure of the co-functional network with a random walk with restart (RWR)-based method, and by selecting the significant term pairs to decrease the noise information. Based on the EC number (Enzyme Commission)-based groups of yeast and Arabidopsis, evaluation test shows that NETSIM2 can enhance the accuracy of Gene Ontology-based gene functional similarity. CONCLUSIONS Using NETSIM2 as an example, we found that the accuracy of semantic similarities can be significantly improved after effectively incorporating the global gene-to-gene interactions in the co-functional network, especially on the species that gene annotations in GO are far from complete.
Collapse
Affiliation(s)
- Jiajie Peng
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China. .,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, China. .,Centre for Multidisciplinary Convergence Computing (CMCC), School of Computer Science, Northwestern Polytechnical University, Xi'an, China.
| | - Xuanshuo Zhang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China
| | - Weiwei Hui
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China
| | - Junya Lu
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China
| | - Qianqian Li
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China
| | - Shuhui Liu
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China.,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, China
| |
Collapse
|
12
|
Meng J, Xu WY, Chen X, Lin T, Deng XY. Gene locations may contribute to predicting gene regulatory relationships. J Zhejiang Univ Sci B 2018; 19:25-37. [PMID: 29308605 DOI: 10.1631/jzus.b1700303] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
We propose that locations of genes on chromosomes can contribute to the prediction of gene regulatory relationships. We constructed a time-based gene regulatory network of zebrafish cardiogenesis on the basis of a spatio-temporal neighborhood method. Through the network, specific regulatory pathways and order of gene expression during zebrafish cardiogenesis were obtained. By comparing the order with locations of these genes on chromosomes, we discovered that there exists a reversal phenomenon between the order and order of gene locations. The discovery provides an inherent rule to instruct exploration of gene regulatory relationships. Specifically, the discovery can help to predict if regulatory relationships between genes exist and contribute to evaluating the correctness of discovered gene regulatory relationships.
Collapse
Affiliation(s)
- Jun Meng
- Department of System Science and Engineering, School of Electrical Engineering, Zhejiang University, Hangzhou 310027, China
| | - Wen-Yuan Xu
- Department of System Science and Engineering, School of Electrical Engineering, Zhejiang University, Hangzhou 310027, China
| | - Xiao Chen
- Department of System Science and Engineering, School of Electrical Engineering, Zhejiang University, Hangzhou 310027, China
| | - Tao Lin
- Laboratory of Machine Learning and Optimization, École Polytechnique Fédérale de Lausanne (EPFL), Route Cantonale, 1015 Lausanne 999034, Switzerland
| | - Xiao-Yu Deng
- Department of System Science and Engineering, School of Electrical Engineering, Zhejiang University, Hangzhou 310027, China
| |
Collapse
|
13
|
Maekawa S, Ishida T, Yanagisawa S. Reduced Expression of APUM24, Encoding a Novel rRNA Processing Factor, Induces Sugar-Dependent Nucleolar Stress and Altered Sugar Responses in Arabidopsis thaliana. THE PLANT CELL 2018; 30:209-227. [PMID: 29242314 PMCID: PMC5810573 DOI: 10.1105/tpc.17.00778] [Citation(s) in RCA: 31] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/02/2017] [Revised: 11/08/2017] [Accepted: 12/04/2017] [Indexed: 05/16/2023]
Abstract
Ribosome biogenesis is one of the most energy-consuming events in the cell and must therefore be coordinated with changes in cellular energy status. Here, we show that the sugar-inducible gene ARABIDOPSIS PUMILIO PROTEIN24 (APUM24) encodes a Pumilio homology domain-containing protein involved in pre-rRNA processing in Arabidopsis thaliana Null mutation of APUM24 resulted in aborted embryos due to abnormal gametogenesis and embryogenesis, whereas reduced expression of APUM24 caused several phenotypes characteristic of ribosome biogenesis or function-related mutants. APUM24 interacted with other pre-rRNA processing factors and a putative endonuclease for the removal of the internal transcribed spacer 2 (ITS2) of pre-rRNA in the nucleolus. The APUM24-containing complex also interacted with ITS2, and reduced APUM24 expression caused the overaccumulation of processing intermediates containing ITS2. Thus, APUM24 likely functions as an ITS2 removal-associated factor. Most importantly, the apum24 knockdown mutant was hypersensitive to highly concentrated sugar, and the mutant showed sugar-dependent overaccumulation of processing intermediates and nucleolar stress (changes in nucleolar size). Furthermore, reduced APUM24 expression diminished sugar-induced promotion of leaf and root growth. Hence, a breakdown in the coordinated expression of ribosome biogenesis-related genes with energy status may induce nucleolar stress and disturb proper sugar responses in Arabidopsis.
Collapse
Affiliation(s)
- Shugo Maekawa
- Biotechnology Research Center, The University of Tokyo, Bunkyo-ku, Tokyo 113-8657, Japan
| | - Tetsuya Ishida
- Biotechnology Research Center, The University of Tokyo, Bunkyo-ku, Tokyo 113-8657, Japan
| | - Shuichi Yanagisawa
- Biotechnology Research Center, The University of Tokyo, Bunkyo-ku, Tokyo 113-8657, Japan
| |
Collapse
|
14
|
Peng J, Wang H, Lu J, Hui W, Wang Y, Shang X. Identifying term relations cross different gene ontology categories. BMC Bioinformatics 2017; 18:573. [PMID: 29297309 PMCID: PMC5751813 DOI: 10.1186/s12859-017-1959-3] [Citation(s) in RCA: 39] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Background The Gene Ontology (GO) is a community-based bioinformatics resource that employs ontologies to represent biological knowledge and describes information about gene and gene product function. GO includes three independent categories: molecular function, biological process and cellular component. For better biological reasoning, identifying the biological relationships between terms in different categories are important. However, the existing measurements to calculate similarity between terms in different categories are either developed by using the GO data only or only take part of combined gene co-function network information. Results We propose an iterative ranking-based method called CroGO2 to measure the cross-categories GO term similarities by incorporating level information of GO terms with both direct and indirect interactions in the gene co-function network. Conclusions The evaluation test shows that CroGO2 performs better than the existing methods. A genome-specific term association network for yeast is also generated by connecting terms with the high confidence score. The linkages in the term association network could be supported by the literature. Given a gene set, the related terms identified by using the association network have overlap with the related terms identified by GO enrichment analysis.
Collapse
Affiliation(s)
- Jiajie Peng
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China
| | - Honggang Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Junya Lu
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China
| | - Weiwei Hui
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China
| | - Yadong Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China.
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China.
| |
Collapse
|
15
|
Aluru M, McKinney T, Venero AKL, Choudhury S, Torres M. Mitogen-activated protein kinases, Fus3 and Kss1, regulate chronological lifespan in yeast. Aging (Albany NY) 2017; 9:2587-2609. [PMID: 29273704 PMCID: PMC5764394 DOI: 10.18632/aging.101350] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2017] [Accepted: 12/11/2017] [Indexed: 04/19/2023]
Abstract
Using a systems-based approach, we have identified several genes not previously evaluated for a role(s) in chronological aging. Here, we have thoroughly investigated the chronological lifespan (CLS) of three of these genes (FUS3, KSS1 and HOG1) and their protein products, each of which have well-defined cell signaling roles in young cells. The importance of FUS3 and KSS1 in CLS are largely unknown and analyzed here for the first time. Using both qualitative and quantitative CLS assays, we show that deletion of any of the three MAPK's increases yeast lifespan. Furthermore, combined deletion of any MAPK and TOR1, most prominently fus3Δ/tor1Δ, produces a two-stage CLS response ending in lifespan increase greater than that of tor1Δ. Similar effects are achieved upon endogenous expression of a non-activatable form of Fus3. We speculate that the autophagy-promoting role of FUS3, which is inherently antagonistic to the role of TOR1, may in part be responsible for the differential aging phenotype of fus3Δ/tor1Δ. Consistent with this notion we show that nitrogen starvation, which promotes autophagy by deactivating Tor1, results in decreased CLS if FUS3 is deleted. Taken together, these results reveal a previously unrealized effect of mating-specific MAPKs in the chronological lifespan of yeast.
Collapse
Affiliation(s)
- Maneesha Aluru
- Georgia Institute of Technology, School of Biological Sciences, Atlanta, GA 30332, USA
| | - Tori McKinney
- Georgia Institute of Technology, School of Biological Sciences, Atlanta, GA 30332, USA
| | | | - Shilpa Choudhury
- Georgia Institute of Technology, School of Biological Sciences, Atlanta, GA 30332, USA
| | - Matthew Torres
- Georgia Institute of Technology, School of Biological Sciences, Atlanta, GA 30332, USA
| |
Collapse
|
16
|
Teng Z, Guo M, Liu X, Tian Z, Che K. Revealing protein functions based on relationships of interacting proteins and GO terms. J Biomed Semantics 2017; 8:27. [PMID: 29297388 PMCID: PMC5763294 DOI: 10.1186/s13326-017-0139-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND In recent years, numerous computational methods predicted protein function based on the protein-protein interaction (PPI) network. These methods supposed that two proteins share the same function if they interact with each other. However, it is reported by recent studies that the functions of two interacting proteins may be just related. It will mislead the prediction of protein function. Therefore, there is a need for investigating the functional relationship between interacting proteins. RESULTS In this paper, the functional relationship between interacting proteins is studied and a novel method, called as GoDIN, is advanced to annotate functions of interacting proteins in Gene Ontology (GO) context. It is assumed that the functional difference between interacting proteins can be expressed by semantic difference between GO term and its relatives. Thus, the method uses GO term and its relatives to annotate the interacting proteins separately according to their functional roles in the PPI network. The method is validated by a series of experiments and compared with the concerned method. The experimental results confirm the assumption and suggest that GoDIN is effective on predicting functions of protein. CONCLUSIONS This study demonstrates that: (1) interacting proteins are not equal in the PPI network, and their function may be same or similar, or just related; (2) functional difference between interacting proteins can be measured by their degrees in the PPI network; (3) functional relationship between interacting proteins can be expressed by relationship between GO term and its relatives.
Collapse
Affiliation(s)
- Zhixia Teng
- Department of Information Management and Information System, Northeast Forestry University, Harbin, 150040, China.
- Department of Computer Science and Engineering, Harbin Institute of Technology, Harbin, 150001, China.
| | - Maozu Guo
- Department of Computer Science and Engineering, Harbin Institute of Technology, Harbin, 150001, China.
| | - Xiaoyan Liu
- Department of Computer Science and Engineering, Harbin Institute of Technology, Harbin, 150001, China
| | - Zhen Tian
- Department of Computer Science and Engineering, Harbin Institute of Technology, Harbin, 150001, China
| | - Kai Che
- Department of Computer Science and Engineering, Harbin Institute of Technology, Harbin, 150001, China
| |
Collapse
|
17
|
Kominakis A, Hager-Theodorides AL, Zoidis E, Saridaki A, Antonakos G, Tsiamis G. Combined GWAS and 'guilt by association'-based prioritization analysis identifies functional candidate genes for body size in sheep. Genet Sel Evol 2017; 49:41. [PMID: 28454565 PMCID: PMC5408376 DOI: 10.1186/s12711-017-0316-3] [Citation(s) in RCA: 52] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2016] [Accepted: 04/19/2017] [Indexed: 12/30/2022] Open
Abstract
Background Body size in sheep is an important indicator of productivity, growth and health as well as of environmental adaptation. It is a composite quantitative trait that has been studied with high-throughput genomic methods, i.e. genome-wide association studies (GWAS) in various mammalian species. Several genomic markers have been associated with body size traits and genes have been identified as causative candidates in humans, dog and cattle. A limited number of related GWAS have been performed in various sheep breeds and have identified genomic regions and candidate genes that partly account for body size variability. Here, we conducted a GWAS in Frizarta dairy sheep with phenotypic data from 10 body size measurements and genotypic data (from Illumina ovineSNP50 BeadChip) for 459 ewes. Results The 10 body size measurements were subjected to principal component analysis and three independent principal components (PC) were constructed, interpretable as width, height and length dimensions, respectively. The GWAS performed for each PC identified 11 significant SNPs, at the chromosome level, one on each of the chromosomes 3, 8, 9, 10, 11, 12, 19, 20, 23 and two on chromosome 25. Nine out of the 11 SNPs were located on previously identified quantitative trait loci for sheep meat, production or reproduction. One hundred and ninety-seven positional candidate genes within a 1-Mb distance from each significant SNP were found. A guilt-by-association-based (GBA) prioritization analysis (PA) was performed to identify the most plausible functional candidate genes. GBA-based PA identified 39 genes that were significantly associated with gene networks relevant to body size traits. Prioritized genes were identified in the vicinity of all significant SNPs except for those on chromosomes 10 and 12. The top five ranking genes were TP53, BMPR1A, PIK3R5, RPL26 and PRKDC. Conclusions The results of this GWAS provide evidence for 39 causative candidate genes across nine chromosomal regions for body size traits, some of which are novel and some are previously identified candidates from other studies (e.g. TP53, NTN1 and ZNF521). GBA-based PA has proved to be a useful tool to identify genes with increased biological relevance but it is subjected to certain limitations. Electronic supplementary material The online version of this article (doi:10.1186/s12711-017-0316-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Antonios Kominakis
- Department of Animal Science and Aquaculture, Agricultural University of Athens, Iera Odos 75, 11855, Athens, Greece
| | - Ariadne L Hager-Theodorides
- Department of Animal Science and Aquaculture, Agricultural University of Athens, Iera Odos 75, 11855, Athens, Greece.
| | - Evangelos Zoidis
- Department of Animal Science and Aquaculture, Agricultural University of Athens, Iera Odos 75, 11855, Athens, Greece
| | - Aggeliki Saridaki
- Department of Environmental and Natural Resources Management, University of Patras, Seferi 2, 30100, Agrinio, Greece
| | - George Antonakos
- Agricultural and Livestock Union of Western Greece, 13rd Km N.R. Agrinio-Ioannina, 30100, Lepenou, Greece
| | - George Tsiamis
- Department of Environmental and Natural Resources Management, University of Patras, Seferi 2, 30100, Agrinio, Greece
| |
Collapse
|
18
|
Abstract
Characterizing genetic interactions is crucial to understanding cellular and organismal response to gene-level perturbations. Such knowledge can inform the selection of candidate disease therapy targets, yet experimentally determining whether genes interact is technically nontrivial and time-consuming. High-fidelity prediction of different classes of genetic interactions in multiple organisms would substantially alleviate this experimental burden. Under the hypothesis that functionally related genes tend to share common genetic interaction partners, we evaluate a computational approach to predict genetic interactions in Homo sapiens, Drosophila melanogaster, and Saccharomyces cerevisiae. By leveraging knowledge of functional relationships between genes, we cross-validate predictions on known genetic interactions and observe high predictive power of multiple classes of genetic interactions in all three organisms. Additionally, our method suggests high-confidence candidate interaction pairs that can be directly experimentally tested. A web application is provided for users to query genes for predicted novel genetic interaction partners. Finally, by subsampling the known yeast genetic interaction network, we found that novel genetic interactions are predictable even when knowledge of currently known interactions is minimal.
Collapse
|
19
|
Kim E, Hwang S, Lee I. SoyNet: a database of co-functional networks for soybean Glycine max. Nucleic Acids Res 2017; 45:D1082-D1089. [PMID: 27492285 PMCID: PMC5210602 DOI: 10.1093/nar/gkw704] [Citation(s) in RCA: 44] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2016] [Revised: 07/27/2016] [Accepted: 07/27/2016] [Indexed: 01/09/2023] Open
Abstract
Soybean (Glycine max) is a legume crop with substantial economic value, providing a source of oil and protein for humans and livestock. More than 50% of edible oils consumed globally are derived from this crop. Soybean plants are also important for soil fertility, as they fix atmospheric nitrogen by symbiosis with microorganisms. The latest soybean genome annotation (version 2.0) lists 56 044 coding genes, yet their functional contributions to crop traits remain mostly unknown. Co-functional networks have proven useful for identifying genes that are involved in a particular pathway or phenotype with various network algorithms. Here, we present SoyNet (available at www.inetbio.org/soynet), a database of co-functional networks for G. max and a companion web server for network-based functional predictions. SoyNet maps 1 940 284 co-functional links between 40 812 soybean genes (72.8% of the coding genome), which were inferred from 21 distinct types of genomics data including 734 microarrays and 290 RNA-seq samples from soybean. SoyNet provides a new route to functional investigation of the soybean genome, elucidating genes and pathways of agricultural importance.
Collapse
Affiliation(s)
- Eiru Kim
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, Korea
| | - Sohyun Hwang
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, Korea
| | - Insuk Lee
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, Korea
| |
Collapse
|
20
|
PoplarGene: poplar gene network and resource for mining functional information for genes from woody plants. Sci Rep 2016; 6:31356. [PMID: 27515999 PMCID: PMC4981870 DOI: 10.1038/srep31356] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2016] [Accepted: 07/18/2016] [Indexed: 01/05/2023] Open
Abstract
Poplar is not only an important resource for the production of paper, timber and other wood-based products, but it has also emerged as an ideal model system for studying woody plants. To better understand the biological processes underlying various traits in poplar, e.g., wood development, a comprehensive functional gene interaction network is highly needed. Here, we constructed a genome-wide functional gene network for poplar (covering ~70% of the 41,335 poplar genes) and created the network web service PoplarGene, offering comprehensive functional interactions and extensive poplar gene functional annotations. PoplarGene incorporates two network-based gene prioritization algorithms, neighborhood-based prioritization and context-based prioritization, which can be used to perform gene prioritization in a complementary manner. Furthermore, the co-functional information in PoplarGene can be applied to other woody plant proteomes with high efficiency via orthology transfer. In addition to poplar gene sequences, the webserver also accepts Arabidopsis reference gene as input to guide the search for novel candidate functional genes in PoplarGene. We believe that PoplarGene (http://bioinformatics.caf.ac.cn/PoplarGene and http://124.127.201.25/PoplarGene) will greatly benefit the research community, facilitating studies of poplar and other woody plants.
Collapse
|
21
|
Yang YT, Ting YH, Liang KJ, Lo KY. The Roles of Puf6 and Loc1 in 60S Biogenesis Are Interdependent, and Both Are Required for Efficient Accommodation of Rpl43. J Biol Chem 2016; 291:19312-23. [PMID: 27458021 DOI: 10.1074/jbc.m116.732800] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2016] [Indexed: 12/22/2022] Open
Abstract
Puf6 and Loc1 have two important functional roles in the cells, asymmetric mRNA distribution and ribosome biogenesis. Puf6 and Loc1 are localized predominantly in the nucleolus. They bind ASH1 mRNA, repress its translation, and facilitate the transport to the daughter cells. Asymmetric mRNA distribution is important for cell differentiation. Besides their roles in mRNA localization, Puf6 and Loc1 have been shown to be involved in 60S biogenesis. In puf6Δ or loc1Δ cells, pre-rRNA processing and 60S export are impaired and 60S subunits are underaccumulated. The functional studies of Puf6 and Loc1 have been focused on ASH1 mRNA pathway, but their roles in 60S biogenesis are still not clear. In this study, we found that Puf6 and Loc1 interact directly with each other and both proteins interact with the ribosomal protein Rpl43 (L43e). Notably, the roles of Puf6 and Loc1 in 60S biogenesis are interdependent, and both are required for efficient accommodation of Rpl43. Loc1 is further required to maintain the protein level of Rpl43. Additionally, the recruitment of Rpl43 is required for the release of Puf6 and Loc1. We propose that Puf6 and Loc1 facilitate Rpl43 loading and are sequentially released from 60S after incorporation of Rpl43 into ribosomes in yeast.
Collapse
Affiliation(s)
- Yi-Ting Yang
- From the Department of Agricultural Chemistry, National Taiwan University, Taipei 10617, Taiwan
| | - Ya-Han Ting
- From the Department of Agricultural Chemistry, National Taiwan University, Taipei 10617, Taiwan
| | - Kei-Jen Liang
- From the Department of Agricultural Chemistry, National Taiwan University, Taipei 10617, Taiwan
| | - Kai-Yin Lo
- From the Department of Agricultural Chemistry, National Taiwan University, Taipei 10617, Taiwan
| |
Collapse
|
22
|
Gómez-Vela F, Barranco CD, Díaz-Díaz N. Incorporating biological knowledge for construction of fuzzy networks of gene associations. Appl Soft Comput 2016. [DOI: 10.1016/j.asoc.2016.01.014] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
23
|
Al-Dalky R, Taha K, Al Homouz D, Qasaimeh M. Applying Monte Carlo Simulation to Biomedical Literature to Approximate Genetic Network. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2016; 13:494-504. [PMID: 26415184 DOI: 10.1109/tcbb.2015.2481399] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Biologists often need to know the set of genes associated with a given set of genes or a given disease. We propose in this paper a classifier system called Monte Carlo for Genetic Network (MCforGN) that can construct genetic networks, identify functionally related genes, and predict gene-disease associations. MCforGN identifies functionally related genes based on their co-occurrences in the abstracts of biomedical literature. For a given gene g , the system first extracts the set of genes found within the abstracts of biomedical literature associated with g. It then ranks these genes to determine the ones with high co-occurrences with g . It overcomes the limitations of current approaches that employ analytical deterministic algorithms by applying Monte Carlo Simulation to approximate genetic networks. It does so by conducting repeated random sampling to obtain numerical results and to optimize these results. Moreover, it analyzes results to obtain the probabilities of different genes' co-occurrences using series of statistical tests. MCforGN can detect gene-disease associations by employing a combination of centrality measures (to identify the central genes in disease-specific genetic networks) and Monte Carlo Simulation. MCforGN aims at enhancing state-of-the-art biological text mining by applying novel extraction techniques. We evaluated MCforGN by comparing it experimentally with nine approaches. Results showed marked improvement.
Collapse
|
24
|
Hsiao YT, Lee WP, Yang W, Müller S, Flamm C, Hofacker I, Kügler P. Practical Guidelines for Incorporating Knowledge-Based and Data-Driven Strategies into the Inference of Gene Regulatory Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2016; 13:64-75. [PMID: 26441429 DOI: 10.1109/tcbb.2015.2465954] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Modeling gene regulatory networks (GRNs) is essential for conceptualizing how genes are expressed and how they influence each other. Typically, a reverse engineering approach is employed; this strategy is effective in reproducing possible fitting models of GRNs. To use this strategy, however, two daunting tasks must be undertaken: one task is to optimize the accuracy of inferred network behaviors; and the other task is to designate valid biological topologies for target networks. Although existing studies have addressed these two tasks for years, few of the studies can satisfy both of the requirements simultaneously. To address these difficulties, we propose an integrative modeling framework that combines knowledge-based and data-driven input sources to construct biological topologies with their corresponding network behaviors. To validate the proposed approach, a real dataset collected from the cell cycle of the yeast S. cerevisiae is used. The results show that the proposed framework can successfully infer solutions that meet the requirements of both the network behaviors and biological structures. Therefore, the outcomes are exploitable for future in vivo experimental design.
Collapse
|
25
|
Modeling Gene Networks in Saccharomyces cerevisiae Based on Gene Expression Profiles. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2015; 2015:621264. [PMID: 26839582 PMCID: PMC4709922 DOI: 10.1155/2015/621264] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/14/2015] [Revised: 10/14/2015] [Accepted: 11/16/2015] [Indexed: 11/30/2022]
Abstract
Detailed and innovative analysis of gene regulatory network structures may reveal novel insights to biological mechanisms. Here we study how gene regulatory network in Saccharomyces cerevisiae can differ under aerobic and anaerobic conditions. To achieve this, we discretized the gene expression profiles and calculated the self-entropy of down- and upregulation of gene expression as well as joint entropy. Based on these quantities the uncertainty coefficient was calculated for each gene triplet, following which, separate gene logic networks were constructed for the aerobic and anaerobic conditions. Four structural parameters such as average degree, average clustering coefficient, average shortest path, and average betweenness were used to compare the structure of the corresponding aerobic and anaerobic logic networks. Five genes were identified to be putative key components of the two energy metabolisms. Furthermore, community analysis using the Newman fast algorithm revealed two significant communities for the aerobic but only one for the anaerobic network. David Gene Functional Classification suggests that, under aerobic conditions, one such community reflects the cell cycle and cell replication, while the other one is linked to the mitochondrial respiratory chain function.
Collapse
|
26
|
Peng J, Wang T, Wang J, Wang Y, Chen J. Extending gene ontology with gene association networks. Bioinformatics 2015; 32:1185-94. [PMID: 26644414 DOI: 10.1093/bioinformatics/btv712] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2015] [Accepted: 11/26/2015] [Indexed: 01/01/2023] Open
Abstract
MOTIVATION Gene ontology (GO) is a widely used resource to describe the attributes for gene products. However, automatic GO maintenance remains to be difficult because of the complex logical reasoning and the need of biological knowledge that are not explicitly represented in the GO. The existing studies either construct whole GO based on network data or only infer the relations between existing GO terms. None is purposed to add new terms automatically to the existing GO. RESULTS We proposed a new algorithm 'GOExtender' to efficiently identify all the connected gene pairs labeled by the same parent GO terms. GOExtender is used to predict new GO terms with biological network data, and connect them to the existing GO. Evaluation tests on biological process and cellular component categories of different GO releases showed that GOExtender can extend new GO terms automatically based on the biological network. Furthermore, we applied GOExtender to the recent release of GO and discovered new GO terms with strong support from literature. AVAILABILITY AND IMPLEMENTATION Software and supplementary document are available at www.msu.edu/%7Ejinchen/GOExtender CONTACT jinchen@msu.edu or ydwang@hit.edu.cn SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jiajie Peng
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China, Department of Energy Plant Research Laboratory, Michigan State University, East Lansing, MI 48824, USA
| | - Tao Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Jixuan Wang
- School of Software, Harbin Institute of Technology, Harbin, China and
| | - Yadong Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Jin Chen
- Department of Energy Plant Research Laboratory, Michigan State University, East Lansing, MI 48824, USA, Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| |
Collapse
|
27
|
Peters TW, Miller AW, Tourette C, Agren H, Hubbard A, Hughes RE. Genomic Analysis of ATP Efflux in Saccharomyces cerevisiae. G3 (BETHESDA, MD.) 2015; 6:161-70. [PMID: 26585826 PMCID: PMC4704715 DOI: 10.1534/g3.115.023267] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/31/2015] [Accepted: 11/06/2015] [Indexed: 01/12/2023]
Abstract
Adenosine triphosphate (ATP) plays an important role as a primary molecule for the transfer of chemical energy to drive biological processes. ATP also functions as an extracellular signaling molecule in a diverse array of eukaryotic taxa in a conserved process known as purinergic signaling. Given the important roles of extracellular ATP in cell signaling, we sought to comprehensively elucidate the pathways and mechanisms governing ATP efflux from eukaryotic cells. Here, we present results of a genomic analysis of ATP efflux from Saccharomyces cerevisiae by measuring extracellular ATP levels in cultures of 4609 deletion mutants. This screen revealed key cellular processes that regulate extracellular ATP levels, including mitochondrial translation and vesicle sorting in the late endosome, indicating that ATP production and transport through vesicles are required for efflux. We also observed evidence for altered ATP efflux in strains deleted for genes involved in amino acid signaling, and mitochondrial retrograde signaling. Based on these results, we propose a model in which the retrograde signaling pathway potentiates amino acid signaling to promote mitochondrial respiration. This study advances our understanding of the mechanism of ATP secretion in eukaryotes and implicates TOR complex 1 (TORC1) and nutrient signaling pathways in the regulation of ATP efflux. These results will facilitate analysis of ATP efflux mechanisms in higher eukaryotes.
Collapse
Affiliation(s)
| | - Aaron W Miller
- The Buck Institute for Research on Aging, Novato, California 94945
| | | | - Hannah Agren
- The Buck Institute for Research on Aging, Novato, California 94945
| | - Alan Hubbard
- School of Public Health, Division of Biostatistics, University of California, Berkeley, California 94729-7358
| | - Robert E Hughes
- The Buck Institute for Research on Aging, Novato, California 94945
| |
Collapse
|
28
|
Gonzalez GH, Tahsin T, Goodale BC, Greene AC, Greene CS. Recent Advances and Emerging Applications in Text and Data Mining for Biomedical Discovery. Brief Bioinform 2015; 17:33-42. [PMID: 26420781 PMCID: PMC4719073 DOI: 10.1093/bib/bbv087] [Citation(s) in RCA: 103] [Impact Index Per Article: 11.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2015] [Indexed: 02/06/2023] Open
Abstract
Precision medicine will revolutionize the way we treat and prevent disease. A major barrier to the implementation of precision medicine that clinicians and translational scientists face is understanding the underlying mechanisms of disease. We are starting to address this challenge through automatic approaches for information extraction, representation and analysis. Recent advances in text and data mining have been applied to a broad spectrum of key biomedical questions in genomics, pharmacogenomics and other fields. We present an overview of the fundamental methods for text and data mining, as well as recent advances and emerging applications toward precision medicine.
Collapse
|
29
|
Shin J, Lee I. Co-Inheritance Analysis within the Domains of Life Substantially Improves Network Inference by Phylogenetic Profiling. PLoS One 2015; 10:e0139006. [PMID: 26394049 PMCID: PMC4578931 DOI: 10.1371/journal.pone.0139006] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2015] [Accepted: 09/07/2015] [Indexed: 01/23/2023] Open
Abstract
Phylogenetic profiling, a network inference method based on gene inheritance profiles, has been widely used to construct functional gene networks in microbes. However, its utility for network inference in higher eukaryotes has been limited. An improved algorithm with an in-depth understanding of pathway evolution may overcome this limitation. In this study, we investigated the effects of taxonomic structures on co-inheritance analysis using 2,144 reference species in four query species: Escherichia coli, Saccharomyces cerevisiae, Arabidopsis thaliana, and Homo sapiens. We observed three clusters of reference species based on a principal component analysis of the phylogenetic profiles, which correspond to the three domains of life-Archaea, Bacteria, and Eukaryota-suggesting that pathways inherit primarily within specific domains or lower-ranked taxonomic groups during speciation. Hence, the co-inheritance pattern within a taxonomic group may be eroded by confounding inheritance patterns from irrelevant taxonomic groups. We demonstrated that co-inheritance analysis within domains substantially improved network inference not only in microbe species but also in the higher eukaryotes, including humans. Although we observed two sub-domain clusters of reference species within Eukaryota, co-inheritance analysis within these sub-domain taxonomic groups only marginally improved network inference. Therefore, we conclude that co-inheritance analysis within domains is the optimal approach to network inference with the given reference species. The construction of a series of human gene networks with increasing sample sizes of the reference species for each domain revealed that the size of the high-accuracy networks increased as additional reference species genomes were included, suggesting that within-domain co-inheritance analysis will continue to expand human gene networks as genomes of additional species are sequenced. Taken together, we propose that co-inheritance analysis within the domains of life will greatly potentiate the use of the expected onslaught of sequenced genomes in the study of molecular pathways in higher eukaryotes.
Collapse
Affiliation(s)
- Junha Shin
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, Korea
| | - Insuk Lee
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, Korea
| |
Collapse
|
30
|
Yao S, Yoo S, Yu D. Prior knowledge driven Granger causality analysis on gene regulatory network discovery. BMC Bioinformatics 2015; 16:273. [PMID: 26316173 PMCID: PMC4551367 DOI: 10.1186/s12859-015-0710-1] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2015] [Accepted: 08/17/2015] [Indexed: 12/20/2022] Open
Abstract
Background Our study focuses on discovering gene regulatory networks from time series gene expression data using the Granger causality (GC) model. However, the number of available time points (T) usually is much smaller than the number of target genes (n) in biological datasets. The widely applied pairwise GC model (PGC) and other regularization strategies can lead to a significant number of false identifications when n>>T. Results In this study, we proposed a new method, viz., CGC-2SPR (CGC using two-step prior Ridge regularization) to resolve the problem by incorporating prior biological knowledge about a target gene data set. In our simulation experiments, the propose new methodology CGC-2SPR showed significant performance improvement in terms of accuracy over other widely used GC modeling (PGC, Ridge and Lasso) and MI-based (MRNET and ARACNE) methods. In addition, we applied CGC-2SPR to a real biological dataset, i.e., the yeast metabolic cycle, and discovered more true positive edges with CGC-2SPR than with the other existing methods. Conclusions In our research, we noticed a “ 1+1>2” effect when we combined prior knowledge and gene expression data to discover regulatory networks. Based on causality networks, we made a functional prediction that the Abm1 gene (its functions previously were unknown) might be related to the yeast’s responses to different levels of glucose. Our research improves causality modeling by combining heterogeneous knowledge, which is well aligned with the future direction in system biology. Furthermore, we proposed a method of Monte Carlo significance estimation (MCSE) to calculate the edge significances which provide statistical meanings to the discovered causality networks. All of our data and source codes will be available under the link https://bitbucket.org/dtyu/granger-causality/wiki/Home.
Collapse
Affiliation(s)
- Shun Yao
- Department of Biochemistry and Cell Biology, Stony Brook University, Stony Brook, 11790, NY, USA. .,Computational Science Center, Brookhaven National Laboratory, Upton, 11793, NY, USA.
| | - Shinjae Yoo
- Computational Science Center, Brookhaven National Laboratory, Upton, 11793, NY, USA.
| | - Dantong Yu
- Computational Science Center, Brookhaven National Laboratory, Upton, 11793, NY, USA.
| |
Collapse
|
31
|
Zhu F, Panwar B, Guan Y. Algorithms for modeling global and context-specific functional relationship networks. Brief Bioinform 2015; 17:686-95. [PMID: 26254431 DOI: 10.1093/bib/bbv065] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2015] [Indexed: 02/07/2023] Open
Abstract
Functional genomics has enormous potential to facilitate our understanding of normal and disease-specific physiology. In the past decade, intensive research efforts have been focused on modeling functional relationship networks, which summarize the probability of gene co-functionality relationships. Such modeling can be based on either expression data only or heterogeneous data integration. Numerous methods have been deployed to infer the functional relationship networks, while most of them target the global (non-context-specific) functional relationship networks. However, it is expected that functional relationships consistently reprogram under different tissues or biological processes. Thus, advanced methods have been developed targeting tissue-specific or developmental stage-specific networks. This article brings together the state-of-the-art functional relationship network modeling methods, emphasizes the need for heterogeneous genomic data integration and context-specific network modeling and outlines future directions for functional relationship networks.
Collapse
|
32
|
Paredes-Sánchez FA, Sifuentes-Rincón AM, Segura Cabrera A, García Pérez CA, Parra Bracamonte GM, Ambriz Morales P. Associations of SNPs located at candidate genes to bovine growth traits, prioritized with an interaction networks construction approach. BMC Genet 2015. [PMID: 26198337 PMCID: PMC4511253 DOI: 10.1186/s12863-015-0247-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Abstract
Background For most domestic animal species, including bovines, it is difficult to identify causative genetic variants involved in economically relevant traits. The candidate gene approach is efficient because it investigates genes that are expected to be associated with the expression of a trait and defines whether the genetic variation present in a population is associated with phenotypic diversity. A potential limitation of this approach is the identification of candidates. This study used a bioinformatics approach to identify candidate genes via a search guided by a functional interaction network. Results A functional interaction network tool, BosNet, was constructed for Bos taurus. Predictions for candidate genes were performed using the guilt-by-association principle in BosNet. Association analyses identified five novel markers within BosNet-prioritized genes that had significant effects on different growth traits in Charolais and Brahman cattle. Conclusions BosNet is an excellent tool for the identification of single nucleotide polymorphisms that are potentially associated with complex traits.
Collapse
Affiliation(s)
- Francisco Alejandro Paredes-Sánchez
- Laboratorio de Biotecnología Animal, Centro de Biotecnología Genómica. IPN, Boulevard del Maestro esq. Elías Piña, Col. Narciso Mendoza, Cd. Reynosa, Tam, C.P. 88710, Mexico.
| | - Ana María Sifuentes-Rincón
- Laboratorio de Biotecnología Animal, Centro de Biotecnología Genómica. IPN, Boulevard del Maestro esq. Elías Piña, Col. Narciso Mendoza, Cd. Reynosa, Tam, C.P. 88710, Mexico.
| | - Aldo Segura Cabrera
- Red de Estudios Moleculares Avanzados, Instituto de Ecología, A.C., Xalapa, Mexico.
| | - Carlos Armando García Pérez
- Laboratorio de Bioinformática, Centro de Biotecnología Genómica. IPN, Boulevard del Maestro esq. Elías Piña, Col. Narciso Mendoza, Cd. Reynosa, Tam, C.P. 88710, Mexico.
| | - Gaspar Manuel Parra Bracamonte
- Laboratorio de Biotecnología Animal, Centro de Biotecnología Genómica. IPN, Boulevard del Maestro esq. Elías Piña, Col. Narciso Mendoza, Cd. Reynosa, Tam, C.P. 88710, Mexico.
| | - Pascuala Ambriz Morales
- Laboratorio de Biotecnología Animal, Centro de Biotecnología Genómica. IPN, Boulevard del Maestro esq. Elías Piña, Col. Narciso Mendoza, Cd. Reynosa, Tam, C.P. 88710, Mexico.
| |
Collapse
|
33
|
Gene network coherence based on prior knowledge using direct and indirect relationships. Comput Biol Chem 2015; 56:142-51. [DOI: 10.1016/j.compbiolchem.2015.03.002] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2014] [Revised: 03/06/2015] [Accepted: 03/20/2015] [Indexed: 12/21/2022]
|
34
|
Lee T, Oh T, Yang S, Shin J, Hwang S, Kim CY, Kim H, Shim H, Shim JE, Ronald PC, Lee I. RiceNet v2: an improved network prioritization server for rice genes. Nucleic Acids Res 2015; 43:W122-7. [PMID: 25813048 PMCID: PMC4489288 DOI: 10.1093/nar/gkv253] [Citation(s) in RCA: 53] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2015] [Accepted: 03/12/2015] [Indexed: 11/20/2022] Open
Abstract
Rice is the most important staple food crop and a model grass for studies of bioenergy crops. We previously published a genome-scale functional network server called RiceNet, constructed by integrating diverse genomics data and demonstrated the use of the network in genetic dissection of rice biotic stress responses and its usefulness for other grass species. Since the initial construction of the network, there has been a significant increase in the amount of publicly available rice genomics data. Here, we present an updated network prioritization server for Oryza sativa ssp. japonica, RiceNet v2 (http://www.inetbio.org/ricenet), which provides a network of 25 765 genes (70.1% of the coding genome) and 1 775 000 co-functional links. Ricenet v2 also provides two complementary methods for network prioritization based on: (i) network direct neighborhood and (ii) context-associated hubs. RiceNet v2 can use genes of the related subspecies O. sativa ssp. indica and the reference plant Arabidopsis for versatility in generating hypotheses. We demonstrate that RiceNet v2 effectively identifies candidate genes involved in rice root/shoot development and defense responses, demonstrating its usefulness for the grass research community.
Collapse
Affiliation(s)
- Tak Lee
- Department of Biotechnology, College of Life Sciences and Biotechnology, Yonsei University, Seoul, Korea
| | - Taeyun Oh
- The Joint Bioenergy Institute, Emeryville CA and Department of Plant Pathology and the Genome Center, University of California, Davis, CA 95616, USA
| | - Sunmo Yang
- Department of Biotechnology, College of Life Sciences and Biotechnology, Yonsei University, Seoul, Korea
| | - Junha Shin
- Department of Biotechnology, College of Life Sciences and Biotechnology, Yonsei University, Seoul, Korea
| | - Sohyun Hwang
- Department of Biotechnology, College of Life Sciences and Biotechnology, Yonsei University, Seoul, Korea
| | - Chan Yeong Kim
- Department of Biotechnology, College of Life Sciences and Biotechnology, Yonsei University, Seoul, Korea
| | - Hyojin Kim
- Department of Biotechnology, College of Life Sciences and Biotechnology, Yonsei University, Seoul, Korea
| | - Hongseok Shim
- Department of Biotechnology, College of Life Sciences and Biotechnology, Yonsei University, Seoul, Korea
| | - Jung Eun Shim
- Department of Biotechnology, College of Life Sciences and Biotechnology, Yonsei University, Seoul, Korea
| | - Pamela C Ronald
- The Joint Bioenergy Institute, Emeryville CA and Department of Plant Pathology and the Genome Center, University of California, Davis, CA 95616, USA
| | - Insuk Lee
- Department of Biotechnology, College of Life Sciences and Biotechnology, Yonsei University, Seoul, Korea
| |
Collapse
|
35
|
Lee I, Kim E, Marcotte EM. Modes of interaction between individuals dominate the topologies of real world networks. PLoS One 2015; 10:e0121248. [PMID: 25793969 PMCID: PMC4368763 DOI: 10.1371/journal.pone.0121248] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2014] [Accepted: 01/29/2015] [Indexed: 11/21/2022] Open
Abstract
We find that the topologies of real world networks, such as those formed within human societies, by the Internet, or among cellular proteins, are dominated by the mode of the interactions considered among the individuals. Specifically, a major dichotomy in previously studied networks arises from modeling networks in terms of pairwise versus group tasks. The former often intrinsically give rise to scale-free, disassortative, hierarchical networks, whereas the latter often give rise to single- or broad-scale, assortative, nonhierarchical networks. These dependencies explain contrasting observations among previous topological analyses of real world complex systems. We also observe this trend in systems with natural hierarchies, in which alternate representations of the same networks, but which capture different levels of the hierarchy, manifest these signature topological differences. For example, in both the Internet and cellular proteomes, networks of lower-level system components (routers within domains or proteins within biological processes) are assortative and nonhierarchical, whereas networks of upper-level system components (internet domains or biological processes) are disassortative and hierarchical. Our results demonstrate that network topologies of complex systems must be interpreted in light of their hierarchical natures and interaction types.
Collapse
Affiliation(s)
- Insuk Lee
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, 50 Yonsei-ro, Seodaemun-gu, Seoul 120-749, Korea
- * E-mail: (IL); (EMM)
| | - Eiru Kim
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, 50 Yonsei-ro, Seodaemun-gu, Seoul 120-749, Korea
| | - Edward M. Marcotte
- Center for Systems and Synthetic Biology, Department of Molecular Biosciences, and Institute for Cellular and Molecular Biology, MBB 3.148BA, University of Texas at Austin, 2500 Speedway, Austin, Texas 78712-1064, United States of America
- * E-mail: (IL); (EMM)
| |
Collapse
|
36
|
Peng J, Uygun S, Kim T, Wang Y, Rhee SY, Chen J. Measuring semantic similarities by combining gene ontology annotations and gene co-function networks. BMC Bioinformatics 2015; 16:44. [PMID: 25886899 PMCID: PMC4339680 DOI: 10.1186/s12859-015-0474-7] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2014] [Accepted: 01/26/2015] [Indexed: 01/18/2023] Open
Abstract
Background Gene Ontology (GO) has been used widely to study functional relationships between genes. The current semantic similarity measures rely only on GO annotations and GO structure. This limits the power of GO-based similarity because of the limited proportion of genes that are annotated to GO in most organisms. Results We introduce a novel approach called NETSIM (network-based similarity measure) that incorporates information from gene co-function networks in addition to using the GO structure and annotations. Using metabolic reaction maps of yeast, Arabidopsis, and human, we demonstrate that NETSIM can improve the accuracy of GO term similarities. We also demonstrate that NETSIM works well even for genomes with sparser gene annotation data. We applied NETSIM on large Arabidopsis gene families such as cytochrome P450 monooxygenases to group the members functionally and show that this grouping could facilitate functional characterization of genes in these families. Conclusions Using NETSIM as an example, we demonstrated that the performance of a semantic similarity measure could be significantly improved after incorporating genome-specific information. NETSIM incorporates both GO annotations and gene co-function network data as a priori knowledge in the model. Therefore, functional similarities of GO terms that are not explicitly encoded in GO but are relevant in a taxon-specific manner become measurable when GO annotations are limited. Supplementary information and software are available at http://www.msu.edu/~jinchen/NETSIM. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0474-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Jiajie Peng
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China. .,Department of Energy Plant Research Laboratory, Michigan State University, East Lansing, MI, 48824, USA.
| | - Sahra Uygun
- Department of Energy Plant Research Laboratory, Michigan State University, East Lansing, MI, 48824, USA. .,Genetics Program, Michigan State University, East Lansing, MI, 48824, USA.
| | - Taehyong Kim
- Department of Plant Biology, Carnegie Institution for Science, 260 Panama St, Stanford, CA, 94305, USA.
| | - Yadong Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China.
| | - Seung Y Rhee
- Department of Plant Biology, Carnegie Institution for Science, 260 Panama St, Stanford, CA, 94305, USA.
| | - Jin Chen
- Department of Energy Plant Research Laboratory, Michigan State University, East Lansing, MI, 48824, USA. .,Department of Computer Science and Engineering, Michigan State University, East Lansing, MI, 48824, USA.
| |
Collapse
|
37
|
Kim H, Shim JE, Shin J, Lee I. EcoliNet: a database of cofunctional gene network for Escherichia coli. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2015; 2015:bav001. [PMID: 25650278 PMCID: PMC4314589 DOI: 10.1093/database/bav001] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
During the past several decades, Escherichia coli has been a treasure chest for molecular biology. The molecular mechanisms of many fundamental cellular processes have been discovered through research on this bacterium. Although much basic research now focuses on more complex model organisms, E. coli still remains important in metabolic engineering and synthetic biology. Despite its long history as a subject of molecular investigation, more than one-third of the E. coli genome has no pathway annotation supported by either experimental evidence or manual curation. Recently, a network-assisted genetics approach to the efficient identification of novel gene functions has increased in popularity. To accelerate the speed of pathway annotation for the remaining uncharacterized part of the E. coli genome, we have constructed a database of cofunctional gene network with near-complete genome coverage of the organism, dubbed EcoliNet. We find that EcoliNet is highly predictive for diverse bacterial phenotypes, including antibiotic response, indicating that it will be useful in prioritizing novel candidate genes for a wide spectrum of bacterial phenotypes. We have implemented a web server where biologists can easily run network algorithms over EcoliNet to predict novel genes involved in a pathway or novel functions for a gene. All integrated cofunctional associations can be downloaded, enabling orthology-based reconstruction of gene networks for other bacterial species as well. Database URL: http://www.inetbio.org/ecolinet.
Collapse
Affiliation(s)
- Hanhae Kim
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, Korea
| | - Jung Eun Shim
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, Korea
| | - Junha Shin
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, Korea
| | - Insuk Lee
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, Korea
| |
Collapse
|
38
|
Abstract
Motivation: Recently, a shift was made from using Gene Ontology (GO) to evaluate molecular network data to using these data to construct and evaluate GO. Dutkowski et al. provide the first evidence that a large part of GO can be reconstructed solely from topologies of molecular networks. Motivated by this work, we develop a novel data integration framework that integrates multiple types of molecular network data to reconstruct and update GO. We ask how much of GO can be recovered by integrating various molecular interaction data. Results: We introduce a computational framework for integration of various biological networks using penalized non-negative matrix tri-factorization (PNMTF). It takes all network data in a matrix form and performs simultaneous clustering of genes and GO terms, inducing new relations between genes and GO terms (annotations) and between GO terms themselves. To improve the accuracy of our predicted relations, we extend the integration methodology to include additional topological information represented as the similarity in wiring around non-interacting genes. Surprisingly, by integrating topologies of bakers’ yeasts protein–protein interaction, genetic interaction (GI) and co-expression networks, our method reports as related 96% of GO terms that are directly related in GO. The inclusion of the wiring similarity of non-interacting genes contributes 6% to this large GO term association capture. Furthermore, we use our method to infer new relationships between GO terms solely from the topologies of these networks and validate 44% of our predictions in the literature. In addition, our integration method reproduces 48% of cellular component, 41% of molecular function and 41% of biological process GO terms, outperforming the previous method in the former two domains of GO. Finally, we predict new GO annotations of yeast genes and validate our predictions through GIs profiling. Availability and implementation: Supplementary Tables of new GO term associations and predicted gene annotations are available at http://bio-nets.doc.ic.ac.uk/GO-Reconstruction/. Contact:natasha@imperial.ac.uk Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Vuk Janjić
- Department of Computing, Imperial College London SW7 2AZ, UK
| | - Nataša Pržulj
- Department of Computing, Imperial College London SW7 2AZ, UK
| |
Collapse
|
39
|
Hsiao YT, Lee WP. Reverse engineering gene regulatory networks: coupling an optimization algorithm with a parameter identification technique. BMC Bioinformatics 2014; 15 Suppl 15:S8. [PMID: 25474560 PMCID: PMC4271569 DOI: 10.1186/1471-2105-15-s15-s8] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
Background To infer gene regulatory networks from time series gene profiles, two important tasks that are related to biological systems must be undertaken. One task is to determine a valid network structure that has topological properties that can influence the network dynamics profoundly. The other task is to optimize the network parameters to minimize the accumulated discrepancy between the gene expression data and the values produced by the inferred network model. Though the above two tasks must be conducted simultaneously, most existing work addresses only one of the tasks. Results We propose an iterative approach that couples parameter identification and parameter optimization techniques, to address the two tasks simultaneously during network inference. This approach first identifies the most influential parameters against internal perturbations; this identification is based on sensitivity measurements. Then, a hybrid GA-PSO optimization method infers parameters in accordance with their criticalities. The proposed approach has been applied to several datasets, including subsets of the SOS DNA repair system in E. coli, the Rat central nervous system (CNS), and the protein glycosylation system of yeast S. cerevisiae. The result and analysis show that our approach can infer solutions to satisfy both the requirements of network structure and network behavior. Conclusions Network structure is an important though challenging issue to address in inferring sophisticated networks with biological details. In need of prior structural knowledge, we turn to measure parameter sensitivity instead to account for the network structure in an indirect way. By developing an integrated approach for considering both the network structure and behavior in the inference process, we can successfully infer critical gene interactions as well as valid time expression profiles.
Collapse
|
40
|
Xu Y, Guo M, Zou Q, Liu X, Wang C, Liu Y. System-level insights into the cellular interactome of a non-model organism: inferring, modelling and analysing functional gene network of soybean (Glycine max). PLoS One 2014; 9:e113907. [PMID: 25423109 PMCID: PMC4244207 DOI: 10.1371/journal.pone.0113907] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2014] [Accepted: 10/24/2014] [Indexed: 01/30/2023] Open
Abstract
Cellular interactome, in which genes and/or their products interact on several levels, forming transcriptional regulatory-, protein interaction-, metabolic-, signal transduction networks, etc., has attracted decades of research focuses. However, such a specific type of network alone can hardly explain the various interactive activities among genes. These networks characterize different interaction relationships, implying their unique intrinsic properties and defects, and covering different slices of biological information. Functional gene network (FGN), a consolidated interaction network that models fuzzy and more generalized notion of gene-gene relations, have been proposed to combine heterogeneous networks with the goal of identifying functional modules supported by multiple interaction types. There are yet no successful precedents of FGNs on sparsely studied non-model organisms, such as soybean (Glycine max), due to the absence of sufficient heterogeneous interaction data. We present an alternative solution for inferring the FGNs of soybean (SoyFGNs), in a pioneering study on the soybean interactome, which is also applicable to other organisms. SoyFGNs exhibit the typical characteristics of biological networks: scale-free, small-world architecture and modularization. Verified by co-expression and KEGG pathways, SoyFGNs are more extensive and accurate than an orthology network derived from Arabidopsis. As a case study, network-guided disease-resistance gene discovery indicates that SoyFGNs can provide system-level studies on gene functions and interactions. This work suggests that inferring and modelling the interactome of a non-model plant are feasible. It will speed up the discovery and definition of the functions and interactions of other genes that control important functions, such as nitrogen fixation and protein or lipid synthesis. The efforts of the study are the basis of our further comprehensive studies on the soybean functional interactome at the genome and microRNome levels. Additionally, a web tool for information retrieval and analysis of SoyFGNs can be accessed at SoyFN: http://nclab.hit.edu.cn/SoyFN.
Collapse
Affiliation(s)
- Yungang Xu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Maozu Guo
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Quan Zou
- School of Information Science and Technology, Xiamen University, Xiamen, China
| | - Xiaoyan Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Chunyu Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Yang Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| |
Collapse
|
41
|
Schaffrath R, Abdel-Fattah W, Klassen R, Stark MJR. The diphthamide modification pathway from Saccharomyces cerevisiae--revisited. Mol Microbiol 2014; 94:1213-26. [PMID: 25352115 DOI: 10.1111/mmi.12845] [Citation(s) in RCA: 51] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/27/2014] [Indexed: 01/09/2023]
Abstract
Diphthamide is a conserved modification in archaeal and eukaryal translation elongation factor 2 (EF2). Its name refers to the target function for diphtheria toxin, the disease-causing agent that, through ADP ribosylation of diphthamide, causes irreversible inactivation of EF2 and cell death. Although this clearly emphasizes a pathobiological role for diphthamide, its physiological function is unclear, and precisely why cells need EF2 to contain diphthamide is hardly understood. Nonetheless, the conservation of diphthamide biosynthesis together with syndromes (i.e. ribosomal frame-shifting, embryonic lethality, neurodegeneration and cancer) typical of mutant cells that cannot make it strongly suggests that diphthamide-modified EF2 occupies an important and translation-related role in cell proliferation and development. Whether this is structural and/or regulatory remains to be seen. However, recent progress in dissecting the diphthamide gene network (DPH1-DPH7) from the budding yeast Saccharomyces cerevisiae has significantly advanced our understanding of the mechanisms required to initiate and complete diphthamide synthesis on EF2. Here, we review recent developments in the field that not only have provided novel, previously overlooked and unexpected insights into the pathway and the biochemical players required for diphthamide synthesis but also are likely to foster innovative studies into the potential regulation of diphthamide, and importantly, its ill-defined biological role.
Collapse
Affiliation(s)
- Raffael Schaffrath
- Department of Genetics, University of Leicester, Leicester, LE1 7RH, UK; Institut für Biologie, Abteilung Mikrobiologie, Universität Kassel, 34132, Kassel, Germany
| | | | | | | |
Collapse
|
42
|
Mahdevar G, Nowzari-Dalini A, Sadeghi M. Inferring gene correlation networks from transcription factor binding sites. Genes Genet Syst 2014; 88:301-9. [PMID: 24694393 DOI: 10.1266/ggs.88.301] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
Gene expression is a highly regulated biological process that is fundamental to the existence of phenotypes of any living organism. The regulatory relations are usually modeled as a network; simply, every gene is modeled as a node and relations are shown as edges between two related genes. This paper presents a novel method for inferring correlation networks, networks constructed by connecting co-expressed genes, through predicting co-expression level from genes promoter's sequences. According to the results, this method works well on biological data and its outcome is comparable to the methods that use microarray as input. The method is written in C++ language and is available upon request from the corresponding author.
Collapse
Affiliation(s)
- Ghasem Mahdevar
- Department of Bioinformatics, Institute of Biochemistry and Biophysics, University of Tehran
| | | | | |
Collapse
|
43
|
Taha K. Determining Semantically Related Significant Genes. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2014; 11:1119-1130. [PMID: 26357049 DOI: 10.1109/tcbb.2014.2344668] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
GO relation embodies some aspects of existence dependency. If GO term xis existence-dependent on GO term y, the presence of y implies the presence of x. Therefore, the genes annotated with the function of the GO term y are usually functionally and semantically related to the genes annotated with the function of the GO term x. A large number of gene set enrichment analysis methods have been developed in recent years for analyzing gene sets enrichment. However, most of these methods overlook the structural dependencies between GO terms in GO graph by not considering the concept of existence dependency. We propose in this paper a biological search engine called RSGSearch that identifies enriched sets of genes annotated with different functions using the concept of existence dependency. We observe that GO term xcannot be existence-dependent on GO term y, if x- and y- have the same specificity (biological characteristics). After encoding into a numeric format the contributions of GO terms annotating target genes to the semantics of their lowest common ancestors (LCAs), RSGSearch uses microarray experiment to identify the most significant LCA that annotates the result genes. We evaluated RSGSearch experimentally and compared it with five gene set enrichment systems. Results showed marked improvement.
Collapse
|
44
|
Taha K. RGFinder: a system for determining semantically related genes using GO graph minimum spanning tree. IEEE Trans Nanobioscience 2014; 14:24-37. [PMID: 25343765 DOI: 10.1109/tnb.2014.2363295] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Biologists often need to know the set S' of genes that are the most functionally and semantically related to a given set S of genes. For determining the set S', most current gene similarity measures overlook the structural dependencies among the Gene Ontology (GO) terms annotating the set S, which may lead to erroneous results. We introduce in this paper a biological search engine called RGFinder that considers the structural dependencies among GO terms by employing the concept of existence dependency. RGFinder assigns a weight to each edge in GO graph to represent the degree of relatedness between the two GO terms connected by the edge. The value of the weight is determined based on the following factors: 1) type of the relation represented by the edge (e.g., an "is-a" relation is assigned a different weight than a "part-of" relation), 2) the functional relationship between the two GO terms connected by the edge, and 3) the string-substring relationship between the names of the two GO terms connected by the edge. RGFinder then constructs a minimum spanning tree of GO graph based on these weights. In the framework of RGFinder, the set S' is annotated to the GO terms located at the lowest convergences of the subtree of the minimum spanning tree that passes through the GO terms annotating set S. We evaluated RGFinder experimentally and compared it with four gene set enrichment systems. Results showed marked improvement.
Collapse
|
45
|
Gene network biological validity based on gene-gene interaction relevance. ScientificWorldJournal 2014; 2014:540679. [PMID: 25295303 PMCID: PMC4175387 DOI: 10.1155/2014/540679] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2014] [Accepted: 07/11/2014] [Indexed: 01/17/2023] Open
Abstract
In recent years, gene networks have become one of the most useful tools for modeling biological processes. Many inference gene network algorithms have been developed as techniques for extracting knowledge from gene expression data. Ensuring the reliability of the inferred gene relationships is a crucial task in any study in order to prove that the algorithms used are precise. Usually, this validation process can be carried out using prior biological knowledge. The metabolic pathways stored in KEGG are one of the most widely used knowledgeable sources for analyzing relationships between genes. This paper introduces a new methodology, GeneNetVal, to assess the biological validity of gene networks based on the relevance of the gene-gene interactions stored in KEGG metabolic pathways. Hence, a complete KEGG pathway conversion into a gene association network and a new matching distance based on gene-gene interaction relevance are proposed. The performance of GeneNetVal was established with three different experiments. Firstly, our proposal is tested in a comparative ROC analysis. Secondly, a randomness study is presented to show the behavior of GeneNetVal when the noise is increased in the input network. Finally, the ability of GeneNetVal to detect biological functionality of the network is shown.
Collapse
|
46
|
Geppert T, Koeppen H. Biological Networks and Drug Discovery-Where Do We Stand? Drug Dev Res 2014; 75:271-82. [DOI: 10.1002/ddr.21207] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Affiliation(s)
- Tim Geppert
- Lead Identification and Optimization Support; Boehringer Ingelheim Pharma GmbH & Co. KG; Biberach/Riss 88397 Germany
| | - Herbert Koeppen
- Lead Identification and Optimization Support; Boehringer Ingelheim Pharma GmbH & Co. KG; Biberach/Riss 88397 Germany
| |
Collapse
|
47
|
Schramm SJ, Jayaswal V, Goel A, Li SS, Yang YH, Mann GJ, Wilkins MR. Molecular interaction networks for the analysis of human disease: utility, limitations, and considerations. Proteomics 2014; 13:3393-405. [PMID: 24166987 DOI: 10.1002/pmic.201200570] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2012] [Revised: 09/11/2013] [Accepted: 10/07/2013] [Indexed: 01/01/2023]
Abstract
High-throughput '-omics' data can be combined with large-scale molecular interaction networks, for example, protein-protein interaction networks, to provide a unique framework for the investigation of human molecular biology. Interest in these integrative '-omics' methods is growing rapidly because of their potential to understand complexity and association with disease; such approaches have a focus on associations between phenotype and "network-type." The potential of this research is enticing, yet there remain a series of important considerations. Here, we discuss interaction data selection, data quality, the relative merits of using data from large high-throughput studies versus a meta-database of smaller literature-curated studies, and possible issues of sociological or inspection bias in interaction data. Other work underway, especially international consortia to establish data formats, quality standards and address data redundancy, and the improvements these efforts are making to the field, is also evaluated. We present options for researchers intending to use large-scale molecular interaction networks as a functional context for protein or gene expression data, including microRNAs, especially in the context of human disease.
Collapse
Affiliation(s)
- Sarah-Jane Schramm
- Sydney Medical School, Westmead Millennium Institute for Medical Research, The University of Sydney, Sydney, NSW, Australia; Melanoma Institute Australia, Sydney, NSW, Australia
| | | | | | | | | | | | | |
Collapse
|
48
|
Qin T, Matmati N, Tsoi LC, Mohanty BK, Gao N, Tang J, Lawson AB, Hannun YA, Zheng WJ. Finding pathway-modulating genes from a novel Ontology Fingerprint-derived gene network. Nucleic Acids Res 2014; 42:e138. [PMID: 25063300 PMCID: PMC4191379 DOI: 10.1093/nar/gku678] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
Abstract
To enhance our knowledge regarding biological pathway regulation, we took an integrated approach, using the biomedical literature, ontologies, network analyses and experimental investigation to infer novel genes that could modulate biological pathways. We first constructed a novel gene network via a pairwise comparison of all yeast genes' Ontology Fingerprints--a set of Gene Ontology terms overrepresented in the PubMed abstracts linked to a gene along with those terms' corresponding enrichment P-values. The network was further refined using a Bayesian hierarchical model to identify novel genes that could potentially influence the pathway activities. We applied this method to the sphingolipid pathway in yeast and found that many top-ranked genes indeed displayed altered sphingolipid pathway functions, initially measured by their sensitivity to myriocin, an inhibitor of de novo sphingolipid biosynthesis. Further experiments confirmed the modulation of the sphingolipid pathway by one of these genes, PFA4, encoding a palmitoyl transferase. Comparative analysis showed that few of these novel genes could be discovered by other existing methods. Our novel gene network provides a unique and comprehensive resource to study pathway modulations and systems biology in general.
Collapse
Affiliation(s)
- Tingting Qin
- Department of Pathology, University of Michigan, Ann Arbor, MI 48109, USA
| | - Nabil Matmati
- The Stony Brook University Cancer Center and the Department of Medicine, Stony Brook, NY 11794, USA
| | - Lam C Tsoi
- Department of Biostatistics, Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Bidyut K Mohanty
- Department of Biochemistry & Molecular Biology, Medical University of South Carolina, Charleston, SC 29425, USA
| | - Nan Gao
- Department of Computer Science and Engineering, University of South Carolina, Columbia, SC 29208, USA
| | - Jijun Tang
- Department of Computer Science and Engineering, University of South Carolina, Columbia, SC 29208, USA Tianjin Key Laboratory of Cognitive Computing and Application, Tianjin University, Tianjin, China
| | - Andrew B Lawson
- Department of Public Health Science, Medical University of South Carolina, Charleston, SC 29425, USA
| | - Yusuf A Hannun
- The Stony Brook University Cancer Center and the Department of Medicine, Stony Brook, NY 11794, USA
| | - W Jim Zheng
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| |
Collapse
|
49
|
Li HD, Menon R, Omenn GS, Guan Y. The emerging era of genomic data integration for analyzing splice isoform function. Trends Genet 2014; 30:340-7. [PMID: 24951248 DOI: 10.1016/j.tig.2014.05.005] [Citation(s) in RCA: 70] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2014] [Revised: 05/21/2014] [Accepted: 05/23/2014] [Indexed: 01/17/2023]
Abstract
The vast majority of multi-exon genes in humans undergo alternative splicing, which greatly increases the functional diversity of protein species. Predicting functions at the isoform level is essential to further our understanding of developmental abnormalities and cancers, which frequently exhibit aberrant splicing and dysregulation of isoform expression. However, determination of isoform function is very difficult, and efforts to predict isoform function have been limited in the functional genomics field. Deep sequencing of RNA now provides an unprecedented amount of expression data at the transcript level. We describe here emerging computational approaches that integrate such large-scale whole-transcriptome sequencing (RNA-seq) data for predicting the functions of alternatively spliced isoforms, and we discuss their applications in developmental and cancer biology. We outline future directions for isoform function prediction, emphasizing the need for heterogeneous genomic data integration and tissue-specific, dynamic isoform-level network modeling, which will allow the field to realize its full potential.
Collapse
Affiliation(s)
- Hong-Dong Li
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Rajasree Menon
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Gilbert S Omenn
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA; Department of Internal Medicine, University of Michigan, Ann Arbor, Michigan, MI, USA
| | - Yuanfang Guan
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA; Department of Internal Medicine, University of Michigan, Ann Arbor, Michigan, MI, USA; Department of Electrical Engineering and Computer Science, Ann Arbor, MI, USA.
| |
Collapse
|
50
|
Kurt Z, Aydin N, Altay G. A comprehensive comparison of association estimators for gene network inference algorithms. Bioinformatics 2014; 30:2142-9. [DOI: 10.1093/bioinformatics/btu182] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
|