1
|
Zhan L, Wang Y, Wang A, Zhang Y, Cheng C, Zhao J, Zhang W, Chen J, Li P. A genome-scale deep learning model to predict gene expression changes of genetic perturbations from multiplex biological networks. Brief Bioinform 2024; 25:bbae433. [PMID: 39226889 PMCID: PMC11370636 DOI: 10.1093/bib/bbae433] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2024] [Revised: 07/17/2024] [Accepted: 08/19/2024] [Indexed: 09/05/2024] Open
Abstract
Systematic characterization of biological effects to genetic perturbation is essential to the application of molecular biology and biomedicine. However, the experimental exhaustion of genetic perturbations on the genome-wide scale is challenging. Here, we show TranscriptionNet, a deep learning model that integrates multiple biological networks to systematically predict transcriptional profiles to three types of genetic perturbations based on transcriptional profiles induced by genetic perturbations in the L1000 project: RNA interference, clustered regularly interspaced short palindromic repeat, and overexpression. TranscriptionNet performs better than existing approaches in predicting inducible gene expression changes for all three types of genetic perturbations. TranscriptionNet can predict transcriptional profiles for all genes in existing biological networks and increases perturbational gene expression changes for each type of genetic perturbation from a few thousand to 26 945 genes. TranscriptionNet demonstrates strong generalization ability when comparing predicted and true gene expression changes on different external tasks. Overall, TranscriptionNet can systemically predict transcriptional consequences induced by perturbing genes on a genome-wide scale and thus holds promise to systemically detect gene function and enhance drug development and target discovery.
Collapse
Affiliation(s)
- Lingmin Zhan
- College of Basic Sciences, Shanxi Agricultural University, 1 Mingxian South Road, Taigu District, Jinzhong, 030801, China
| | - Yingdong Wang
- College of Basic Sciences, Shanxi Agricultural University, 1 Mingxian South Road, Taigu District, Jinzhong, 030801, China
| | - Aoyi Wang
- College of Basic Sciences, Shanxi Agricultural University, 1 Mingxian South Road, Taigu District, Jinzhong, 030801, China
| | - Yuanyuan Zhang
- College of Basic Sciences, Shanxi Agricultural University, 1 Mingxian South Road, Taigu District, Jinzhong, 030801, China
| | - Caiping Cheng
- College of Basic Sciences, Shanxi Agricultural University, 1 Mingxian South Road, Taigu District, Jinzhong, 030801, China
| | - Jinzhong Zhao
- College of Basic Sciences, Shanxi Agricultural University, 1 Mingxian South Road, Taigu District, Jinzhong, 030801, China
| | - Wuxia Zhang
- College of Basic Sciences, Shanxi Agricultural University, 1 Mingxian South Road, Taigu District, Jinzhong, 030801, China
| | - Jianxin Chen
- School of Traditional Chinese Medicine, Beijing University of Chinese Medicine, 11 North Third Ring Road East, Chaoyang District, Beijing 100029, China
| | - Peng Li
- College of Basic Sciences, Shanxi Agricultural University, 1 Mingxian South Road, Taigu District, Jinzhong, 030801, China
| |
Collapse
|
2
|
Kwon JJ, Pan J, Gonzalez G, Hahn WC, Zitnik M. On knowing a gene: A distributional hypothesis of gene function. Cell Syst 2024; 15:488-496. [PMID: 38810640 PMCID: PMC11189734 DOI: 10.1016/j.cels.2024.04.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Revised: 02/25/2024] [Accepted: 04/30/2024] [Indexed: 05/31/2024]
Abstract
As words can have multiple meanings that depend on sentence context, genes can have various functions that depend on the surrounding biological system. This pleiotropic nature of gene function is limited by ontologies, which annotate gene functions without considering biological contexts. We contend that the gene function problem in genetics may be informed by recent technological leaps in natural language processing, in which representations of word semantics can be automatically learned from diverse language contexts. In contrast to efforts to model semantics as "is-a" relationships in the 1990s, modern distributional semantics represents words as vectors in a learned semantic space and fuels current advances in transformer-based models such as large language models and generative pre-trained transformers. A similar shift in thinking of gene functions as distributions over cellular contexts may enable a similar breakthrough in data-driven learning from large biological datasets to inform gene function.
Collapse
Affiliation(s)
- Jason J Kwon
- Dana-Farber Cancer Institute and Harvard Medical School, Department of Medical Oncology, Boston, MA 02215, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Joshua Pan
- Dana-Farber Cancer Institute and Harvard Medical School, Department of Medical Oncology, Boston, MA 02215, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Guadalupe Gonzalez
- Department of Computing, Faculty of Engineering, Imperial College, London SW7 2AZ, UK
| | - William C Hahn
- Dana-Farber Cancer Institute and Harvard Medical School, Department of Medical Oncology, Boston, MA 02215, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.
| | - Marinka Zitnik
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Harvard Medical School, Department of Biomedical Informatics, Boston, MA 02115, USA; Harvard Data Science Initiative, Harvard University, Cambridge, MA 02138, USA; Kempner Institute for the Study of Natural and Artificial Intelligence, Harvard University, Allston, MA 02134, USA.
| |
Collapse
|
3
|
BIONIC: discovering new biology through deep learning-based network integration. Nat Methods 2022; 19:1185-1186. [PMID: 36192466 DOI: 10.1038/s41592-022-01617-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
4
|
Forster DT, Li SC, Yashiroda Y, Yoshimura M, Li Z, Isuhuaylas LAV, Itto-Nakama K, Yamanaka D, Ohya Y, Osada H, Wang B, Bader GD, Boone C. BIONIC: biological network integration using convolutions. Nat Methods 2022; 19:1250-1261. [PMID: 36192463 PMCID: PMC11236286 DOI: 10.1038/s41592-022-01616-x] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2020] [Accepted: 08/16/2022] [Indexed: 01/21/2023]
Abstract
Biological networks constructed from varied data can be used to map cellular function, but each data type has limitations. Network integration promises to address these limitations by combining and automatically weighting input information to obtain a more accurate and comprehensive representation of the underlying biology. We developed a deep learning-based network integration algorithm that incorporates a graph convolutional network framework. Our method, BIONIC (Biological Network Integration using Convolutions), learns features that contain substantially more functional information compared to existing approaches. BIONIC has unsupervised and semisupervised learning modes, making use of available gene function annotations. BIONIC is scalable in both size and quantity of the input networks, making it feasible to integrate numerous networks on the scale of the human genome. To demonstrate the use of BIONIC in identifying new biology, we predicted and experimentally validated essential gene chemical-genetic interactions from nonessential gene profiles in yeast.
Collapse
Affiliation(s)
- Duncan T Forster
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
- The Donnelly Centre, University of Toronto, Toronto, Ontario, Canada
- Vector Institute for Artificial Intelligence, Toronto, Ontario, Canada
| | - Sheena C Li
- The Donnelly Centre, University of Toronto, Toronto, Ontario, Canada
- RIKEN Center for Sustainable Resource Science, Wako, Saitama, Japan
| | - Yoko Yashiroda
- RIKEN Center for Sustainable Resource Science, Wako, Saitama, Japan
| | - Mami Yoshimura
- RIKEN Center for Sustainable Resource Science, Wako, Saitama, Japan
| | - Zhijian Li
- The Donnelly Centre, University of Toronto, Toronto, Ontario, Canada
| | | | - Kaori Itto-Nakama
- Department of Integrated Biosciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Japan
| | - Daisuke Yamanaka
- Laboratory for Immunopharmacology of Microbial Products, School of Pharmacy, Tokyo University of Pharmacy and Life Sciences, Hachioji, Tokyo, Japan
| | - Yoshikazu Ohya
- Department of Integrated Biosciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Japan
- Collaborative Research Institute for Innovative Microbiology, The University of Tokyo, Tokyo, Japan
| | - Hiroyuki Osada
- RIKEN Center for Sustainable Resource Science, Wako, Saitama, Japan
| | - Bo Wang
- Vector Institute for Artificial Intelligence, Toronto, Ontario, Canada.
- Peter Munk Cardiac Center, University Health Network, Toronto, Ontario, Canada.
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada.
- Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, Ontario, Canada.
| | - Gary D Bader
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada.
- The Donnelly Centre, University of Toronto, Toronto, Ontario, Canada.
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada.
- The Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, Ontario, Canada.
- Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada.
| | - Charles Boone
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada.
- The Donnelly Centre, University of Toronto, Toronto, Ontario, Canada.
- RIKEN Center for Sustainable Resource Science, Wako, Saitama, Japan.
| |
Collapse
|
5
|
Pan J, Kwon JJ, Talamas JA, Borah AA, Vazquez F, Boehm JS, Tsherniak A, Zitnik M, McFarland JM, Hahn WC. Sparse dictionary learning recovers pleiotropy from human cell fitness screens. Cell Syst 2022; 13:286-303.e10. [PMID: 35085500 PMCID: PMC9035054 DOI: 10.1016/j.cels.2021.12.005] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2021] [Revised: 10/30/2021] [Accepted: 12/21/2021] [Indexed: 12/28/2022]
Abstract
In high-throughput functional genomic screens, each gene product is commonly assumed to exhibit a singular biological function within a defined protein complex or pathway. In practice, a single gene perturbation may induce multiple cascading functional outcomes, a genetic principle known as pleiotropy. Here, we model pleiotropy in fitness screen collections by representing each gene perturbation as the sum of multiple perturbations of biological functions, each harboring independent fitness effects inferred empirically from the data. Our approach (Webster) recovered pleiotropic functions for DNA damage proteins from genotoxic fitness screens, untangled distinct signaling pathways upstream of shared effector proteins from cancer cell fitness screens, and predicted the stoichiometry of an unknown protein complex subunit from fitness data alone. Modeling compound sensitivity profiles in terms of genetic functions recovered compound mechanisms of action. Our approach establishes a sparse approximation mechanism for unraveling complex genetic architectures underlying high-dimensional gene perturbation readouts.
Collapse
Affiliation(s)
- Joshua Pan
- Dana-Farber Cancer Institute, Department of Medical Oncology, Boston, MA 02215, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Harvard Medical School, Boston, MA 02215, USA
| | - Jason J Kwon
- Dana-Farber Cancer Institute, Department of Medical Oncology, Boston, MA 02215, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Harvard Medical School, Boston, MA 02215, USA
| | - Jessica A Talamas
- Dana-Farber Cancer Institute, Department of Medical Oncology, Boston, MA 02215, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Harvard Medical School, Boston, MA 02215, USA
| | - Ashir A Borah
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | | | - Jesse S Boehm
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Aviad Tsherniak
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Marinka Zitnik
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Harvard Medical School, Department of Biomedical Informatics, Boston, MA 02215, USA; Harvard University, Data Science Initiative, Cambridge, MA 02138, USA
| | | | - William C Hahn
- Dana-Farber Cancer Institute, Department of Medical Oncology, Boston, MA 02215, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Harvard Medical School, Boston, MA 02215, USA; Brigham and Women's Hospital and Harvard Medical School, Department of Medicine, Boston, MA 02215, USA.
| |
Collapse
|
6
|
OUP accepted manuscript. Brief Funct Genomics 2022; 21:243-269. [DOI: 10.1093/bfgp/elac007] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2021] [Revised: 03/17/2022] [Accepted: 03/18/2022] [Indexed: 11/14/2022] Open
|
7
|
Law JN, Akers K, Tasnina N, Santina CMD, Deutsch S, Kshirsagar M, Klein-Seetharaman J, Crovella M, Rajagopalan P, Kasif S, Murali TM. Interpretable network propagation with application to expanding the repertoire of human proteins that interact with SARS-CoV-2. Gigascience 2021; 10:giab082. [PMID: 34966926 PMCID: PMC8716363 DOI: 10.1093/gigascience/giab082] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Revised: 09/21/2021] [Accepted: 11/28/2021] [Indexed: 01/02/2023] Open
Abstract
BACKGROUND Network propagation has been widely used for nearly 20 years to predict gene functions and phenotypes. Despite the popularity of this approach, little attention has been paid to the question of provenance tracing in this context, e.g., determining how much any experimental observation in the input contributes to the score of every prediction. RESULTS We design a network propagation framework with 2 novel components and apply it to predict human proteins that directly or indirectly interact with SARS-CoV-2 proteins. First, we trace the provenance of each prediction to its experimentally validated sources, which in our case are human proteins experimentally determined to interact with viral proteins. Second, we design a technique that helps to reduce the manual adjustment of parameters by users. We find that for every top-ranking prediction, the highest contribution to its score arises from a direct neighbor in a human protein-protein interaction network. We further analyze these results to develop functional insights on SARS-CoV-2 that expand on known biology such as the connection between endoplasmic reticulum stress, HSPA5, and anti-clotting agents. CONCLUSIONS We examine how our provenance-tracing method can be generalized to a broad class of network-based algorithms. We provide a useful resource for the SARS-CoV-2 community that implicates many previously undocumented proteins with putative functional relationships to viral infection. This resource includes potential drugs that can be opportunistically repositioned to target these proteins. We also discuss how our overall framework can be extended to other, newly emerging viruses.
Collapse
Affiliation(s)
- Jeffrey N Law
- Interdisciplinary Ph.D. Program in Genetics, Bioinformatics, and Computational Biology, Virginia Tech, Blacksburg, VA 24061, USA
| | - Kyle Akers
- Interdisciplinary Ph.D. Program in Genetics, Bioinformatics, and Computational Biology, Virginia Tech, Blacksburg, VA 24061, USA
| | - Nure Tasnina
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA
| | | | - Shay Deutsch
- Department of Mathematics, University of California, Los Angeles, CA 90095, USA
| | | | | | - Mark Crovella
- Department of Computer Science, Boston University, Boston, MA 02215, USA
| | | | - Simon Kasif
- Department of Biomedical Engineering, Boston University, Boston, MA 02215, USA
| | - T M Murali
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA
| |
Collapse
|
8
|
We need to keep a reproducible trace of facts, predictions, and hypotheses from gene to function in the era of big data. PLoS Biol 2020; 18:e3000999. [PMID: 33253151 PMCID: PMC7728211 DOI: 10.1371/journal.pbio.3000999] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Revised: 12/10/2020] [Indexed: 01/18/2023] Open
Abstract
How do we scale biological science to the demand of next generation biology and medicine to keep track of the facts, predictions, and hypotheses? These days, enormous amounts of DNA sequence and other omics data are generated. Since these data contain the blueprint for life, it is imperative that we interpret it accurately. The abundance of DNA is only one part of the challenge. Artificial Intelligence (AI) and network methods routinely build on large screens, single cell technologies, proteomics, and other modalities to infer or predict biological functions and phenotypes associated with proteins, pathways, and organisms. As a first step, how do we systematically trace the provenance of knowledge from experimental ground truth to gene function predictions and annotations? Here, we review the main challenges in tracking the evolution of biological knowledge and propose several specific solutions to provenance and computational tracing of evidence in functional linkage networks.
Collapse
|
9
|
Fernando PC, Mabee PM, Zeng E. Integration of anatomy ontology data with protein-protein interaction networks improves the candidate gene prediction accuracy for anatomical entities. BMC Bioinformatics 2020; 21:442. [PMID: 33028186 PMCID: PMC7542696 DOI: 10.1186/s12859-020-03773-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2020] [Accepted: 09/22/2020] [Indexed: 01/04/2023] Open
Abstract
Background Identification of genes responsible for anatomical entities is a major requirement in many fields including developmental biology, medicine, and agriculture. Current wet lab techniques used for this purpose, such as gene knockout, are high in resource and time consumption. Protein–protein interaction (PPI) networks are frequently used to predict disease genes for humans and gene candidates for molecular functions, but they are rarely used to predict genes for anatomical entities. Moreover, PPI networks suffer from network quality issues, which can be a limitation for their usage in predicting candidate genes. Therefore, we developed an integrative framework to improve the candidate gene prediction accuracy for anatomical entities by combining existing experimental knowledge about gene-anatomical entity relationships with PPI networks using anatomy ontology annotations. We hypothesized that this integration improves the quality of the PPI networks by reducing the number of false positive and false negative interactions and is better optimized to predict candidate genes for anatomical entities. We used existing Uberon anatomical entity annotations for zebrafish and mouse genes to construct gene networks by calculating semantic similarity between the genes. These anatomy-based gene networks were semantic networks, as they were constructed based on the anatomy ontology annotations that were obtained from the experimental data in the literature. We integrated these anatomy-based gene networks with mouse and zebrafish PPI networks retrieved from the STRING database and compared the performance of their network-based candidate gene predictions. Results According to evaluations of candidate gene prediction performance tested under four different semantic similarity calculation methods (Lin, Resnik, Schlicker, and Wang), the integrated networks, which were semantically improved PPI networks, showed better performances by having higher area under the curve values for receiver operating characteristic and precision-recall curves than PPI networks for both zebrafish and mouse. Conclusion Integration of existing experimental knowledge about gene-anatomical entity relationships with PPI networks via anatomy ontology improved the candidate gene prediction accuracy and optimized them for predicting candidate genes for anatomical entities.
Collapse
Affiliation(s)
- Pasan C Fernando
- Department of Biology, University of South Dakota, Vermillion, SD, USA.
| | - Paula M Mabee
- Department of Biology, University of South Dakota, Vermillion, SD, USA.,National Ecological Observatory Network, Battelle Memorial Institute, 1685 38th St., Suite 100, Boulder, CO, 80301, USA
| | - Erliang Zeng
- Division of Biostatistics and Computational Biology, College of Dentistry, University of Iowa, Iowa City, IA, USA. .,Department of Preventive and Community Dentistry, College of Dentistry, University of Iowa, Iowa City, IA, USA. .,Department of Biostatistics, College of Public Health, University of Iowa, Iowa City, IA, USA. .,Department of Biomedical Engineering, College of Engineering, University of Iowa, Iowa City, IA, USA.
| |
Collapse
|
10
|
Li W, Wang M, Sun J, Wang Y, Jiang R. Gene co-opening network deciphers gene functional relationships. MOLECULAR BIOSYSTEMS 2018; 13:2428-2439. [PMID: 28976510 DOI: 10.1039/c7mb00430c] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Genome sequencing technology has generated a vast amount of genomic and epigenomic data, and has provided us a great opportunity to study gene functions on a global scale from an epigenomic view. In the last decade, network-based studies, such as those based on PPI networks and co-expression networks, have shown good performance in capturing functional relationships between genes. However, the functions of a gene and the mechanism of interaction of genes with each other to elucidate their functions are still not entirely clear. Here, we construct a gene co-opening network based on chromatin accessibility of genes. We show that genes related to a specific biological process or the same disease tend to be clustered in the co-opening network. This understanding allows us to detect functional clusters from the network and to predict new functions for genes. We further apply the network to prioritize disease genes for Psoriasis, and demonstrate the power of the joint analysis of the co-opening network and GWAS data in identifying disease genes. Taken together, the co-opening network provides a new viewpoint for the elucidation of gene associations and the interpretation of disease mechanisms.
Collapse
Affiliation(s)
- Wenran Li
- MOE Key Laboratory of Bioinformatics, Bioinformatics Division and Center for Synthetic & Systems Biology, TNLIST, Department of Automation, Tsinghua University, Beijing 100084, China.
| | | | | | | | | |
Collapse
|
11
|
Abstract
Abstract
Next Generation Sequencing (NGS) or deep sequencing technology enables parallel reading of multiple individual DNA fragments, thereby enabling the identification of millions of base pairs in several hours. Recent research has clearly shown that machine learning technologies can efficiently analyse large sets of genomic data and help to identify novel gene functions and regulation regions. A deep artificial neural network consists of a group of artificial neurons that mimic the properties of living neurons. These mathematical models, termed Artificial Neural Networks (ANN), can be used to solve artificial intelligence engineering problems in several different technological fields (e.g., biology, genomics, proteomics, and metabolomics). In practical terms, neural networks are non-linear statistical structures that are organized as modelling tools and are used to simulate complex genomic relationships between inputs and outputs. To date, Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNN) have been demonstrated to be the best tools for improving performance in problem solving tasks within the genomic field.
Collapse
|
12
|
Quan Y, Liu MY, Liu YM, Zhu LD, Wu YS, Luo ZH, Zhang XZ, Xu SZ, Yang QY, Zhang HY. Facilitating Anti-Cancer Combinatorial Drug Discovery by Targeting Epistatic Disease Genes. Molecules 2018; 23:E736. [PMID: 29570606 PMCID: PMC6017788 DOI: 10.3390/molecules23040736] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2018] [Revised: 03/15/2018] [Accepted: 03/20/2018] [Indexed: 12/19/2022] Open
Abstract
Due to synergistic effects, combinatorial drugs are widely used for treating complex diseases. However, combining drugs and making them synergetic remains a challenge. Genetic disease genes are considered a promising source of drug targets with important implications for navigating the drug space. Most diseases are not caused by a single pathogenic factor, but by multiple disease genes, in particular, interacting disease genes. Thus, it is reasonable to consider that targeting epistatic disease genes may enhance the therapeutic effects of combinatorial drugs. In this study, synthetic lethality gene pairs of tumors, similar to epistatic disease genes, were first targeted by combinatorial drugs, resulting in the enrichment of the combinatorial drugs with cancer treatment, which verified our hypothesis. Then, conventional epistasis detection software was used to identify epistatic disease genes from the genome wide association studies (GWAS) dataset. Furthermore, combinatorial drugs were predicted by targeting these epistatic disease genes, and five combinations were proven to have synergistic anti-cancer effects on MCF-7 cells through cell cytotoxicity assay. Combined with the three-dimensional (3D) genome-based method, the epistatic disease genes were filtered and were more closely related to disease. By targeting the filtered gene pairs, the efficiency of combinatorial drug discovery has been further improved.
Collapse
Affiliation(s)
- Yuan Quan
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China.
| | - Meng-Yuan Liu
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China.
| | - Ye-Mao Liu
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China.
| | - Li-Da Zhu
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China.
| | - Yu-Shan Wu
- School of Life Sciences, Shandong University of Technology; No. 12 Zhangzhou Road, Zibo 255049, China.
| | - Zhi-Hui Luo
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China.
| | - Xiu-Zhen Zhang
- School of Life Sciences, Shandong University of Technology; No. 12 Zhangzhou Road, Zibo 255049, China.
| | - Shi-Zhong Xu
- Department of Botany and Plant Sciences, University of California, Riverside, CA 92521, USA.
| | - Qing-Yong Yang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China.
| | - Hong-Yu Zhang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China.
| |
Collapse
|
13
|
Knüppel R, Kuttenberger C, Ferreira-Cerca S. Toward Time-Resolved Analysis of RNA Metabolism in Archaea Using 4-Thiouracil. Front Microbiol 2017; 8:286. [PMID: 28286499 PMCID: PMC5323407 DOI: 10.3389/fmicb.2017.00286] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2016] [Accepted: 02/13/2017] [Indexed: 11/13/2022] Open
Abstract
Archaea are widespread organisms colonizing almost every habitat on Earth. However, the molecular biology of archaea still remains relatively uncharacterized. RNA metabolism is a central cellular process, which has been extensively analyzed in both bacteria and eukarya. In contrast, analysis of RNA metabolism dynamic in archaea has been limited to date. To facilitate analysis of the RNA metabolism dynamic at a system-wide scale in archaea, we have established non-radioactive pulse labeling of RNA, using the nucleotide analog 4-thiouracil (4TU) in two commonly used model archaea: the halophile Euryarchaeota Haloferax volcanii, and the thermo-acidophile Crenarchaeota Sulfolobus acidocaldarius. In this work, we show that 4TU pulse labeling can be efficiently performed in these two organisms in a dose- and time-dependent manner. In addition, our results suggest that uracil prototrophy had no critical impact on the overall 4TU incorporation in RNA molecules. Accordingly, our work suggests that 4TU incorporation can be widely performed in archaea, thereby expanding the molecular toolkit to analyze archaeal gene expression network dynamic in unprecedented detail.
Collapse
Affiliation(s)
- Robert Knüppel
- Biochemistry III, Institute for Biochemistry, Genetics and Microbiology, University of Regensburg Regensburg, Germany
| | - Corinna Kuttenberger
- Biochemistry III, Institute for Biochemistry, Genetics and Microbiology, University of Regensburg Regensburg, Germany
| | - Sébastien Ferreira-Cerca
- Biochemistry III, Institute for Biochemistry, Genetics and Microbiology, University of Regensburg Regensburg, Germany
| |
Collapse
|
14
|
Abstract
Genetic comparison of the effects of mutant and wild-type alleles is a powerful way to define gene function. But those few disease-causing variants that provide qualitatively different insights into the disease mechanisms of more common sporadic diseases have the greatest translational value.
Collapse
|
15
|
Abstract
The field of machine learning, which aims to develop computer algorithms that improve with experience, holds promise to enable computers to assist humans in the analysis of large, complex data sets. Here, we provide an overview of machine learning applications for the analysis of genome sequencing data sets, including the annotation of sequence elements and epigenetic, proteomic or metabolomic data. We present considerations and recurrent challenges in the application of supervised, semi-supervised and unsupervised machine learning methods, as well as of generative and discriminative modelling approaches. We provide general guidelines to assist in the selection of these machine learning methods and their practical application for the analysis of genetic and genomic data sets.
Collapse
Affiliation(s)
- Maxwell W Libbrecht
- Department of Computer Science and Engineering, University of Washington, 185 Stevens Way, Seattle, Washington 98195-2350, USA
| | - William Stafford Noble
- 1] Department of Computer Science and Engineering, University of Washington, 185 Stevens Way, Seattle, Washington 98195-2350, USA. [2] Department of Genome Sciences, University of Washington, 3720 15th Ave NE Seattle, Washington 98195-5065, USA
| |
Collapse
|
16
|
Valentini G, Paccanaro A, Caniza H, Romero AE, Re M. An extensive analysis of disease-gene associations using network integration and fast kernel-based gene prioritization methods. Artif Intell Med 2014; 61:63-78. [PMID: 24726035 PMCID: PMC4070077 DOI: 10.1016/j.artmed.2014.03.003] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2013] [Revised: 03/05/2014] [Accepted: 03/10/2014] [Indexed: 02/07/2023]
Abstract
OBJECTIVE In the context of "network medicine", gene prioritization methods represent one of the main tools to discover candidate disease genes by exploiting the large amount of data covering different types of functional relationships between genes. Several works proposed to integrate multiple sources of data to improve disease gene prioritization, but to our knowledge no systematic studies focused on the quantitative evaluation of the impact of network integration on gene prioritization. In this paper, we aim at providing an extensive analysis of gene-disease associations not limited to genetic disorders, and a systematic comparison of different network integration methods for gene prioritization. MATERIALS AND METHODS We collected nine different functional networks representing different functional relationships between genes, and we combined them through both unweighted and weighted network integration methods. We then prioritized genes with respect to each of the considered 708 medical subject headings (MeSH) diseases by applying classical guilt-by-association, random walk and random walk with restart algorithms, and the recently proposed kernelized score functions. RESULTS The results obtained with classical random walk algorithms and the best single network achieved an average area under the curve (AUC) across the 708 MeSH diseases of about 0.82, while kernelized score functions and network integration boosted the average AUC to about 0.89. Weighted integration, by exploiting the different "informativeness" embedded in different functional networks, outperforms unweighted integration at 0.01 significance level, according to the Wilcoxon signed rank sum test. For each MeSH disease we provide the top-ranked unannotated candidate genes, available for further bio-medical investigation. CONCLUSIONS Network integration is necessary to boost the performances of gene prioritization methods. Moreover the methods based on kernelized score functions can further enhance disease gene ranking results, by adopting both local and global learning strategies, able to exploit the overall topology of the network.
Collapse
Affiliation(s)
- Giorgio Valentini
- AnacletoLab - Dipartimento di Informatica, Università degli Studi di Milano, via Comelico 39/41, 20135 Milano, Italy.
| | - Alberto Paccanaro
- Department of Computer Science and Centre for Systems and Synthetic Biology, Royal Holloway, University of London, Egham TW20 0EX, UK
| | - Horacio Caniza
- Department of Computer Science and Centre for Systems and Synthetic Biology, Royal Holloway, University of London, Egham TW20 0EX, UK
| | - Alfonso E Romero
- Department of Computer Science and Centre for Systems and Synthetic Biology, Royal Holloway, University of London, Egham TW20 0EX, UK
| | - Matteo Re
- AnacletoLab - Dipartimento di Informatica, Università degli Studi di Milano, via Comelico 39/41, 20135 Milano, Italy
| |
Collapse
|
17
|
Re M, Valentini G. Network-based drug ranking and repositioning with respect to DrugBank therapeutic categories. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013; 10:1359-1371. [PMID: 24407295 DOI: 10.1109/tcbb.2013.62] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Drug repositioning is a challenging computational problem involving the integration of heterogeneous sources of biomolecular data and the design of label ranking algorithms able to exploit the overall topology of the underlying pharmacological network. In this context, we propose a novel semisupervised drug ranking problem: prioritizing drugs in integrated biochemical networks according to specific DrugBank therapeutic categories. Algorithms for drug repositioning usually perform the inference step into an inhomogeneous similarity space induced by the relationships existing between drugs and a second type of entity (e.g., disease, target, ligand set), thus making unfeasible a drug ranking within a homogeneous pharmacological space. To deal with this problem, we designed a general framework based on bipartite network projections by which homogeneous pharmacological networks can be constructed and integrated from heterogeneous and complementary sources of chemical, biomolecular and clinical information. Moreover, we present a novel algorithmic scheme based on kernelized score functions that adopts both local and global learning strategies to effectively rank drugs in the integrated pharmacological space using different network combination methods. Detailed experiments with more than 80 DrugBank therapeutic categories involving about 1,300 FDA-approved drugs show the effectiveness of the proposed approach.
Collapse
Affiliation(s)
- Matteo Re
- Universita degli Studi di Milano, Milano
| | | |
Collapse
|
18
|
Dutkowski J, Kramer M, Surma MA, Balakrishnan R, Cherry JM, Krogan NJ, Ideker T. A gene ontology inferred from molecular networks. Nat Biotechnol 2013; 31:38-45. [PMID: 23242164 DOI: 10.1038/nbt.2463] [Citation(s) in RCA: 124] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2012] [Indexed: 12/20/2022]
Abstract
Ontologies have proven very useful for capturing knowledge as a hierarchy of terms and their interrelationships. In biology a major challenge has been to construct ontologies of gene function given incomplete biological knowledge and inconsistencies in how this knowledge is manually curated. Here we show that large networks of gene and protein interactions in Saccharomyces cerevisiae can be used to infer an ontology whose coverage and power are equivalent to those of the manually curated Gene Ontology (GO). The network-extracted ontology (NeXO) contains 4,123 biological terms and 5,766 term-term relations, capturing 58% of known cellular components. We also explore robust NeXO terms and term relations that were initially not cataloged in GO, a number of which have now been added based on our analysis. Using quantitative genetic interaction profiling and chemogenomics, we find further support for many of the uncharacterized terms identified by NeXO, including multisubunit structures related to protein trafficking or mitochondrial function. This work enables a shift from using ontologies to evaluate data to using data to construct and evaluate ontologies.
Collapse
Affiliation(s)
- Janusz Dutkowski
- Department of Medicine, University of California San Diego, La Jolla, California, USA.
| | | | | | | | | | | | | |
Collapse
|
19
|
Lee I. Network approaches to the genetic dissection of phenotypes in animals and humans. Anim Cells Syst (Seoul) 2013. [DOI: 10.1080/19768354.2013.789076] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022] Open
|
20
|
Abstract
To what extent can variation in phenotypic traits such as disease risk be accurately predicted in individuals? In this Review, I highlight recent studies in model organisms that are relevant both to the challenge of accurately predicting phenotypic variation from individual genome sequences ('whole-genome reverse genetics') and for understanding why, in many cases, this may be impossible. These studies argue that only by combining genetic knowledge with in vivo measurements of biological states will it be possible to make accurate genetic predictions for individual humans.
Collapse
|
21
|
Alshalalfa M, Bader GD, Goldenberg A, Morris Q, Alhajj R. Detecting microRNAs of high influence on protein functional interaction networks: a prostate cancer case study. BMC SYSTEMS BIOLOGY 2012; 6:112. [PMID: 22929553 PMCID: PMC3490713 DOI: 10.1186/1752-0509-6-112] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/23/2012] [Accepted: 08/14/2012] [Indexed: 11/10/2022]
Abstract
BACKGROUND The use of biological molecular network information for diagnostic and prognostic purposes and elucidation of molecular disease mechanism is a key objective in systems biomedicine. The network of regulatory miRNA-target and functional protein interactions is a rich source of information to elucidate the function and the prognostic value of miRNAs in cancer. The objective of this study is to identify miRNAs that have high influence on target protein complexes in prostate cancer as a case study. This could provide biomarkers or therapeutic targets relevant for prostate cancer treatment. RESULTS Our findings demonstrate that a miRNA's functional role can be explained by its target protein connectivity within a physical and functional interaction network. To detect miRNAs with high influence on target protein modules, we integrated miRNA and mRNA expression profiles with a sequence based miRNA-target network and human functional and physical protein interactions (FPI). miRNAs with high influence on target protein complexes play a role in prostate cancer progression and are promising diagnostic or prognostic biomarkers. We uncovered several miRNA-regulated protein modules which were enriched in focal adhesion and prostate cancer genes. Several miRNAs such as miR-96, miR-182, and miR-143 demonstrated high influence on their target protein complexes and could explain most of the gene expression changes in our analyzed prostate cancer data set. CONCLUSIONS We describe a novel method to identify active miRNA-target modules relevant to prostate cancer progression and outcome. miRNAs with high influence on protein networks are valuable biomarkers that can be used in clinical investigations for prostate cancer treatment.
Collapse
|
22
|
Greene CS, Troyanskaya OG. Accurate evaluation and analysis of functional genomics data and methods. Ann N Y Acad Sci 2012; 1260:95-100. [PMID: 22268703 DOI: 10.1111/j.1749-6632.2011.06383.x] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
The development of technology capable of inexpensively performing large-scale measurements of biological systems has generated a wealth of data. Integrative analysis of these data holds the promise of uncovering gene function, regulation, and, in the longer run, understanding complex disease. However, their analysis has proved very challenging, as it is difficult to quickly and effectively assess the relevance and accuracy of these data for individual biological questions. Here, we identify biases that present challenges for the assessment of functional genomics data and methods. We then discuss evaluation methods that, taken together, begin to address these issues. We also argue that the funding of systematic data-driven experiments and of high-quality curation efforts will further improve evaluation metrics so that they more-accurately assess functional genomics data and methods. Such metrics will allow researchers in the field of functional genomics to continue to answer important biological questions in a data-driven manner.
Collapse
Affiliation(s)
- Casey S Greene
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, USA.
| | | |
Collapse
|
23
|
Quantitative utilization of prior biological knowledge in the Bayesian network modeling of gene expression data. BMC Bioinformatics 2011; 12:359. [PMID: 21884587 PMCID: PMC3203352 DOI: 10.1186/1471-2105-12-359] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2011] [Accepted: 08/31/2011] [Indexed: 01/22/2023] Open
Abstract
Background Bayesian Network (BN) is a powerful approach to reconstructing genetic regulatory networks from gene expression data. However, expression data by itself suffers from high noise and lack of power. Incorporating prior biological knowledge can improve the performance. As each type of prior knowledge on its own may be incomplete or limited by quality issues, integrating multiple sources of prior knowledge to utilize their consensus is desirable. Results We introduce a new method to incorporate the quantitative information from multiple sources of prior knowledge. It first uses the Naïve Bayesian classifier to assess the likelihood of functional linkage between gene pairs based on prior knowledge. In this study we included cocitation in PubMed and schematic similarity in Gene Ontology annotation. A candidate network edge reservoir is then created in which the copy number of each edge is proportional to the estimated likelihood of linkage between the two corresponding genes. In network simulation the Markov Chain Monte Carlo sampling algorithm is adopted, and samples from this reservoir at each iteration to generate new candidate networks. We evaluated the new algorithm using both simulated and real gene expression data including that from a yeast cell cycle and a mouse pancreas development/growth study. Incorporating prior knowledge led to a ~2 fold increase in the number of known transcription regulations recovered, without significant change in false positive rate. In contrast, without the prior knowledge BN modeling is not always better than a random selection, demonstrating the necessity in network modeling to supplement the gene expression data with additional information. Conclusion our new development provides a statistical means to utilize the quantitative information in prior biological knowledge in the BN modeling of gene expression data, which significantly improves the performance.
Collapse
|
24
|
Airoldi EM, Heller KA, Silva R. Small sets of interacting proteins suggest functional linkage mechanisms via Bayesian analogical reasoning. Bioinformatics 2011; 27:i374-82. [PMID: 21685095 PMCID: PMC3117334 DOI: 10.1093/bioinformatics/btr236] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Proteins and protein complexes coordinate their activity to execute cellular functions. In a number of experimental settings, including synthetic genetic arrays, genetic perturbations and RNAi screens, scientists identify a small set of protein interactions of interest. A working hypothesis is often that these interactions are the observable phenotypes of some functional process, which is not directly observable. Confirmatory analysis requires finding other pairs of proteins whose interaction may be additional phenotypical evidence about the same functional process. Extant methods for finding additional protein interactions rely heavily on the information in the newly identified set of interactions. For instance, these methods leverage the attributes of the individual proteins directly, in a supervised setting, in order to find relevant protein pairs. A small set of protein interactions provides a small sample to train parameters of prediction methods, thus leading to low confidence. RESULTS We develop RBSets, a computational approach to ranking protein interactions rooted in analogical reasoning; that is, the ability to learn and generalize relations between objects. Our approach is tailored to situations where the training set of protein interactions is small, and leverages the attributes of the individual proteins indirectly, in a Bayesian ranking setting that is perhaps closest to propensity scoring in mathematical psychology. We find that RBSets leads to good performance in identifying additional interactions starting from a small evidence set of interacting proteins, for which an underlying biological logic in terms of functional processes and signaling pathways can be established with some confidence. Our approach is scalable and can be applied to large databases with minimal computational overhead. Our results suggest that analogical reasoning within a Bayesian ranking problem is a promising new approach for real-time biological discovery. AVAILABILITY Java code is available at: www.gatsby.ucl.ac.uk/~rbas. CONTACT airoldi@fas.harvard.edu; kheller@mit.edu; ricardo@stats.ucl.ac.uk.
Collapse
Affiliation(s)
- Edoardo M Airoldi
- Department of Statistics and FAS Center for Systems Biology, Harvard University, Cambridge, MA 02138, USA.
| | | | | |
Collapse
|
25
|
Overton IM, Graham S, Gould KA, Hinds J, Botting CH, Shirran S, Barton GJ, Coote PJ. Global network analysis of drug tolerance, mode of action and virulence in methicillin-resistant S. aureus. BMC SYSTEMS BIOLOGY 2011; 5:68. [PMID: 21569391 PMCID: PMC3123200 DOI: 10.1186/1752-0509-5-68] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/16/2010] [Accepted: 05/12/2011] [Indexed: 02/08/2023]
Abstract
BACKGROUND Staphylococcus aureus is a major human pathogen and strains resistant to existing treatments continue to emerge. Development of novel treatments is therefore important. Antimicrobial peptides represent a source of potential novel antibiotics to combat resistant bacteria such as Methicillin-Resistant Staphylococcus aureus (MRSA). A promising antimicrobial peptide is ranalexin, which has potent activity against Gram-positive bacteria, and particularly S. aureus. Understanding mode of action is a key component of drug discovery and network biology approaches enable a global, integrated view of microbial physiology, including mechanisms of antibiotic killing. We developed a systems-wide functional association network approach to integrate proteome and transcriptome profiles, enabling study of drug resistance and mode of action. RESULTS The functional association network was constructed by Bayesian logistic regression, providing a framework for identification of antimicrobial peptide (ranalexin) response modules from S. aureus MRSA-252 transcriptome and proteome profiling. These signatures of ranalexin treatment revealed multiple killing mechanisms, including cell wall activity. Cell wall effects were supported by gene disruption and osmotic fragility experiments. Furthermore, twenty-two novel virulence factors were inferred, while the VraRS two-component system and PhoU-mediated persister formation were implicated in MRSA tolerance to cationic antimicrobial peptides. CONCLUSIONS This work demonstrates a powerful integrative approach to study drug resistance and mode of action. Our findings are informative to the development of novel therapeutic strategies against Staphylococcus aureus and particularly MRSA.
Collapse
Affiliation(s)
- Ian M Overton
- Biomedical Systems Analysis, MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, Western General Hospital, Crewe Road, Edinburgh EH4 2XU, UK.
| | | | | | | | | | | | | | | |
Collapse
|
26
|
Lee I. Probabilistic functional gene societies. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2011; 106:435-42. [PMID: 21281658 DOI: 10.1016/j.pbiomolbio.2011.01.003] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/13/2010] [Revised: 12/29/2010] [Accepted: 01/18/2011] [Indexed: 11/25/2022]
Abstract
A cellular system may be viewed as a social network of genes. Genes work together to conduct physiological processes in the cells. Thus if we have a view of the functional association among genes, we may also be able to unravel the association between genotypes and phenotypes; the emergent properties of interactive activities of genes. We could have various points of view for a gene network. Perhaps the most common standpoints are protein-protein interaction networks (PPIN) and transcriptional regulatory networks (TRN). Here I introduce another type of view for the gene network; the probabilistic functional gene network (PFGN). A 'functional view' of association between genes enables us to have a holistic model of the gene society. A 'probabilistic view' makes the model of gene associations derived from noisy high-throughput data more robust. In addition, the dynamics of gene association may be presented in a single static network model by the probabilistic view. By combining the two modeling views, the probabilistic functional gene networks have been constructed for various organisms and proved to be highly useful in generating novel biological hypotheses not only for simple unicellular microbes, but also for highly complex multicellular animals and plants.
Collapse
Affiliation(s)
- Insuk Lee
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, 262 Seongsanno, Seodaemun-gu, Seoul 120-749, Republic of Korea.
| |
Collapse
|
27
|
Abstract
A large number of genome-scale networks, including protein-protein and genetic interaction networks, are now available for several organisms. In parallel, many studies have focused on analyzing, characterizing, and modeling these networks. Beyond investigating the topological characteristics such as degree distribution, clustering coefficient, and average shortest-path distance, another area of particular interest is the prediction of nodes (genes) with a given characteristic (labels) - for example prediction of genes that cause a particular phenotype or have a given function. In this chapter, we describe methods and algorithms for predicting node labels from network-based datasets with an emphasis on label propagation algorithms (LPAs) and their relation to local neighborhood methods.
Collapse
Affiliation(s)
- Sara Mostafavi
- Department of Computer Science, Centre for Cellular and Biomolecular Research (CCBR), University of Toronto, Toronto, ON, Canada
| | | | | |
Collapse
|
28
|
Zhang ZG, Ye ZQ, Yu L, Shi P. Phylogenomic reconstruction of lactic acid bacteria: an update. BMC Evol Biol 2011; 11:1. [PMID: 21194491 PMCID: PMC3024227 DOI: 10.1186/1471-2148-11-1] [Citation(s) in RCA: 109] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2010] [Accepted: 01/01/2011] [Indexed: 01/28/2023] Open
Abstract
Background Lactic acid bacteria (LAB) are important in the food industry for the production of fermented food products and in human health as commensals in the gut. However, the phylogenetic relationships among LAB species remain under intensive debate owing to disagreements among different data sets. Results We performed a phylogenetic analysis of LAB species based on 232 genes from 28 LAB genome sequences. Regardless of the tree-building methods used, combined analyses yielded an identical, well-resolved tree topology with strong supports for all nodes. The LAB species examined were divided into two groups. Group 1 included families Enterococcaceae and Streptococcaceae. Group 2 included families Lactobacillaceae and Leuconostocaceae. Within Group 2, the LAB species were divided into two clades. One clade comprised of the acidophilus complex of genus Lactobacillus and two other species, Lb. sakei and Lb. casei. In the acidophilus complex, Lb. delbrueckii separated first, while Lb. acidophilus/Lb. helveticus and Lb. gasseri/Lb. johnsonii were clustered into a sister group. The other clade within Group 2 consisted of the salivarius subgroup, including five species, Lb. salivarius, Lb. plantarum, Lb. brevis, Lb. reuteri, Lb. fermentum, and the genera Pediococcus, Oenococcus, and Leuconostoc. In this clade, Lb. salivarius was positioned most basally, followed by two clusters, one corresponding to Lb. plantarum/Lb. brevis pair and Pediococcus, and the other including Oenococcus/Leuconostoc pair and Lb. reuteri/Lb. fermentum pair. In addition, phylogenetic utility of the 232 genes was analyzed to identify those that may be more useful than others. The genes identified as useful were related to translation and ribosomal structure and biogenesis (TRSB), and a three-gene set comprising genes encoding ultra-violet resistance protein B (uvrB), DNA polymerase III (polC) and penicillin binding protein 2B (pbpB). Conclusions Our phylogenomic analyses provide important insights into the evolution and diversification of LAB species, and also revealed the phylogenetic utility of several genes. We infer that the occurrence of multiple, independent adaptation events in LAB species, have resulted in their occupation of various habitats. Further analyses of more genes from additional, representative LAB species are needed to reveal the molecular mechanisms underlying adaptation of LAB species to various environmental niches.
Collapse
Affiliation(s)
- Zhi-Gang Zhang
- State Key Laboratory of Genetic Resources and Evolution, Laboratory of Evolutionary and Functional Genomics, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, PR China
| | | | | | | |
Collapse
|
29
|
Information technology solutions for integration of biomolecular and clinical data in the identification of new cancer biomarkers and targets for therapy. Pharmacol Ther 2010; 128:488-98. [PMID: 20832425 DOI: 10.1016/j.pharmthera.2010.08.012] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2010] [Accepted: 08/24/2010] [Indexed: 02/04/2023]
Abstract
The quest for new cancer biomarkers and targets for therapy requires not only the aggregation and analysis of heterogeneous biomolecular data but also integration of clinical data. In this review we highlight information technology solutions for the integration of biomolecular and clinical data and focus on a solution at the departmental level, i.e., decentralized and medium-scale solution for groups of labs working on a specific topic. Both, hardware and software requirements are described as well as bioinformatics methods and tools for the data analysis. The highlighted IT solutions include storage architecture, high-performance computing, and application servers. Additionally, following computational approaches for data integration are reviewed: data aggregation, integrative data analysis including methodological aspects as well as examples, biomolecular pathways and network reconstruction, and mathematical modelling. Finally, a case study in cancer immunology including the used computational methods is shown, demonstrating how IT solutions for integrating biomolecular and clinical data can help to identify new cancer biomarkers for improving diagnosis and predicting clinical outcome.
Collapse
|
30
|
Schaid DJ. Genomic similarity and kernel methods II: methods for genomic information. Hum Hered 2010; 70:132-40. [PMID: 20606458 DOI: 10.1159/000312643] [Citation(s) in RCA: 63] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2009] [Accepted: 03/09/2010] [Indexed: 11/19/2022] Open
Abstract
Measures of genomic similarity are often the basis of flexible statistical analyses, and when based on kernel methods, they provide a powerful platform to take advantage of a broad and deep statistical theory, and a wide range of existing software; see the companion paper for a review of this material [1]. The kernel method converts information - perhaps complex or high-dimensional information - for a pair of subjects to a quantitative value representing either similarity or dissimilarity, with the requirement that it must create a positive semidefinite matrix when applied to all pairs of subjects. This approach provides enormous opportunities to enhance genetic analyses by including a wide range of publically-available data as structured kernel 'prior' information. Kernel methods are appealing for their generality, yet this generality can make it challenging to formulate measures of similarity that directly address a specific scientific aim, or that are most powerful to detect a specific genetic mechanism. Although it is difficult to create a cook book of kernels for genetic studies, useful guidelines can be gleaned from a variety of novel published approaches. We review some novel developments of kernels for specific analyses and speculate on how to build kernels for complex genomic attributes based on publically available data. The creativity of analysts, with rigorous evaluations by applications to real and simulated data, will ultimately provide a much stronger array of kernel 'tools' for genetic analyses.
Collapse
Affiliation(s)
- Daniel J Schaid
- Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, Minn., USA
| |
Collapse
|
31
|
Lee I, Lehner B, Vavouri T, Shin J, Fraser AG, Marcotte EM. Predicting genetic modifier loci using functional gene networks. Genome Res 2010; 20:1143-53. [PMID: 20538624 DOI: 10.1101/gr.102749.109] [Citation(s) in RCA: 69] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Most phenotypes are genetically complex, with contributions from mutations in many different genes. Mutations in more than one gene can combine synergistically to cause phenotypic change, and systematic studies in model organisms show that these genetic interactions are pervasive. However, in human association studies such nonadditive genetic interactions are very difficult to identify because of a lack of statistical power--simply put, the number of potential interactions is too vast. One approach to resolve this is to predict candidate modifier interactions between loci, and then to specifically test these for associations with the phenotype. Here, we describe a general method for predicting genetic interactions based on the use of integrated functional gene networks. We show that in both Saccharomyces cerevisiae and Caenorhabditis elegans a single high-coverage, high-quality functional network can successfully predict genetic modifiers for the majority of genes. For C. elegans we also describe the construction of a new, improved, and expanded functional network, WormNet 2. Using this network we demonstrate how it is possible to rapidly expand the number of modifier loci known for a gene, predicting and validating new genetic interactions for each of three signal transduction genes. We propose that this approach, termed network-guided modifier screening, provides a general strategy for predicting genetic interactions. This work thus suggests that a high-quality integrated human gene network will provide a powerful resource for modifier locus discovery in many different diseases.
Collapse
Affiliation(s)
- Insuk Lee
- Department of Biotechnology, College of Life science and Biotechnology, Yonsei University, Seodaemun-ku, Seoul 120-749, South Korea.
| | | | | | | | | | | |
Collapse
|
32
|
Triviño JC, Pazos F. Quantitative global studies of reactomes and metabolomes using a vectorial representation of reactions and chemical compounds. BMC SYSTEMS BIOLOGY 2010; 4:46. [PMID: 20406431 PMCID: PMC2883543 DOI: 10.1186/1752-0509-4-46] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/27/2009] [Accepted: 04/20/2010] [Indexed: 12/02/2022]
Abstract
Background Global studies of the protein repertories of organisms are providing important information on the characteristics of the protein space. Many of these studies entail classification of the protein repertory on the basis of structure and/or sequence similarities. The situation is different for metabolism. Because there is no good way of measuring similarities between chemical reactions, there is a barrier to the development of global classifications of "metabolic space" and subsequent studies comparable to those done for protein sequences and structures. Results In this work, we propose a vectorial representation of chemical reactions, which allows them to be compared and classified. In this representation, chemical compounds, reactions and pathways may be represented in the same vectorial space. We show that the representation of chemical compounds reflects their physicochemical properties and can be used for predictive purposes. We use the vectorial representations of reactions to perform a global classification of the reactome of the model organism E. coli. Conclusions We show that this unsupervised clustering results in groups of enzymes more coherent in biological terms than equivalent groupings obtained from the EC hierarchy. This hierarchical clustering produces an optimal set of 21 groups which we analyzed for their biological meaning.
Collapse
Affiliation(s)
- Juan C Triviño
- Computational Systems Biology Group, National Centre for Biotechnology (CNB-CSIC), C/Darwin, 3, Cantoblanco, 28049 Madrid, Spain
| | | |
Collapse
|
33
|
|
34
|
Abstract
Networks in biology can appear complex and difficult to decipher. We illustrate how to interpret biological networks with the help of frequently used visualization and analysis patterns.
Collapse
|
35
|
Holmans P. Statistical methods for pathway analysis of genome-wide data for association with complex genetic traits. ADVANCES IN GENETICS 2010; 72:141-79. [PMID: 21029852 DOI: 10.1016/b978-0-12-380862-2.00007-2] [Citation(s) in RCA: 78] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
A number of statistical methods have been developed to test for associations between pathways (collections of genes related biologically) and complex genetic traits. Pathway analysis methods were originally developed for analyzing gene expression data, but recently methods have been developed to perform pathway analysis on genome-wide association study (GWAS) data. The purpose of this review is to give an overview of these methods, enabling the reader to gain an understanding of what pathway analysis involves, and to select the method most suited to their purposes. This review describes the various types of statistical methods for pathway analysis, detailing the strengths and weaknesses of each. Factors influencing the power of pathway analyses, such as gene coverage and choice of pathways to analyze, are discussed, as well as various unresolved statistical issues. Finally, a list of computer programs for performing pathway analysis on genome-wide association data is provided.
Collapse
Affiliation(s)
- Peter Holmans
- Biostatistics and Bioinformatics Unit, MRC Centre for Neuropsychiatric Genetics and Genomics, Department of Psychological Medicine and Neurology, Cardiff University School of Medicine, Heath Park, Cardiff, United Kingdom
| |
Collapse
|
36
|
Costello JC, Dalkilic MM, Beason SM, Gehlhausen JR, Patwardhan R, Middha S, Eads BD, Andrews JR. Gene networks in Drosophila melanogaster: integrating experimental data to predict gene function. Genome Biol 2009; 10:R97. [PMID: 19758432 PMCID: PMC2768986 DOI: 10.1186/gb-2009-10-9-r97] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2009] [Revised: 08/17/2009] [Accepted: 09/16/2009] [Indexed: 12/27/2022] Open
Abstract
BACKGROUND Discovering the functions of all genes is a central goal of contemporary biomedical research. Despite considerable effort, we are still far from achieving this goal in any metazoan organism. Collectively, the growing body of high-throughput functional genomics data provides evidence of gene function, but remains difficult to interpret. RESULTS We constructed the first network of functional relationships for Drosophila melanogaster by integrating most of the available, comprehensive sets of genetic interaction, protein-protein interaction, and microarray expression data. The complete integrated network covers 85% of the currently known genes, which we refined to a high confidence network that includes 20,000 functional relationships among 5,021 genes. An analysis of the network revealed a remarkable concordance with prior knowledge. Using the network, we were able to infer a set of high-confidence Gene Ontology biological process annotations on 483 of the roughly 5,000 previously unannotated genes. We also show that this approach is a means of inferring annotations on a class of genes that cannot be annotated based solely on sequence similarity. Lastly, we demonstrate the utility of the network through reanalyzing gene expression data to both discover clusters of coregulated genes and compile a list of candidate genes related to specific biological processes. CONCLUSIONS Here we present the the first genome-wide functional gene network in D. melanogaster. The network enables the exploration, mining, and reanalysis of experimental data, as well as the interpretation of new data. The inferred annotations provide testable hypotheses of previously uncharacterized genes.
Collapse
Affiliation(s)
- James C Costello
- School of Informatics, Indiana University, E. Tenth St, Bloomington, Indiana 47408, USA
- Department of Biology, Indiana University, E. Third St, Bloomington, Indiana 47405, USA
| | - Mehmet M Dalkilic
- School of Informatics, Indiana University, E. Tenth St, Bloomington, Indiana 47408, USA
- Center for Genomics and Bioinformatics, Indiana University, E. Third St., Bloomington, Indiana 47405, USA
| | - Scott M Beason
- School of Informatics, Indiana University, E. Tenth St, Bloomington, Indiana 47408, USA
| | - Jeff R Gehlhausen
- School of Informatics, Indiana University, E. Tenth St, Bloomington, Indiana 47408, USA
| | - Rupali Patwardhan
- Center for Genomics and Bioinformatics, Indiana University, E. Third St., Bloomington, Indiana 47405, USA
- Current address: Department of Genome Sciences, University of Washington, NE Pacific St, Seattle, Washington 98195-5065, USA
| | - Sumit Middha
- Center for Genomics and Bioinformatics, Indiana University, E. Third St., Bloomington, Indiana 47405, USA
- Current address: Bioinformatics Core, Mayo Clinic, First St SW, Rochester, Minnesota 55905, USA
| | - Brian D Eads
- Department of Biology, Indiana University, E. Third St, Bloomington, Indiana 47405, USA
| | - Justen R Andrews
- School of Informatics, Indiana University, E. Tenth St, Bloomington, Indiana 47408, USA
- Department of Biology, Indiana University, E. Third St, Bloomington, Indiana 47405, USA
| |
Collapse
|
37
|
Friedlander MJ, Torres-Reveron J. The changing roles of neurons in the cortical subplate. Front Neuroanat 2009; 3:15. [PMID: 19688111 PMCID: PMC2727405 DOI: 10.3389/neuro.05.015.2009] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2009] [Accepted: 07/24/2009] [Indexed: 11/28/2022] Open
Abstract
Neurons may serve different functions over the course of an organism's life. Recent evidence suggests that cortical subplate (SP) neurons including those that reside in the white matter may perform longitudinal multi-tasking at different stages of development. These cells play a key role in early cortical development in coordinating thalamocortical reciprocal innervation. At later stages of development, they become integrated within the cortical microcircuitry. This type of longitudinal multi-tasking can enhance the capacity for information processing by populations of cells serving different functions over the lifespan. Subplate cells are initially derived when cells from the ventricular zone underlying the cortex migrate to the cortical preplate that is subsequently split by the differentiating neurons of the cortical plate with some neurons locating in the marginal zone and others settling below in the SP. While the cortical plate neurons form most of the cortical layers (layers 2–6), the marginal zone neurons form layer 1 and the SP neurons become interstitial cells of the white matter as well as forming a compact sublayer along the bottom of layer 6. After serving as transient innervation targets for thalamocortical axons, most of these cells die and layer 4 neurons become innervated by thalamic axons. However, 10–20% survives, remaining into adulthood along the bottom of layer 6 and as a scattered population of interstitial neurons in the white matter. Surviving SP cells' axons project throughout the overlying laminae, reaching layer 1 and issuing axon collaterals within white matter and in lower layer 6. This suggests that they participate in local synaptic networks, as well. Moreover, they receive excitatory and inhibitory synaptic inputs, potentially monitoring outputs from axon collaterals of cortical efferents, from cortical afferents and/or from each other. We explore our understanding of the functional connectivity of these cells at different stages of development.
Collapse
|
38
|
A comparative genomics, network-based approach to understanding virulence in Vibrio cholerae. J Bacteriol 2009; 191:6262-72. [PMID: 19666715 DOI: 10.1128/jb.00475-09] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Our views of the genes that drive phenotypes have generally been built up one locus or operon at a time. However, a given phenotype, such as virulence, is a multilocus phenomenon. To gain a more comprehensive view of the genes and interactions underlying a phenotype, we propose an approach that incorporates information from comparative genomics and network biology and illustrate it by examining the virulence phenotype of Vibrio cholerae O1 El Tor N16961. We assessed the associations among the virulence-associated proteins from Vibrio cholerae and all the other proteins from this bacterium using a functional-association network map. In the context of this map, we were able to identify 262 proteins that are functionally linked to the virulence-associated genes more closely than is typical of the proteins in this strain and 240 proteins that are functionally linked to the virulence-associated proteins with a confidence score greater than 0.9. The roles of these genes were investigated using functional information from online data sources, comparative genomics, and the relationships shown by the protein association map. We also incorporated core proteome data from the family Vibrionaceae; 35% of the virulence-associated proteins have orthologs among the 1,822 orthologous groups of proteins in the core proteome, indicating that they may be dual-role virulence genes or encode functions that have value outside the human host. This approach is a valuable tool in searching for novel functional associations and in investigating the relationship between genotype and phenotype.
Collapse
|
39
|
Jia F, Gampala SS, Mittal A, Luo Q, Rock CD. Cre-lox univector acceptor vectors for functional screening in protoplasts: analysis of Arabidopsis donor cDNAs encoding ABSCISIC ACID INSENSITIVE1-like protein phosphatases. PLANT MOLECULAR BIOLOGY 2009; 70:693-708. [PMID: 19499346 PMCID: PMC2755202 DOI: 10.1007/s11103-009-9502-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/13/2009] [Accepted: 05/15/2009] [Indexed: 05/27/2023]
Abstract
The 14,200 available full length Arabidopsis thaliana cDNAs in the universal plasmid system (UPS) donor vector pUNI51 should be applied broadly and efficiently to leverage a "functional map-space" of homologous plant genes. We have engineered Cre-lox UPS host acceptor vectors (pCR701- 705) with N-terminal epitope tags in frame with the loxH site and downstream from the maize Ubiquitin promoter for use in transient protoplast expression assays and particle bombardment transformation of monocots. As an example of the utility of these vectors, we recombined them with several Arabidopsis cDNAs encoding Ser/Thr protein phosphatase type 2C (PP2Cs) known from genetic studies or predicted by hierarchical clustering meta-analysis to be involved in ABA and stress responses. Our functional results in Zea mays mesophyll protoplasts on ABA-inducible expression effects on the Late Embryogenesis Abundant promoter ProEm:GUS reporter were consistent with predictions and resulted in identification of novel activities of some PP2Cs. Deployment of these vectors can facilitate functional genomics and proteomics and identification of novel gene activities.
Collapse
Affiliation(s)
- Fan Jia
- Department of Biological Sciences, Texas Tech University. Lubbock TX, U. S. A. 79409-3131
| | | | - Amandeep Mittal
- Department of Biological Sciences, Texas Tech University. Lubbock TX, U. S. A. 79409-3131
| | - Qingjun Luo
- Department of Biological Sciences, Texas Tech University. Lubbock TX, U. S. A. 79409-3131
| | - Christopher D. Rock
- Department of Biological Sciences, Texas Tech University. Lubbock TX, U. S. A. 79409-3131
| |
Collapse
|
40
|
Babu M, Musso G, Díaz-Mejía JJ, Butland G, Greenblatt JF, Emili A. Systems-level approaches for identifying and analyzing genetic interaction networks in Escherichia coli and extensions to other prokaryotes. MOLECULAR BIOSYSTEMS 2009; 5:1439-55. [PMID: 19763343 DOI: 10.1039/b907407d] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Molecular interactions define the functional organization of the cell. Epistatic (genetic, or gene-gene) interactions, one of the most informative and commonly encountered forms of functional relationships, are increasingly being used to map process architecture in model eukaryotic organisms. In particular, 'systems-level' screens in yeast and worm aimed at elucidating genetic interaction networks have led to the generation of models describing the global modular organization of gene products and protein complexes within a cell. However, comparable data for prokaryotic organisms have not been available. Given its ease of growth and genetic manipulation, the Gram-negative bacterium Escherichia coli appears to be an ideal model system for performing comprehensive genome-scale examinations of genetic redundancy in bacteria. In this review, we highlight emerging experimental and computational techniques that have been developed recently to examine functional relationships and redundancy in E. coli at a systems-level, and their potential application to prokaryotes in general. Additionally, we have scanned PubMed abstracts and full-text published articles to manually curate a list of approximately 200 previously reported synthetic sick or lethal genetic interactions in E. coli derived from small-scale experimental studies.
Collapse
Affiliation(s)
- Mohan Babu
- Banting and Best Department of Medical Research, Terrence Donnelly Center for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada M5S 3E1
| | | | | | | | | | | |
Collapse
|
41
|
Widespread reorganization of metabolic enzymes into reversible assemblies upon nutrient starvation. Proc Natl Acad Sci U S A 2009; 106:10147-52. [PMID: 19502427 DOI: 10.1073/pnas.0812771106] [Citation(s) in RCA: 286] [Impact Index Per Article: 19.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Proteins are likely to organize into complexes that assemble and disassemble depending on cellular needs. When approximately 800 yeast strains expressing GFP-tagged proteins were grown to stationary phase, a surprising number of proteins involved in intermediary metabolism and stress response were observed to form punctate cytoplasmic foci. The formation of these discrete physical structures was confirmed by immunofluorescence and mass spectrometry of untagged proteins. The purine biosynthetic enzyme Ade4-GFP formed foci in the absence of adenine, and cycling between punctate and diffuse phenotypes could be controlled by adenine subtraction and addition. Similarly, glutamine synthetase (Gln1-GFP) foci cycled reversibly in the absence and presence of glucose. The structures were neither targeted for vacuolar or autophagosome degradation nor colocalized with P bodies or major organelles. Thus, upon nutrient depletion we observe widespread protein assemblies displaying nutrient-specific formation and dissolution.
Collapse
|
42
|
Pan W. Network-based multiple locus linkage analysis of expression traits. Bioinformatics 2009; 25:1390-6. [PMID: 19336446 PMCID: PMC2682520 DOI: 10.1093/bioinformatics/btp177] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2008] [Revised: 03/24/2009] [Accepted: 03/26/2009] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION We consider the problem of multiple locus linkage analysis for expression traits of genes in a pathway or a network. To capitalize on co-expression of functionally related genes, we propose a penalized regression method that maps multiple expression quantitative trait loci (eQTLs) for all related genes simultaneously while accounting for their shared functions as specified a priori by a gene pathway or network. RESULTS An analysis of a mouse dataset and simulation studies clearly demonstrate the advantage of the proposed method over a standard approach that ignores biological knowledge of gene networks.
Collapse
Affiliation(s)
- Wei Pan
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455-0378, USA.
| |
Collapse
|
43
|
Sidhu AS, Bellgard MI, Dillon TS. Classification of Information About Proteins. Bioinformatics 2009. [DOI: 10.1007/978-0-387-92738-1_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
|
44
|
Pan W. Network-based model weighting to detect multiple loci influencing complex diseases. Hum Genet 2008; 124:225-34. [PMID: 18719944 DOI: 10.1007/s00439-008-0545-1] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2008] [Accepted: 08/12/2008] [Indexed: 01/20/2023]
Abstract
For genome-wide association studies, it has been increasingly recognized that the popular locus-by-locus search for DNA variants associated with disease susceptibility may not be effective, especially when there are interactions between or among multiple loci, for which a multi-loci search strategy may be more productive. However, even if computationally feasible, a genome-wide search over all possible multiple loci requires exploring a huge model space and making costly adjustment for multiple testing, leading to reduced statistical power. On the other hand, there are accumulating data suggesting that protein products of many disease-causing genes tend to interact with each other, or cluster in the same biological pathway. To incorporate this prior knowledge and existing data on gene networks, we propose a gene network-based method to improve statistical power over that of the exhaustive search by giving higher weights to models involving genes nearby in a network. We use simulated data under realistic scenarios, including a large-scale human protein-protein interaction network and 23 known ataxia-causing genes, to demonstrate potential gain by our proposed method when disease-genes are clustered in a network.
Collapse
Affiliation(s)
- Wei Pan
- Division of Biostatistics, MMC 303, School of Public Health, University of Minnesota, Minneapolis, MN 55455-0392, USA.
| |
Collapse
|
45
|
Park JY, Cho MO, Leonard S, Calder B, Mian IS, Kim WH, Wijnhoven S, van Steeg H, Mitchell J, van der Horst GTJ, Hoeijmakers J, Cohen P, Vijg J, Suh Y. Homeostatic imbalance between apoptosis and cell renewal in the liver of premature aging Xpd mice. PLoS One 2008; 3:e2346. [PMID: 18545656 PMCID: PMC2396506 DOI: 10.1371/journal.pone.0002346] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2008] [Accepted: 05/02/2008] [Indexed: 01/08/2023] Open
Abstract
Unrepaired or misrepaired DNA damage has been implicated as a causal factor in cancer and aging. Xpd(TTD) mice, harboring defects in nucleotide excision repair and transcription due to a mutation in the Xpd gene (R722W), display severe symptoms of premature aging but have a reduced incidence of cancer. To gain further insight into the molecular basis of the mutant-specific manifestation of age-related phenotypes, we used comparative microarray analysis of young and old female livers to discover gene expression signatures distinguishing Xpd(TTD) mice from their age-matched wild type controls. We found a transcription signature of increased apoptosis in the Xpd(TTD) mice, which was confirmed by in situ immunohistochemical analysis and found to be accompanied by increased proliferation. However, apoptosis rate exceeded the rate of proliferation, resulting in homeostatic imbalance. Interestingly, a metabolic response signature was observed involving decreased energy metabolism and reduced IGF-1 signaling, a major modulator of life span. We conclude that while the increased apoptotic response to endogenous DNA damage contributes to the accelerated aging phenotypes and the reduced cancer incidence observed in the Xpd(TTD) mice, the signature of reduced energy metabolism is likely to reflect a compensatory adjustment to limit the increased genotoxic stress in these mutants. These results support a general model for premature aging in DNA repair deficient mice based on cellular responses to DNA damage that impair normal tissue homeostasis.
Collapse
Affiliation(s)
- Jung Yoon Park
- Department of Medicine, Albert Einstein College of Medicine, Bronx, New York, United States of America
- Department of Molecular Genetics, Albert Einstein College of Medicine, Bronx, New York, United States of America
| | - Mi-Ook Cho
- Department of Medicine, Albert Einstein College of Medicine, Bronx, New York, United States of America
- Department of Molecular Genetics, Albert Einstein College of Medicine, Bronx, New York, United States of America
| | - Shanique Leonard
- Department of Physiology, Barshop Institute for Longevity and Aging Studies, University of Texas Health Science Center at San Antonio, San Antonio, Texas, United States of America
| | - Brent Calder
- Buck Institute for Age Research, Novato, California, United States of America
| | - I. Saira Mian
- Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
| | - Woo Ho Kim
- Department of Pathology, Seoul National University College of Medicine, Seoul, Korea
| | - Susan Wijnhoven
- National Institute of Public Health and the Environment, Laboratory of Toxicology, Pathology and Genetics, Bilthoven, the Netherlands
| | - Harry van Steeg
- National Institute of Public Health and the Environment, Laboratory of Toxicology, Pathology and Genetics, Bilthoven, the Netherlands
| | - James Mitchell
- MGC-Department of Cell Biology and Genetics, Erasmus University Medical Center, Rotterdam, the Netherlands
| | | | - Jan Hoeijmakers
- MGC-Department of Cell Biology and Genetics, Erasmus University Medical Center, Rotterdam, the Netherlands
| | - Pinchas Cohen
- Pediatric Endocrinology, University of California Los Angeles, Los Angeles, California, United States of America
| | - Jan Vijg
- Buck Institute for Age Research, Novato, California, United States of America
| | - Yousin Suh
- Department of Medicine, Albert Einstein College of Medicine, Bronx, New York, United States of America
- Department of Molecular Genetics, Albert Einstein College of Medicine, Bronx, New York, United States of America
- * E-mail:
| |
Collapse
|
46
|
Lehner B, Lee I. Network-guided genetic screening: building, testing and using gene networks to predict gene function. BRIEFINGS IN FUNCTIONAL GENOMICS AND PROTEOMICS 2008; 7:217-27. [PMID: 18445637 DOI: 10.1093/bfgp/eln020] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
A challenge facing nearly all biologists is to identify the complete set of genes that are important for a process or disease. This applies to scientists investigating fundamental pathways in model organisms, but also to clinicians trying to understand human disease. There are many different types of experimental data that can be used to predict the genes that are important for a process, but these data are normally dispersed across numerous publications and databases, and are of varying and unknown quality. Integrated functional gene networks aim to gather functional information from all of these data into a single intuitive graph model that can be used to predict gene functions. In this approach, the ability of each data set to predict functional associations between genes is first measured using a standard benchmark, and then the scored predictions by each data set are combined. The resulting integrated probabilistic gene network can be used by all researchers to predict gene function, with much greater coverage and accuracy than any individual data set. In this review, we discuss how such integrated gene networks are constructed, how their predictive power for gene function can be tested, and how experimental biologists can use these networks to guide their research. We pay particular attention to such networks constructed for Caenorhabditis elegans, because in this complex multicellular model system functional predictions for genes can be rapidly tested in vivo using RNAi. The approach is, however, widely applicable to any system, and might soon be a common method used to dissect the genetics of human complex diseases.
Collapse
Affiliation(s)
- Ben Lehner
- EMBL-CRG Systems Biology Research Unit and Institució Catalana de Recerca i Estudis Avançats (ICREA), Centre for Genomic Regulation, UPF, Barcelona 08003, Spain.
| | | |
Collapse
|
47
|
Friedlander MJ. Lifespan longitudinal multitasking by cortical neurons. FUTURE NEUROLOGY 2008. [DOI: 10.2217/14796708.3.2.117] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
The large number of neurons (1011) and synapses (1014) in the mammalian brain provides a rich anatomical substrate for information processing. Many neurons perform very specialized functions, such as detecting or processing sensory stimuli, relaying or amplifying attributes of an afferent input to another brain area or making decisions to convert inputs into action. Some cell types, including the early-generated subplate cells of the developing cerebral cortex, play a special role during a restricted period of early brain development, acting transiently as scaffolds for the formation of thalamocortical and corticothalamic connections. However, many of these neurons (10–20%) survive elimination and become functionally integrated into the mature cortical circuitry. Thus, a single neuron type can perform different functions in the brain at different periods of life, potentially increasing the combinatorial capacity of the functional cellular architecture of the brain over the lifespan.
Collapse
Affiliation(s)
- Michael J Friedlander
- Baylor College of Medicine, Department of Neuroscience, Director of Neuroscience Initiatives, One Baylor Plaza, Suite S740A, Houston, TX 77030, USA
| |
Collapse
|
48
|
Xu H, Xu H, Lin M, Wang W, Li Z, Huang J, Chen Y, Chen X. Learning the drug target-likeness of a protein. Proteomics 2007; 7:4255-63. [DOI: 10.1002/pmic.200700062] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
|
49
|
Lee I, Li Z, Marcotte EM. An improved, bias-reduced probabilistic functional gene network of baker's yeast, Saccharomyces cerevisiae. PLoS One 2007; 2:e988. [PMID: 17912365 PMCID: PMC1991590 DOI: 10.1371/journal.pone.0000988] [Citation(s) in RCA: 162] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2007] [Accepted: 09/10/2007] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Probabilistic functional gene networks are powerful theoretical frameworks for integrating heterogeneous functional genomics and proteomics data into objective models of cellular systems. Such networks provide syntheses of millions of discrete experimental observations, spanning DNA microarray experiments, physical protein interactions, genetic interactions, and comparative genomics; the resulting networks can then be easily applied to generate testable hypotheses regarding specific gene functions and associations. METHODOLOGY/PRINCIPAL FINDINGS We report a significantly improved version (v. 2) of a probabilistic functional gene network of the baker's yeast, Saccharomyces cerevisiae. We describe our optimization methods and illustrate their effects in three major areas: the reduction of functional bias in network training reference sets, the application of a probabilistic model for calculating confidences in pair-wise protein physical or genetic interactions, and the introduction of simple thresholds that eliminate many false positive mRNA co-expression relationships. Using the network, we predict and experimentally verify the function of the yeast RNA binding protein Puf6 in 60S ribosomal subunit biogenesis. CONCLUSIONS/SIGNIFICANCE YeastNet v. 2, constructed using these optimizations together with additional data, shows significant reduction in bias and improvements in precision and recall, in total covering 102,803 linkages among 5,483 yeast proteins (95% of the validated proteome). YeastNet is available from http://www.yeastnet.org.
Collapse
Affiliation(s)
- Insuk Lee
- Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas at Austin, Austin, Texas, United States of America
| | - Zhihua Li
- Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas at Austin, Austin, Texas, United States of America
| | - Edward M. Marcotte
- Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas at Austin, Austin, Texas, United States of America
- Department of Chemistry and Biochemistry, Institute for Cellular and Molecular Biology, University of Texas at Austin, Austin, Texas, United States of America
- * To whom correspondence should be addressed. E-mail:
| |
Collapse
|
50
|
McGarry K, Chambers J, Oatley G. A multi-layered approach to protein data integration for diabetes research. Artif Intell Med 2007; 41:129-43. [PMID: 17869073 DOI: 10.1016/j.artmed.2007.07.009] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2006] [Revised: 07/26/2007] [Accepted: 07/26/2007] [Indexed: 01/15/2023]
Abstract
OBJECTIVE Recent advances in high-throughput experimental techniques have enabled many protein-protein interactions to be identified and stored in large databases. Understanding protein interactions is fundamental to the advancement of science and medical knowledge, unfortunately the scale of the requires an automated approach to analysis. We describe our graph-mining techniques to identify important structures within protein-protein interaction networks to aid in human comprehension and computerised analysis. METHODS AND MATERIALS We describe our techniques for characterizing graph type and associated properties which is constructed from data collated from the Human Protein Reference Database. Using random graph rewiring comparative techniques and cross-validation with other identification methods a further analysis of the identified essential proteins is presented to illustrate the accuracy of these measures. We argue for using techniques based upon graph structure for separating and encapsulating proteins based upon functionality. RESULTS We demonstrate how rational Erdos numbers may be used as a method to identify collaborating proteins based solely upon network structure. Further, by using dynamic cut-off limit it demonstrates how collaboration subgraphs can be generated for each protein within the network, and how graph containment can be used as a means of identifying which of many possible graphs are likely to be actual protein complexes. The demonstration protein interaction network built for diabetes is found to be a scale-free, small-world graph with a power-law degree distribution of interactions on nodes. These findings are consistent with many other protein interaction networks.
Collapse
Affiliation(s)
- Ken McGarry
- School of Pharmacy, University of Sunderland, Wharncliffe Street, Sunderland SR1 3SD, UK.
| | | | | |
Collapse
|