1
|
Zhang G, Chen Y, Yan C, Wang J, Liang W, Luo J, Luo H. MPASL: multi-perspective learning knowledge graph attention network for synthetic lethality prediction in human cancer. Front Pharmacol 2024; 15:1398231. [PMID: 38835667 PMCID: PMC11148462 DOI: 10.3389/fphar.2024.1398231] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2024] [Accepted: 04/26/2024] [Indexed: 06/06/2024] Open
Abstract
Synthetic lethality (SL) is widely used to discover the anti-cancer drug targets. However, the identification of SL interactions through wet experiments is costly and inefficient. Hence, the development of efficient and high-accuracy computational methods for SL interactions prediction is of great significance. In this study, we propose MPASL, a multi-perspective learning knowledge graph attention network to enhance synthetic lethality prediction. MPASL utilizes knowledge graph hierarchy propagation to explore multi-source neighbor nodes related to genes. The knowledge graph ripple propagation expands gene representations through existing gene SL preference sets. MPASL can learn the gene representations from both gene-entity perspective and entity-entity perspective. Specifically, based on the aggregation method, we learn to obtain gene-oriented entity embeddings. Then, the gene representations are refined by comparing the various layer-wise neighborhood features of entities using the discrepancy contrastive technique. Finally, the learned gene representation is applied in SL prediction. Experimental results demonstrated that MPASL outperforms several state-of-the-art methods. Additionally, case studies have validated the effectiveness of MPASL in identifying SL interactions between genes.
Collapse
Affiliation(s)
- Ge Zhang
- School of Computer and Information Engineering, Henan University, Kaifeng, Henan, China
- Henan Key Laboratory of Big Data Analysis and Processing, Henan University, Kaifeng, Henan, China
| | - Yitong Chen
- School of Computer and Information Engineering, Henan University, Kaifeng, Henan, China
- Henan Key Laboratory of Big Data Analysis and Processing, Henan University, Kaifeng, Henan, China
| | - Chaokun Yan
- School of Computer and Information Engineering, Henan University, Kaifeng, Henan, China
- Henan Key Laboratory of Big Data Analysis and Processing, Henan University, Kaifeng, Henan, China
| | - Jianlin Wang
- School of Computer and Information Engineering, Henan University, Kaifeng, Henan, China
- Henan Key Laboratory of Big Data Analysis and Processing, Henan University, Kaifeng, Henan, China
| | - Wenjuan Liang
- School of Computer and Information Engineering, Henan University, Kaifeng, Henan, China
- Henan Key Laboratory of Big Data Analysis and Processing, Henan University, Kaifeng, Henan, China
| | - Junwei Luo
- College of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, Henan, China
| | - Huimin Luo
- School of Computer and Information Engineering, Henan University, Kaifeng, Henan, China
- Henan Key Laboratory of Big Data Analysis and Processing, Henan University, Kaifeng, Henan, China
| |
Collapse
|
2
|
Fan K, Tang S, Gökbağ B, Cheng L, Li L. Multi-view graph convolutional network for cancer cell-specific synthetic lethality prediction. Front Genet 2023; 13:1103092. [PMID: 36699450 PMCID: PMC9868610 DOI: 10.3389/fgene.2022.1103092] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2022] [Accepted: 12/22/2022] [Indexed: 01/11/2023] Open
Abstract
Synthetic lethal (SL) genetic interactions have been regarded as a promising focus for investigating potential targeted therapeutics to tackle cancer. However, the costly investment of time and labor associated with wet-lab experimental screenings to discover potential SL relationships motivates the development of computational methods. Although graph neural network (GNN) models have performed well in the prediction of SL gene pairs, existing GNN-based models are not designed for predicting cancer cell-specific SL interactions that are more relevant to experimental validation in vitro. Besides, neither have existing methods fully utilized diverse graph representations of biological features to improve prediction performance. In this work, we propose MVGCN-iSL, a novel multi-view graph convolutional network (GCN) model to predict cancer cell-specific SL gene pairs, by incorporating five biological graph features and multi-omics data. Max pooling operation is applied to integrate five graph-specific representations obtained from GCN models. Afterwards, a deep neural network (DNN) model serves as the prediction module to predict the SL interactions in individual cancer cells (iSL). Extensive experiments have validated the model's successful integration of the multiple graph features and state-of-the-art performance in the prediction of potential SL gene pairs as well as generalization ability to novel genes.
Collapse
Affiliation(s)
- Kunjie Fan
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, United States
| | - Shan Tang
- College of Pharmacy, The Ohio State University, Columbus, OH, United States
| | - Birkan Gökbağ
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, United States
| | - Lijun Cheng
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, United States
| | - Lang Li
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, United States,College of Pharmacy, The Ohio State University, Columbus, OH, United States,*Correspondence: Lang Li,
| |
Collapse
|
3
|
Tang S, Gökbağ B, Fan K, Shao S, Huo Y, Wu X, Cheng L, Li L. Synthetic lethal gene pairs: Experimental approaches and predictive models. Front Genet 2022; 13:961611. [PMID: 36531238 PMCID: PMC9751344 DOI: 10.3389/fgene.2022.961611] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2022] [Accepted: 11/07/2022] [Indexed: 03/27/2024] Open
Abstract
Synthetic lethality (SL) refers to a genetic interaction in which the simultaneous perturbation of two genes leads to cell or organism death, whereas viability is maintained when only one of the pair is altered. The experimental exploration of these pairs and predictive modeling in computational biology contribute to our understanding of cancer biology and the development of cancer therapies. We extensively reviewed experimental technologies, public data sources, and predictive models in the study of synthetic lethal gene pairs and herein detail biological assumptions, experimental data, statistical models, and computational schemes of various predictive models, speculate regarding their influence on individual sample- and population-based synthetic lethal interactions, discuss the pros and cons of existing SL data and models, and highlight potential research directions in SL discovery.
Collapse
Affiliation(s)
- Shan Tang
- College of Pharmacy, The Ohio State University, Columbus, OH, United States
| | - Birkan Gökbağ
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, United States
| | - Kunjie Fan
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, United States
| | - Shuai Shao
- College of Pharmacy, The Ohio State University, Columbus, OH, United States
| | - Yang Huo
- Indiana University, Bloomington, IN, United States
| | - Xue Wu
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, United States
| | - Lijun Cheng
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, United States
| | - Lang Li
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, United States
| |
Collapse
|
4
|
Wang J, Zhang Q, Han J, Zhao Y, Zhao C, Yan B, Dai C, Wu L, Wen Y, Zhang Y, Leng D, Wang Z, Yang X, He S, Bo X. Computational methods, databases and tools for synthetic lethality prediction. Brief Bioinform 2022; 23:6555403. [PMID: 35352098 PMCID: PMC9116379 DOI: 10.1093/bib/bbac106] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2021] [Revised: 02/15/2022] [Accepted: 03/02/2022] [Indexed: 12/17/2022] Open
Abstract
Synthetic lethality (SL) occurs between two genes when the inactivation of either gene alone has no effect on cell survival but the inactivation of both genes results in cell death. SL-based therapy has become one of the most promising targeted cancer therapies in the last decade as PARP inhibitors achieve great success in the clinic. The key point to exploiting SL-based cancer therapy is the identification of robust SL pairs. Although many wet-lab-based methods have been developed to screen SL pairs, known SL pairs are less than 0.1% of all potential pairs due to large number of human gene combinations. Computational prediction methods complement wet-lab-based methods to effectively reduce the search space of SL pairs. In this paper, we review the recent applications of computational methods and commonly used databases for SL prediction. First, we introduce the concept of SL and its screening methods. Second, various SL-related data resources are summarized. Then, computational methods including statistical-based methods, network-based methods, classical machine learning methods and deep learning methods for SL prediction are summarized. In particular, we elaborate on the negative sampling methods applied in these models. Next, representative tools for SL prediction are introduced. Finally, the challenges and future work for SL prediction are discussed.
Collapse
Affiliation(s)
- Jing Wang
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Qinglong Zhang
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Junshan Han
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Yanpeng Zhao
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Caiyun Zhao
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Bowei Yan
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Chong Dai
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Lianlian Wu
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Yuqi Wen
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Yixin Zhang
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Dongjin Leng
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Zhongming Wang
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Xiaoxi Yang
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Song He
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Xiaochen Bo
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| |
Collapse
|
5
|
Long Y, Wu M, Liu Y, Zheng J, Kwoh CK, Luo J, Li X. Graph contextualized attention network for predicting synthetic lethality in human cancers. Bioinformatics 2021; 37:2432-2440. [PMID: 33609108 DOI: 10.1093/bioinformatics/btab110] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2020] [Revised: 02/09/2021] [Accepted: 02/16/2021] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Synthetic Lethality (SL) plays an increasingly critical role in the targeted anticancer therapeutics. In addition, identifying SL interactions can create opportunities to selectively kill cancer cells without harming normal cells. Given the high cost of wet-lab experiments, in silico prediction of SL interactions as an alternative can be a rapid and cost-effective way to guide the experimental screening of candidate SL pairs. Several matrix factorization-based methods have recently been proposed for human SL prediction. However, they are limited in capturing the dependencies of neighbors. In addition, it is also highly challenging to make accurate predictions for new genes without any known SL partners. RESULTS In this work, we propose a novel graph contextualized attention network named GCATSL to learn gene representations for SL prediction. First, we leverage different data sources to construct multiple feature graphs for genes, which serve as the feature inputs for our GCATSL method. Second, for each feature graph, we design node-level attention mechanism to effectively capture the importance of local and global neighbors and learn local and global representations for the nodes, respectively. We further exploit multi-layer perceptron (MLP) to aggregate the original features with the local and global representations and then derive the feature-specific representations. Third, to derive the final representations, we design feature-level attention to integrate feature-specific representations by taking the importance of different feature graphs into account. Extensive experimental results on three datasets under different settings demonstrated that our GCATSL model outperforms 14 state-of-the-art methods consistently. In addition, case studies further validated the effectiveness of our proposed model in identifying novel SL pairs. AVAILABILITY Python codes and dataset are freely available on GitHub (https://github.com/longyahui/GCATSL) and Zenodo (https://zenodo.org/record/4522679) under the MIT license.
Collapse
Affiliation(s)
- Yahui Long
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410000, China.,School of Computer Science and Engineering, Nanyang Technological University, Singapore, 639798, Singapore
| | - Min Wu
- Institute for Infocomm Research, Agency for Science, Technology and Research (A*STAR), 138632, Singapore
| | - Yong Liu
- Joint NTU-UBC Research Centre of Excellence in Active Living for the Elderly, Nanyang Technological University, 639798, Singapore
| | - Jie Zheng
- School of Information Science and Technology, ShanghaiTech University, Shanghai, 201210, China
| | - Chee Keong Kwoh
- School of Computer Science and Engineering, Nanyang Technological University, Singapore, 639798, Singapore
| | - Jiawei Luo
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410000, China
| | - Xiaoli Li
- Institute for Infocomm Research, Agency for Science, Technology and Research (A*STAR), 138632, Singapore
| |
Collapse
|
6
|
Mondal P, Sadhukhan AK, Ganguly A, Gupta P. Optimization of process parameters for bio-enzymatic and enzymatic saccharification of waste broken rice for ethanol production using response surface methodology and artificial neural network-genetic algorithm. 3 Biotech 2021; 11:28. [PMID: 33442526 DOI: 10.1007/s13205-020-02553-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2019] [Accepted: 11/12/2020] [Indexed: 12/13/2022] Open
Abstract
Reducible sugar solution has been produced from waste broken rice by a novel saccharification process using a combination of bio-enzyme (bakhar) and commercial enzyme (α-amylase). The reducible sugar solution thus produced is a promising raw material for the production of bioethanol using the fermentation process. Response surface methodology (RSM) and Artificial neural network-genetic algorithm (ANN-GA) have been used separately to optimize the multivariable process parameters for maximum yield of the total reducing sugar (TRS) in saccharification process. The maximum yield (0.704 g/g) of TRS is predicted by the ANN-GA model at a temperature of 93 °C, saccharification time of 250 min, 6.5 pH and 1.25 mL/kg of enzyme dosages, while the RSM predicts the maximum yield of 0.7025 g/g at a little different process conditions. The fresh experimental validation of the said model predictions by ANN-GA and RSM is found to be satisfactory with the relative mean error of 2.4% and 3.8% and coefficients of determination of 0.997 and 0.996.
Collapse
Affiliation(s)
- Payel Mondal
- Chemical Engineering Department, National Institute of Technology, Durgapur, 713209 India
| | - Anup Kumar Sadhukhan
- Chemical Engineering Department, National Institute of Technology, Durgapur, 713209 India
| | - Amit Ganguly
- CSIR-Central Mechanical Engineering Research Institute, Durgapur, 713209 India
| | - Parthapratim Gupta
- Chemical Engineering Department, National Institute of Technology, Durgapur, 713209 India
| |
Collapse
|
7
|
Fan J, Li XC, Crovella M, Leiserson MDM. Matrix (factorization) reloaded: flexible methods for imputing genetic interactions with cross-species and side information. Bioinformatics 2020; 36:i866-i874. [PMID: 33381837 DOI: 10.1093/bioinformatics/btaa818] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/09/2020] [Indexed: 01/02/2023] Open
Abstract
MOTIVATION Mapping genetic interactions (GIs) can reveal important insights into cellular function and has potential translational applications. There has been great progress in developing high-throughput experimental systems for measuring GIs (e.g. with double knockouts) as well as in defining computational methods for inferring (imputing) unknown interactions. However, existing computational methods for imputation have largely been developed for and applied in baker's yeast, even as experimental systems have begun to allow measurements in other contexts. Importantly, existing methods face a number of limitations in requiring specific side information and with respect to computational cost. Further, few have addressed how GIs can be imputed when data are scarce. RESULTS In this article, we address these limitations by presenting a new imputation framework, called Extensible Matrix Factorization (EMF). EMF is a framework of composable models that flexibly exploit cross-species information in the form of GI data across multiple species, and arbitrary side information in the form of kernels (e.g. from protein-protein interaction networks). We perform a rigorous set of experiments on these models in matched GI datasets from baker's and fission yeast. These include the first such experiments on genome-scale GI datasets in multiple species in the same study. We find that EMF models that exploit side and cross-species information improve imputation, especially in data-scarce settings. Further, we show that EMF outperforms the state-of-the-art deep learning method, even when using strictly less data, and incurs orders of magnitude less computational cost. AVAILABILITY Implementations of models and experiments are available at: https://github.com/lrgr/EMF. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jason Fan
- Department of Computer Science and Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742
| | - Xuan Cindy Li
- Program in Computational Biology, Bioinformatics, and Genomics, University of Maryland, College Park, MD 20742, USA
| | - Mark Crovella
- Department of Computer Science, Boston University, MA, 02215, USA
| | - Mark D M Leiserson
- Department of Computer Science and Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742
| |
Collapse
|
8
|
Liany H, Jeyasekharan A, Rajan V. Predicting synthetic lethal interactions using heterogeneous data sources. Bioinformatics 2020; 36:2209-2216. [PMID: 31782759 DOI: 10.1093/bioinformatics/btz893] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2019] [Revised: 10/31/2019] [Accepted: 11/27/2019] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION A synthetic lethal (SL) interaction is a relationship between two functional entities where the loss of either one of the entities is viable but the loss of both entities is lethal to the cell. Such pairs can be used as drug targets in targeted anticancer therapies, and so, many methods have been developed to identify potential candidate SL pairs. However, these methods use only a subset of available data from multiple platforms, at genomic, epigenomic and transcriptomic levels; and hence are limited in their ability to learn from complex associations in heterogeneous data sources. RESULTS In this article, we develop techniques that can seamlessly integrate multiple heterogeneous data sources to predict SL interactions. Our approach obtains latent representations by collective matrix factorization-based techniques, which in turn are used for prediction through matrix completion. Our experiments, on a variety of biological datasets, illustrate the efficacy and versatility of our approach, that outperforms state-of-the-art methods for predicting SL interactions and can be used with heterogeneous data sources with minimal feature engineering. AVAILABILITY AND IMPLEMENTATION Software available at https://github.com/lianyh. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Herty Liany
- Department of Computer Science, School of Computing, National University of Singapore, Singapore, Singapore
| | - Anand Jeyasekharan
- Department of Medicine, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| | - Vaibhav Rajan
- Department of Information Systems and Analytics, School of Computing, National University of Singapore, Singapore, Singapore
| |
Collapse
|
9
|
G2G: A web-server for the prediction of human synthetic lethal interactions. Comput Struct Biotechnol J 2020; 18:1028-1031. [PMID: 32419903 PMCID: PMC7215103 DOI: 10.1016/j.csbj.2020.04.012] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2020] [Revised: 04/18/2020] [Accepted: 04/19/2020] [Indexed: 12/04/2022] Open
Abstract
Genetic interactions (GIs) are fundamental to our understanding of biological processes in the cell. While GIs have been systematically mapped in yeast, there is scarce information about them in humans. Recently, we have suggested a state-of-the-art hierarchical method that leverages gene ontology information for predicting GIs in yeast. Here, we adapt this method and apply it for the first time to predict GIs in human. We introduce a web service called G2G for this task that is available at http://bnet.cs.tau.ac.il/g2g/.
Collapse
|
10
|
Liu Y, Wu M, Liu C, Li XL, Zheng J. SL 2MF: Predicting Synthetic Lethality in Human Cancers via Logistic Matrix Factorization. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:748-757. [PMID: 30969932 DOI: 10.1109/tcbb.2019.2909908] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Synthetic lethality (SL) is a promising concept for novel discovery of anti-cancer drug targets. However, wet-lab experiments for detecting SLs are faced with various challenges, such as high cost, low consistency across platforms, or cell lines. Therefore, computational prediction methods are needed to address these issues. This paper proposes a novel SL prediction method, named SL2 MF, which employs logistic matrix factorization to learn latent representations of genes from the observed SL data. The probability that two genes are likely to form SL is modeled by the linear combination of gene latent vectors. As known SL pairs are more trustworthy than unknown pairs, we design importance weighting schemes to assign higher importance weights for known SL pairs and lower importance weights for unknown pairs in SL2 MF. Moreover, we also incorporate biological knowledge about genes from protein-protein interaction (PPI) data and Gene Ontology (GO). In particular, we calculate the similarity between genes based on their GO annotations and topological properties in the PPI network. Extensive experiments on the SL interaction data from SynLethDB database have been conducted to demonstrate the effectiveness of SL2 MF.
Collapse
|
11
|
Wan F, Li S, Tian T, Lei Y, Zhao D, Zeng J. EXP2SL: A Machine Learning Framework for Cell-Line-Specific Synthetic Lethality Prediction. Front Pharmacol 2020; 11:112. [PMID: 32184722 PMCID: PMC7058988 DOI: 10.3389/fphar.2020.00112] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2019] [Accepted: 01/28/2020] [Indexed: 12/13/2022] Open
Abstract
Synthetic lethality (SL), an important type of genetic interaction, can provide useful insight into the target identification process for the development of anticancer therapeutics. Although several well-established SL gene pairs have been verified to be conserved in humans, most SL interactions remain cell-line specific. Here, we demonstrated that the cell-line-specific gene expression profiles derived from the shRNA perturbation experiments performed in the LINCS L1000 project can provide useful features for predicting SL interactions in human. In this paper, we developed a semi-supervised neural network-based method called EXP2SL to accurately identify SL interactions from the L1000 gene expression profiles. Through a systematic evaluation on the SL datasets of three different cell lines, we demonstrated that our model achieved better performance than the baseline methods and verified the effectiveness of using the L1000 gene expression features and the semi-supervise training technique in SL prediction.
Collapse
Affiliation(s)
- Fangping Wan
- Institute of Interdisciplinary Information Science, Tsinghua University, Beijing, China
| | - Shuya Li
- Institute of Interdisciplinary Information Science, Tsinghua University, Beijing, China
| | - Tingzhong Tian
- Institute of Interdisciplinary Information Science, Tsinghua University, Beijing, China
| | - Yipin Lei
- Machine Learning Department, Silexon AI Technology Co. Ltd., Nanjing, China
| | - Dan Zhao
- Institute of Interdisciplinary Information Science, Tsinghua University, Beijing, China
| | - Jianyang Zeng
- Institute of Interdisciplinary Information Science, Tsinghua University, Beijing, China
| |
Collapse
|
12
|
Panchy NL, Lloyd JP, Shiu SH. Improved recovery of cell-cycle gene expression in Saccharomyces cerevisiae from regulatory interactions in multiple omics data. BMC Genomics 2020; 21:159. [PMID: 32054475 PMCID: PMC7020519 DOI: 10.1186/s12864-020-6554-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2019] [Accepted: 02/04/2020] [Indexed: 12/11/2022] Open
Abstract
BACKGROUND Gene expression is regulated by DNA-binding transcription factors (TFs). Together with their target genes, these factors and their interactions collectively form a gene regulatory network (GRN), which is responsible for producing patterns of transcription, including cyclical processes such as genome replication and cell division. However, identifying how this network regulates the timing of these patterns, including important interactions and regulatory motifs, remains a challenging task. RESULTS We employed four in vivo and in vitro regulatory data sets to investigate the regulatory basis of expression timing and phase-specific patterns cell-cycle expression in Saccharomyces cerevisiae. Specifically, we considered interactions based on direct binding between TF and target gene, indirect effects of TF deletion on gene expression, and computational inference. We found that the source of regulatory information significantly impacts the accuracy and completeness of recovering known cell-cycle expressed genes. The best approach involved combining TF-target and TF-TF interactions features from multiple datasets in a single model. In addition, TFs important to multiple phases of cell-cycle expression also have the greatest impact on individual phases. Important TFs regulating a cell-cycle phase also tend to form modules in the GRN, including two sub-modules composed entirely of unannotated cell-cycle regulators (STE12-TEC1 and RAP1-HAP1-MSN4). CONCLUSION Our findings illustrate the importance of integrating both multiple omics data and regulatory motifs in order to understand the significance regulatory interactions involved in timing gene expression. This integrated approached allowed us to recover both known cell-cycles interactions and the overall pattern of phase-specific expression across the cell-cycle better than any single data set. Likewise, by looking at regulatory motifs in the form of TF-TF interactions, we identified sets of TFs whose co-regulation of target genes was important for cell-cycle expression, even when regulation by individual TFs was not. Overall, this demonstrates the power of integrating multiple data sets and models of interaction in order to understand the regulatory basis of established biological processes and their associated gene regulatory networks.
Collapse
Affiliation(s)
- Nicholas L Panchy
- Genetics Graduate Program, Michigan State University, East Lansing, MI, 48824, USA.,Present address: National Institute for Mathematical and Biological Synthesis, University of Tennessee, 1122 Volunteer Blvd., Suite 106, Knoxville, TN, 37996-3410, USA
| | - John P Lloyd
- Department of Human Genetics and Internal Medicine, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Shin-Han Shiu
- Genetics Graduate Program, Michigan State University, East Lansing, MI, 48824, USA. .,Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, MI, 48824, USA. .,Michigan State University, Plant Biology Laboratories, 612 Wilson Road, Room 166, East Lansing, MI, 48824-1312, USA.
| |
Collapse
|
13
|
Zitnik M, Nguyen F, Wang B, Leskovec J, Goldenberg A, Hoffman MM. Machine Learning for Integrating Data in Biology and Medicine: Principles, Practice, and Opportunities. AN INTERNATIONAL JOURNAL ON INFORMATION FUSION 2019; 50:71-91. [PMID: 30467459 PMCID: PMC6242341 DOI: 10.1016/j.inffus.2018.09.012] [Citation(s) in RCA: 215] [Impact Index Per Article: 43.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]
Abstract
New technologies have enabled the investigation of biology and human health at an unprecedented scale and in multiple dimensions. These dimensions include myriad properties describing genome, epigenome, transcriptome, microbiome, phenotype, and lifestyle. No single data type, however, can capture the complexity of all the factors relevant to understanding a phenomenon such as a disease. Integrative methods that combine data from multiple technologies have thus emerged as critical statistical and computational approaches. The key challenge in developing such approaches is the identification of effective models to provide a comprehensive and relevant systems view. An ideal method can answer a biological or medical question, identifying important features and predicting outcomes, by harnessing heterogeneous data across several dimensions of biological variation. In this Review, we describe the principles of data integration and discuss current methods and available implementations. We provide examples of successful data integration in biology and medicine. Finally, we discuss current challenges in biomedical integrative methods and our perspective on the future development of the field.
Collapse
Affiliation(s)
- Marinka Zitnik
- Department of Computer Science, Stanford University,
Stanford, CA, USA
| | - Francis Nguyen
- Department of Medical Biophysics, University of Toronto,
Toronto, ON, Canada
- Princess Margaret Cancer Centre, Toronto, ON, Canada
| | - Bo Wang
- Hikvision Research Institute, Santa Clara, CA, USA
| | - Jure Leskovec
- Department of Computer Science, Stanford University,
Stanford, CA, USA
- Chan Zuckerberg Biohub, San Francisco, CA, USA
| | - Anna Goldenberg
- Genetics & Genome Biology, SickKids Research Institute,
Toronto, ON, Canada
- Department of Computer Science, University of Toronto,
Toronto, ON, Canada
- Vector Institute, Toronto, ON, Canada
| | - Michael M. Hoffman
- Department of Medical Biophysics, University of Toronto,
Toronto, ON, Canada
- Princess Margaret Cancer Centre, Toronto, ON, Canada
- Department of Computer Science, University of Toronto,
Toronto, ON, Canada
- Vector Institute, Toronto, ON, Canada
| |
Collapse
|
14
|
HRGPred: Prediction of herbicide resistant genes with k-mer nucleotide compositional features and support vector machine. Sci Rep 2019; 9:778. [PMID: 30692561 PMCID: PMC6349872 DOI: 10.1038/s41598-018-37309-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2018] [Accepted: 12/03/2018] [Indexed: 02/07/2023] Open
Abstract
Herbicide resistance (HR) is a major concern for the agricultural producers as well as environmentalists. Resistance to commonly used herbicides are conferred due to mutation(s) in the genes encoding herbicide target sites/proteins (GETS). Identification of these genes through wet-lab experiments is time consuming and expensive. Thus, a supervised learning-based computational model has been proposed in this study, which is first of its kind for the prediction of seven classes of GETS. The cDNA sequences of the genes were initially transformed into numeric features based on the k-mer compositions and then supplied as input to the support vector machine. In the proposed SVM-based model, the prediction occurs in two stages, where a binary classifier in the first stage discriminates the genes involved in conferring the resistance to herbicides from other genes, followed by a multi-class classifier in the second stage that categorizes the predicted herbicide resistant genes in the first stage into any one of the seven resistant classes. Overall classification accuracies were observed to be ~89% and >97% for binary and multi-class classifications respectively. The proposed model confirmed higher accuracy than the homology-based algorithms viz., BLAST and Hidden Markov Model. Besides, the developed computational model achieved ~87% accuracy, while tested with an independent dataset. An online prediction server HRGPred (http://cabgrid.res.in:8080/hrgpred) has also been established to facilitate the prediction of GETS by the scientific community.
Collapse
|
15
|
Jiao X, Li Z, Wang M, Katiyar S, Di Sante G, Farshchian M, South AP, Cocola C, Colombo D, Reinbold R, Zucchi I, Wu K, Tabas I, Spike BT, Pestell RG. Dachshund Depletion Disrupts Mammary Gland Development and Diverts the Composition of the Mammary Gland Progenitor Pool. Stem Cell Reports 2018; 12:135-151. [PMID: 30554919 PMCID: PMC6335505 DOI: 10.1016/j.stemcr.2018.11.010] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2018] [Revised: 11/14/2018] [Accepted: 11/14/2018] [Indexed: 12/31/2022] Open
Abstract
DACH1 abundance is reduced in human malignancies, including breast cancer. Herein DACH1 was detected among multipotent fetal mammary stem cells in the embryo, among mixed lineage precursors, and in adult basal cells and (ERα+) luminal progenitors. Dach1 gene deletion at 6 weeks in transgenic mice reduced ductal branching, reduced the proportion of mammary basal cells (Lin− CD24med CD29high) and reduced abundance of basal cytokeratin 5, whereas DACH1 overexpression induced ductal branching, increased Gata3 and Notch1, and expanded mammosphere formation in LA-7 breast cells. Mammary gland-transforming growth factor β (TGF-β) activity, known to reduce ductal branching and to reduce the basal cell population, increased upon Dach1 deletion, associated with increased SMAD phosphorylation. Association of the scaffold protein Smad anchor for receptor activation with Smad2/3, which facilitates TGF-β activation, was reduced by endogenous DACH1. DACH1 increases basal cells, enhances ductal formation and restrains TGF-β activity in vivo. Dach1 is expressed in mammary gland fetal stem cells and adult luminal cells Dach1 expands mammary gland basal/myoepithelial cells Dach1 induces post-natal mammary gland ductal formation Dach1 retrains TGF-β activity in the mammary gland in vivo
Collapse
Affiliation(s)
- Xuanmao Jiao
- Pennsylvania Cancer and Regenerative Medicine Research Center, Baruch S. Blumberg Institute, 3805 Old Easton Road, Doylestown, PA 18902, USA
| | - Zhiping Li
- Pennsylvania Cancer and Regenerative Medicine Research Center, Baruch S. Blumberg Institute, 3805 Old Easton Road, Doylestown, PA 18902, USA
| | - Min Wang
- Pennsylvania Cancer and Regenerative Medicine Research Center, Baruch S. Blumberg Institute, 3805 Old Easton Road, Doylestown, PA 18902, USA
| | - Sanjay Katiyar
- Department of Cancer Biology, Thomas Jefferson University, Bluemle Life Sciences Building, 233 South 10(th) Street, Philadelphia, PA 19107, USA
| | - Gabriele Di Sante
- Pennsylvania Cancer and Regenerative Medicine Research Center, Baruch S. Blumberg Institute, 3805 Old Easton Road, Doylestown, PA 18902, USA
| | - Mehdi Farshchian
- Department of Dermatology and Cutaneous Biology, Thomas Jefferson University, Bluemle Life Sciences Building, 233 South 10(th) Street, Philadelphia, PA 19107, USA
| | - Andrew P South
- Department of Dermatology and Cutaneous Biology, Thomas Jefferson University, Bluemle Life Sciences Building, 233 South 10(th) Street, Philadelphia, PA 19107, USA
| | - Cinzia Cocola
- Istituto Tecnologie Biomediche, Consiglio Nazionale Delle Ricerche, Via Cervi 93, Segrate, 20090 Milano, Italy
| | - Daniele Colombo
- Istituto Tecnologie Biomediche, Consiglio Nazionale Delle Ricerche, Via Cervi 93, Segrate, 20090 Milano, Italy
| | - Rolland Reinbold
- Istituto Tecnologie Biomediche, Consiglio Nazionale Delle Ricerche, Via Cervi 93, Segrate, 20090 Milano, Italy
| | - Ileana Zucchi
- Istituto Tecnologie Biomediche, Consiglio Nazionale Delle Ricerche, Via Cervi 93, Segrate, 20090 Milano, Italy
| | - Kongming Wu
- Department of Oncology, Tongji Hospital of Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, P.R. China
| | - Ira Tabas
- Department of Medicine, Columbia University Irving Medical Center, New York, NY 10032, USA; Department of Pathology and Cell Biology, Columbia University Irving Medical Center, New York, NY 10032, USA; Department of Physiology and Cellular Biophysics, Columbia University Irving Medical Center, New York, NY 10032, USA
| | - Benjamin T Spike
- Huntsman Cancer Institute, Department of Oncological Sciences, University of Utah, 2000 Circle of Hope, Room 2505, Salt Lake City, UT 84112, USA
| | - Richard G Pestell
- Pennsylvania Cancer and Regenerative Medicine Research Center, Baruch S. Blumberg Institute, 3805 Old Easton Road, Doylestown, PA 18902, USA; Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore 637551, Singapore.
| |
Collapse
|
16
|
El Naqa I, Pandey G, Aerts H, Chien JT, Andreassen CN, Niemierko A, Ten Haken RK. Radiation Therapy Outcomes Models in the Era of Radiomics and Radiogenomics: Uncertainties and Validation. Int J Radiat Oncol Biol Phys 2018; 102:1070-1073. [PMID: 30353869 DOI: 10.1016/j.ijrobp.2018.08.022] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2018] [Revised: 08/08/2018] [Accepted: 08/12/2018] [Indexed: 01/24/2023]
Affiliation(s)
- Issam El Naqa
- Department of Radiation Oncology, University of Michigan, Ann Arbor, Michigan.
| | - Gaurav Pandey
- Icahn Institute for Genomics and Multiscale Biology and Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York
| | - Hugo Aerts
- Department of Radiation Oncology, Dana-Farber Cancer Institute, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts; Department of Radiology, Dana-Farber Cancer Institute, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts
| | - Jen-Tzung Chien
- Department of Electrical and Computer Engineering, National Chiao Tung University, Hsinchu, Taiwan; Department of Computer Science, National Chiao Tung University, Hsinchu, Taiwan
| | | | - Andrzej Niemierko
- Department of Radiation Oncology, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts
| | - Randall K Ten Haken
- Department of Radiation Oncology, University of Michigan, Ann Arbor, Michigan
| |
Collapse
|
17
|
Ma J, Yu MK, Fong S, Ono K, Sage E, Demchak B, Sharan R, Ideker T. Using deep learning to model the hierarchical structure and function of a cell. Nat Methods 2018; 15:290-298. [PMID: 29505029 PMCID: PMC5882547 DOI: 10.1038/nmeth.4627] [Citation(s) in RCA: 206] [Impact Index Per Article: 34.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2017] [Accepted: 02/07/2018] [Indexed: 01/20/2023]
Abstract
Although artificial neural networks simulate a variety of human functions, their internal structures are hard to interpret. In the life sciences, extensive knowledge of cell biology provides an opportunity to design visible neural networks (VNNs) which couple the model’s inner workings to those of real systems. Here we develop DCell, a VNN embedded in the hierarchical structure of 2526 subsystems comprising a eukaryotic cell (http://d-cell.ucsd.edu/). Trained on several million genotypes, DCell simulates cellular growth nearly as accurately as laboratory observations. During simulation, genotypes induce patterns of subsystem activities, enabling in-silico investigations of the molecular mechanisms underlying genotype-phenotype associations. These mechanisms can be validated and many are unexpected; some are governed by Boolean logic. Cumulatively, 80% of the importance for growth prediction is captured by 484 subsystems (21%), reflecting the emergence of a complex phenotype. DCell provides a foundation for decoding the genetics of disease, drug resistance, and synthetic life.
Collapse
Affiliation(s)
- Jianzhu Ma
- Department of Medicine, University of California San Diego, La Jolla, California, USA
| | - Michael Ku Yu
- Department of Medicine, University of California San Diego, La Jolla, California, USA.,Program in Bioinformatics, University of California San Diego, La Jolla, California, USA
| | - Samson Fong
- Department of Medicine, University of California San Diego, La Jolla, California, USA.,Department of Bioengineering, University of California San Diego, La Jolla, California, USA
| | - Keiichiro Ono
- Department of Medicine, University of California San Diego, La Jolla, California, USA
| | - Eric Sage
- Department of Medicine, University of California San Diego, La Jolla, California, USA
| | - Barry Demchak
- Department of Medicine, University of California San Diego, La Jolla, California, USA
| | - Roded Sharan
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel
| | - Trey Ideker
- Department of Medicine, University of California San Diego, La Jolla, California, USA.,Program in Bioinformatics, University of California San Diego, La Jolla, California, USA.,Department of Bioengineering, University of California San Diego, La Jolla, California, USA
| |
Collapse
|
18
|
Stanescu A, Pandey G. LEARNING PARSIMONIOUS ENSEMBLES FOR UNBALANCED COMPUTATIONAL GENOMICS PROBLEMS. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017; 22:288-299. [PMID: 27896983 DOI: 10.1142/9789813207813_0028] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
Prediction problems in biomedical sciences are generally quite difficult, partially due to incomplete knowledge of how the phenomenon of interest is influenced by the variables and measurements used for prediction, as well as a lack of consensus regarding the ideal predictor(s) for specific problems. In these situations, a powerful approach to improving prediction performance is to construct ensembles that combine the outputs of many individual base predictors, which have been successful for many biomedical prediction tasks. Moreover, selecting a parsimonious ensemble can be of even greater value for biomedical sciences, where it is not only important to learn an accurate predictor, but also to interpret what novel knowledge it can provide about the target problem. Ensemble selection is a promising approach for this task because of its ability to select a collectively predictive subset, often a relatively small one, of all input base predictors. One of the most well-known algorithms for ensemble selection, CES (Caruana et al.'s Ensemble Selection), generally performs well in practice, but faces several challenges due to the difficulty of choosing the right values of its various parameters. Since the choices made for these parameters are usually ad-hoc, good performance of CES is difficult to guarantee for a variety of problems or datasets. To address these challenges with CES and other such algorithms, we propose a novel heterogeneous ensemble selection approach based on the paradigm of reinforcement learning (RL), which offers a more systematic and mathematically sound methodology for exploring the many possible combinations of base predictors that can be selected into an ensemble. We develop three RL-based strategies for constructing ensembles and analyze their results on two unbalanced computational genomics problems, namely the prediction of protein function and splice sites in eukaryotic genomes. We show that the resultant ensembles are indeed substantially more parsimonious as compared to the full set of base predictors, yet still offer almost the same classification power, especially for larger datasets. The RL ensembles also yield a better combination of parsimony and predictive performance as compared to CES.
Collapse
Affiliation(s)
- Ana Stanescu
- Icahn Institute for Genomics and Multiscale Biology and Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | | |
Collapse
|
19
|
Benstead-Hume G, Wooller SK, Pearl FMG. 'Big data' approaches for novel anti-cancer drug discovery. Expert Opin Drug Discov 2017; 12:599-609. [PMID: 28462602 DOI: 10.1080/17460441.2017.1319356] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
INTRODUCTION The development of improved cancer therapies is frequently cited as an urgent unmet medical need. Recent advances in platform technologies and the increasing availability of biological 'big data' are providing an unparalleled opportunity to systematically identify the key genes and pathways involved in tumorigenesis. The discoveries made using these new technologies may lead to novel therapeutic interventions. Areas covered: The authors discuss the current approaches that use 'big data' to identify cancer drivers. These approaches include the analysis of genomic sequencing data, pathway data, multi-platform data, identifying genetic interactions such as synthetic lethality and using cell line data. They review how big data is being used to identify novel drug targets. The authors then provide an overview of the available data repositories and tools being used at the forefront of cancer drug discovery. Expert opinion: Targeted therapies based on the genomic events driving the tumour will eventually inform treatment protocols. However, using a tailored approach to treat all tumour patients may require developing a large repertoire of targeted drugs.
Collapse
Affiliation(s)
- Graeme Benstead-Hume
- a Bioinformatics Group, School of Life Sciences , University of Sussex , Brighton , United Kingdom
| | - Sarah K Wooller
- a Bioinformatics Group, School of Life Sciences , University of Sussex , Brighton , United Kingdom
| | - Frances M G Pearl
- a Bioinformatics Group, School of Life Sciences , University of Sussex , Brighton , United Kingdom
| |
Collapse
|
20
|
Abstract
Characterizing genetic interactions is crucial to understanding cellular and organismal response to gene-level perturbations. Such knowledge can inform the selection of candidate disease therapy targets, yet experimentally determining whether genes interact is technically nontrivial and time-consuming. High-fidelity prediction of different classes of genetic interactions in multiple organisms would substantially alleviate this experimental burden. Under the hypothesis that functionally related genes tend to share common genetic interaction partners, we evaluate a computational approach to predict genetic interactions in Homo sapiens, Drosophila melanogaster, and Saccharomyces cerevisiae. By leveraging knowledge of functional relationships between genes, we cross-validate predictions on known genetic interactions and observe high predictive power of multiple classes of genetic interactions in all three organisms. Additionally, our method suggests high-confidence candidate interaction pairs that can be directly experimentally tested. A web application is provided for users to query genes for predicted novel genetic interaction partners. Finally, by subsampling the known yeast genetic interaction network, we found that novel genetic interactions are predictable even when knowledge of currently known interactions is minimal.
Collapse
|
21
|
Akerman I, Tu Z, Beucher A, Rolando DMY, Sauty-Colace C, Benazra M, Nakic N, Yang J, Wang H, Pasquali L, Moran I, Garcia-Hurtado J, Castro N, Gonzalez-Franco R, Stewart AF, Bonner C, Piemonti L, Berney T, Groop L, Kerr-Conte J, Pattou F, Argmann C, Schadt E, Ravassard P, Ferrer J. Human Pancreatic β Cell lncRNAs Control Cell-Specific Regulatory Networks. Cell Metab 2017; 25:400-411. [PMID: 28041957 PMCID: PMC5300904 DOI: 10.1016/j.cmet.2016.11.016] [Citation(s) in RCA: 166] [Impact Index Per Article: 23.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/05/2016] [Revised: 10/01/2016] [Accepted: 11/29/2016] [Indexed: 12/28/2022]
Abstract
Recent studies have uncovered thousands of long non-coding RNAs (lncRNAs) in human pancreatic β cells. β cell lncRNAs are often cell type specific and exhibit dynamic regulation during differentiation or upon changing glucose concentrations. Although these features hint at a role of lncRNAs in β cell gene regulation and diabetes, the function of β cell lncRNAs remains largely unknown. In this study, we investigated the function of β cell-specific lncRNAs and transcription factors using transcript knockdowns and co-expression network analysis. This revealed lncRNAs that function in concert with transcription factors to regulate β cell-specific transcriptional networks. We further demonstrate that the lncRNA PLUTO affects local 3D chromatin structure and transcription of PDX1, encoding a key β cell transcription factor, and that both PLUTO and PDX1 are downregulated in islets from donors with type 2 diabetes or impaired glucose tolerance. These results implicate lncRNAs in the regulation of β cell-specific transcription factor networks.
Collapse
Affiliation(s)
- Ildem Akerman
- Section of Epigenomics and Disease, Department of Medicine, Imperial College London, London W12 0NN, United Kingdom; Genomic Programming of Beta Cells Laboratory, Institut d'Investigacions Biomediques August Pi I Sunyer (IDIBAPS), Barcelona 08036, Spain; Centro de Investigación Biomédica en Red de Diabetes y Enfermedades Metabólicas Asociadas (CIBERDEM), Madrid 28029, Spain
| | - Zhidong Tu
- Department of Genetics and Genomic Science, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Anthony Beucher
- Section of Epigenomics and Disease, Department of Medicine, Imperial College London, London W12 0NN, United Kingdom
| | - Delphine M Y Rolando
- Section of Epigenomics and Disease, Department of Medicine, Imperial College London, London W12 0NN, United Kingdom
| | - Claire Sauty-Colace
- Sorbonne Universités, UPMC Univ Paris 06, INSERM, CNRS, Institut du cerveau et de la moelle (ICM) - Hôpital Pitié-Salpêtrière, Boulevard de l'Hôpital, Paris 75013, France
| | - Marion Benazra
- Sorbonne Universités, UPMC Univ Paris 06, INSERM, CNRS, Institut du cerveau et de la moelle (ICM) - Hôpital Pitié-Salpêtrière, Boulevard de l'Hôpital, Paris 75013, France
| | - Nikolina Nakic
- Section of Epigenomics and Disease, Department of Medicine, Imperial College London, London W12 0NN, United Kingdom
| | - Jialiang Yang
- Department of Genetics and Genomic Science, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Huan Wang
- Department of Genetics and Genomic Science, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Lorenzo Pasquali
- Centro de Investigación Biomédica en Red de Diabetes y Enfermedades Metabólicas Asociadas (CIBERDEM), Madrid 28029, Spain; Germans Trias i Pujol University Hospital and Research Institute and Josep Carreras Leukaemia Research Institute, Badalona 08916, Spain
| | - Ignasi Moran
- Section of Epigenomics and Disease, Department of Medicine, Imperial College London, London W12 0NN, United Kingdom
| | - Javier Garcia-Hurtado
- Genomic Programming of Beta Cells Laboratory, Institut d'Investigacions Biomediques August Pi I Sunyer (IDIBAPS), Barcelona 08036, Spain; Centro de Investigación Biomédica en Red de Diabetes y Enfermedades Metabólicas Asociadas (CIBERDEM), Madrid 28029, Spain
| | - Natalia Castro
- Genomic Programming of Beta Cells Laboratory, Institut d'Investigacions Biomediques August Pi I Sunyer (IDIBAPS), Barcelona 08036, Spain; Centro de Investigación Biomédica en Red de Diabetes y Enfermedades Metabólicas Asociadas (CIBERDEM), Madrid 28029, Spain
| | - Roser Gonzalez-Franco
- Section of Epigenomics and Disease, Department of Medicine, Imperial College London, London W12 0NN, United Kingdom
| | - Andrew F Stewart
- Diabetes, Obesity, and Metabolism Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Caroline Bonner
- European Genomic Institute for Diabetes, INSERM UMR 1190, Lille 59800, France
| | - Lorenzo Piemonti
- Diabetes Research Institute (HSR-DRI), San Raffaele Scientific Institute, Milano 20132, Italy
| | - Thierry Berney
- Cell Isolation and Transplantation Center, University of Geneva, 1211 Geneva 4, Switzerland
| | - Leif Groop
- Department of Clinical Sciences, Lund University Diabetes Centre, Lund University, Lund 20502, Sweden
| | - Julie Kerr-Conte
- European Genomic Institute for Diabetes, INSERM UMR 1190, Lille 59800, France
| | - Francois Pattou
- European Genomic Institute for Diabetes, INSERM UMR 1190, Lille 59800, France
| | - Carmen Argmann
- Department of Genetics and Genomic Science, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Eric Schadt
- Department of Genetics and Genomic Science, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Philippe Ravassard
- Sorbonne Universités, UPMC Univ Paris 06, INSERM, CNRS, Institut du cerveau et de la moelle (ICM) - Hôpital Pitié-Salpêtrière, Boulevard de l'Hôpital, Paris 75013, France
| | - Jorge Ferrer
- Section of Epigenomics and Disease, Department of Medicine, Imperial College London, London W12 0NN, United Kingdom; Genomic Programming of Beta Cells Laboratory, Institut d'Investigacions Biomediques August Pi I Sunyer (IDIBAPS), Barcelona 08036, Spain; Centro de Investigación Biomédica en Red de Diabetes y Enfermedades Metabólicas Asociadas (CIBERDEM), Madrid 28029, Spain.
| |
Collapse
|
22
|
Cho H, Berger B, Peng J. Compact Integration of Multi-Network Topology for Functional Analysis of Genes. Cell Syst 2016; 3:540-548.e5. [PMID: 27889536 DOI: 10.1016/j.cels.2016.10.017] [Citation(s) in RCA: 141] [Impact Index Per Article: 17.6] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2016] [Revised: 08/14/2016] [Accepted: 10/19/2016] [Indexed: 01/18/2023]
Abstract
The topological landscape of molecular or functional interaction networks provides a rich source of information for inferring functional patterns of genes or proteins. However, a pressing yet-unsolved challenge is how to combine multiple heterogeneous networks, each having different connectivity patterns, to achieve more accurate inference. Here, we describe the Mashup framework for scalable and robust network integration. In Mashup, the diffusion in each network is first analyzed to characterize the topological context of each node. Next, the high-dimensional topological patterns in individual networks are canonically represented using low-dimensional vectors, one per gene or protein. These vectors can then be plugged into off-the-shelf machine learning methods to derive functional insights about genes or proteins. We present tools based on Mashup that achieve state-of-the-art performance in three diverse functional inference tasks: protein function prediction, gene ontology reconstruction, and genetic interaction prediction. Mashup enables deeper insights into the structure of rapidly accumulating and diverse biological network data and can be broadly applied to other network science domains.
Collapse
Affiliation(s)
- Hyunghoon Cho
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA 02139, USA
| | - Bonnie Berger
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA 02139, USA; Department of Mathematics, MIT, Cambridge, MA 02139, USA.
| | - Jian Peng
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA 02139, USA; Department of Computer Science, University of Illinois at Urbana-Champaign, Champaign, IL 61801, USA.
| |
Collapse
|
23
|
Yu MK, Kramer M, Dutkowski J, Srivas R, Licon K, Kreisberg J, Ng CT, Krogan N, Sharan R, Ideker T. Translation of Genotype to Phenotype by a Hierarchy of Cell Subsystems. Cell Syst 2016; 2:77-88. [PMID: 26949740 PMCID: PMC4772745 DOI: 10.1016/j.cels.2016.02.003] [Citation(s) in RCA: 50] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
Accurately translating genotype to phenotype requires accounting for the functional impact of genetic variation at many biological scales. Here we present a strategy for genotype-phenotype reasoning based on existing knowledge of cellular subsystems. These subsystems and their hierarchical organization are defined by the Gene Ontology or a complementary ontology inferred directly from previously published datasets. Guided by the ontology's hierarchical structure, we organize genotype data into an "ontotype," that is, a hierarchy of perturbations representing the effects of genetic variation at multiple cellular scales. The ontotype is then interpreted using logical rules generated by machine learning to predict phenotype. This approach substantially outperforms previous, non-hierarchical methods for translating yeast genotype to cell growth phenotype, and it accurately predicts the growth outcomes of two new screens of 2,503 double gene knockouts impacting DNA repair or nuclear lumen. Ontotypes also generalize to larger knockout combinations, setting the stage for interpreting the complex genetics of disease.
Collapse
Affiliation(s)
- Michael Ku Yu
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla CA 92093, USA
- Department of Medicine, University of California San Diego, La Jolla CA 92093, USA
| | - Michael Kramer
- Department of Medicine, University of California San Diego, La Jolla CA 92093, USA
- Biomedical Sciences Program, University of California San Diego, La Jolla CA 92093, USA
| | - Janusz Dutkowski
- Department of Medicine, University of California San Diego, La Jolla CA 92093, USA
- Data4Cure, La Jolla, CA 92037, USA
| | - Rohith Srivas
- Department of Medicine, University of California San Diego, La Jolla CA 92093, USA
- Department of Bioengineering, University of California San Diego, La Jolla CA 92093, USA
| | - Katherine Licon
- Department of Medicine, University of California San Diego, La Jolla CA 92093, USA
| | - Jason Kreisberg
- Department of Medicine, University of California San Diego, La Jolla CA 92093, USA
| | | | - Nevan Krogan
- Department of Cellular and Molecular Pharmacology, University of California San Francisco, San Francisco 94143, USA
| | - Roded Sharan
- Blavatnik School of Computer Science, Tel-Aviv University, Tel Aviv 69978, Israel
| | - Trey Ideker
- Department of Medicine, University of California San Diego, La Jolla CA 92093, USA
| |
Collapse
|
24
|
Madhukar NS, Elemento O, Pandey G. Prediction of Genetic Interactions Using Machine Learning and Network Properties. Front Bioeng Biotechnol 2015; 3:172. [PMID: 26579514 PMCID: PMC4620407 DOI: 10.3389/fbioe.2015.00172] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2015] [Accepted: 10/12/2015] [Indexed: 12/04/2022] Open
Abstract
A genetic interaction (GI) is a type of interaction where the effect of one gene is modified by the effect of one or several other genes. These interactions are important for delineating functional relationships among genes and their corresponding proteins, as well as elucidating complex biological processes and diseases. An important type of GI - synthetic sickness or synthetic lethality - involves two or more genes, where the loss of either gene alone has little impact on cell viability, but the combined loss of all genes leads to a severe decrease in fitness (sickness) or cell death (lethality). The identification of GIs is an important problem for it can help delineate pathways, protein complexes, and regulatory dependencies. Synthetic lethal interactions have important clinical and biological significance, such as providing therapeutically exploitable weaknesses in tumors. While near systematic high-content screening for GIs is possible in single cell organisms such as yeast, the systematic discovery of GIs is extremely difficult in mammalian cells. Therefore, there is a great need for computational approaches to reliably predict GIs, including synthetic lethal interactions, in these organisms. Here, we review the state-of-the-art approaches, strategies, and rigorous evaluation methods for learning and predicting GIs, both under general (healthy/standard laboratory) conditions and under specific contexts, such as diseases.
Collapse
Affiliation(s)
- Neel S Madhukar
- Department of Physiology and Biophysics, Meyer Cancer Center, Institute for Precision Medicine and Institute for Computational Biomedicine, Weill Cornell Medical College , New York, NY , USA ; Tri-Institutional Training Program in Computational Biology and Medicine , New York, NY , USA
| | - Olivier Elemento
- Department of Physiology and Biophysics, Meyer Cancer Center, Institute for Precision Medicine and Institute for Computational Biomedicine, Weill Cornell Medical College , New York, NY , USA ; Tri-Institutional Training Program in Computational Biology and Medicine , New York, NY , USA
| | - Gaurav Pandey
- Department of Genetics and Genomic Sciences and Graduate School of Biomedical Sciences, Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai , New York, NY , USA
| |
Collapse
|
25
|
Whalen S, Pandey OP, Pandey G. Predicting protein function and other biomedical characteristics with heterogeneous ensembles. Methods 2015; 93:92-102. [PMID: 26342255 DOI: 10.1016/j.ymeth.2015.08.016] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2015] [Revised: 08/03/2015] [Accepted: 08/23/2015] [Indexed: 12/29/2022] Open
Abstract
Prediction problems in biomedical sciences, including protein function prediction (PFP), are generally quite difficult. This is due in part to incomplete knowledge of the cellular phenomenon of interest, the appropriateness and data quality of the variables and measurements used for prediction, as well as a lack of consensus regarding the ideal predictor for specific problems. In such scenarios, a powerful approach to improving prediction performance is to construct heterogeneous ensemble predictors that combine the output of diverse individual predictors that capture complementary aspects of the problems and/or datasets. In this paper, we demonstrate the potential of such heterogeneous ensembles, derived from stacking and ensemble selection methods, for addressing PFP and other similar biomedical prediction problems. Deeper analysis of these results shows that the superior predictive ability of these methods, especially stacking, can be attributed to their attention to the following aspects of the ensemble learning process: (i) better balance of diversity and performance, (ii) more effective calibration of outputs and (iii) more robust incorporation of additional base predictors. Finally, to make the effective application of heterogeneous ensembles to large complex datasets (big data) feasible, we present DataSink, a distributed ensemble learning framework, and demonstrate its sound scalability using the examined datasets. DataSink is publicly available from https://github.com/shwhalen/datasink.
Collapse
Affiliation(s)
- Sean Whalen
- Gladstone Institutes, University of California, San Francisco, CA, USA.
| | - Om Prakash Pandey
- Icahn Institute for Genomics and Multiscale Biology and Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| | - Gaurav Pandey
- Icahn Institute for Genomics and Multiscale Biology and Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Graduate School of Biomedical Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| |
Collapse
|
26
|
Lu X, Megchelenbrink W, Notebaart RA, Huynen MA. Predicting human genetic interactions from cancer genome evolution. PLoS One 2015; 10:e0125795. [PMID: 25933428 PMCID: PMC4416779 DOI: 10.1371/journal.pone.0125795] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2014] [Accepted: 03/25/2015] [Indexed: 11/18/2022] Open
Abstract
Synthetic Lethal (SL) genetic interactions play a key role in various types of biological research, ranging from understanding genotype-phenotype relationships to identifying drug-targets against cancer. Despite recent advances in empirical measuring SL interactions in human cells, the human genetic interaction map is far from complete. Here, we present a novel approach to predict this map by exploiting patterns in cancer genome evolution. First, we show that empirically determined SL interactions are reflected in various gene presence, absence, and duplication patterns in hundreds of cancer genomes. The most evident pattern that we discovered is that when one member of an SL interaction gene pair is lost, the other gene tends not to be lost, i.e. the absence of co-loss. This observation is in line with expectation, because the loss of an SL interacting pair will be lethal to the cancer cell. SL interactions are also reflected in gene expression profiles, such as an under representation of cases where the genes in an SL pair are both under expressed, and an over representation of cases where one gene of an SL pair is under expressed, while the other one is over expressed. We integrated the various previously unknown cancer genome patterns and the gene expression patterns into a computational model to identify SL pairs. This simple, genome-wide model achieves a high prediction power (AUC = 0.75) for known genetic interactions. It allows us to present for the first time a comprehensive genome-wide list of SL interactions with a high estimated prediction precision, covering up to 591,000 gene pairs. This unique list can potentially be used in various application areas ranging from biotechnology to medical genetics.
Collapse
Affiliation(s)
- Xiaowen Lu
- Department of Bioinformatics, Radboud Institute for Molecular Life Sciences, Radboud University Medical Centre, Nijmegen, The Netherlands
| | - Wout Megchelenbrink
- Department of Bioinformatics, Radboud Institute for Molecular Life Sciences, Radboud University Medical Centre, Nijmegen, The Netherlands
- Institute for Computing and Information Sciences, Radboud University Nijmegen, Nijmegen, The Netherlands
| | - Richard A. Notebaart
- Department of Bioinformatics, Radboud Institute for Molecular Life Sciences, Radboud University Medical Centre, Nijmegen, The Netherlands
- Centre for Systems Biology and Bioenergetics, Radboud University Medical Centre, Nijmegen, The Netherlands
- * E-mail: (RAN); (MAH)
| | - Martijn A. Huynen
- Department of Bioinformatics, Radboud Institute for Molecular Life Sciences, Radboud University Medical Centre, Nijmegen, The Netherlands
- Centre for Systems Biology and Bioenergetics, Radboud University Medical Centre, Nijmegen, The Netherlands
- * E-mail: (RAN); (MAH)
| |
Collapse
|
27
|
Žitnik M, Zupan B. Data Imputation in Epistatic MAPs by Network-Guided Matrix Completion. J Comput Biol 2015; 22:595-608. [PMID: 25658751 DOI: 10.1089/cmb.2014.0158] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
Epistatic miniarray profile (E-MAP) is a popular large-scale genetic interaction discovery platform. E-MAPs benefit from quantitative output, which makes it possible to detect subtle interactions with greater precision. However, due to the limits of biotechnology, E-MAP studies fail to measure genetic interactions for up to 40% of gene pairs in an assay. Missing measurements can be recovered by computational techniques for data imputation, in this way completing the interaction profiles and enabling downstream analysis algorithms that could otherwise be sensitive to missing data values. We introduce a new interaction data imputation method called network-guided matrix completion (NG-MC). The core part of NG-MC is low-rank probabilistic matrix completion that incorporates prior knowledge presented as a collection of gene networks. NG-MC assumes that interactions are transitive, such that latent gene interaction profiles inferred by NG-MC depend on the profiles of their direct neighbors in gene networks. As the NG-MC inference algorithm progresses, it propagates latent interaction profiles through each of the networks and updates gene network weights toward improved prediction. In a study with four different E-MAP data assays and considered protein-protein interaction and gene ontology similarity networks, NG-MC significantly surpassed existing alternative techniques. Inclusion of information from gene networks also allowed NG-MC to predict interactions for genes that were not included in original E-MAP assays, a task that could not be considered by current imputation approaches.
Collapse
Affiliation(s)
- Marinka Žitnik
- 1Faculty of Computer and Information Science, University of Ljubljana, Ljubljana, Slovenia
| | - Blaž Zupan
- 1Faculty of Computer and Information Science, University of Ljubljana, Ljubljana, Slovenia.,2Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas
| |
Collapse
|
28
|
Žitnik M, Zupan B. Data Fusion by Matrix Factorization. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2015; 37:41-53. [PMID: 26353207 DOI: 10.1109/tpami.2014.2343973] [Citation(s) in RCA: 90] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
For most problems in science and engineering we can obtain data sets that describe the observed system from various perspectives and record the behavior of its individual components. Heterogeneous data sets can be collectively mined by data fusion. Fusion can focus on a specific target relation and exploit directly associated data together with contextual data and data about system's constraints. In the paper we describe a data fusion approach with penalized matrix tri-factorization (DFMF) that simultaneously factorizes data matrices to reveal hidden associations. The approach can directly consider any data that can be expressed in a matrix, including those from feature-based representations, ontologies, associations and networks. We demonstrate the utility of DFMF for gene function prediction task with eleven different data sources and for prediction of pharmacologic actions by fusing six data sources. Our data fusion algorithm compares favorably to alternative data integration approaches and achieves higher accuracy than can be obtained from any single data source alone.
Collapse
|
29
|
Wu M, Li X, Zhang F, Li X, Kwoh CK, Zheng J. In silico prediction of synthetic lethality by meta-analysis of genetic interactions, functions, and pathways in yeast and human cancer. Cancer Inform 2014; 13:71-80. [PMID: 25452682 PMCID: PMC4224103 DOI: 10.4137/cin.s14026] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2014] [Revised: 08/15/2014] [Accepted: 08/18/2014] [Indexed: 02/07/2023] Open
Abstract
A major goal in cancer medicine is to find selective drugs with reduced side effect. A pair of genes is called synthetic lethality (SL) if mutations of both genes will kill a cell while mutation of either gene alone will not. Hence, a gene in SL interactions with a cancer-specific mutated gene will be a promising drug target with anti-cancer selectivity. Wet-lab screening approach is still so costly that even for yeast only a small fraction of gene pairs has been covered. Computational methods are therefore important for large-scale discovery of SL interactions. Most existing approaches focus on individual features or machine-learning methods, which are prone to noise or overfitting. In this paper, we propose an approach named MetaSL for predicting yeast SL, which integrates 17 genomic and proteomic features and the outputs of 10 classification methods. MetaSL thus combines the strengths of existing methods and achieves the highest area under the Receiver Operating Characteristics (ROC) curve (AUC) of 87.1% among all competitors on yeast data. Moreover, through orthologous mapping from yeast to human genes, we then predicted several lists of candidate SL pairs in human cancer. Our method and predictions would thus shed light on mechanisms of SL and lead to discovery of novel anti-cancer drugs. In addition, all the experimental results can be downloaded from http://www.ntu.edu.sg/home/zhengjie/data/MetaSL.
Collapse
Affiliation(s)
- Min Wu
- School of Computer Engineering, Nanyang Technological University, Singapore. ; Institute for Infocomm Research, ASTAR, 1 Fusionopolis Way, Singapore
| | - Xuejuan Li
- School of Computer Engineering, Nanyang Technological University, Singapore
| | - Fan Zhang
- School of Computer Engineering, Nanyang Technological University, Singapore
| | - Xiaoli Li
- Institute for Infocomm Research, ASTAR, 1 Fusionopolis Way, Singapore
| | - Chee-Keong Kwoh
- School of Computer Engineering, Nanyang Technological University, Singapore
| | - Jie Zheng
- School of Computer Engineering, Nanyang Technological University, Singapore. ; Genome Institute of Singapore, ASTAR, Biopolis, Singapore
| |
Collapse
|
30
|
Žitnik M, Zupan B. Matrix factorization-based data fusion for drug-induced liver injury prediction. ACTA ACUST UNITED AC 2014. [DOI: 10.4161/sysb.29072] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|
31
|
Skinner MK, Savenkova MI, Zhang B, Gore AC, Crews D. Gene bionetworks involved in the epigenetic transgenerational inheritance of altered mate preference: environmental epigenetics and evolutionary biology. BMC Genomics 2014; 15:377. [PMID: 24885959 PMCID: PMC4073506 DOI: 10.1186/1471-2164-15-377] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2013] [Accepted: 04/28/2014] [Indexed: 03/22/2024] Open
Abstract
BACKGROUND Mate preference behavior is an essential first step in sexual selection and is a critical determinant in evolutionary biology. Previously an environmental compound (the fungicide vinclozolin) was found to promote the epigenetic transgenerational inheritance of an altered sperm epigenome and modified mate preference characteristics for three generations after exposure of a gestating female. RESULTS The current study investigated gene networks involved in various regions of the brain that correlated with the altered mate preference behavior in the male and female. Statistically significant correlations of gene clusters and modules were identified to associate with specific mate preference behaviors. This novel systems biology approach identified gene networks (bionetworks) involved in sex-specific mate preference behavior. Observations demonstrate the ability of environmental factors to promote the epigenetic transgenerational inheritance of this altered evolutionary biology determinant. CONCLUSIONS Combined observations elucidate the potential molecular control of mate preference behavior and suggests environmental epigenetics can have a role in evolutionary biology.
Collapse
Affiliation(s)
- Michael K Skinner
- />Center for Reproductive Biology, School of Biological Sciences, Washington State University, Pullman, WA 99164-4236 USA
| | - Marina I Savenkova
- />Center for Reproductive Biology, School of Biological Sciences, Washington State University, Pullman, WA 99164-4236 USA
| | - Bin Zhang
- />Department of Genetics & Genomic Sciences, Institute of Genomics and Multiscale Biology, Mount Sinai School of Medicine, New York, NY 10029 USA
| | | | - David Crews
- />Section of Integrative Biology, University of Texas at Austin, Austin, TX 78712 USA
| |
Collapse
|
32
|
Efroni S, Meerzaman D, Schaefer CF, Greenblum S, Soo-Lyu M, Hu Y, Cultraro C, Meshorer E, Buetow KH. Systems analysis utilising pathway interactions identifies sonic hedgehog pathway as a primary biomarker and oncogenic target in hepatocellular carcinoma. IET Syst Biol 2014; 7:243-51. [PMID: 24712101 DOI: 10.1049/iet-syb.2010.0078] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
Abstract
The development and progression of cancer is associated with disruption of biological networks. Historically studies have identified sets of signature genes involved in events ultimately leading to the development of cancer. Identification of such sets does not indicate which biologic processes are oncogenic drivers and makes it difficult to identify key networks to target for interventions. Using a comprehensive, integrated computational approach, the authors identify the sonic hedgehog (SHH) pathway as the gene network that most significantly distinguishes tumour and tumour-adjacent samples in human hepatocellular carcinoma (HCC). The analysis reveals that the SHH pathway is commonly activated in the tumour samples and its activity most significantly differentiates tumour from the non-tumour samples. The authors experimentally validate these in silico findings in the same biologic material using Western blot analysis. This analysis reveals that the expression levels of SHH, phosphorylated cyclin B1, and CDK7 levels are much higher in most tumour tissues as compared to normal tissue. It is also shown that siRNA-mediated silencing of SHH gene expression resulted in a significant reduction of cell proliferation in a liver cancer cell line, SNU449 indicating that SHH plays a major role in promoting cell proliferation in liver cancer. The SHH pathway is a key network underpinning HCC aetiology which may guide the development of interventions for this most common form of human liver cancer.
Collapse
|
33
|
Lu X, Kensche PR, Huynen MA, Notebaart RA. Genome evolution predicts genetic interactions in protein complexes and reveals cancer drug targets. Nat Commun 2014; 4:2124. [PMID: 23851603 PMCID: PMC3717498 DOI: 10.1038/ncomms3124] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2012] [Accepted: 06/07/2013] [Indexed: 12/05/2022] Open
Abstract
Genetic interactions reveal insights into cellular function and can be used to identify drug targets. Here we construct a new model to predict negative genetic interactions in protein complexes by exploiting the evolutionary history of genes in parallel converging pathways in metabolism. We evaluate our model with protein complexes of Saccharomyces cerevisiae and show that the predicted protein pairs more frequently have a negative genetic interaction than random proteins from the same complex. Furthermore, we apply our model to human protein complexes to predict novel cancer drug targets, and identify 20 candidate targets with empirical support and 10 novel targets amenable to further experimental validation. Our study illustrates that negative genetic interactions can be predicted by systematically exploring genome evolution, and that this is useful to identify novel anti-cancer drug targets. Genetic interactions can reveal insights into cellular functions. Here, Lu et al. show that negative genetic interactions in protein complexes can be predicted by systematically exploring the evolutionary history of genes, which may be useful for the identification of novel targets for anti-cancer drugs.
Collapse
Affiliation(s)
- Xiaowen Lu
- Department of Bioinformatics, Centre for Molecular Life Sciences, Radboud University Medical Centre, 6525GA Nijmegen, The Netherlands
| | | | | | | |
Collapse
|
34
|
Farris SP, Mayfield RD. RNA-Seq reveals novel transcriptional reorganization in human alcoholic brain. INTERNATIONAL REVIEW OF NEUROBIOLOGY 2014; 116:275-300. [PMID: 25172479 DOI: 10.1016/b978-0-12-801105-8.00011-4] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
DNA microarrays have been used for over a decade to profile gene expression on a genomic scale. While this technology has advanced our understanding of complex cellular function, the reliance of microarrays on hybridization kinetics results in several technical limitations. For example, knowledge of the sequences being probed is required, distinguishing similar sequences is difficult because of cross-hybridization, and the relatively narrow dynamic range of the signal limits sensitivity. Recently, new technologies have been introduced that are based on novel sequencing methodologies. These next-generation sequencing methods do not have the limitations inherent to microarrays. Next-generation sequencing is unique since it allows the detection of all known and novel RNAs present in biological samples without bias toward known transcripts. In addition, the expression of coding and noncoding RNAs, alternative splicing events, and expressed single nucleotide polymorphisms (SNPs) can be identified in a single experiment. Furthermore, this technology allows for remarkably higher throughput while lowering sequencing costs. This significant shift in throughput and pricing makes low-cost access to whole genomes possible and more importantly expands sequencing applications far beyond traditional uses (Morozova & Marra, 2008) to include sequencing the transcriptome (RNA-Seq), providing detail on gene structure, alternative splicing events, expressed SNPs, and transcript size (Mane et al., 2009; Tang et al., 2009; Walter et al., 2009), in a single experiment, while also quantifying the absolute abundance of genes, all with greater sensitivity and dynamic range than the competing cDNA microarray technology (Mortazavi, Williams, McCue, Schaeffer, & Wold, 2008).
Collapse
Affiliation(s)
- Sean P Farris
- Waggoner Center for Alcohol and Addiction Research, The University of Texas at Austin, Austin, TX 78712
| | - R Dayne Mayfield
- Waggoner Center for Alcohol and Addiction Research, The University of Texas at Austin, Austin, TX 78712.
| |
Collapse
|
35
|
Schrynemackers M, Küffner R, Geurts P. On protocols and measures for the validation of supervised methods for the inference of biological networks. Front Genet 2013; 4:262. [PMID: 24348517 PMCID: PMC3848415 DOI: 10.3389/fgene.2013.00262] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2013] [Accepted: 11/13/2013] [Indexed: 11/30/2022] Open
Abstract
Networks provide a natural representation of molecular biology knowledge, in particular to model relationships between biological entities such as genes, proteins, drugs, or diseases. Because of the effort, the cost, or the lack of the experiments necessary for the elucidation of these networks, computational approaches for network inference have been frequently investigated in the literature. In this paper, we examine the assessment of supervised network inference. Supervised inference is based on machine learning techniques that infer the network from a training sample of known interacting and possibly non-interacting entities and additional measurement data. While these methods are very effective, their reliable validation in silico poses a challenge, since both prediction and validation need to be performed on the basis of the same partially known network. Cross-validation techniques need to be specifically adapted to classification problems on pairs of objects. We perform a critical review and assessment of protocols and measures proposed in the literature and derive specific guidelines how to best exploit and evaluate machine learning techniques for network inference. Through theoretical considerations and in silico experiments, we analyze in depth how important factors influence the outcome of performance estimation. These factors include the amount of information available for the interacting entities, the sparsity and topology of biological networks, and the lack of experimentally verified non-interacting pairs.
Collapse
Affiliation(s)
- Marie Schrynemackers
- Systems and Modeling, Department of Electrical Engineering and Computer Science and GIGA-R, University of Liège Liège, Belgium
| | - Robert Küffner
- Institute for Practical Informatics and Bioinformatics, Ludwig-Maximilians-University Munich, Germany
| | - Pierre Geurts
- Systems and Modeling, Department of Electrical Engineering and Computer Science and GIGA-R, University of Liège Liège, Belgium
| |
Collapse
|
36
|
Discovering disease-disease associations by fusing systems-level molecular data. Sci Rep 2013; 3:3202. [PMID: 24232732 PMCID: PMC3828568 DOI: 10.1038/srep03202] [Citation(s) in RCA: 81] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2013] [Accepted: 10/23/2013] [Indexed: 12/12/2022] Open
Abstract
The advent of genome-scale genetic and genomic studies allows new insight into disease classification. Recently, a shift was made from linking diseases simply based on their shared genes towards systems-level integration of molecular data. Here, we aim to find relationships between diseases based on evidence from fusing all available molecular interaction and ontology data. We propose a multi-level hierarchy of disease classes that significantly overlaps with existing disease classification. In it, we find 14 disease-disease associations currently not present in Disease Ontology and provide evidence for their relationships through comorbidity data and literature curation. Interestingly, even though the number of known human genetic interactions is currently very small, we find they are the most important predictor of a link between diseases. Finally, we show that omission of any one of the included data sources reduces prediction quality, further highlighting the importance in the paradigm shift towards systems-level data fusion.
Collapse
|
37
|
Wang K, Sun J, Zhou S, Wan C, Qin S, Li C, He L, Yang L. Prediction of drug-target interactions for drug repositioning only based on genomic expression similarity. PLoS Comput Biol 2013; 9:e1003315. [PMID: 24244130 PMCID: PMC3820513 DOI: 10.1371/journal.pcbi.1003315] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2013] [Accepted: 09/19/2013] [Indexed: 01/16/2023] Open
Abstract
Small drug molecules usually bind to multiple protein targets or even unintended off-targets. Such drug promiscuity has often led to unwanted or unexplained drug reactions, resulting in side effects or drug repositioning opportunities. So it is always an important issue in pharmacology to identify potential drug-target interactions (DTI). However, DTI discovery by experiment remains a challenging task, due to high expense of time and resources. Many computational methods are therefore developed to predict DTI with high throughput biological and clinical data. Here, we initiatively demonstrate that the on-target and off-target effects could be characterized by drug-induced in vitro genomic expression changes, e.g. the data in Connectivity Map (CMap). Thus, unknown ligands of a certain target can be found from the compounds showing high gene-expression similarity to the known ligands. Then to clarify the detailed practice of CMap based DTI prediction, we objectively evaluate how well each target is characterized by CMap. The results suggest that (1) some targets are better characterized than others, so the prediction models specific to these well characterized targets would be more accurate and reliable; (2) in some cases, a family of ligands for the same target tend to interact with common off-targets, which may help increase the efficiency of DTI discovery and explain the mechanisms of complicated drug actions. In the present study, CMap expression similarity is proposed as a novel indicator of drug-target interactions. The detailed strategies of improving data quality by decreasing the batch effect and building prediction models are also effectively established. We believe the success in CMap can be further translated into other public and commercial data of genomic expression, thus increasing research productivity towards valid drug repositioning and minimal side effects. Small drug molecules usually bind to unintended off-targets, leading to unexpected drug responses such as side effects or drug repositioning opportunities. Thus, identifying unintended drug-target interactions (DTI) is particularly required for understanding complicated drug actions. It remains expensive nowadays to experimentally determine DTI, so various computational methods are developed. In this study, we initiatively demonstrated that target binding is directly correlated with drug induced genomic expression profiles in Connectivity Map (CMap). By improving data quality of CMap, we illustrated three important facts: (1) Drugs binding to common targets show higher gene-expression similarity than random compounds, indicating that upstream ligand binding could be characterized by downstream gene-expression change. (2) It is found that some targets are better characterized by CMap than others. To guarantee efficiency of DTI discovery, prediction models should be specifically built for those well characterized targets. (3) It is broadly observed in the predicted DTI that ligands for the same target may collectively interact with common off-target. This observation is consistent with published experimental evidence and can help illustrate the mechanisms of unexplained drug reactions. Based on CMap, our work established an efficient pipeline of identifying potential DTI. By extending the success in CMap to other genomic data sources, we believe more DTI would be discovered.
Collapse
Affiliation(s)
- Kejian Wang
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Shanghai Jiao Tong University, Shanghai, China
| | | | | | | | | | | | | | | |
Collapse
|
38
|
Abstract
Proteins are not monolithic entities; rather, they can contain multiple domains that mediate distinct interactions, and their functionality can be regulated through post-translational modifications at multiple distinct sites. Traditionally, network biology has ignored such properties of proteins and has instead examined either the physical interactions of whole proteins or the consequences of removing entire genes. In this Review, we discuss experimental and computational methods to increase the resolution of protein-protein, genetic and drug-gene interaction studies to the domain and residue levels. Such work will be crucial for using interaction networks to connect sequence and structural information, and to understand the biological consequences of disease-associated mutations, which will hopefully lead to more effective therapeutic strategies.
Collapse
|
39
|
Szczurek E, Misra N, Vingron M. Synthetic sickness or lethality points at candidate combination therapy targets in glioblastoma. Int J Cancer 2013; 133:2123-32. [PMID: 23629686 DOI: 10.1002/ijc.28235] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2012] [Accepted: 04/11/2013] [Indexed: 12/30/2022]
Abstract
Synthetic lethal interactions in cancer hold the potential for successful combined therapies, which would avoid the difficulties of single molecule-targeted treatment. Identification of interactions that are specific for human tumors is an open problem in cancer research. This work aims at deciphering synthetic sick or lethal interactions directly from somatic alteration, expression and survival data of cancer patients. To this end, we look for pairs of genes and their alterations or expression levels that are "avoided" by tumors and "beneficial" for patients. Thus, candidates for synthetic sickness or lethality (SSL) interaction are identified as such gene pairs whose combination of states is under-represented in the data. Our main methodological contribution is a quantitative score that allows ranking of the candidate SSL interactions according to evidence found in patient survival. Applying this analysis to glioblastoma data, we collect 1,956 synthetic sick or lethal partners for 85 abundantly altered genes, most of which show extensive copy number variation across the patient cohort. We rediscover and interpret known interaction between TP53 and PLK1, as well as provide insight into the mechanism behind EGFR interacting with AKT2, but not AKT1 nor AKT3. Cox model analysis determines 274 of identified interactions as having significant impact on overall survival in glioblastoma, which is more informative than a standard survival predictor based on patient's age.
Collapse
Affiliation(s)
- Ewa Szczurek
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Ihnestr. 63-73, 14195, Berlin, Germany.
| | | | | |
Collapse
|
40
|
Gautier L, Taboureau O, Audouze K. The effect of network biology on drug toxicology. Expert Opin Drug Metab Toxicol 2013; 9:1409-18. [PMID: 23937336 DOI: 10.1517/17425255.2013.820704] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
INTRODUCTION The high failure rate of drug candidates due to toxicity, during clinical trials, is a critical issue in drug discovery. Network biology has become a promising approach, in this regard, using the increasingly large amount of biological and chemical data available and combining it with bioinformatics. With this approach, the assessment of chemical safety can be done across multiple scales of complexity from molecular to cellular and system levels in human health. Network biology can be used at several levels of complexity. AREAS COVERED This review describes the strengths and limitations of network biology. The authors specifically assess this approach across different biological scales when it is applied to toxicity. EXPERT OPINION There has been much progress made with the amount of data that is generated by various omics technologies. With this large amount of useful data, network biology has the opportunity to contribute to a better understanding of a drug's safety profile. The authors believe that considering a drug action and protein's function in a global physiological environment may benefit our understanding of the impact some chemicals have on human health and toxicity. The next step for network biology will be to better integrate differential and quantitative data.
Collapse
Affiliation(s)
- Laurent Gautier
- Technical University of Denmark, Center for Biological Sequence Analysis, Department of Systems Biology , Lyngby , Denmark
| | | | | |
Collapse
|
41
|
Alanis-Lobato G, Cannistraci CV, Ravasi T. Exploitation of genetic interaction network topology for the prediction of epistatic behavior. Genomics 2013; 102:202-8. [PMID: 23892246 DOI: 10.1016/j.ygeno.2013.07.010] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2012] [Revised: 06/24/2013] [Accepted: 07/17/2013] [Indexed: 11/30/2022]
Abstract
Genetic interaction (GI) detection impacts the understanding of human disease and the ability to design personalized treatment. The mapping of every GI in most organisms is far from complete due to the combinatorial amount of gene deletions and knockdowns required. Computational techniques to predict new interactions based only on network topology have been developed in network science but never applied to GI networks. We show that topological prediction of GIs is possible with high precision and propose a graph dissimilarity index that is able to provide robust prediction in both dense and sparse networks. Computational prediction of GIs is a strong tool to aid high-throughput GI determination. The dissimilarity index we propose in this article is able to attain precise predictions that reduce the universe of candidate GIs to test in the lab.
Collapse
Affiliation(s)
- Gregorio Alanis-Lobato
- Integrative Systems Biology Lab, Biological and Environmental Sciences and Engineering Division, Computer, Electrical and Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal 23955-6900, Kingdom of Saudi Arabia; Division of Medical Genetics, Department of Medicine, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093 USA
| | | | | |
Collapse
|
42
|
Li S, Nakaya HI, Kazmin DA, Oh JZ, Pulendran B. Systems biological approaches to measure and understand vaccine immunity in humans. Semin Immunol 2013; 25:209-18. [PMID: 23796714 DOI: 10.1016/j.smim.2013.05.003] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/18/2013] [Accepted: 05/09/2013] [Indexed: 02/01/2023]
Abstract
Recent studies have demonstrated the utility of using systems approaches to identify molecular signatures that can be used to predict vaccine immunity in humans. Such approaches are now being used extensively in vaccinology, and are beginning to yield novel insights about the molecular networks driving vaccine immunity. In this review, we present a broad review of the methodologies involved in these studies, and discuss the promise and challenges involved in this emerging field of "systems vaccinology."
Collapse
Affiliation(s)
- Shuzhao Li
- Emory Vaccine Center, Yerkes National Primate Research Center, 954 Gatewood Road, Atlanta, GA 30329, USA
| | | | | | | | | |
Collapse
|
43
|
Csermely P, Korcsmáros T, Kiss HJM, London G, Nussinov R. Structure and dynamics of molecular networks: a novel paradigm of drug discovery: a comprehensive review. Pharmacol Ther 2013; 138:333-408. [PMID: 23384594 PMCID: PMC3647006 DOI: 10.1016/j.pharmthera.2013.01.016] [Citation(s) in RCA: 511] [Impact Index Per Article: 46.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2013] [Accepted: 01/22/2013] [Indexed: 02/02/2023]
Abstract
Despite considerable progress in genome- and proteome-based high-throughput screening methods and in rational drug design, the increase in approved drugs in the past decade did not match the increase of drug development costs. Network description and analysis not only give a systems-level understanding of drug action and disease complexity, but can also help to improve the efficiency of drug design. We give a comprehensive assessment of the analytical tools of network topology and dynamics. The state-of-the-art use of chemical similarity, protein structure, protein-protein interaction, signaling, genetic interaction and metabolic networks in the discovery of drug targets is summarized. We propose that network targeting follows two basic strategies. The "central hit strategy" selectively targets central nodes/edges of the flexible networks of infectious agents or cancer cells to kill them. The "network influence strategy" works against other diseases, where an efficient reconfiguration of rigid networks needs to be achieved by targeting the neighbors of central nodes/edges. It is shown how network techniques can help in the identification of single-target, edgetic, multi-target and allo-network drug target candidates. We review the recent boom in network methods helping hit identification, lead selection optimizing drug efficacy, as well as minimizing side-effects and drug toxicity. Successful network-based drug development strategies are shown through the examples of infections, cancer, metabolic diseases, neurodegenerative diseases and aging. Summarizing >1200 references we suggest an optimized protocol of network-aided drug development, and provide a list of systems-level hallmarks of drug quality. Finally, we highlight network-related drug development trends helping to achieve these hallmarks by a cohesive, global approach.
Collapse
Affiliation(s)
- Peter Csermely
- Department of Medical Chemistry, Semmelweis University, P.O. Box 260, H-1444 Budapest 8, Hungary.
| | | | | | | | | |
Collapse
|
44
|
Pandey G, Zhang B, Jian L. Predicting submicron air pollution indicators: a machine learning approach. ENVIRONMENTAL SCIENCE. PROCESSES & IMPACTS 2013; 15:996-1005. [PMID: 23535697 DOI: 10.1039/c3em30890a] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
The regulation of air pollutant levels is rapidly becoming one of the most important tasks for the governments of developing countries, especially China. Submicron particles, such as ultrafine particles (UFP, aerodynamic diameter ≤ 100 nm) and particulate matter ≤ 1.0 micrometers (PM1.0), are an unregulated emerging health threat to humans, but the relationships between the concentration of these particles and meteorological and traffic factors are poorly understood. To shed some light on these connections, we employed a range of machine learning techniques to predict UFP and PM1.0 levels based on a dataset consisting of observations of weather and traffic variables recorded at a busy roadside in Hangzhou, China. Based upon the thorough examination of over twenty five classifiers used for this task, we find that it is possible to predict PM1.0 and UFP levels reasonably accurately and that tree-based classification models (Alternating Decision Tree and Random Forests) perform the best for both these particles. In addition, weather variables show a stronger relationship with PM1.0 and UFP levels, and thus cannot be ignored for predicting submicron particle levels. Overall, this study has demonstrated the potential application value of systematically collecting and analysing datasets using machine learning techniques for the prediction of submicron sized ambient air pollutants.
Collapse
Affiliation(s)
- Gaurav Pandey
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, NY 10029, USA
| | | | | |
Collapse
|
45
|
Rider AK, Johnson RA, Davis DA, Hoens TR, Chawla NV. Classifier Evaluation with Missing Negative Class Labels. ADVANCES IN INTELLIGENT DATA ANALYSIS XII 2013. [DOI: 10.1007/978-3-642-41398-8_33] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
|
46
|
Systems genetics in "-omics" era: current and future development. Theory Biosci 2012; 132:1-16. [PMID: 23138757 DOI: 10.1007/s12064-012-0168-x] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2012] [Accepted: 10/25/2012] [Indexed: 02/06/2023]
Abstract
The systems genetics is an emerging discipline that integrates high-throughput expression profiling technology and systems biology approaches for revealing the molecular mechanism of complex traits, and will improve our understanding of gene functions in the biochemical pathway and genetic interactions between biological molecules. With the rapid advances of microarray analysis technologies, bioinformatics is extensively used in the studies of gene functions, SNP-SNP genetic interactions, LD block-block interactions, miRNA-mRNA interactions, DNA-protein interactions, protein-protein interactions, and functional mapping for LD blocks. Based on bioinformatics panel, which can integrate "-omics" datasets to extract systems knowledge and useful information for explaining the molecular mechanism of complex traits, systems genetics is all about to enhance our understanding of biological processes. Systems biology has provided systems level recognition of various biological phenomena, and constructed the scientific background for the development of systems genetics. In addition, the next-generation sequencing technology and post-genome wide association studies empower the discovery of new gene and rare variants. The integration of different strategies will help to propose novel hypothesis and perfect the theoretical framework of systems genetics, which will make contribution to the future development of systems genetics, and open up a whole new area of genetics.
Collapse
|
47
|
Leveraging models of cell regulation and GWAS data in integrative network-based association studies. Nat Genet 2012; 44:841-7. [PMID: 22836096 DOI: 10.1038/ng.2355] [Citation(s) in RCA: 190] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
|
48
|
Genome-wide essential gene identification in Streptococcus sanguinis. Sci Rep 2011; 1:125. [PMID: 22355642 PMCID: PMC3216606 DOI: 10.1038/srep00125] [Citation(s) in RCA: 93] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2011] [Accepted: 09/21/2011] [Indexed: 12/29/2022] Open
Abstract
A clear perception of gene essentiality in bacterial pathogens is pivotal for identifying drug targets to combat emergence of new pathogens and antibiotic-resistant bacteria, for synthetic biology, and for understanding the origins of life. We have constructed a comprehensive set of deletion mutants and systematically identified a clearly defined set of essential genes for Streptococcus sanguinis. Our results were confirmed by growing S. sanguinis in minimal medium and by double-knockout of paralogous or isozyme genes. Careful examination revealed that these essential genes were associated with only three basic categories of biological functions: maintenance of the cell envelope, energy production, and processing of genetic information. Our finding was subsequently validated in two other pathogenic streptococcal species, Streptococcus pneumoniae and Streptococcus mutans and in two other gram-positive pathogens, Bacillus subtilis and Staphylococcus aureus. Our analysis has thus led to a simplified model that permits reliable prediction of gene essentiality.
Collapse
|
49
|
|