1
|
Sarsani V, Brotman SM, Xianyong Y, Fernandes Silva L, Laakso M, Spracklen CN. A cross-ancestry genome-wide meta-analysis, fine-mapping, and gene prioritization approach to characterize the genetic architecture of adiponectin. HGG ADVANCES 2024; 5:100252. [PMID: 37859345 PMCID: PMC10652123 DOI: 10.1016/j.xhgg.2023.100252] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Revised: 10/16/2023] [Accepted: 10/16/2023] [Indexed: 10/21/2023] Open
Abstract
Previous genome-wide association studies (GWASs) for adiponectin, a complex trait linked to type 2 diabetes and obesity, identified >20 associated loci. However, most loci were identified in populations of European ancestry, and many of the target genes underlying the associations remain unknown. We conducted a cross-ancestry adiponectin GWAS meta-analysis in ≤46,434 individuals from the Metabolic Syndrome in Men (METSIM) cohort and the ADIPOGen and AGEN consortiums. We combined study-specific association summary statistics using a fixed-effects, inverse variance-weighted approach. We identified 22 loci associated with adiponectin (p < 5×10-8), including 15 known and seven previously unreported loci. Among individuals of European ancestry, Genome-wide Complex Traits Analysis joint conditional analysis (GCTA-COJO) identified 14 additional distinct signals at the ADIPOQ, CDH13, HCAR1, and ZNF664 loci. Leveraging the cross-ancestry data, FINEMAP + SuSiE identified 45 causal variants (PP > 0.9), which also exhibited potential pleiotropy for cardiometabolic traits. To prioritize target genes at associated loci, we propose a combinatorial likelihood scoring formalism (Gene Priority Score [GPScore]) based on measures derived from 11 gene prioritization strategies and the physical distance to the transcription start site. With GPScore, we prioritize the 30 most probable target genes underlying the adiponectin-associated variants in the cross-ancestry analysis, including well-known causal genes (e.g., ADIPOQ, CDH13) and additional genes (e.g., CSF1, RGS17). Functional association networks revealed complex interactions of prioritized genes, their functionally connected genes, and their underlying pathways centered around insulin and adiponectin signaling, indicating an essential role in regulating energy balance in the body, inflammation, coagulation, fibrinolysis, insulin resistance, and diabetes. Overall, our analyses identify and characterize adiponectin association signals and inform experimental interrogation of target genes for adiponectin.
Collapse
Affiliation(s)
- Vishal Sarsani
- Department of Mathematics and Statistics, University of Massachusetts Amherst, Amherst, MA, USA
| | - Sarah M Brotman
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Yin Xianyong
- Department of Biostatistics and Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI 48109, USA
| | - Lillian Fernandes Silva
- Institute of Clinical Medicine, Internal Medicine, University of Eastern Finland, Kuopio, Finland
| | - Markku Laakso
- Institute of Clinical Medicine, Internal Medicine, University of Eastern Finland, Kuopio, Finland
| | - Cassandra N Spracklen
- Department of Biostatistics and Epidemiology, University of Massachusetts Amherst, Amherst, MA, USA.
| |
Collapse
|
2
|
Woicik A, Zhang M, Xu H, Mostafavi S, Wang S. Gemini: memory-efficient integration of hundreds of gene networks with high-order pooling. Bioinformatics 2023; 39:i504-i512. [PMID: 37387142 DOI: 10.1093/bioinformatics/btad247] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023] Open
Abstract
MOTIVATION The exponential growth of genomic sequencing data has created ever-expanding repositories of gene networks. Unsupervised network integration methods are critical to learn informative representations for each gene, which are later used as features for downstream applications. However, these network integration methods must be scalable to account for the increasing number of networks and robust to an uneven distribution of network types within hundreds of gene networks. RESULTS To address these needs, we present Gemini, a novel network integration method that uses memory-efficient high-order pooling to represent and weight each network according to its uniqueness. Gemini then mitigates the uneven network distribution through mixing up existing networks to create many new networks. We find that Gemini leads to more than a 10% improvement in F1 score, 15% improvement in micro-AUPRC, and 63% improvement in macro-AUPRC for human protein function prediction by integrating hundreds of networks from BioGRID, and that Gemini's performance significantly improves when more networks are added to the input network collection, while Mashup and BIONIC embeddings' performance deteriorates. Gemini thereby enables memory-efficient and informative network integration for large gene networks and can be used to massively integrate and analyze networks in other domains. AVAILABILITY AND IMPLEMENTATION Gemini can be accessed at: https://github.com/MinxZ/Gemini.
Collapse
Affiliation(s)
- Addie Woicik
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, United States
| | - Mingxin Zhang
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, United States
| | - Hanwen Xu
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, United States
| | - Sara Mostafavi
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, United States
| | - Sheng Wang
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, United States
| |
Collapse
|
3
|
Wu Z, Guo M, Jin X, Chen J, Liu B. CFAGO: cross-fusion of network and attributes based on attention mechanism for protein function prediction. Bioinformatics 2023; 39:7072461. [PMID: 36883697 PMCID: PMC10032634 DOI: 10.1093/bioinformatics/btad123] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Revised: 02/28/2023] [Accepted: 03/05/2023] [Indexed: 03/09/2023] Open
Abstract
MOTIVATION Protein function annotation is fundamental to understanding biological mechanisms. The abundant genome-scale protein-protein interaction (PPI) networks, together with other protein biological attributes, provide rich information for annotating protein functions. As PPI networks and biological attributes describe protein functions from different perspectives, it is highly challenging to cross-fuse them for protein function prediction. Recently, several methods combine the PPI networks and protein attributes via the graph neural networks (GNNs). However, GNNs may inherit or even magnify the bias caused by noisy edges in PPI networks. Besides, GNNs with stacking of many layers may cause the over-smoothing problem of node representations. RESULTS We develop a novel protein function prediction method, CFAGO, to integrate single-species PPI networks and protein biological attributes via a multi-head attention mechanism. CFAGO is first pre-trained with an encoder-decoder architecture to capture the universal protein representation of the two sources. It is then fine-tuned to learn more effective protein representations for protein function prediction. Benchmark experiments on human and mouse datasets show CFAGO outperforms state-of-the-art single-species network-based methods by at least 7.59%, 6.90%, 11.68% in terms of m-AUPR, M-AUPR, and Fmax, respectively, demonstrating cross-fusion by multi-head attention mechanism can greatly improve the protein function prediction. We further evaluate the quality of captured protein representations in terms of Davies Bouldin Score, whose results show that cross-fused protein representations by multi-head attention mechanism are at least 2.7% better than that of original and concatenated representations. We believe CFAGO is an effective tool for protein function prediction. AVAILABILITY AND IMPLEMENTATION The source code of CFAGO and experiments data are available at: http://bliulab.net/CFAGO/.
Collapse
Affiliation(s)
- Zhourun Wu
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong 518055, China
| | - Mingyue Guo
- School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Beijing 101408, China
| | - Xiaopeng Jin
- College of Big Data and Internet, Shenzhen Technology University, Shenzhen, Guangdong 518118, China
| | - Junjie Chen
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong 518055, China
| | - Bin Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong 518055, China
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
- Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing 100081, China
| |
Collapse
|
4
|
Niranjan V, Uttarkar A, Kaul A, Varghese M. A Machine Learning-Based Approach Using Multi-omics Data to Predict Metabolic Pathways. Methods Mol Biol 2023; 2553:441-452. [PMID: 36227554 DOI: 10.1007/978-1-0716-2617-7_19] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
The integrative method approaches are continuously evolving to provide accurate insights from the data that is received through experimentation on various biological systems. Multi-omics data can be integrated with predictive machine learning algorithms in order to provide results with high accuracy. This protocol chapter defines the steps required for the ML-multi-omics integration methods that are applied on biological datasets for its analysis and the visual interpretation of the results thus obtained.
Collapse
Affiliation(s)
- Vidya Niranjan
- Department of Biotechnology, R V College of Engineering, Mysuru Road, Kengeri, Bengaluru, India.
| | - Akshay Uttarkar
- Department of Biotechnology, R V College of Engineering, Mysuru Road, Kengeri, Bengaluru, India
| | - Aakaanksha Kaul
- Department of Biotechnology, R V College of Engineering, Mysuru Road, Kengeri, Bengaluru, India
| | - Maryanne Varghese
- Department of Biotechnology, R V College of Engineering, Mysuru Road, Kengeri, Bengaluru, India
| |
Collapse
|
5
|
Li W, Shao C, Zhou H, Du H, Chen H, Wan H, He Y. Multi-omics research strategies in ischemic stroke: A multidimensional perspective. Ageing Res Rev 2022; 81:101730. [PMID: 36087702 DOI: 10.1016/j.arr.2022.101730] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2022] [Revised: 08/23/2022] [Accepted: 09/03/2022] [Indexed: 01/31/2023]
Abstract
Ischemic stroke (IS) is a multifactorial and heterogeneous neurological disorder with high rate of death and long-term impairment. Despite years of studies, there are still no stroke biomarkers for clinical practice, and the molecular mechanisms of stroke remain largely unclear. The high-throughput omics approach provides new avenues for discovering biomarkers of IS and explaining its pathological mechanisms. However, single-omics approaches only provide a limited understanding of the biological pathways of diseases. The integration of multiple omics data means the simultaneous analysis of thousands of genes, RNAs, proteins and metabolites, revealing networks of interactions between multiple molecular levels. Integrated analysis of multi-omics approaches will provide helpful insights into stroke pathogenesis, therapeutic target identification and biomarker discovery. Here, we consider advances in genomics, transcriptomics, proteomics and metabolomics and outline their use in discovering the biomarkers and pathological mechanisms of IS. We then delineate strategies for achieving integration at the multi-omics level and discuss how integrative omics and systems biology can contribute to our understanding and management of IS.
Collapse
Affiliation(s)
- Wentao Li
- School of Pharmaceutical Sciences, Zhejiang Chinese Medical University, Hangzhou 310053, China.
| | - Chongyu Shao
- School of Life Sciences, Zhejiang Chinese Medical University, Hangzhou 310053, China.
| | - Huifen Zhou
- School of Life Sciences, Zhejiang Chinese Medical University, Hangzhou 310053, China.
| | - Haixia Du
- School of Life Sciences, Zhejiang Chinese Medical University, Hangzhou 310053, China.
| | - Haiyang Chen
- School of Pharmaceutical Sciences, Zhejiang Chinese Medical University, Hangzhou 310053, China.
| | - Haitong Wan
- School of Life Sciences, Zhejiang Chinese Medical University, Hangzhou 310053, China.
| | - Yu He
- School of Pharmaceutical Sciences, Zhejiang Chinese Medical University, Hangzhou 310053, China.
| |
Collapse
|
6
|
Li W, Zhang H, Li M, Han M, Yin Y. MGEGFP: a multi-view graph embedding method for gene function prediction based on adaptive estimation with GCN. Brief Bioinform 2022; 23:6659744. [PMID: 35947989 DOI: 10.1093/bib/bbac333] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2022] [Revised: 07/02/2022] [Accepted: 07/21/2022] [Indexed: 11/14/2022] Open
Abstract
In recent years, a number of computational approaches have been proposed to effectively integrate multiple heterogeneous biological networks, and have shown impressive performance for inferring gene function. However, the previous methods do not fully represent the critical neighborhood relationship between genes during the feature learning process. Furthermore, it is difficult to accurately estimate the contributions of different views for multi-view integration. In this paper, we propose MGEGFP, a multi-view graph embedding method based on adaptive estimation with Graph Convolutional Network (GCN), to learn high-quality gene representations among multiple interaction networks for function prediction. First, we design a dual-channel GCN encoder to disentangle the view-specific information and the consensus pattern across diverse networks. By the aid of disentangled representations, we develop a multi-gate module to adaptively estimate the contributions of different views during each reconstruction process and make full use of the multiplexity advantages, where a diversity preservation constraint is designed to prevent the over-fitting problem. To validate the effectiveness of our model, we conduct experiments on networks from the STRING database for both yeast and human datasets, and compare the performance with seven state-of-the-art methods in five evaluation metrics. Moreover, the ablation study manifests the important contribution of the designed dual-channel encoder, multi-gate module and the diversity preservation constraint in MGEGFP. The experimental results confirm the superiority of our proposed method and suggest that MGEGFP can be a useful tool for gene function prediction.
Collapse
Affiliation(s)
- Wei Li
- College of Artificial Intelligence, Nankai University, Tongyan Road, 300350, Tianjin, China
| | - Han Zhang
- College of Artificial Intelligence, Nankai University, Tongyan Road, 300350, Tianjin, China
| | - Minghe Li
- College of Artificial Intelligence, Nankai University, Tongyan Road, 300350, Tianjin, China
| | - Mingjing Han
- College of Artificial Intelligence, Nankai University, Tongyan Road, 300350, Tianjin, China
| | - Yanbin Yin
- Department of Food Science and Technology, University of Nebraska - Lincoln, 1400 R Street, 68588, Nebraska, USA
| |
Collapse
|
7
|
James K, Alsobhe A, Cockell SJ, Wipat A, Pocock M. Integration of probabilistic functional networks without an external Gold Standard. BMC Bioinformatics 2022; 23:302. [PMID: 35879662 PMCID: PMC9316706 DOI: 10.1186/s12859-022-04834-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2021] [Accepted: 07/11/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Probabilistic functional integrated networks (PFINs) are designed to aid our understanding of cellular biology and can be used to generate testable hypotheses about protein function. PFINs are generally created by scoring the quality of interaction datasets against a Gold Standard dataset, usually chosen from a separate high-quality data source, prior to their integration. Use of an external Gold Standard has several drawbacks, including data redundancy, data loss and the need for identifier mapping, which can complicate the network build and impact on PFIN performance. Additionally, there typically are no Gold Standard data for non-model organisms. RESULTS We describe the development of an integration technique, ssNet, that scores and integrates both high-throughput and low-throughout data from a single source database in a consistent manner without the need for an external Gold Standard dataset. Using data from Saccharomyces cerevisiae we show that ssNet is easier and faster, overcoming the challenges of data redundancy, Gold Standard bias and ID mapping. In addition ssNet results in less loss of data and produces a more complete network. CONCLUSIONS The ssNet method allows PFINs to be built successfully from a single database, while producing comparable network performance to networks scored using an external Gold Standard source and with reduced data loss.
Collapse
Affiliation(s)
- Katherine James
- Department of Applied Sciences, Northumbria University, Sandyford Rd, Newcastle upon Tyne, NE1 8ST, UK. .,Interdisciplinary Computing and Complex BioSystems Group, Newcastle University, Science Square, Newcastle upon Tyne, NE4 5TG, UK.
| | - Aoesha Alsobhe
- Interdisciplinary Computing and Complex BioSystems Group, Newcastle University, Science Square, Newcastle upon Tyne, NE4 5TG, UK.,Saudi Electronic University, Abi Bakr As Siddiq Branch Rd, Riyadh, 1332, Saudi Arabia
| | - Simon J Cockell
- School of Biomedical, Nutritional and Sports Science, Faculty of Medical Sciences, Newcastle University, Framlington Place, Newcastle upon Tyne, NE2 4HH, UK
| | - Anil Wipat
- Interdisciplinary Computing and Complex BioSystems Group, Newcastle University, Science Square, Newcastle upon Tyne, NE4 5TG, UK
| | - Matthew Pocock
- Interdisciplinary Computing and Complex BioSystems Group, Newcastle University, Science Square, Newcastle upon Tyne, NE4 5TG, UK
| |
Collapse
|
8
|
Hesami M, Alizadeh M, Jones AMP, Torkamaneh D. Machine learning: its challenges and opportunities in plant system biology. Appl Microbiol Biotechnol 2022; 106:3507-3530. [PMID: 35575915 DOI: 10.1007/s00253-022-11963-6] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Revised: 03/14/2022] [Accepted: 05/07/2022] [Indexed: 12/25/2022]
Abstract
Sequencing technologies are evolving at a rapid pace, enabling the generation of massive amounts of data in multiple dimensions (e.g., genomics, epigenomics, transcriptomic, metabolomics, proteomics, and single-cell omics) in plants. To provide comprehensive insights into the complexity of plant biological systems, it is important to integrate different omics datasets. Although recent advances in computational analytical pipelines have enabled efficient and high-quality exploration and exploitation of single omics data, the integration of multidimensional, heterogenous, and large datasets (i.e., multi-omics) remains a challenge. In this regard, machine learning (ML) offers promising approaches to integrate large datasets and to recognize fine-grained patterns and relationships. Nevertheless, they require rigorous optimizations to process multi-omics-derived datasets. In this review, we discuss the main concepts of machine learning as well as the key challenges and solutions related to the big data derived from plant system biology. We also provide in-depth insight into the principles of data integration using ML, as well as challenges and opportunities in different contexts including multi-omics, single-cell omics, protein function, and protein-protein interaction. KEY POINTS: • The key challenges and solutions related to the big data derived from plant system biology have been highlighted. • Different methods of data integration have been discussed. • Challenges and opportunities of the application of machine learning in plant system biology have been highlighted and discussed.
Collapse
Affiliation(s)
- Mohsen Hesami
- Department of Plant Agriculture, University of Guelph, Guelph, ON, N1G 2W1, Canada
| | - Milad Alizadeh
- Department of Botany, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada
| | | | - Davoud Torkamaneh
- Département de Phytologie, Université Laval, Québec City, QC, G1V 0A6, Canada. .,Institut de Biologie Intégrative Et Des Systèmes (IBIS), Université Laval, Québec City, QC, G1V 0A6, Canada.
| |
Collapse
|
9
|
Liu L, Mamitsuka H, Zhu S. HPODNets: deep graph convolutional networks for predicting human protein-phenotype associations. Bioinformatics 2022; 38:799-808. [PMID: 34672333 DOI: 10.1093/bioinformatics/btab729] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2021] [Revised: 09/18/2021] [Accepted: 10/18/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Deciphering the relationship between human genes/proteins and abnormal phenotypes is of great importance in the prevention, diagnosis and treatment against diseases. The Human Phenotype Ontology (HPO) is a standardized vocabulary that describes the phenotype abnormalities encountered in human disorders. However, the current HPO annotations are still incomplete. Thus, it is necessary to computationally predict human protein-phenotype associations. In terms of current, cutting-edge computational methods for annotating proteins (such as functional annotation), three important features are (i) multiple network input, (ii) semi-supervised learning and (iii) deep graph convolutional network (GCN), whereas there are no methods with all these features for predicting HPO annotations of human protein. RESULTS We develop HPODNets with all above three features for predicting human protein-phenotype associations. HPODNets adopts a deep GCN with eight layers which allows to capture high-order topological information from multiple interaction networks. Empirical results with both cross-validation and temporal validation demonstrate that HPODNets outperforms seven competing state-of-the-art methods for protein function prediction. HPODNets with the architecture of deep GCNs is confirmed to be effective for predicting HPO annotations of human protein and, more generally, node label ranking problem with multiple biomolecular networks input in bioinformatics. AVAILABILITY AND IMPLEMENTATION https://github.com/liulizhi1996/HPODNets. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Lizhi Liu
- School of Computer Science, Fudan University, Shanghai 200433, China
| | - Hiroshi Mamitsuka
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto Prefecture 611-0011, Japan.,Department of Computer Science, Aalto University, Espoo 02150, Finland
| | - Shanfeng Zhu
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai 200433, China.,Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence (Fudan University), Ministry of Education, Shanghai 200433, China.,MOE Frontiers Center for Brain Science, Fudan University, Shanghai 200433, China.,Zhangjiang Fudan International Innovation Center, Shanghai 200433, China.,Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai 200433, China.,Institute of Artificial Intelligence Biomedicine, Nanjing University, Nanjing 210032, China
| |
Collapse
|
10
|
Network Approaches for Precision Oncology. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2022; 1361:199-213. [DOI: 10.1007/978-3-030-91836-1_11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
11
|
Wang W, Han R, Zhang M, Wang Y, Wang T, Wang Y, Shang X, Peng J. A network-based method for brain disease gene prediction by integrating brain connectome and molecular network. Brief Bioinform 2021; 23:6415315. [PMID: 34727570 DOI: 10.1093/bib/bbab459] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2021] [Revised: 09/18/2021] [Accepted: 10/07/2021] [Indexed: 12/27/2022] Open
Abstract
Brain disease gene identification is critical for revealing the biological mechanism and developing drugs for brain diseases. To enhance the identification of brain disease genes, similarity-based computational methods, especially network-based methods, have been adopted for narrowing down the searching space. However, these network-based methods only use molecular networks, ignoring brain connectome data, which have been widely used in many brain-related studies. In our study, we propose a novel framework, named brainMI, for integrating brain connectome data and molecular-based gene association networks to predict brain disease genes. For the consistent representation of molecular-based network data and brain connectome data, brainMI first constructs a novel gene network, called brain functional connectivity (BFC)-based gene network, based on resting-state functional magnetic resonance imaging data and brain region-specific gene expression data. Then, a multiple network integration method is proposed to learn low-dimensional features of genes by integrating the BFC-based gene network and existing protein-protein interaction networks. Finally, these features are utilized to predict brain disease genes based on a support vector machine-based model. We evaluate brainMI on four brain diseases, including Alzheimer's disease, Parkinson's disease, major depressive disorder and autism. brainMI achieves of 0.761, 0.729, 0.728 and 0.744 using the BFC-based gene network alone and enhances the molecular network-based performance by 6.3% on average. In addition, the results show that brainMI achieves higher performance in predicting brain disease genes compared to the existing three state-of-the-art methods.
Collapse
Affiliation(s)
- Wei Wang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China.,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, 710072, China
| | - Ruijiang Han
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China.,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, 710072, China
| | - Menghan Zhang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China.,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, 710072, China
| | - Yuxian Wang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China.,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, 710072, China
| | - Tao Wang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China.,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, 710072, China
| | - Yongtian Wang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China.,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, 710072, China
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China.,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, 710072, China
| | - Jiajie Peng
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China.,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, 710072, China
| |
Collapse
|
12
|
Liu L, Zhu S. Computational Methods for Prediction of Human Protein-Phenotype Associations: A Review. PHENOMICS (CHAM, SWITZERLAND) 2021; 1:171-185. [PMID: 36939789 PMCID: PMC9590544 DOI: 10.1007/s43657-021-00019-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/23/2021] [Revised: 06/05/2021] [Accepted: 06/16/2021] [Indexed: 12/01/2022]
Abstract
Deciphering the relationship between human proteins (genes) and phenotypes is one of the fundamental tasks in phenomics research. The Human Phenotype Ontology (HPO) builds upon a standardized logical vocabulary to describe the abnormal phenotypes encountered in human diseases and paves the way towards the computational analysis of their genetic causes. To date, many computational methods have been proposed to predict the HPO annotations of proteins. In this paper, we conduct a comprehensive review of the existing approaches to predicting HPO annotations of novel proteins, identifying missing HPO annotations, and prioritizing candidate proteins with respect to a certain HPO term. For each topic, we first give the formalized description of the problem, and then systematically revisit the published literatures highlighting their advantages and disadvantages, followed by the discussion on the challenges and promising future directions. In addition, we point out several potential topics to be worthy of exploration including the selection of negative HPO annotations and detecting HPO misannotations. We believe that this review will provide insight to the researchers in the field of computational phenotype analyses in terms of comprehending and developing novel prediction algorithms.
Collapse
Affiliation(s)
- Lizhi Liu
- School of Computer Science, Fudan University, Shanghai, 200433 China
| | - Shanfeng Zhu
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, 200433 China
- Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence (Fudan University), Ministry of Education, Shanghai, 200433 China
- MOE Frontiers Center for Brain Science, Fudan University, Shanghai, 200433 China
- Zhangjiang Fudan International Innovation Center, Shanghai, 200433 China
- Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai, 200433 China
| |
Collapse
|
13
|
Reel PS, Reel S, Pearson E, Trucco E, Jefferson E. Using machine learning approaches for multi-omics data analysis: A review. Biotechnol Adv 2021; 49:107739. [PMID: 33794304 DOI: 10.1016/j.biotechadv.2021.107739] [Citation(s) in RCA: 287] [Impact Index Per Article: 95.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Revised: 03/01/2021] [Accepted: 03/25/2021] [Indexed: 02/06/2023]
Abstract
With the development of modern high-throughput omic measurement platforms, it has become essential for biomedical studies to undertake an integrative (combined) approach to fully utilise these data to gain insights into biological systems. Data from various omics sources such as genetics, proteomics, and metabolomics can be integrated to unravel the intricate working of systems biology using machine learning-based predictive algorithms. Machine learning methods offer novel techniques to integrate and analyse the various omics data enabling the discovery of new biomarkers. These biomarkers have the potential to help in accurate disease prediction, patient stratification and delivery of precision medicine. This review paper explores different integrative machine learning methods which have been used to provide an in-depth understanding of biological systems during normal physiological functioning and in the presence of a disease. It provides insight and recommendations for interdisciplinary professionals who envisage employing machine learning skills in multi-omics studies.
Collapse
Affiliation(s)
- Parminder S Reel
- Division of Population Health and Genomics, School of Medicine, University of Dundee, Dundee, United Kingdom
| | - Smarti Reel
- Division of Population Health and Genomics, School of Medicine, University of Dundee, Dundee, United Kingdom
| | - Ewan Pearson
- Division of Population Health and Genomics, School of Medicine, University of Dundee, Dundee, United Kingdom
| | - Emanuele Trucco
- VAMPIRE project, Computing, School of Science and Engineering, University of Dundee, Dundee, United Kingdom
| | - Emily Jefferson
- Division of Population Health and Genomics, School of Medicine, University of Dundee, Dundee, United Kingdom.
| |
Collapse
|
14
|
Zhao Y, Wang J, Guo M, Zhang X, Yu G. Cross-Species Protein Function Prediction with Asynchronous-Random Walk. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:1439-1450. [PMID: 31562099 DOI: 10.1109/tcbb.2019.2943342] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Protein function prediction is a fundamental task in the post-genomic era. Available functional annotations of proteins are incomplete and the annotations of two homologous species are complementary to each other. However, how to effectively leverage mutually complementary annotations of different species to further boost the prediction performance is still not well studied. In this paper, we propose a cross-species protein function prediction approach by performing Asynchronous Random Walk on a heterogeneous network (AsyRW). AsyRW first constructs a heterogeneous network to integrate multiple functional association networks derived from different biological data, established homology-relationships between proteins from different species, known annotations of proteins and Gene Ontology (GO). To account for the intrinsic structures of intra- and inter-species of proteins and that of GO, AsyRW quantifies the individual walk lengths of each network node using the gravity-like theory, and then performs asynchronous-random walk with the individual length to predict associations between proteins and GO terms. Experiments on annotations archived in different years show that individual walk length and asynchronous-random walk can effectively leverage the complementary annotations of different species, AsyRW has a significantly improved performance to other related and competitive methods. The codes of AsyRW are available at: http://mlda.swu.edu.cn/codes.php?name=AsyRW.
Collapse
|
15
|
Zhao Y, Wang J, Chen J, Zhang X, Guo M, Yu G. A Literature Review of Gene Function Prediction by Modeling Gene Ontology. Front Genet 2020; 11:400. [PMID: 32391061 PMCID: PMC7193026 DOI: 10.3389/fgene.2020.00400] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2020] [Accepted: 03/30/2020] [Indexed: 12/14/2022] Open
Abstract
Annotating the functional properties of gene products, i.e., RNAs and proteins, is a fundamental task in biology. The Gene Ontology database (GO) was developed to systematically describe the functional properties of gene products across species, and to facilitate the computational prediction of gene function. As GO is routinely updated, it serves as the gold standard and main knowledge source in functional genomics. Many gene function prediction methods making use of GO have been proposed. But no literature review has summarized these methods and the possibilities for future efforts from the perspective of GO. To bridge this gap, we review the existing methods with an emphasis on recent solutions. First, we introduce the conventions of GO and the widely adopted evaluation metrics for gene function prediction. Next, we summarize current methods of gene function prediction that apply GO in different ways, such as using hierarchical or flat inter-relationships between GO terms, compressing massive GO terms and quantifying semantic similarities. Although many efforts have improved performance by harnessing GO, we conclude that there remain many largely overlooked but important topics for future research.
Collapse
Affiliation(s)
- Yingwen Zhao
- College of Computer and Information Science, Southwest University, Chongqing, China
| | - Jun Wang
- College of Computer and Information Science, Southwest University, Chongqing, China
| | - Jian Chen
- State Key Laboratory of Agrobiotechnology and National Maize Improvement Center, China Agricultural University, Beijing, China
| | - Xiangliang Zhang
- CBRC, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | - Maozu Guo
- School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing, China
| | - Guoxian Yu
- College of Computer and Information Science, Southwest University, Chongqing, China
- CBRC, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| |
Collapse
|
16
|
Peng J, Xue H, Wei Z, Tuncali I, Hao J, Shang X. Integrating multi-network topology for gene function prediction using deep neural networks. Brief Bioinform 2020; 22:2096-2105. [PMID: 32249297 DOI: 10.1093/bib/bbaa036] [Citation(s) in RCA: 50] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2019] [Revised: 02/09/2020] [Accepted: 02/25/2020] [Indexed: 01/18/2023] Open
Abstract
MOTIVATION The emergence of abundant biological networks, which benefit from the development of advanced high-throughput techniques, contributes to describing and modeling complex internal interactions among biological entities such as genes and proteins. Multiple networks provide rich information for inferring the function of genes or proteins. To extract functional patterns of genes based on multiple heterogeneous networks, network embedding-based methods, aiming to capture non-linear and low-dimensional feature representation based on network biology, have recently achieved remarkable performance in gene function prediction. However, existing methods do not consider the shared information among different networks during the feature learning process. RESULTS Taking the correlation among the networks into account, we design a novel semi-supervised autoencoder method to integrate multiple networks and generate a low-dimensional feature representation. Then we utilize a convolutional neural network based on the integrated feature embedding to annotate unlabeled gene functions. We test our method on both yeast and human datasets and compare with three state-of-the-art methods. The results demonstrate the superior performance of our method. We not only provide a comprehensive analysis of the performance of the newly proposed algorithm but also provide a tool for extracting features of genes based on multiple networks, which can be used in the downstream machine learning task. AVAILABILITY DeepMNE-CNN is freely available at https://github.com/xuehansheng/DeepMNE-CNN. CONTACT jiajiepeng@nwpu.edu.cn; shang@nwpu.edu.cn; jianye.hao@tju.edu.cn.
Collapse
Affiliation(s)
- Jiajie Peng
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China
| | - Hansheng Xue
- Key Laboratory of Big Data Storage and Management, Ministry of Industry and Information Technology, Northwestern Polytechnical University, Xi'an, 710072, China
| | - Zhongyu Wei
- Research School of Computer Science, Australian National University, Canberra, 2601, Australia
| | - Idil Tuncali
- School of Data Science, Fudan University, Shanghai, 200433, China
| | | | | |
Collapse
|
17
|
Frasca M, Bianchi NC. Multitask Protein Function Prediction through Task Dissimilarity. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:1550-1560. [PMID: 28328509 DOI: 10.1109/tcbb.2017.2684127] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Automated protein function prediction is a challenging problem with distinctive features, such as the hierarchical organization of protein functions and the scarcity of annotated proteins for most biological functions. We propose a multitask learning algorithm addressing both issues. Unlike standard multitask algorithms, which use task (protein functions) similarity information as a bias to speed up learning, we show that dissimilarity information enforces separation of rare class labels from frequent class labels, and for this reason is better suited for solving unbalanced protein function prediction problems. We support our claim by showing that a multitask extension of the label propagation algorithm empirically works best when the task relatedness information is represented using a dissimilarity matrix as opposed to a similarity matrix. Moreover, the experimental comparison carried out on three model organism shows that our method has a more stable performance in both "protein-centric" and "function-centric" evaluation settings.
Collapse
|
18
|
Perlasca P, Frasca M, Ba CT, Notaro M, Petrini A, Casiraghi E, Grossi G, Gliozzo J, Valentini G, Mesiti M. UNIPred-Web: a web tool for the integration and visualization of biomolecular networks for protein function prediction. BMC Bioinformatics 2019; 20:422. [PMID: 31412768 PMCID: PMC6694573 DOI: 10.1186/s12859-019-2959-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2019] [Accepted: 06/18/2019] [Indexed: 01/06/2023] Open
Abstract
Background One of the main issues in the automated protein function prediction (AFP) problem is the integration of multiple networked data sources. The UNIPred algorithm was thereby proposed to efficiently integrate —in a function-specific fashion— the protein networks by taking into account the imbalance that characterizes protein annotations, and to subsequently predict novel hypotheses about unannotated proteins. UNIPred is publicly available as R code, which might result of limited usage for non-expert users. Moreover, its application requires efforts in the acquisition and preparation of the networks to be integrated. Finally, the UNIPred source code does not handle the visualization of the resulting consensus network, whereas suitable views of the network topology are necessary to explore and interpret existing protein relationships. Results We address the aforementioned issues by proposing UNIPred-Web, a user-friendly Web tool for the application of the UNIPred algorithm to a variety of biomolecular networks, already supplied by the system, and for the visualization and exploration of protein networks. We support different organisms and different types of networks —e.g., co-expression, shared domains and physical interaction networks. Users are supported in the different phases of the process, ranging from the selection of the networks and the protein function to be predicted, to the navigation of the integrated network. The system also supports the upload of user-defined protein networks. The vertex-centric and the highly interactive approach of UNIPred-Web allow a narrow exploration of specific proteins, and an interactive analysis of large sub-networks with only a few mouse clicks. Conclusions UNIPred-Web offers a practical and intuitive (visual) guidance to biologists interested in gaining insights into protein biomolecular functions. UNIPred-Web provides facilities for the integration of networks, and supplies a framework for the imbalance-aware protein network integration of nine organisms, the prediction of thousands of GO protein functions, and a easy-to-use graphical interface for the visual analysis, navigation and interpretation of the integrated networks and of the functional predictions.
Collapse
Affiliation(s)
- Paolo Perlasca
- Department of Computer Science, Università degli Studi di Milano, Via Celoria 18, Milano, 20133, Italy
| | - Marco Frasca
- Department of Computer Science, Università degli Studi di Milano, Via Celoria 18, Milano, 20133, Italy
| | - Cheick Tidiane Ba
- Department of Computer Science, Università degli Studi di Milano, Via Celoria 18, Milano, 20133, Italy
| | - Marco Notaro
- Department of Computer Science, Università degli Studi di Milano, Via Celoria 18, Milano, 20133, Italy
| | - Alessandro Petrini
- Department of Computer Science, Università degli Studi di Milano, Via Celoria 18, Milano, 20133, Italy
| | - Elena Casiraghi
- Department of Computer Science, Università degli Studi di Milano, Via Celoria 18, Milano, 20133, Italy
| | - Giuliano Grossi
- Department of Computer Science, Università degli Studi di Milano, Via Celoria 18, Milano, 20133, Italy
| | - Jessica Gliozzo
- Department of Computer Science, Università degli Studi di Milano, Via Celoria 18, Milano, 20133, Italy.,Fondazione IRCCS Ca' Granda - Ospedale Maggiore Policlinico, Università degli Studi di Milano, Via della Commenda 10, Milano, 20122, Italy
| | - Giorgio Valentini
- Department of Computer Science, Università degli Studi di Milano, Via Celoria 18, Milano, 20133, Italy
| | - Marco Mesiti
- Department of Computer Science, Università degli Studi di Milano, Via Celoria 18, Milano, 20133, Italy.
| |
Collapse
|
19
|
Franz M, Rodriguez H, Lopes C, Zuberi K, Montojo J, Bader GD, Morris Q. GeneMANIA update 2018. Nucleic Acids Res 2019; 46:W60-W64. [PMID: 29912392 PMCID: PMC6030815 DOI: 10.1093/nar/gky311] [Citation(s) in RCA: 710] [Impact Index Per Article: 142.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2018] [Accepted: 06/13/2018] [Indexed: 01/11/2023] Open
Abstract
GeneMANIA (http://genemania.org) is a flexible user-friendly web site for generating hypotheses about gene function, analyzing gene lists and prioritizing genes for functional assays. Given a query gene list, GeneMANIA finds functionally similar genes using a wealth of genomics and proteomics data. In this mode, it weights each functional genomic dataset according to its predictive value for the query. Another use of GeneMANIA is gene function prediction. Given a single query gene, GeneMANIA finds genes likely to share function with it based on their interactions with it. Enriched Gene Ontology categories among this set can point to the function of the gene. Nine organisms are currently supported (Arabidopsis thaliana, Caenorhabditis elegans, Danio rerio, Drosophila melanogaster, Escherichia coli, Homo sapiens, Mus musculus, Rattus norvegicus and Saccharomyces cerevisiae). Hundreds of data sets and hundreds of millions of interactions have been collected from GEO, BioGRID, IRefIndex and I2D, as well as organism-specific functional genomics data sets. Users can customize their search by selecting specific data sets to query and by uploading their own data sets to analyze. We have recently updated the user interface to GeneMANIA to make it more intuitive and make more efficient use of visual space. GeneMANIA can now be used effectively on a variety of devices.
Collapse
Affiliation(s)
- Max Franz
- The Donnelly Centre, University of Toronto, Ontario, Canada
| | | | | | - Khalid Zuberi
- The Donnelly Centre, University of Toronto, Ontario, Canada
| | - Jason Montojo
- The Donnelly Centre, University of Toronto, Ontario, Canada
| | - Gary D Bader
- The Donnelly Centre, University of Toronto, Ontario, Canada.,Department of Computer Science, University of Toronto, Ontario, Canada.,Department of Molecular Genetics, University of Toronto, Ontario, Canada
| | - Quaid Morris
- The Donnelly Centre, University of Toronto, Ontario, Canada.,Department of Computer Science, University of Toronto, Ontario, Canada.,Department of Molecular Genetics, University of Toronto, Ontario, Canada.,Department of Electrical and Computer Engineering, University of Toronto, Ontario, Canada
| |
Collapse
|
20
|
Gligorijevic V, Panagakis Y, Zafeiriou S. Non-Negative Matrix Factorizations for Multiplex Network Analysis. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2019; 41:928-940. [PMID: 29993651 DOI: 10.1109/tpami.2018.2821146] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Networks have been a general tool for representing, analyzing, and modeling relational data arising in several domains. One of the most important aspect of network analysis is community detection or network clustering. Until recently, the major focus have been on discovering community structure in single (i.e., monoplex) networks. However, with the advent of relational data with multiple modalities, multiplex networks, i.e., networks composed of multiple layers representing different aspects of relations, have emerged. Consequently, community detection in multiplex network, i.e., detecting clusters of nodes shared by all layers, has become a new challenge. In this paper, we propose Network Fusion for Composite Community Extraction (NF-CCE), a new class of algorithms, based on four different non-negative matrix factorization models, capable of extracting composite communities in multiplex networks. Each algorithm works in two steps: first, it finds a non-negative, low-dimensional feature representation of each network layer; then, it fuses the feature representation of layers into a common non-negative, low-dimensional feature representation via collective factorization. The composite clusters are extracted from the common feature representation. We demonstrate the superior performance of our algorithms over the state-of-the-art methods on various types of multiplex networks, including biological, social, economic, citation, phone communication, and brain multiplex networks.
Collapse
|
21
|
Ruan Y, Li Y, Liu Y, Zhou J, Wang X, Zhang W. Investigation of optimal pathways for preeclampsia using network-based guilt by association algorithm. Exp Ther Med 2019; 17:4139-4143. [PMID: 30988790 PMCID: PMC6447911 DOI: 10.3892/etm.2019.7410] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2018] [Accepted: 02/22/2019] [Indexed: 12/13/2022] Open
Abstract
This study investigated optimal pathways for preeclampsia (PE) utilizing the network-based guilt by association (GBA) algorithm. The inference method consisted of four steps: preparing differentially expressed genes (DEGs) between PE patients and normal controls from gene expression data; constructing co-expression network (CEN) for DEGs utilizing Spearman's correlation coefficient (SCC) method; and predicting optimal pathways by network-based GBA algorithm of which the area under the receiver operating characteristics curve (AUROC) was gained for each pathway. There were 351 DEGs and 61,425 edges in the CEN for PE. Subsequently, 53 pathways were obtained with a good classification performance (AUROC >0.5). AUROC for 9 was >0.9 and defined as optimal pathways, especially microRNAs in cancer (AUROC=0.9966), gap junction (AUROC=0.9922), and pathogenic Escherichia coli infection (AUROC=0.9888). Nine optimal pathways were identified through comprehensive analysis of data from PE patients, which might shed new light on uncovering molecular and pathological mechanism of PE.
Collapse
Affiliation(s)
- Yan Ruan
- Department of Obstetrics, Beijing Obstetrics and Gynecology Hospital, Capital Medical University, Beijing 100026, P.R. China
| | - Yuan Li
- Department of Obstetrics, Beijing Obstetrics and Gynecology Hospital, Capital Medical University, Beijing 100026, P.R. China
| | - Yingping Liu
- Department of Obstetrics, Beijing Obstetrics and Gynecology Hospital, Capital Medical University, Beijing 100026, P.R. China
| | - Jianxin Zhou
- Department of Obstetrics, Beijing Obstetrics and Gynecology Hospital, Capital Medical University, Beijing 100026, P.R. China
| | - Xin Wang
- Department of Obstetrics, Beijing Obstetrics and Gynecology Hospital, Capital Medical University, Beijing 100026, P.R. China
| | - Weiyuan Zhang
- Department of Obstetrics, Beijing Obstetrics and Gynecology Hospital, Capital Medical University, Beijing 100026, P.R. China
| |
Collapse
|
22
|
Morales-Martinez M, Valencia-Hipolito A, Vega GG, Neri N, Nambo MJ, Alvarado I, Cuadra I, Duran-Padilla MA, Martinez-Maza O, Huerta-Yepez S, Vega MI. Regulation of Krüppel-Like Factor 4 (KLF4) expression through the transcription factor Yin-Yang 1 (YY1) in non-Hodgkin B-cell lymphoma. Oncotarget 2019; 10:2173-2188. [PMID: 31040909 PMCID: PMC6481341 DOI: 10.18632/oncotarget.26745] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2018] [Accepted: 02/15/2019] [Indexed: 12/21/2022] Open
Abstract
Krüppel-Like Factor 4 (KLF4) is a member of the KLF transcription factor family, and evidence suggests that KLF4 is either an oncogene or a tumor suppressor. The regulatory mechanism underlying KLF4 expression in cancer, and specifically in lymphoma, is still not understood. Bioinformatics analysis revealed two YY1 putative binding sites in the KLF4 promoter region (-950 bp and -105 bp). Here, the potential regulation of KLF4 by YY1 in NHL was analyzed. Mutation of the putative YY1 binding sites in a previously reported system containing the KLF4 promoter region and CHIP analysis confirmed that these binding sites are important for KLF4 regulation. B-NHL cell lines showed that both KLF4 and YY1 are co-expressed, and transfection with siRNA-YY1 resulted in significant inhibition of KLF4. The clinical implications of YY1 in the transcriptional regulation of KLF4 were investigated by IHC in a TMA with 43 samples of subtypes DLBCL and FL, and all tumor tissues expressing YY1 demonstrated a correlation with KLF4 expression, which was consistent with bioinformatics analyses in several databases. Our findings demonstrated that KLF4 can be transcriptionally regulated by YY1 in B-NHL, and a correlation between YY1 expression and KLF4 was found in clinical samples. Hence, both YY1 and KLF4 may be possible therapeutic biomarkers of NHL.
Collapse
Affiliation(s)
- Mario Morales-Martinez
- Molecular Signal Pathway in Cancer Laboratory, UIMEO, Oncology Hospital, Siglo XXI National Medical Center, IMSS, México City, México.,Unidad de Posgrado, Facultad de Medicina Universidad Nacional Autónoma de México, México City, México
| | - Alberto Valencia-Hipolito
- Molecular Signal Pathway in Cancer Laboratory, UIMEO, Oncology Hospital, Siglo XXI National Medical Center, IMSS, México City, México
| | - Gabriel G Vega
- Molecular Signal Pathway in Cancer Laboratory, UIMEO, Oncology Hospital, Siglo XXI National Medical Center, IMSS, México City, México.,Unidad de Posgrado, Facultad de Medicina Universidad Nacional Autónoma de México, México City, México
| | - Natividad Neri
- Department of Hematology, Oncology Hospital, National Medical Center, IMSS, México City, México
| | - Maria J Nambo
- Department of Hematology, Oncology Hospital, National Medical Center, IMSS, México City, México
| | - Isabel Alvarado
- Servicio de Anatomía Patológica, Hospital de Oncología, Centro Médico Nacional Siglo XXI, IMSS, México City, México
| | - Ivonne Cuadra
- Servicio de Anatomía Patológica, Hospital de Oncología, Centro Médico Nacional Siglo XXI, IMSS, México City, México
| | - Marco A Duran-Padilla
- Servicio de Patología, Hospital General de México "Eduardo Liceaga", Facultad de Medicina de la UNAM, México City, México
| | - Otoniel Martinez-Maza
- Department of Obstetrics and Gynecology, Jonsson Comprehensive Cancer Center, UCLA AIDS Institute, David Geffen School of Medicine, University of California, Los Angeles, California, USA.,Department of Microbiology, Immunology, and Molecular Genetics, Jonsson Comprehensive Cancer Center, UCLA AIDS Institute, David Geffen School of Medicine, University of California, Los Angeles, California, USA
| | - Sara Huerta-Yepez
- Unidad de Investigación en Enfermedades Oncológicas, Hospital Infantil de México "Federico Gómez" S.S.A, México City, México
| | - Mario I Vega
- Molecular Signal Pathway in Cancer Laboratory, UIMEO, Oncology Hospital, Siglo XXI National Medical Center, IMSS, México City, México.,Department of Medicine, Hematology-Oncology Division, Greater Los Angeles VA Healthcare Center, UCLA Medical Center, Jonsson Comprehensive Cancer Center, Los Angeles, California, USA
| |
Collapse
|
23
|
Integrating Multiple Interaction Networks for Gene Function Inference. Molecules 2018; 24:molecules24010030. [PMID: 30577643 PMCID: PMC6337127 DOI: 10.3390/molecules24010030] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2018] [Revised: 12/19/2018] [Accepted: 12/20/2018] [Indexed: 01/17/2023] Open
Abstract
In the past few decades, the number and variety of genomic and proteomic data available have increased dramatically. Molecular or functional interaction networks are usually constructed according to high-throughput data and the topological structure of these interaction networks provide a wealth of information for inferring the function of genes or proteins. It is a widely used way to mine functional information of genes or proteins by analyzing the association networks. However, it remains still an urgent but unresolved challenge how to combine multiple heterogeneous networks to achieve more accurate predictions. In this paper, we present a method named ReprsentConcat to improve function inference by integrating multiple interaction networks. The low-dimensional representation of each node in each network is extracted, then these representations from multiple networks are concatenated and fed to gcForest, which augment feature vectors by cascading and automatically determines the number of cascade levels. We experimentally compare ReprsentConcat with a state-of-the-art method, showing that it achieves competitive results on the datasets of yeast and human. Moreover, it is robust to the hyperparameters including the number of dimensions.
Collapse
|
24
|
Boldi P, Frasca M, Malchiodi D. Evaluating the impact of topological protein features on the negative examples selection. BMC Bioinformatics 2018; 19:417. [PMID: 30453879 PMCID: PMC6245585 DOI: 10.1186/s12859-018-2385-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Supervised machine learning methods when applied to the problem of automated protein-function prediction (AFP) require the availability of both positive examples (i.e., proteins which are known to possess a given protein function) and negative examples (corresponding to proteins not associated with that function). Unfortunately, publicly available proteome and genome data sources such as the Gene Ontology rarely store the functions not possessed by a protein. Thus the negative selection, consisting in identifying informative negative examples, is currently a central and challenging problem in AFP. Several heuristics have been proposed through the years to solve this problem; nevertheless, despite their effectiveness, to the best of our knowledge no previous existing work studied which protein features are more relevant to this task, that is, which protein features help more in discriminating reliable and unreliable negatives. RESULTS The present work analyses the impact of several features on the selection of negative proteins for the Gene Ontology (GO) terms. The analysis is network-based: it exploits the fact that proteins can be naturally structured in a network, considering the pairwise relationships coming from several sources of data, such as protein-protein and genetic interactions. Overall, the proposed protein features, including local and global graph centrality measures and protein multifunctionality, can be term-aware (i.e., depending on the GO term) and term-unaware (i.e., invariant across the GO terms). We validated the informativeness of each feature utilizing a temporal holdout in three different experiments on yeast, mouse and human proteomes: (i) feature selection to detect which protein features are more helpful for the negative selection; (ii) protein function prediction to verify whether the features considered are also useful to predict GO terms; (iii) negative selection by applying two different negative selection algorithms on proteins represented through the proposed features. CONCLUSIONS Term-aware features (with some exceptions) resulted more informative for problem (i), together with node betweenness, which is the most relevant among term-unaware features. The node positive neighborhood instead is the most predictive feature for the AFP problem, while experiment (iii) showed that the proposed features allow negative selection algorithms to select effectively negative instances in the temporal holdout setting, with better results when nonlinear combinations of features are also exploited.
Collapse
Affiliation(s)
- Paolo Boldi
- Department of Computer Science, Università degli Studi di Milano, Via Comelico 39, Milano, 20135, Italy
| | - Marco Frasca
- Department of Computer Science, Università degli Studi di Milano, Via Comelico 39, Milano, 20135, Italy.
| | - Dario Malchiodi
- Department of Computer Science, Università degli Studi di Milano, Via Comelico 39, Milano, 20135, Italy
| |
Collapse
|
25
|
Vidulin V, Šmuc T, Džeroski S, Supek F. The evolutionary signal in metagenome phyletic profiles predicts many gene functions. MICROBIOME 2018; 6:129. [PMID: 29991352 PMCID: PMC6040064 DOI: 10.1186/s40168-018-0506-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/07/2017] [Accepted: 06/19/2018] [Indexed: 06/08/2023]
Abstract
BACKGROUND The function of many genes is still not known even in model organisms. An increasing availability of microbiome DNA sequencing data provides an opportunity to infer gene function in a systematic manner. RESULTS We evaluated if the evolutionary signal contained in metagenome phyletic profiles (MPP) is predictive of a broad array of gene functions. The MPPs are an encoding of environmental DNA sequencing data that consists of relative abundances of gene families across metagenomes. We find that such MPPs can accurately predict 826 Gene Ontology functional categories, while drawing on human gut microbiomes, ocean metagenomes, and DNA sequences from various other engineered and natural environments. Overall, in this task, the MPPs are highly accurate, and moreover they provide coverage for a set of Gene Ontology terms largely complementary to standard phylogenetic profiles, derived from fully sequenced genomes. We also find that metagenomes approximated from taxon relative abundance obtained via 16S rRNA gene sequencing may provide surprisingly useful predictive models. Crucially, the MPPs derived from different types of environments can infer distinct, non-overlapping sets of gene functions and therefore complement each other. Consistently, simulations on > 5000 metagenomes indicate that the amount of data is not in itself critical for maximizing predictive accuracy, while the diversity of sampled environments appears to be the critical factor for obtaining robust models. CONCLUSIONS In past work, metagenomics has provided invaluable insight into ecology of various habitats, into diversity of microbial life and also into human health and disease mechanisms. We propose that environmental DNA sequencing additionally constitutes a useful tool to predict biological roles of genes, yielding inferences out of reach for existing comparative genomics approaches.
Collapse
Affiliation(s)
- Vedrana Vidulin
- Faculty of Information Studies, 8000 Novo Mesto, Slovenia
- Division of Electronics, Rudjer Boskovic Institute, 10000 Zagreb, Croatia
- Department of Knowledge Technologies, Jozef Stefan Institute, 1000 Ljubljana, Slovenia
| | - Tomislav Šmuc
- Division of Electronics, Rudjer Boskovic Institute, 10000 Zagreb, Croatia
| | - Sašo Džeroski
- Department of Knowledge Technologies, Jozef Stefan Institute, 1000 Ljubljana, Slovenia
| | - Fran Supek
- Genome Data Science, Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, 08028 Barcelona, Spain
| |
Collapse
|
26
|
Exploring the interactions of the RAS family in the human protein network and their potential implications in RAS-directed therapies. Oncotarget 2018; 7:75810-75826. [PMID: 27713118 PMCID: PMC5342780 DOI: 10.18632/oncotarget.12416] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2016] [Accepted: 09/15/2016] [Indexed: 12/14/2022] Open
Abstract
RAS proteins are the founding members of the RAS superfamily of GTPases. They are involved in key signaling pathways regulating essential cellular functions such as cell growth and differentiation. As a result, their deregulation by inactivating mutations often results in aberrant cell proliferation and cancer. With the exception of the relatively well-known KRAS, HRAS and NRAS proteins, little is known about how the interactions of the other RAS human paralogs affect cancer evolution and response to treatment. In this study we performed a comprehensive analysis of the relationship between the phylogeny of RAS proteins and their location in the protein interaction network. This analysis was integrated with the structural analysis of conserved positions in available 3D structures of RAS complexes. Our results show that many RAS proteins with divergent sequences are found close together in the human interactome. We found specific conserved amino acid positions in this group that map to the binding sites of RAS with many of their signaling effectors, suggesting that these pairs could share interacting partners. These results underscore the potential relevance of cross-talking in the RAS signaling network, which should be taken into account when considering the inhibitory activity of drugs targeting specific RAS oncoproteins. This study broadens our understanding of the human RAS signaling network and stresses the importance of considering its potential cross-talk in future therapies.
Collapse
|
27
|
Multilayer network modeling of integrated biological systems: Comment on "Network science of biological systems at different scales: A review" by Gosak et al. Phys Life Rev 2017; 24:149-152. [PMID: 29305153 DOI: 10.1016/j.plrev.2017.12.006] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2017] [Accepted: 12/21/2017] [Indexed: 11/21/2022]
|
28
|
Yan KK, Zhao H, Pang H. A comparison of graph- and kernel-based -omics data integration algorithms for classifying complex traits. BMC Bioinformatics 2017; 18:539. [PMID: 29212468 PMCID: PMC6389230 DOI: 10.1186/s12859-017-1982-4] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2017] [Accepted: 11/26/2017] [Indexed: 02/01/2023] Open
Abstract
BACKGROUND High-throughput sequencing data are widely collected and analyzed in the study of complex diseases in quest of improving human health. Well-studied algorithms mostly deal with single data source, and cannot fully utilize the potential of these multi-omics data sources. In order to provide a holistic understanding of human health and diseases, it is necessary to integrate multiple data sources. Several algorithms have been proposed so far, however, a comprehensive comparison of data integration algorithms for classification of binary traits is currently lacking. RESULTS In this paper, we focus on two common classes of integration algorithms, graph-based that depict relationships with subjects denoted by nodes and relationships denoted by edges, and kernel-based that can generate a classifier in feature space. Our paper provides a comprehensive comparison of their performance in terms of various measurements of classification accuracy and computation time. Seven different integration algorithms, including graph-based semi-supervised learning, graph sharpening integration, composite association network, Bayesian network, semi-definite programming-support vector machine (SDP-SVM), relevance vector machine (RVM) and Ada-boost relevance vector machine are compared and evaluated with hypertension and two cancer data sets in our study. In general, kernel-based algorithms create more complex models and require longer computation time, but they tend to perform better than graph-based algorithms. The performance of graph-based algorithms has the advantage of being faster computationally. CONCLUSIONS The empirical results demonstrate that composite association network, relevance vector machine, and Ada-boost RVM are the better performers. We provide recommendations on how to choose an appropriate algorithm for integrating data from multiple sources.
Collapse
Affiliation(s)
- Kang K Yan
- School of Public Health, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | - Hongyu Zhao
- Department of Biostatistics, Yale University, New Haven, CT, USA
| | - Herbert Pang
- School of Public Health, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China.
| |
Collapse
|
29
|
Abstract
Multiple biological, behavioural and genetic determinants or correlates of obesity have been identified to date. Genome-wide association studies (GWAS) have contributed to the identification of more than 100 obesity-associated genetic variants, but their roles in causal processes leading to obesity remain largely unknown. Most variants are likely to have tissue-specific regulatory roles through joint contributions to biological pathways and networks, through changes in gene expression that influence quantitative traits, or through the regulation of the epigenome. The recent availability of large-scale functional genomics resources provides an opportunity to re-examine obesity GWAS data to begin elucidating the function of genetic variants. Interrogation of knockout mouse phenotype resources provides a further avenue to test for evidence of convergence between genetic variation and biological or behavioural determinants of obesity.
Collapse
|
30
|
Bhattacharyya M, Madden P, Henning N, Gregory S, Aid M, Martinot AJ, Barouch DH, Penaloza-MacMaster P. Regulation of CD4 T cells and their effects on immunopathological inflammation following viral infection. Immunology 2017; 152:328-343. [PMID: 28582800 DOI: 10.1111/imm.12771] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2017] [Revised: 05/09/2017] [Accepted: 05/22/2017] [Indexed: 12/12/2022] Open
Abstract
CD4 T cells help immune responses, but knowledge of how memory CD4 T cells are regulated and how they regulate adaptive immune responses and induce immunopathology is limited. Using adoptive transfer of virus-specific CD4 T cells, we show that naive CD4 T cells undergo substantial expansion following infection, but can induce lethal T helper type 1-driven inflammation. In contrast, memory CD4 T cells exhibit a biased proliferation of T follicular helper cell subsets and were able to improve adaptive immune responses in the context of minimal tissue damage. Our analyses revealed that type I interferon regulates the expansion of primary CD4 T cells, but does not seem to play a critical role in regulating the expansion of secondary CD4 T cells. Strikingly, blockade of type I interferon abrogated lethal inflammation by primary CD4 T cells following viral infection, despite that this treatment increased the numbers of primary CD4 T-cell responses. Altogether, these data demonstrate important aspects of how primary and secondary CD4 T cells are regulated in vivo, and how they contribute to immune protection and immunopathology. These findings are important for rational vaccine design and for improving adoptive T-cell therapies against persistent antigens.
Collapse
Affiliation(s)
- Mitra Bhattacharyya
- Department of Microbiology-Immunology, Feinberg School of Medicine, Northwestern University, Chicago, IL
| | - Patrick Madden
- Department of Microbiology-Immunology, Feinberg School of Medicine, Northwestern University, Chicago, IL
| | - Nathan Henning
- Department of Microbiology-Immunology, Feinberg School of Medicine, Northwestern University, Chicago, IL
| | - Shana Gregory
- Department of Microbiology-Immunology, Feinberg School of Medicine, Northwestern University, Chicago, IL
| | - Malika Aid
- Center for Virology and Vaccine Research, Beth Israel Deaconess Medical Center, Boston, MA
| | - Amanda J Martinot
- Center for Virology and Vaccine Research, Beth Israel Deaconess Medical Center, Boston, MA
| | - Dan H Barouch
- Center for Virology and Vaccine Research, Beth Israel Deaconess Medical Center, Boston, MA.,Ragon Institute of MGH, MIT, and Harvard, Boston, MA, USA
| | - Pablo Penaloza-MacMaster
- Department of Microbiology-Immunology, Feinberg School of Medicine, Northwestern University, Chicago, IL
| |
Collapse
|
31
|
Jiang B, Kloster K, Gleich DF, Gribskov M. AptRank: an adaptive PageRank model for protein function prediction on bi-relational graphs. Bioinformatics 2017; 33:1829-1836. [PMID: 28200073 DOI: 10.1093/bioinformatics/btx029] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2016] [Accepted: 02/14/2017] [Indexed: 11/15/2022] Open
Affiliation(s)
- Biaobin Jiang
- Department of Biological Sciences, Purdue University, West Lafayette, IN, USA
| | - Kyle Kloster
- Department of Mathematics, Purdue University, West Lafayette, IN, USA
| | - David F Gleich
- Department of Computer Science, Purdue University, West Lafayette, IN, USA
| | - Michael Gribskov
- Department of Biological Sciences, Purdue University, West Lafayette, IN, USA
- Department of Computer Science, Purdue University, West Lafayette, IN, USA
| |
Collapse
|
32
|
Wang S, Qu M, Peng J. PROSNET: INTEGRATING HOMOLOGY WITH MOLECULAR NETWORKS FOR PROTEIN FUNCTION PREDICTION. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017; 22:27-38. [PMID: 27896959 PMCID: PMC5319591 DOI: 10.1142/9789813207813_0004] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Automated annotation of protein function has become a critical task in the post-genomic era. Network-based approaches and homology-based approaches have been widely used and recently tested in large-scale community-wide assessment experiments. It is natural to integrate network data with homology information to further improve the predictive performance. However, integrating these two heterogeneous, high-dimensional and noisy datasets is non-trivial. In this work, we introduce a novel protein function prediction algorithm ProSNet. An integrated heterogeneous network is first built to include molecular networks of multiple species and link together homologous proteins across multiple species. Based on this integrated network, a dimensionality reduction algorithm is introduced to obtain compact low-dimensional vectors to encode proteins in the network. Finally, we develop machine learning classification algorithms that take the vectors as input and make predictions by transferring annotations both within each species and across different species. Extensive experiments on five major species demonstrate that our integration of homology with molecular networks substantially improves the predictive performance over existing approaches.
Collapse
Affiliation(s)
- Sheng Wang
- Department of Computer Science, University of Illinois at Urbana-Champaign, Champaign, IL, USA
| | | | | |
Collapse
|
33
|
Abstract
Molecular profiling of proteins and phosphoproteins using a reverse phase protein array (RPPA) platform, with a panel of target-specific antibodies, enables the parallel, quantitative proteomic analysis of many biological samples in a microarray format. Hence, RPPA analysis can generate a high volume of multidimensional data that must be effectively interrogated and interpreted. A range of computational techniques for data mining can be applied to detect and explore data structure and to form functional predictions from large datasets. Here, two approaches for the computational analysis of RPPA data are detailed: the identification of similar patterns of protein expression by hierarchical cluster analysis and the modeling of protein interactions and signaling relationships by network analysis. The protocols use freely available, cross-platform software, are easy to implement, and do not require any programming expertise. Serving as data-driven starting points for further in-depth analysis, validation, and biological experimentation, these and related bioinformatic approaches can accelerate the functional interpretation of RPPA data.
Collapse
Affiliation(s)
- Adam Byron
- Cancer Research UK Edinburgh Centre, Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, EH4 2XR, UK.
| |
Collapse
|
34
|
Abstract
BACKGROUND Gene Ontology (GO) is a collaborative project that maintains and develops controlled vocabulary (or terms) to describe the molecular function, biological roles and cellular location of gene products in a hierarchical ontology. GO also provides GO annotations that associate genes with GO terms. GO consortium independently and collaboratively annotate terms to gene products, mainly from model organisms (or species) they are interested in. Due to experiment ethics, research interests of biologists and resources limitations, homologous genes from different species currently are annotated with different terms. These differences can be more attributed to incomplete annotations of genes than to functional difference between them. RESULTS Semantic similarity between genes is derived from GO hierarchy and annotations of genes. It is positively correlated with the similarity derived from various types of biological data and has been applied to predict gene function. In this paper, we investigate whether it is possible to replenish annotations of incompletely annotated genes by using semantic similarity between genes from two species with homology. For this investigation, we utilize three representative semantic similarity metrics to compute similarity between genes from two species. Next, we determine the k nearest neighborhood genes from the two species based on the chosen metric and then use terms annotated to k neighbors of a gene to replenish annotations of that gene. We perform experiments on archived (from Jan-2014 to Jan-2016) GO annotations of four species (Human, Mouse, Danio rerio and Arabidopsis thaliana) to assess the contribution of semantic similarity between genes from different species. The experimental results demonstrate that: (1) semantic similarity between genes from homologous species contributes much more on the improved accuracy (by 53.22%) than genes from single species alone, and genes from two species with low homology; (2) GO annotations of genes from homologous species are complementary to each other. CONCLUSIONS Our study shows that semantic similarity based interspecies gene function annotation from homologous species is more prominent than traditional intraspecies approaches. This work can promote more research on semantic similarity based function prediction across species.
Collapse
Affiliation(s)
- Guoxian Yu
- College of Computer and Information Sciences, Southwest University, Chongqing, China
| | - Wei Luo
- College of Computer and Information Sciences, Southwest University, Chongqing, China
| | - Guangyuan Fu
- College of Computer and Information Sciences, Southwest University, Chongqing, China
| | - Jun Wang
- College of Computer and Information Sciences, Southwest University, Chongqing, China
| |
Collapse
|
35
|
Cho H, Berger B, Peng J. Compact Integration of Multi-Network Topology for Functional Analysis of Genes. Cell Syst 2016; 3:540-548.e5. [PMID: 27889536 DOI: 10.1016/j.cels.2016.10.017] [Citation(s) in RCA: 141] [Impact Index Per Article: 17.6] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2016] [Revised: 08/14/2016] [Accepted: 10/19/2016] [Indexed: 01/18/2023]
Abstract
The topological landscape of molecular or functional interaction networks provides a rich source of information for inferring functional patterns of genes or proteins. However, a pressing yet-unsolved challenge is how to combine multiple heterogeneous networks, each having different connectivity patterns, to achieve more accurate inference. Here, we describe the Mashup framework for scalable and robust network integration. In Mashup, the diffusion in each network is first analyzed to characterize the topological context of each node. Next, the high-dimensional topological patterns in individual networks are canonically represented using low-dimensional vectors, one per gene or protein. These vectors can then be plugged into off-the-shelf machine learning methods to derive functional insights about genes or proteins. We present tools based on Mashup that achieve state-of-the-art performance in three diverse functional inference tasks: protein function prediction, gene ontology reconstruction, and genetic interaction prediction. Mashup enables deeper insights into the structure of rapidly accumulating and diverse biological network data and can be broadly applied to other network science domains.
Collapse
Affiliation(s)
- Hyunghoon Cho
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA 02139, USA
| | - Bonnie Berger
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA 02139, USA; Department of Mathematics, MIT, Cambridge, MA 02139, USA.
| | - Jian Peng
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA 02139, USA; Department of Computer Science, University of Illinois at Urbana-Champaign, Champaign, IL 61801, USA.
| |
Collapse
|
36
|
Liu J, Ghneim K, Sok D, Bosche WJ, Li Y, Chipriano E, Berkemeier B, Oswald K, Borducchi E, Cabral C, Peter L, Brinkman A, Shetty M, Jimenez J, Mondesir J, Lee B, Giglio P, Chandrashekar A, Abbink P, Colantonio A, Gittens C, Baker C, Wagner W, Lewis MG, Li W, Sekaly RP, Lifson JD, Burton DR, Barouch DH. Antibody-mediated protection against SHIV challenge includes systemic clearance of distal virus. Science 2016; 353:1045-1049. [PMID: 27540005 PMCID: PMC5237379 DOI: 10.1126/science.aag0491] [Citation(s) in RCA: 114] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2016] [Accepted: 08/10/2016] [Indexed: 12/13/2022]
Abstract
HIV-1-specific broadly neutralizing antibodies (bNAbs) can protect rhesus monkeys against simian-human immunodeficiency virus (SHIV) challenge. However, the site of antibody interception of virus and the mechanism of antibody-mediated protection remain unclear. We administered a fully protective dose of the bNAb PGT121 to rhesus monkeys and challenged them intravaginally with SHIV-SF162P3. In PGT121-treated animals, we detected low levels of viral RNA and viral DNA in distal tissues for seven days following challenge. Viral RNA-positive tissues showed transcriptomic changes indicative of innate immune activation, and cells from these tissues initiated infection after adoptive transfer into naïve hosts. These data demonstrate that bNAb-mediated protection against a mucosal virus challenge can involve clearance of infectious virus in distal tissues.
Collapse
Affiliation(s)
- Jinyan Liu
- Center for Virology and Vaccine Research, Beth Israel Deaconess Medical Center, Boston, MA 02215, USA
| | - Khader Ghneim
- Case Western Reserve University, Cleveland, OH 44106, USA
| | - Devin Sok
- The Scripps Research Institute, La Jolla, CA 92037, USA
| | - William J Bosche
- AIDS and Cancer Virus Program, Leidos Biomedical Research, Frederick National Laboratory, Frederick, MD 21702, USA
| | - Yuan Li
- AIDS and Cancer Virus Program, Leidos Biomedical Research, Frederick National Laboratory, Frederick, MD 21702, USA
| | - Elizabeth Chipriano
- AIDS and Cancer Virus Program, Leidos Biomedical Research, Frederick National Laboratory, Frederick, MD 21702, USA
| | - Brian Berkemeier
- AIDS and Cancer Virus Program, Leidos Biomedical Research, Frederick National Laboratory, Frederick, MD 21702, USA
| | - Kelli Oswald
- AIDS and Cancer Virus Program, Leidos Biomedical Research, Frederick National Laboratory, Frederick, MD 21702, USA
| | - Erica Borducchi
- Center for Virology and Vaccine Research, Beth Israel Deaconess Medical Center, Boston, MA 02215, USA
| | - Crystal Cabral
- Center for Virology and Vaccine Research, Beth Israel Deaconess Medical Center, Boston, MA 02215, USA
| | - Lauren Peter
- Center for Virology and Vaccine Research, Beth Israel Deaconess Medical Center, Boston, MA 02215, USA
| | - Amanda Brinkman
- Center for Virology and Vaccine Research, Beth Israel Deaconess Medical Center, Boston, MA 02215, USA
| | - Mayuri Shetty
- Center for Virology and Vaccine Research, Beth Israel Deaconess Medical Center, Boston, MA 02215, USA
| | - Jessica Jimenez
- Center for Virology and Vaccine Research, Beth Israel Deaconess Medical Center, Boston, MA 02215, USA
| | - Jade Mondesir
- Center for Virology and Vaccine Research, Beth Israel Deaconess Medical Center, Boston, MA 02215, USA
| | - Benjamin Lee
- Center for Virology and Vaccine Research, Beth Israel Deaconess Medical Center, Boston, MA 02215, USA
| | - Patricia Giglio
- Center for Virology and Vaccine Research, Beth Israel Deaconess Medical Center, Boston, MA 02215, USA
| | - Abishek Chandrashekar
- Center for Virology and Vaccine Research, Beth Israel Deaconess Medical Center, Boston, MA 02215, USA
| | - Peter Abbink
- Center for Virology and Vaccine Research, Beth Israel Deaconess Medical Center, Boston, MA 02215, USA
| | | | | | | | | | | | - Wenjun Li
- University of Massachusetts Medical School, Worcester, MA 01655, USA
| | | | - Jeffrey D Lifson
- AIDS and Cancer Virus Program, Leidos Biomedical Research, Frederick National Laboratory, Frederick, MD 21702, USA
| | - Dennis R Burton
- The Scripps Research Institute, La Jolla, CA 92037, USA. Ragon Institute of Massachusetts General Hospital, Massachusetts Institute of Technology, and Harvard, Cambridge, MA 02139, USA
| | - Dan H Barouch
- Center for Virology and Vaccine Research, Beth Israel Deaconess Medical Center, Boston, MA 02215, USA. Ragon Institute of Massachusetts General Hospital, Massachusetts Institute of Technology, and Harvard, Cambridge, MA 02139, USA.
| |
Collapse
|
37
|
Vidulin V, Šmuc T, Supek F. Extensive complementarity between gene function prediction methods. Bioinformatics 2016; 32:3645-3653. [PMID: 27522084 DOI: 10.1093/bioinformatics/btw532] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2016] [Revised: 07/11/2016] [Accepted: 08/09/2016] [Indexed: 12/22/2022] Open
Abstract
MOTIVATION The number of sequenced genomes rises steadily but we still lack the knowledge about the biological roles of many genes. Automated function prediction (AFP) is thus a necessity. We hypothesized that AFP approaches that draw on distinct genome features may be useful for predicting different types of gene functions, motivating a systematic analysis of the benefits gained by obtaining and integrating such predictions. RESULTS Our pipeline amalgamates 5 133 543 genes from 2071 genomes in a single massive analysis that evaluates five established genomic AFP methodologies. While 1227 Gene Ontology (GO) terms yielded reliable predictions, the majority of these functions were accessible to only one or two of the methods. Moreover, different methods tend to assign a GO term to non-overlapping sets of genes. Thus, inferences made by diverse genomic AFP methods display a striking complementary, both gene-wise and function-wise. Because of this, a viable integration strategy is to rely on a single most-confident prediction per gene/function, rather than enforcing agreement across multiple AFP methods. Using an information-theoretic approach, we estimate that current databases contain 29.2 bits/gene of known Escherichia coli gene functions. This can be increased by up to 5.5 bits/gene using individual AFP methods or by 11 additional bits/gene upon integration, thereby providing a highly-ranking predictor on the Critical Assessment of Function Annotation 2 community benchmark. Availability of more sequenced genomes boosts the predictive accuracy of AFP approaches and also the benefit from integrating them. AVAILABILITY AND IMPLEMENTATION The individual and integrated GO predictions for the complete set of genes are available from http://gorbi.irb.hr/ CONTACT: fran.supek@irb.hrSupplementary information: Supplementary materials are available at Bioinformatics online.
Collapse
Affiliation(s)
- Vedrana Vidulin
- Division of Electronics, Ruđer Bošković Institute, Bijenička cesta 54, Zagreb 10000, Croatia
| | - Tomislav Šmuc
- Division of Electronics, Ruđer Bošković Institute, Bijenička cesta 54, Zagreb 10000, Croatia
| | - Fran Supek
- Division of Electronics, Ruđer Bošković Institute, Bijenička cesta 54, Zagreb 10000, Croatia.,EMBL/CRG Systems Biology Research Unit, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology and UPF, Dr. Aiguader 88, Barcelona 08003, Spain
| |
Collapse
|
38
|
Fu G, Wang J, Yang B, Yu G. NegGOA: negative GO annotations selection using ontology structure. Bioinformatics 2016; 32:2996-3004. [PMID: 27318205 DOI: 10.1093/bioinformatics/btw366] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2016] [Accepted: 06/01/2016] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Predicting the biological functions of proteins is one of the key challenges in the post-genomic era. Computational models have demonstrated the utility of applying machine learning methods to predict protein function. Most prediction methods explicitly require a set of negative examples-proteins that are known not carrying out a particular function. However, Gene Ontology (GO) almost always only provides the knowledge that proteins carry out a particular function, and functional annotations of proteins are incomplete. GO structurally organizes more than tens of thousands GO terms and a protein is annotated with several (or dozens) of these terms. For these reasons, the negative examples of a protein can greatly help distinguishing true positive examples of the protein from such a large candidate GO space. RESULTS In this paper, we present a novel approach (called NegGOA) to select negative examples. Specifically, NegGOA takes advantage of the ontology structure, available annotations and potentiality of additional annotations of a protein to choose negative examples of the protein. We compare NegGOA with other negative examples selection algorithms and find that NegGOA produces much fewer false negatives than them. We incorporate the selected negative examples into an efficient function prediction model to predict the functions of proteins in Yeast, Human, Mouse and Fly. NegGOA also demonstrates improved accuracy than these comparing algorithms across various evaluation metrics. In addition, NegGOA is less suffered from incomplete annotations of proteins than these comparing methods. AVAILABILITY AND IMPLEMENTATION The Matlab and R codes are available at https://sites.google.com/site/guoxian85/neggoa CONTACT gxyu@swu.edu.cn SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Guangyuan Fu
- College of Computer and Information Science, Southwest University, Chongqing 400715, China
| | - Jun Wang
- College of Computer and Information Science, Southwest University, Chongqing 400715, China
| | - Bo Yang
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, China
| | - Guoxian Yu
- College of Computer and Information Science, Southwest University, Chongqing 400715, China Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, China
| |
Collapse
|
39
|
Mostafavi S, Yoshida H, Moodley D, LeBoité H, Rothamel K, Raj T, Ye CJ, Chevrier N, Zhang SY, Feng T, Lee M, Casanova JL, Clark JD, Hegen M, Telliez JB, Hacohen N, De Jager PL, Regev A, Mathis D, Benoist C. Parsing the Interferon Transcriptional Network and Its Disease Associations. Cell 2016; 164:564-78. [PMID: 26824662 DOI: 10.1016/j.cell.2015.12.032] [Citation(s) in RCA: 207] [Impact Index Per Article: 25.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2015] [Revised: 10/22/2015] [Accepted: 12/21/2015] [Indexed: 12/17/2022]
Abstract
Type 1 interferon (IFN) is a key mediator of organismal responses to pathogens, eliciting prototypical "interferon signature genes" that encode antiviral and inflammatory mediators. For a global view of IFN signatures and regulatory pathways, we performed gene expression and chromatin analyses of the IFN-induced response across a range of immunocyte lineages. These distinguished ISGs by cell-type specificity, kinetics, and sensitivity to tonic IFN and revealed underlying changes in chromatin configuration. We combined 1,398 human and mouse datasets to computationally infer ISG modules and their regulators, validated by genetic analysis in both species. Some ISGs are controlled by Stat1/2 and Irf9 and the ISRE DNA motif, but others appeared dependent on non-canonical factors. This regulatory framework helped to interpret JAK1 blockade pharmacology, different clusters being affected under tonic or IFN-stimulated conditions, and the IFN signatures previously associated with human diseases, revealing unrecognized subtleties in disease footprints, as affected by human ancestry.
Collapse
Affiliation(s)
- Sara Mostafavi
- Division of Immunology, Department of Microbiology and Immunobiology, Harvard Medical School, Boston, MA 02115, USA; Department of Statistics and Department Medical Genetics, University of British Columbia, Vancouver, BC V6H 3N1, Canada
| | - Hideyuki Yoshida
- Division of Immunology, Department of Microbiology and Immunobiology, Harvard Medical School, Boston, MA 02115, USA
| | - Devapregasan Moodley
- Division of Immunology, Department of Microbiology and Immunobiology, Harvard Medical School, Boston, MA 02115, USA
| | - Hugo LeBoité
- Division of Immunology, Department of Microbiology and Immunobiology, Harvard Medical School, Boston, MA 02115, USA
| | - Katherine Rothamel
- Division of Immunology, Department of Microbiology and Immunobiology, Harvard Medical School, Boston, MA 02115, USA
| | - Towfique Raj
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Program in Translational NeuroPsychiatric Genomics, Departments of Neurology and Psychiatry, Brigham and Women's Hospital, Boston, MA 02115, USA
| | - Chun Jimmie Ye
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Nicolas Chevrier
- FAS Center for Systems Biology, Harvard University, Cambridge, MA 02138, USA
| | - Shen-Ying Zhang
- St. Giles Laboratory of Human Genetics of Infectious Diseases, The Rockefeller University, New York, NY 10065, USA
| | - Ting Feng
- Division of Immunology, Department of Microbiology and Immunobiology, Harvard Medical School, Boston, MA 02115, USA
| | - Mark Lee
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Jean-Laurent Casanova
- St. Giles Laboratory of Human Genetics of Infectious Diseases, The Rockefeller University, New York, NY 10065, USA
| | | | | | | | - Nir Hacohen
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Philip L De Jager
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Program in Translational NeuroPsychiatric Genomics, Departments of Neurology and Psychiatry, Brigham and Women's Hospital, Boston, MA 02115, USA
| | - Aviv Regev
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Diane Mathis
- Division of Immunology, Department of Microbiology and Immunobiology, Harvard Medical School, Boston, MA 02115, USA.
| | - Christophe Benoist
- Division of Immunology, Department of Microbiology and Immunobiology, Harvard Medical School, Boston, MA 02115, USA.
| |
Collapse
|
40
|
Barouch DH, Ghneim K, Bosche WJ, Li Y, Berkemeier B, Hull M, Bhattacharyya S, Cameron M, Liu J, Smith K, Borducchi E, Cabral C, Peter L, Brinkman A, Shetty M, Li H, Gittens C, Baker C, Wagner W, Lewis MG, Colantonio A, Kang HJ, Li W, Lifson JD, Piatak M, Sekaly RP. Rapid Inflammasome Activation following Mucosal SIV Infection of Rhesus Monkeys. Cell 2016; 165:656-67. [PMID: 27085913 PMCID: PMC4842119 DOI: 10.1016/j.cell.2016.03.021] [Citation(s) in RCA: 109] [Impact Index Per Article: 13.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2015] [Revised: 01/31/2016] [Accepted: 03/14/2016] [Indexed: 01/10/2023]
Abstract
The earliest events following mucosal HIV-1 infection, prior to measurable viremia, remain poorly understood. Here, by detailed necropsy studies, we show that the virus can rapidly disseminate following mucosal SIV infection of rhesus monkeys and trigger components of the inflammasome, both at the site of inoculation and at early sites of distal virus spread. By 24 hr following inoculation, a proinflammatory signature that lacked antiviral restriction factors was observed in viral RNA-positive tissues. The early innate response included expression of NLRX1, which inhibits antiviral responses, and activation of the TGF-β pathway, which negatively regulates adaptive immune responses. These data suggest a model in which the virus triggers specific host mechanisms that suppress the generation of antiviral innate and adaptive immune responses in the first few days of infection, thus facilitating its own replication. These findings have important implications for the development of vaccines and other strategies to prevent infection.
Collapse
Affiliation(s)
- Dan H Barouch
- Center for Virology and Vaccine Research, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA 02215, USA; Ragon Institute of MGH, MIT, and Harvard, Cambridge, MA 02139, USA.
| | - Khader Ghneim
- Case Western Reserve University, Cleveland, OH 44106, USA
| | - William J Bosche
- AIDS and Cancer Virus Program, Leidos Biomedical Research, Frederick National Laboratory, Frederick, MD 21702, USA
| | - Yuan Li
- AIDS and Cancer Virus Program, Leidos Biomedical Research, Frederick National Laboratory, Frederick, MD 21702, USA
| | - Brian Berkemeier
- AIDS and Cancer Virus Program, Leidos Biomedical Research, Frederick National Laboratory, Frederick, MD 21702, USA
| | - Michael Hull
- AIDS and Cancer Virus Program, Leidos Biomedical Research, Frederick National Laboratory, Frederick, MD 21702, USA
| | | | - Mark Cameron
- Case Western Reserve University, Cleveland, OH 44106, USA
| | - Jinyan Liu
- Center for Virology and Vaccine Research, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA 02215, USA
| | - Kaitlin Smith
- Center for Virology and Vaccine Research, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA 02215, USA
| | - Erica Borducchi
- Center for Virology and Vaccine Research, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA 02215, USA
| | - Crystal Cabral
- Center for Virology and Vaccine Research, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA 02215, USA
| | - Lauren Peter
- Center for Virology and Vaccine Research, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA 02215, USA
| | - Amanda Brinkman
- Center for Virology and Vaccine Research, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA 02215, USA
| | - Mayuri Shetty
- Center for Virology and Vaccine Research, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA 02215, USA
| | - Hualin Li
- Center for Virology and Vaccine Research, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA 02215, USA
| | | | | | | | | | | | - Hyung-Joo Kang
- University of Massachusetts Medical School, Worcester, MA 01605, USA
| | - Wenjun Li
- University of Massachusetts Medical School, Worcester, MA 01605, USA
| | - Jeffrey D Lifson
- AIDS and Cancer Virus Program, Leidos Biomedical Research, Frederick National Laboratory, Frederick, MD 21702, USA
| | - Michael Piatak
- AIDS and Cancer Virus Program, Leidos Biomedical Research, Frederick National Laboratory, Frederick, MD 21702, USA
| | | |
Collapse
|
41
|
Pennisi M, Russo G, Di Salvatore V, Candido S, Libra M, Pappalardo F. Computational modeling in melanoma for novel drug discovery. Expert Opin Drug Discov 2016; 11:609-21. [PMID: 27046143 DOI: 10.1080/17460441.2016.1174688] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
INTRODUCTION There is a growing body of evidence highlighting the applications of computational modeling in the field of biomedicine. It has recently been applied to the in silico analysis of cancer dynamics. In the era of precision medicine, this analysis may allow the discovery of new molecular targets useful for the design of novel therapies and for overcoming resistance to anticancer drugs. According to its molecular behavior, melanoma represents an interesting tumor model in which computational modeling can be applied. Melanoma is an aggressive tumor of the skin with a poor prognosis for patients with advanced disease as it is resistant to current therapeutic approaches. AREAS COVERED This review discusses the basics of computational modeling in melanoma drug discovery and development. Discussion includes the in silico discovery of novel molecular drug targets, the optimization of immunotherapies and personalized medicine trials. EXPERT OPINION Mathematical and computational models are gradually being used to help understand biomedical data produced by high-throughput analysis. The use of advanced computer models allowing the simulation of complex biological processes provides hypotheses and supports experimental design. The research in fighting aggressive cancers, such as melanoma, is making great strides. Computational models represent the key component to complement these efforts. Due to the combinatorial complexity of new drug discovery, a systematic approach based only on experimentation is not possible. Computational and mathematical models are necessary for bringing cancer drug discovery into the era of omics, big data and personalized medicine.
Collapse
Affiliation(s)
- Marzio Pennisi
- a Department of Mathematics and Computer Science , University of Catania , Catania , Italy
| | - Giulia Russo
- b Department of Biomedical and Biotechnological Sciences , University of Catania , Catania , Italy
| | - Valentina Di Salvatore
- c Researcher at National Research Council , Institute of Neurological Sciences , Catania , Italy
| | - Saverio Candido
- b Department of Biomedical and Biotechnological Sciences , University of Catania , Catania , Italy
| | - Massimo Libra
- b Department of Biomedical and Biotechnological Sciences , University of Catania , Catania , Italy
| | | |
Collapse
|
42
|
Meng J, Wekesa JS, Shi GL, Luan YS. Protein function prediction based on data fusion and functional interrelationship. Math Biosci 2016; 274:25-32. [DOI: 10.1016/j.mbs.2016.02.001] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2015] [Revised: 01/08/2016] [Accepted: 02/01/2016] [Indexed: 10/22/2022]
|
43
|
Yu G, Fu G, Wang J, Zhu H. Predicting Protein Function via Semantic Integration of Multiple Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2016; 13:220-232. [PMID: 26800544 DOI: 10.1109/tcbb.2015.2459713] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Determining the biological functions of proteins is one of the key challenges in the post-genomic era. The rapidly accumulated large volumes of proteomic and genomic data drives to develop computational models for automatically predicting protein function in large scale. Recent approaches focus on integrating multiple heterogeneous data sources and they often get better results than methods that use single data source alone. In this paper, we investigate how to integrate multiple biological data sources with the biological knowledge, i.e., Gene Ontology (GO), for protein function prediction. We propose a method, called SimNet, to Semantically integrate multiple functional association Networks derived from heterogenous data sources. SimNet firstly utilizes GO annotations of proteins to capture the semantic similarity between proteins and introduces a semantic kernel based on the similarity. Next, SimNet constructs a composite network, obtained as a weighted summation of individual networks, and aligns the network with the kernel to get the weights assigned to individual networks. Then, it applies a network-based classifier on the composite network to predict protein function. Experiment results on heterogenous proteomic data sources of Yeast, Human, Mouse, and Fly show that, SimNet not only achieves better (or comparable) results than other related competitive approaches, but also takes much less time. The Matlab codes of SimNet are available at https://sites.google.com/site/guoxian85/simnet.
Collapse
|
44
|
Xu Y, Min H, Song H, Wu Q. Multi-instance multi-label distance metric learning for genome-wide protein function prediction. Comput Biol Chem 2016; 63:30-40. [PMID: 26923212 DOI: 10.1016/j.compbiolchem.2016.02.011] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2016] [Accepted: 02/01/2016] [Indexed: 11/24/2022]
Abstract
Multi-instance multi-label (MIML) learning has been proven to be effective for the genome-wide protein function prediction problems where each training example is associated with not only multiple instances but also multiple class labels. To find an appropriate MIML learning method for genome-wide protein function prediction, many studies in the literature attempted to optimize objective functions in which dissimilarity between instances is measured using the Euclidean distance. But in many real applications, Euclidean distance may be unable to capture the intrinsic similarity/dissimilarity in feature space and label space. Unlike other previous approaches, in this paper, we propose to learn a multi-instance multi-label distance metric learning framework (MIMLDML) for genome-wide protein function prediction. Specifically, we learn a Mahalanobis distance to preserve and utilize the intrinsic geometric information of both feature space and label space for MIML learning. In addition, we try to deal with the sparsely labeled data by giving weight to the labeled data. Extensive experiments on seven real-world organisms covering the biological three-domain system (i.e., archaea, bacteria, and eukaryote; Woese et al., 1990) show that the MIMLDML algorithm is superior to most state-of-the-art MIML learning algorithms.
Collapse
Affiliation(s)
- Yonghui Xu
- School of Computer Science and Engineering, South China University of Technology, Guangzhou 510006, China
| | - Huaqing Min
- School of Software Engineering, South China University of Technology, Guangzhou 510006, China.
| | - Hengjie Song
- School of Software Engineering, South China University of Technology, Guangzhou 510006, China
| | - Qingyao Wu
- School of Software Engineering, South China University of Technology, Guangzhou 510006, China.
| |
Collapse
|
45
|
Glass K, Girvan M. Finding New Order in Biological Functions from the Network Structure of Gene Annotations. PLoS Comput Biol 2015; 11:e1004565. [PMID: 26588252 PMCID: PMC4654495 DOI: 10.1371/journal.pcbi.1004565] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2015] [Accepted: 09/23/2015] [Indexed: 11/19/2022] Open
Abstract
The Gene Ontology (GO) provides biologists with a controlled terminology that describes how genes are associated with functions and how functional terms are related to one another. These term-term relationships encode how scientists conceive the organization of biological functions, and they take the form of a directed acyclic graph (DAG). Here, we propose that the network structure of gene-term annotations made using GO can be employed to establish an alternative approach for grouping functional terms that captures intrinsic functional relationships that are not evident in the hierarchical structure established in the GO DAG. Instead of relying on an externally defined organization for biological functions, our approach connects biological functions together if they are performed by the same genes, as indicated in a compendium of gene annotation data from numerous different sources. We show that grouping terms by this alternate scheme provides a new framework with which to describe and predict the functions of experimentally identified sets of genes.
Collapse
Affiliation(s)
- Kimberly Glass
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute and Harvard T. H. Chan School of Public Health, Boston, Massachusetts, United States of America
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, Massachusetts, United States of America
- Physics Department, University of Maryland, College Park, Maryland, United States of America
- * E-mail:
| | - Michelle Girvan
- Physics Department, University of Maryland, College Park, Maryland, United States of America
- Institute for Physical Science and Technology, University of Maryland, College Park, Maryland, United States of America
- Santa Fe Institute, Santa Fe, New Mexico, United States of America
| |
Collapse
|
46
|
Frasca M, Bertoni A, Valentini G. UNIPred: Unbalance-Aware Network Integration and Prediction of Protein Functions. J Comput Biol 2015; 22:1057-74. [PMID: 26402488 DOI: 10.1089/cmb.2014.0110] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
The proper integration of multiple sources of data and the unbalance between annotated and unannotated proteins represent two of the main issues of the automated function prediction (AFP) problem. Most of supervised and semisupervised learning algorithms for AFP proposed in literature do not jointly consider these items, with a negative impact on both sensitivity and precision performances, due to the unbalance between annotated and unannotated proteins that characterize the majority of functional classes and to the specific and complementary information content embedded in each available source of data. We propose UNIPred (unbalance-aware network integration and prediction of protein functions), an algorithm that properly combines different biomolecular networks and predicts protein functions using parametric semisupervised neural models. The algorithm explicitly takes into account the unbalance between unannotated and annotated proteins both to construct the integrated network and to predict protein annotations for each functional class. Full-genome and ontology-wide experiments with three eukaryotic model organisms show that the proposed method compares favorably with state-of-the-art learning algorithms for AFP.
Collapse
Affiliation(s)
- Marco Frasca
- DI - Department of Computer Science, University of Milan , Milan, Italy
| | - Alberto Bertoni
- DI - Department of Computer Science, University of Milan , Milan, Italy
| | - Giorgio Valentini
- DI - Department of Computer Science, University of Milan , Milan, Italy
| |
Collapse
|
47
|
Frasca M. Automated gene function prediction through gene multifunctionality in biological networks. Neurocomputing 2015. [DOI: 10.1016/j.neucom.2015.04.007] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|
48
|
Frasca M, Bassis S, Valentini G. Learning node labels with multi-category Hopfield networks. Neural Comput Appl 2015. [DOI: 10.1007/s00521-015-1965-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
|
49
|
Wang S, Cho H, Zhai C, Berger B, Peng J. Exploiting ontology graph for predicting sparsely annotated gene function. Bioinformatics 2015; 31:i357-64. [PMID: 26072504 PMCID: PMC4542782 DOI: 10.1093/bioinformatics/btv260] [Citation(s) in RCA: 61] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023] Open
Abstract
MOTIVATION Systematically predicting gene (or protein) function based on molecular interaction networks has become an important tool in refining and enhancing the existing annotation catalogs, such as the Gene Ontology (GO) database. However, functional labels with only a few (<10) annotated genes, which constitute about half of the GO terms in yeast, mouse and human, pose a unique challenge in that any prediction algorithm that independently considers each label faces a paucity of information and thus is prone to capture non-generalizable patterns in the data, resulting in poor predictive performance. There exist a variety of algorithms for function prediction, but none properly address this 'overfitting' issue of sparsely annotated functions, or do so in a manner scalable to tens of thousands of functions in the human catalog. RESULTS We propose a novel function prediction algorithm, clusDCA, which transfers information between similar functional labels to alleviate the overfitting problem for sparsely annotated functions. Our method is scalable to datasets with a large number of annotations. In a cross-validation experiment in yeast, mouse and human, our method greatly outperformed previous state-of-the-art function prediction algorithms in predicting sparsely annotated functions, without sacrificing the performance on labels with sufficient information. Furthermore, we show that our method can accurately predict genes that will be assigned a functional label that has no known annotations, based only on the ontology graph structure and genes associated with other labels, which further suggests that our method effectively utilizes the similarity between gene functions. AVAILABILITY AND IMPLEMENTATION https://github.com/wangshenguiuc/clusDCA.
Collapse
Affiliation(s)
- Sheng Wang
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA and Department of Mathematics, MIT, Cambridge, MA, USA
| | - Hyunghoon Cho
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA and Department of Mathematics, MIT, Cambridge, MA, USA
| | - ChengXiang Zhai
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA and Department of Mathematics, MIT, Cambridge, MA, USA
| | - Bonnie Berger
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA and Department of Mathematics, MIT, Cambridge, MA, USA Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA and Department of Mathematics, MIT, Cambridge, MA, USA
| | - Jian Peng
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA and Department of Mathematics, MIT, Cambridge, MA, USA
| |
Collapse
|
50
|
Yu G, Zhu H, Domeniconi C, Guo M. Integrating multiple networks for protein function prediction. BMC SYSTEMS BIOLOGY 2015; 9 Suppl 1:S3. [PMID: 25707434 PMCID: PMC4331678 DOI: 10.1186/1752-0509-9-s1-s3] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
Background High throughput techniques produce multiple functional association networks. Integrating these networks can enhance the accuracy of protein function prediction. Many algorithms have been introduced to generate a composite network, which is obtained as a weighted sum of individual networks. The weight assigned to an individual network reflects its benefit towards the protein functional annotation inference. A classifier is then trained on the composite network for predicting protein functions. However, since these techniques model the optimization of the composite network and the prediction tasks as separate objectives, the resulting composite network is not necessarily optimal for the follow-up protein function prediction. Results We address this issue by modeling the optimization of the composite network and the prediction problems within a unified objective function. In particular, we use a kernel target alignment technique and the loss function of a network based classifier to jointly adjust the weights assigned to the individual networks. We show that the proposed method, called MNet, can achieve a performance that is superior (with respect to different evaluation criteria) to related techniques using the multiple networks of four example species (yeast, human, mouse, and fly) annotated with thousands (or hundreds) of GO terms. Conclusion MNet can effectively integrate multiple networks for protein function prediction and is robust to the input parameters. Supplementary data is available at https://sites.google.com/site/guoxian85/home/mnet. The Matlab code of MNet is available upon request.
Collapse
|