1
|
Xiang J, Zhang J, Zhao Y, Wu FX, Li M. Biomedical data, computational methods and tools for evaluating disease-disease associations. Brief Bioinform 2022; 23:6522999. [PMID: 35136949 DOI: 10.1093/bib/bbac006] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2021] [Revised: 01/04/2022] [Accepted: 01/05/2022] [Indexed: 12/12/2022] Open
Abstract
In recent decades, exploring potential relationships between diseases has been an active research field. With the rapid accumulation of disease-related biomedical data, a lot of computational methods and tools/platforms have been developed to reveal intrinsic relationship between diseases, which can provide useful insights to the study of complex diseases, e.g. understanding molecular mechanisms of diseases and discovering new treatment of diseases. Human complex diseases involve both external phenotypic abnormalities and complex internal molecular mechanisms in organisms. Computational methods with different types of biomedical data from phenotype to genotype can evaluate disease-disease associations at different levels, providing a comprehensive perspective for understanding diseases. In this review, available biomedical data and databases for evaluating disease-disease associations are first summarized. Then, existing computational methods for disease-disease associations are reviewed and classified into five groups in terms of the usages of biomedical data, including disease semantic-based, phenotype-based, function-based, representation learning-based and text mining-based methods. Further, we summarize software tools/platforms for computation and analysis of disease-disease associations. Finally, we give a discussion and summary on the research of disease-disease associations. This review provides a systematic overview for current disease association research, which could promote the development and applications of computational methods and tools/platforms for disease-disease associations.
Collapse
Affiliation(s)
- Ju Xiang
- School of Computer Science and Engineering, Central South University, China
| | - Jiashuai Zhang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China
| | - Yichao Zhao
- School of Computer Science and Engineering, Central South University, China
| | - Fang-Xiang Wu
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China
| | - Min Li
- Division of Biomedical Engineering and Department of Mechanical Engineering at University of Saskatchewan, Saskatoon, Canada
| |
Collapse
|
2
|
Le DH. A network-based method for predicting disease-associated enhancers. PLoS One 2021; 16:e0260432. [PMID: 34879086 PMCID: PMC8654176 DOI: 10.1371/journal.pone.0260432] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2021] [Accepted: 11/09/2021] [Indexed: 11/18/2022] Open
Abstract
Background Enhancers regulate transcription of target genes, causing a change in expression level. Thus, the aberrant activity of enhancers can lead to diseases. To date, a large number of enhancers have been identified, yet a small portion of them have been found to be associated with diseases. This raises a pressing need to develop computational methods to predict associations between diseases and enhancers. Results In this study, we assumed that enhancers sharing target genes could be associated with similar diseases to predict the association. Thus, we built an enhancer functional interaction network by connecting enhancers significantly sharing target genes, then developed a network diffusion method RWDisEnh, based on a random walk with restart algorithm, on networks of diseases and enhancers to globally measure the degree of the association between diseases and enhancers. RWDisEnh performed best when the disease similarities are integrated with the enhancer functional interaction network by known disease-enhancer associations in the form of a heterogeneous network of diseases and enhancers. It was also superior to another network diffusion method, i.e., PageRank with Priors, and a neighborhood-based one, i.e., MaxLink, which simply chooses the closest neighbors of known disease-associated enhancers. Finally, we showed that RWDisEnh could predict novel enhancers, which are either directly or indirectly associated with diseases. Conclusions Taken together, RWDisEnh could be a potential method for predicting disease-enhancer associations.
Collapse
Affiliation(s)
- Duc-Hau Le
- School of Computer Science and Engineering, Thuyloi University, Hanoi, Vietnam
- * E-mail:
| |
Collapse
|
3
|
Charmpi K, Chokkalingam M, Johnen R, Beyer A. Optimizing network propagation for multi-omics data integration. PLoS Comput Biol 2021; 17:e1009161. [PMID: 34762640 PMCID: PMC8664198 DOI: 10.1371/journal.pcbi.1009161] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2021] [Revised: 12/10/2021] [Accepted: 10/12/2021] [Indexed: 01/11/2023] Open
Abstract
Network propagation refers to a class of algorithms that integrate information from input data across connected nodes in a given network. These algorithms have wide applications in systems biology, protein function prediction, inferring condition-specifically altered sub-networks, and prioritizing disease genes. Despite the popularity of network propagation, there is a lack of comparative analyses of different algorithms on real data and little guidance on how to select and parameterize the various algorithms. Here, we address this problem by analyzing different combinations of network normalization and propagation methods and by demonstrating schemes for the identification of optimal parameter settings on real proteome and transcriptome data. Our work highlights the risk of a ‘topology bias’ caused by the incorrect use of network normalization approaches. Capitalizing on the fact that network propagation is a regularization approach, we show that minimizing the bias-variance tradeoff can be utilized for selecting optimal parameters. The application to real multi-omics data demonstrated that optimal parameters could also be obtained by either maximizing the agreement between different omics layers (e.g. proteome and transcriptome) or by maximizing the consistency between biological replicates. Furthermore, we exemplified the utility and robustness of network propagation on multi-omics datasets for identifying ageing-associated genes in brain and liver tissues of rats and for elucidating molecular mechanisms underlying prostate cancer progression. Overall, this work compares different network propagation approaches and it presents strategies for how to use network propagation algorithms to optimally address a specific research question at hand. Modern technologies enable the simultaneous measurement of tens of thousands of molecules in biological samples. Algorithms called network propagation or network smoothing are frequently used to integrate such data with already known molecular interaction data, such as protein and gene interaction networks. These methods distribute the information on molecular perturbations within the network and help identifying network regions that are enriched for many perturbed (affected) molecules. Despite the popularity of these methods, there is a lack of guidance on how to optimally use them. Here, we highlight possible pitfalls when using incorrect network normalization methods. Further, we present different ways for optimizing the smoothing parameters used during network smoothing: the first approach maximizes the consistency between replicate measurements within a dataset; the second one maximizes the consistency between different types of ‘omics’ measurements, such as proteomics and transcriptomics. Using two multi-omics datasets, one from a cohort of prostate cancer patients, the other one from an ageing study on rat brain and liver tissues, we exemplify the effects of these strategies on real data.
Collapse
Affiliation(s)
- Konstantina Charmpi
- CECAD Cologne Excellence Cluster on Cellular Stress Responses in Aging Associated Diseases, Cologne, Germany
| | - Manopriya Chokkalingam
- CECAD Cologne Excellence Cluster on Cellular Stress Responses in Aging Associated Diseases, Cologne, Germany
| | - Ronja Johnen
- CECAD Cologne Excellence Cluster on Cellular Stress Responses in Aging Associated Diseases, Cologne, Germany
| | - Andreas Beyer
- CECAD Cologne Excellence Cluster on Cellular Stress Responses in Aging Associated Diseases, Cologne, Germany
- Center for Molecular Medicine Cologne (CMMC), Medical Faculty, University of Cologne, Cologne, Germany
- Institute for Genetics, Faculty of Mathematics and Natural Sciences, University of Cologne, Cologne, Germany
- * E-mail:
| |
Collapse
|
4
|
Fa R, Cozzetto D, Wan C, Jones DT. Predicting human protein function with multi-task deep neural networks. PLoS One 2018; 13:e0198216. [PMID: 29889900 PMCID: PMC5995439 DOI: 10.1371/journal.pone.0198216] [Citation(s) in RCA: 44] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2018] [Accepted: 05/15/2018] [Indexed: 11/19/2022] Open
Abstract
Machine learning methods for protein function prediction are urgently needed, especially now that a substantial fraction of known sequences remains unannotated despite the extensive use of functional assignments based on sequence similarity. One major bottleneck supervised learning faces in protein function prediction is the structured, multi-label nature of the problem, because biological roles are represented by lists of terms from hierarchically organised controlled vocabularies such as the Gene Ontology. In this work, we build on recent developments in the area of deep learning and investigate the usefulness of multi-task deep neural networks (MTDNN), which consist of upstream shared layers upon which are stacked in parallel as many independent modules (additional hidden layers with their own output units) as the number of output GO terms (the tasks). MTDNN learns individual tasks partially using shared representations and partially from task-specific characteristics. When no close homologues with experimentally validated functions can be identified, MTDNN gives more accurate predictions than baseline methods based on annotation frequencies in public databases or homology transfers. More importantly, the results show that MTDNN binary classification accuracy is higher than alternative machine learning-based methods that do not exploit commonalities and differences among prediction tasks. Interestingly, compared with a single-task predictor, the performance improvement is not linearly correlated with the number of tasks in MTDNN, but medium size models provide more improvement in our case. One of advantages of MTDNN is that given a set of features, there is no requirement for MTDNN to have a bootstrap feature selection procedure as what traditional machine learning algorithms do. Overall, the results indicate that the proposed MTDNN algorithm improves the performance of protein function prediction. On the other hand, there is still large room for deep learning techniques to further enhance prediction ability.
Collapse
Affiliation(s)
- Rui Fa
- The Francis Crick Institute, London, United Kingdom
- Computer Science Department, University College London, London, United Kingdom
| | - Domenico Cozzetto
- The Francis Crick Institute, London, United Kingdom
- Computer Science Department, University College London, London, United Kingdom
| | - Cen Wan
- The Francis Crick Institute, London, United Kingdom
- Computer Science Department, University College London, London, United Kingdom
| | - David T. Jones
- The Francis Crick Institute, London, United Kingdom
- Computer Science Department, University College London, London, United Kingdom
- * E-mail:
| |
Collapse
|
5
|
Robinson S, Nevalainen J, Pinna G, Campalans A, Radicella JP, Guyon L. Incorporating interaction networks into the determination of functionally related hit genes in genomic experiments with Markov random fields. Bioinformatics 2018; 33:i170-i179. [PMID: 28881978 PMCID: PMC5870666 DOI: 10.1093/bioinformatics/btx244] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
Motivation Incorporating gene interaction data into the identification of ‘hit’ genes in genomic experiments is a well-established approach leveraging the ‘guilt by association’ assumption to obtain a network based hit list of functionally related genes. We aim to develop a method to allow for multivariate gene scores and multiple hit labels in order to extend the analysis of genomic screening data within such an approach. Results We propose a Markov random field-based method to achieve our aim and show that the particular advantages of our method compared with those currently used lead to new insights in previously analysed data as well as for our own motivating data. Our method additionally achieves the best performance in an independent simulation experiment. The real data applications we consider comprise of a survival analysis and differential expression experiment and a cell-based RNA interference functional screen. Availability and implementation We provide all of the data and code related to the results in the paper. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sean Robinson
- CEA, BIG, Biologie à Grande Echelle, F-38054 Grenoble, France.,Université Grenoble-Alpes, F-38000 Grenoble, France.,INSERM, U1038, F-38054 Grenoble, France.,Department of Mathematics and Statistics, University of Turku, Turku, Finland.,Industrial Biotechnology, VTT Technical Research Centre of Finland, Turku, Finland
| | - Jaakko Nevalainen
- Department of Mathematics and Statistics, University of Turku, Turku, Finland.,School of Health Sciences, University of Tampere, Tampere, Finland
| | - Guillaume Pinna
- Plateforme ARN Interférence (PArI), DSV/ISVFJ/SBIGEM/UMR 9198 I2BC, CEA Saclay, Gif-sur-Yvette, France
| | - Anna Campalans
- Institute of Molecular and Cellular Radiobiology, CEA, Fontenay-aux-Roses, France.,INSERM, U967, Fontenay-aux-Roses, France.,Université Paris Diderot, U967, Fontenay-aux-Roses, France.,Université Paris Sud, U967, Fontenay-aux-Roses, France
| | - J Pablo Radicella
- Institute of Molecular and Cellular Radiobiology, CEA, Fontenay-aux-Roses, France.,INSERM, U967, Fontenay-aux-Roses, France.,Université Paris Diderot, U967, Fontenay-aux-Roses, France.,Université Paris Sud, U967, Fontenay-aux-Roses, France
| | - Laurent Guyon
- CEA, BIG, Biologie à Grande Echelle, F-38054 Grenoble, France.,Université Grenoble-Alpes, F-38000 Grenoble, France.,INSERM, U1038, F-38054 Grenoble, France
| |
Collapse
|
6
|
You R, Zhang Z, Xiong Y, Sun F, Mamitsuka H, Zhu S. GOLabeler: improving sequence-based large-scale protein function prediction by learning to rank. Bioinformatics 2018. [DOI: 10.1093/bioinformatics/bty130] [Citation(s) in RCA: 81] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Affiliation(s)
- Ronghui You
- School of Computer Science and Shanghai Key Lab of Intelligent Information Processing
- Center for Computational System Biology, ISTBI, Fudan University, Shanghai, China
| | - Zihan Zhang
- School of Computer Science and Shanghai Key Lab of Intelligent Information Processing
- Center for Computational System Biology, ISTBI, Fudan University, Shanghai, China
| | - Yi Xiong
- Department of Bioinformatics and Biostatistics, Shanghai Jiaotong University, Shanghai, China
| | - Fengzhu Sun
- Center for Computational System Biology, ISTBI, Fudan University, Shanghai, China
- Molecular and Computational Biology Program, Department of Biological Sciences, University of Southern California, Los Angeles, USA
| | - Hiroshi Mamitsuka
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto Prefecture, Japan
- Department of Computer Science, Aalto University, Helsinki, Finland
| | - Shanfeng Zhu
- School of Computer Science and Shanghai Key Lab of Intelligent Information Processing
- Center for Computational System Biology, ISTBI, Fudan University, Shanghai, China
| |
Collapse
|
7
|
Chen JY, Pandey R, Nguyen TM. HAPPI-2: a Comprehensive and High-quality Map of Human Annotated and Predicted Protein Interactions. BMC Genomics 2017; 18:182. [PMID: 28212602 PMCID: PMC5314692 DOI: 10.1186/s12864-017-3512-1] [Citation(s) in RCA: 31] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2015] [Accepted: 01/24/2017] [Indexed: 01/07/2023] Open
Abstract
BACKGROUND Human protein-protein interaction (PPI) data is essential to network and systems biology studies. PPI data can help biochemists hypothesize how proteins form complexes by binding to each other, how extracellular signals propagate through post-translational modification of de-activated signaling molecules, and how chemical reactions are coupled by enzymes involved in a complex biological process. Our capability to develop good public database resources for human PPI data has a direct impact on the quality of future research on genome biology and medicine. RESULTS The database of Human Annotated and Predicted Protein Interactions (HAPPI) version 2.0 is a major update to the original HAPPI 1.0 database. It contains 2,922,202 unique protein-protein interactions (PPI) linked by 23,060 human proteins, making it the most comprehensive database covering human PPI data today. These PPIs contain both physical/direct interactions and high-quality functional/indirect interactions. Compared with the HAPPI 1.0 database release, HAPPI database version 2.0 (HAPPI-2) represents a 485% of human PPI data coverage increase and a 73% protein coverage increase. The revamped HAPPI web portal provides users with a friendly search, curation, and data retrieval interface, allowing them to retrieve human PPIs and available annotation information on the interaction type, interaction quality, interacting partner drug targeting data, and disease information. The updated HAPPI-2 can be freely accessed by Academic users at http://discovery.informatics.uab.edu/HAPPI . CONCLUSIONS While the underlying data for HAPPI-2 are integrated from a diverse data sources, the new HAPPI-2 release represents a good balance between data coverage and data quality of human PPIs, making it ideally suited for network biology.
Collapse
Affiliation(s)
- Jake Y Chen
- Wenzhou Medical University First Affiliate Hospital, Wenzhou, Zhejiang Province, China. .,Medeolinx, LLC, Indianapolis, IN, 46280, USA. .,The Informatics Institute, University of Alabama at Birmingham School of Medicine, Birmingham, AL, 35294, USA. .,Indiana Center for Systems Biology and Personalized Medicine, Indiana University School of Informatics and Computing, Indianapolis, IN, 46202, USA.
| | | | - Thanh M Nguyen
- Indiana Center for Systems Biology and Personalized Medicine, Indiana University School of Informatics and Computing, Indianapolis, IN, 46202, USA
| |
Collapse
|
8
|
Abstract
Surveys of public sequence resources show that experimentally supported functional information is still completely missing for a considerable fraction of known proteins and is clearly incomplete for an even larger portion. Bioinformatics methods have long made use of very diverse data sources alone or in combination to predict protein function, with the understanding that different data types help elucidate complementary biological roles. This chapter focuses on methods accepting amino acid sequences as input and producing GO term assignments directly as outputs; the relevant biological and computational concepts are presented along with the advantages and limitations of individual approaches.
Collapse
Affiliation(s)
- Domenico Cozzetto
- Bioinformatics Group, Department of Computer Science, University College London, Gower Street, London, WC1E 6BT, UK
| | - David T Jones
- Bioinformatics Group, Department of Computer Science, University College London, Gower Street, London, WC1E 6BT, UK.
| |
Collapse
|
9
|
Computational Methods for Integration of Biological Data. Per Med 2016. [DOI: 10.1007/978-3-319-39349-0_8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
10
|
Gligorijević V, Pržulj N. Methods for biological data integration: perspectives and challenges. J R Soc Interface 2015; 12:20150571. [PMID: 26490630 PMCID: PMC4685837 DOI: 10.1098/rsif.2015.0571] [Citation(s) in RCA: 157] [Impact Index Per Article: 17.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2015] [Accepted: 09/25/2015] [Indexed: 12/17/2022] Open
Abstract
Rapid technological advances have led to the production of different types of biological data and enabled construction of complex networks with various types of interactions between diverse biological entities. Standard network data analysis methods were shown to be limited in dealing with such heterogeneous networked data and consequently, new methods for integrative data analyses have been proposed. The integrative methods can collectively mine multiple types of biological data and produce more holistic, systems-level biological insights. We survey recent methods for collective mining (integration) of various types of networked biological data. We compare different state-of-the-art methods for data integration and highlight their advantages and disadvantages in addressing important biological problems. We identify the important computational challenges of these methods and provide a general guideline for which methods are suited for specific biological problems, or specific data types. Moreover, we propose that recent non-negative matrix factorization-based approaches may become the integration methodology of choice, as they are well suited and accurate in dealing with heterogeneous data and have many opportunities for further development.
Collapse
Affiliation(s)
| | - Nataša Pržulj
- Department of Computing, Imperial College London, London SW7 2AZ, UK
| |
Collapse
|
11
|
Jiang P, Wang H, Li W, Zang C, Li B, Wong YJ, Meyer C, Liu JS, Aster JC, Liu XS. Network analysis of gene essentiality in functional genomics experiments. Genome Biol 2015; 16:239. [PMID: 26518695 PMCID: PMC4627418 DOI: 10.1186/s13059-015-0808-9] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2015] [Accepted: 10/20/2015] [Indexed: 12/18/2022] Open
Abstract
Many genomic techniques have been developed to study gene essentiality genome-wide, such as CRISPR and shRNA screens. Our analyses of public CRISPR screens suggest protein interaction networks, when integrated with gene expression or histone marks, are highly predictive of gene essentiality. Meanwhile, the quality of CRISPR and shRNA screen results can be significantly enhanced through network neighbor information. We also found network neighbor information to be very informative on prioritizing ChIP-seq target genes and survival indicator genes from tumor profiling. Thus, our study provides a general method for gene essentiality analysis in functional genomic experiments ( http://nest.dfci.harvard.edu ).
Collapse
Affiliation(s)
- Peng Jiang
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Harvard T.H. Chan School of Public Health, Boston, MA, 02215, USA
| | - Hongfang Wang
- Department of Pathology, Brigham and Women's Hospital, Boston, MA, 02115, USA
| | - Wei Li
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Harvard T.H. Chan School of Public Health, Boston, MA, 02215, USA
| | - Chongzhi Zang
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Harvard T.H. Chan School of Public Health, Boston, MA, 02215, USA
| | - Bo Li
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Harvard T.H. Chan School of Public Health, Boston, MA, 02215, USA
| | - Yinling J Wong
- Department of Pathology, Brigham and Women's Hospital, Boston, MA, 02115, USA
| | - Cliff Meyer
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Harvard T.H. Chan School of Public Health, Boston, MA, 02215, USA
| | - Jun S Liu
- Department of Statistics, Harvard University, Cambridge, 200092, China
| | - Jon C Aster
- Department of Pathology, Brigham and Women's Hospital, Boston, MA, 02115, USA
| | - X Shirley Liu
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Harvard T.H. Chan School of Public Health, Boston, MA, 02215, USA. .,School of Life Science and Technology, Tongji University, Shanghai, MA, 02138, USA.
| |
Collapse
|
12
|
Theofilatos KA, Likothanassis S, Mavroudi S. Quo vadis computational analysis of PPI data or why the future isn't here yet. Front Genet 2015; 6:289. [PMID: 26442107 PMCID: PMC4584938 DOI: 10.3389/fgene.2015.00289] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2015] [Accepted: 08/31/2015] [Indexed: 11/13/2022] Open
Affiliation(s)
| | - Spiros Likothanassis
- InSyBio Ltd. London, UK ; Pattern Recognition Laboratory, Department of Computer Engineering and Informatics, University of Patras Patras, Greece
| | - Seferina Mavroudi
- InSyBio Ltd. London, UK ; Pattern Recognition Laboratory, Department of Computer Engineering and Informatics, University of Patras Patras, Greece ; Department of Social Work, School of Sciences of Health and Care, Technological Educational Institute of Western Greece Patras, Greece
| |
Collapse
|
13
|
Wang W, Zhou X, Liu Z, Sun F. Network tuned multiple rank aggregation and applications to gene ranking. BMC Bioinformatics 2015; 16 Suppl 1:S6. [PMID: 25708095 PMCID: PMC4331705 DOI: 10.1186/1471-2105-16-s1-s6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
With the development of various high throughput technologies and analysis methods, researchers can study different aspects of a biological phenomenon simultaneously or one aspect repeatedly with different experimental techniques and analysis methods. The output from each study is a rank list of components of interest. Aggregation of the rank lists of components, such as proteins, genes and single nucleotide variants (SNV), produced by these experiments has been proven to be helpful in both filtering the noise and bringing forth a more complete understanding of the biological problems. Current available rank aggregation methods do not consider the network information that has been observed to provide vital contributions in many data integration studies. We developed network tuned rank aggregation methods incorporating network information and demonstrated its superior performance over aggregation methods without network information. The methods are tested on predicting the Gene Ontology function of yeast proteins. We validate the methods using combinations of three gene expression data sets and three protein interaction networks as well as an integrated network by combining the three networks. Results show that the aggregated rank lists are more meaningful if protein interaction network is incorporated. Among the methods compared, CGI_RRA and CGI_Endeavour, which integrate rank lists with networks using CGI [1] followed by rank aggregation using either robust rank aggregation (RRA) [2] or Endeavour [3] perform the best. Finally, we use the methods to locate target genes of transcription factors.
Collapse
|
14
|
Chen B, Li M, Wang J, Wu FX. Disease gene identification by using graph kernels and Markov random fields. SCIENCE CHINA. LIFE SCIENCES 2014; 57:1054-1063. [PMID: 25326067 DOI: 10.1007/s11427-014-4745-8] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/21/2014] [Accepted: 07/14/2014] [Indexed: 01/05/2023]
Abstract
Genes associated with similar diseases are often functionally related. This principle is largely supported by many biological data sources, such as disease phenotype similarities, protein complexes, protein-protein interactions, pathways and gene expression profiles. Integrating multiple types of biological data is an effective method to identify disease genes for many genetic diseases. To capture the gene-disease associations based on biological networks, a kernel-based MRF method is proposed by combining graph kernels and the Markov random field (MRF) method. In the proposed method, three kinds of kernels are employed to describe the overall relationships of vertices in five biological networks, respectively, and a novel weighted MRF method is developed to integrate those data. In addition, an improved Gibbs sampling procedure and a novel parameter estimation method are proposed to generate predictions from the kernel-based MRF method. Numerical experiments are carried out by integrating known gene-disease associations, protein complexes, protein-protein interactions, pathways and gene expression profiles. The proposed kernel-based MRF method is evaluated by the leave-one-out cross validation paradigm, achieving an AUC score of 0.771 when integrating all those biological data in our experiments, which indicates that our proposed method is very promising compared with many existing methods.
Collapse
Affiliation(s)
- BoLin Chen
- Division of Biomedical Engineering, University of Saskatchewan, Saskatoon, S7N 5A9, Canada
| | | | | | | |
Collapse
|
15
|
Dhal PK, Barman RK, Saha S, Das S. Dynamic modularity of host protein interaction networks in Salmonella Typhi infection. PLoS One 2014; 9:e104911. [PMID: 25144185 PMCID: PMC4140748 DOI: 10.1371/journal.pone.0104911] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2014] [Accepted: 07/17/2014] [Indexed: 01/08/2023] Open
Abstract
Background Salmonella Typhi is a human-restricted pathogen, which causes typhoid fever and remains a global health problem in the developing countries. Although previously reported host expression datasets had identified putative biomarkers and therapeutic targets of typhoid fever, the underlying molecular mechanism of pathogenesis remains incompletely understood. Methods We used five gene expression datasets of human peripheral blood from patients suffering from S. Typhi or other bacteremic infections or non-infectious disease like leukemia. The expression datasets were merged into human protein interaction network (PIN) and the expression correlation between the hubs and their interacting proteins was measured by calculating Pearson Correlation Coefficient (PCC) values. The differences in the average PCC for each hub between the disease states and their respective controls were calculated for studied datasets. The individual hubs and their interactors with expression, PCC and average PCC values were treated as dynamic subnetworks. The hubs that showed unique trends of alterations specific to S. Typhi infection were identified. Results We identified S. Typhi infection-specific dynamic subnetworks of the host, which involve 81 hubs and 1343 interactions. The major enriched GO biological process terms in the identified subnetworks were regulation of apoptosis and biological adhesions, while the enriched pathways include cytokine signalling in the immune system and downstream TCR signalling. The dynamic nature of the hubs CCR1, IRS2 and PRKCA with their interactors was studied in detail. The difference in the dynamics of the subnetworks specific to S. Typhi infection suggests a potential molecular model of typhoid fever. Conclusions Hubs and their interactors of the S. Typhi infection-specific dynamic subnetworks carrying distinct PCC values compared with the non-typhoid and other disease conditions reveal new insight into the pathogenesis of S. Typhi.
Collapse
Affiliation(s)
- Paltu Kumar Dhal
- Biomedical Informatics Centre, National Institute of Cholera and Enteric Diseases, Kolkata, West Bengal, India
| | - Ranjan Kumar Barman
- Biomedical Informatics Centre, National Institute of Cholera and Enteric Diseases, Kolkata, West Bengal, India
| | - Sudipto Saha
- Bioinformatics Centre, Bose Institute, Kolkata, West Bengal, India
| | - Santasabuj Das
- Biomedical Informatics Centre, National Institute of Cholera and Enteric Diseases, Kolkata, West Bengal, India; Division of Clinical Medicine, National Institute of Cholera and Enteric Diseases, Kolkata, West Bengal, India
| |
Collapse
|