1
|
Wei PJ, Ma W, Li Y, Su Y. Disease biomarker identification based on sample network optimization. Methods 2023; 213:42-49. [PMID: 37001685 DOI: 10.1016/j.ymeth.2023.03.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2023] [Revised: 03/12/2023] [Accepted: 03/16/2023] [Indexed: 03/31/2023] Open
Abstract
A large amount of evidence shows that biomarkers are discriminant features related to disease development. Thus, the identification of disease biomarkers has become a basic problem in the analysis of complex diseases in the medical fields, such as disease stage judgment, disease diagnosis and treatment. Research based on networks have become one of the most popular methods. Several algorithms based on networks have been proposed to identify biomarkers, however the networks of genes or molecules ignored the similarities and associations among the samples. It is essential to further understand how to construct and optimize the networks to make the identified biomarkers more accurate. On this basis, more effective strategies can be developed to improve the performance of biomarkers identification. In this study, a multi-objective evolution algorithm based on sample similarity networks has been proposed for disease biomarker identification. Specifically, we design the sample similarity networks to extract the structural characteristic information among samples, which used to calculate the influence of the sample to each class. Besides, based on the networks and the group of biomarkers we choose in every iteration, we can divide samples into different classes by the importance for each class. Then, in the process of evolution algorithm population iteration, we develop the elite guidance strategy and fusion selection strategy to select the biomarkers which make the sample classification more accurate. The experiment results on the five gene expression datasets suggests that the algorithm we proposed is superior over some state-of-the-art disease biomarker identification methods.
Collapse
Affiliation(s)
- Pi-Jing Wei
- Information Materials and Intelligent Sensing Laboratory of Anhui Province, Institutes of Physical Science and Information Technology, Anhui University, 111 Jiulong Road, 230601 Hefei, Anhui, China
| | - Wenwen Ma
- Key Laboratory of Intelligent Computing and Signal Processing, School of Computer Science and Technology, Anhui University, 111 Jiulong Road, 230601 Hefei, China
| | - Yanxin Li
- Department of Cardiology, The Third Hospital of Xingtai, Xingtai 054000, Hebei, China
| | - Yansen Su
- Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, 5089 Wangjiang West Road, 230088 Hefei, China; School of Artificial Intelligence, Anhui University, 111 Jiulong Road, 230601 Hefei, China.
| |
Collapse
|
2
|
Liu J, Zhu H, Qiu J. Locally Adjust Networks Based on Connectivity and Semantic Similarities for Disease Module Detection. Front Genet 2021; 12:726596. [PMID: 34759955 PMCID: PMC8575408 DOI: 10.3389/fgene.2021.726596] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Accepted: 09/22/2021] [Indexed: 11/13/2022] Open
Abstract
For studying the pathogenesis of complex diseases, it is important to identify the disease modules in the system level. Since the protein-protein interaction (PPI) networks contain a number of incomplete and incorrect interactome, most existing methods often lead to many disease proteins isolating from disease modules. In this paper, we propose an effective disease module identification method IDMCSS, where the used human PPI networks are obtained by adding some potential missing interactions from existing PPI networks, as well as removing some potential incorrect interactions. In IDMCSS, a network adjustment strategy is developed to add or remove links around disease proteins based on both topological and semantic information. Next, neighboring proteins of disease proteins are prioritized according to a suggested similarity between each of them and disease proteins, and the protein with the largest similarity with disease proteins is added into a candidate disease protein set one by one. The stopping criterion is set to the boundary of the disease proteins. Finally, the connected subnetwork having the largest number of disease proteins is selected as a disease module. Experimental results on asthma demonstrate the effectiveness of the method in comparison to existing algorithms for disease module identification. It is also shown that the proposed IDMCSS can obtain the disease modules having crucial biological processes of asthma and 12 targets for drug intervention can be predicted.
Collapse
Affiliation(s)
- Jia Liu
- State Key Laboratory of Media Convergence and Communication, Communication University of China, Beijing, China
| | - Huole Zhu
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Artificial Intelligence, Anhui University, Hefei, China
- Information Materials and Intelligent Sensing Laboratory of Anhui Province, School of Artificial Intelligence, Anhui University, Hefei, China
| | - Jianfeng Qiu
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Artificial Intelligence, Anhui University, Hefei, China
- Information Materials and Intelligent Sensing Laboratory of Anhui Province, School of Artificial Intelligence, Anhui University, Hefei, China
| |
Collapse
|
3
|
Tian Y, Su X, Su Y, Zhang X. EMODMI: A Multi-Objective Optimization Based Method to Identify Disease Modules. IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE 2021. [DOI: 10.1109/tetci.2020.3014923] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
4
|
Schulc K, Nagy ZT, Kamp S, Molnár J, Veres DV, Csermely P, Kovács BM. Modular Reorganization of Signaling Networks during the Development of Colon Adenoma and Carcinoma. J Phys Chem B 2021; 125:1716-1726. [PMID: 33562960 PMCID: PMC8023713 DOI: 10.1021/acs.jpcb.0c09307] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/09/2022]
Abstract
![]()
Network science is
an emerging tool in systems biology and oncology,
providing novel, system-level insight into the development of cancer.
The aim of this project was to study the signaling networks in the
process of oncogenesis to explore the adaptive mechanisms taking part
in the cancerous transformation of healthy cells. For this purpose,
colon cancer proved to be an excellent candidate as the preliminary
phase, and adenoma has a long evolution time. In our work, transcriptomic
data have been collected from normal colon, colon adenoma, and colon
cancer samples to calculating link (i.e., network edge) weights as
approximative proxies for protein abundances, and link weights were
included in the Human Cancer Signaling Network. Here we show that
the adenoma phase clearly differs from the normal and cancer states
in terms of a more scattered link weight distribution and enlarged
network diameter. Modular analysis shows the rearrangement of the
apoptosis- and the cell-cycle-related modules, whose pathway enrichment
analysis supports the relevance of targeted therapy. Our work enriches
the system-wide assessment of cancer development, showing specific
changes for the adenoma state.
Collapse
Affiliation(s)
- Klára Schulc
- Department of Molecular Biology, Semmelweis University, Budapest 1085, Hungary
| | - Zsolt T Nagy
- Department of Molecular Biology, Semmelweis University, Budapest 1085, Hungary
| | | | | | - Daniel V Veres
- Department of Molecular Biology, Semmelweis University, Budapest 1085, Hungary.,Turbine Ltd, Budapest, Hungary
| | - Peter Csermely
- Department of Molecular Biology, Semmelweis University, Budapest 1085, Hungary
| | - Borbála M Kovács
- Department of Molecular Biology, Semmelweis University, Budapest 1085, Hungary
| |
Collapse
|
5
|
Su Y, Su X, Wang Q, Zhang L. A multi-objective optimization method for identification of module biomarkers for disease diagnosis. Methods 2020; 192:35-45. [PMID: 32949693 DOI: 10.1016/j.ymeth.2020.09.001] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2020] [Revised: 07/03/2020] [Accepted: 09/07/2020] [Indexed: 01/14/2023] Open
Abstract
Biomarker identification aims at finding a set of biological indicators that best discriminate biological samples of different phenotypes. In this paper, we take the module containing the significant disease-related genes and their interactions from biological networks as a module biomarker, and propose an evolutionary multi-objective optimization method to identify module biomarkers for disease diagnosis. To be specific, we take the classification accuracy on control and disease samples, the association with disease and the intra-link density in the module as the optimization objectives. To achieve the best performance, a novel population initiation strategy is tailored to generate dense-connected initial solutions, and a specific population update strategy is employed to direct the evolution towards the global optimums with abundant diversity. Experimental results show that our method outperforms the previous state-of-the-art disease diagnosis methods. Meantime, the detected biomarker module can reflect the basic and significant biological functions and has a great correlation with a disease phenotype.
Collapse
Affiliation(s)
- Yansen Su
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, Hefei 230601, China
| | - Xiaochun Su
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, Hefei 230601, China
| | - Qijun Wang
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, Hefei 230601, China.
| | - Lejun Zhang
- Yangzhou Univeristy, Yangzhou 225009, China.
| |
Collapse
|
6
|
Si Z, Hu K. Identification of osteosarcoma driver genes using a network method. Oncol Lett 2020; 19:1215-1222. [PMID: 31966051 PMCID: PMC6956419 DOI: 10.3892/ol.2019.11212] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2019] [Accepted: 11/07/2019] [Indexed: 02/05/2023] Open
Abstract
Osteosarcoma (OS) is a severe disease that is generally caused by genetic alterations. Systematic identification of driver genes may be used to increase the understanding of the mechanisms underlying the disease. The present study identified a framework to predict driver genes, with the hypothesis that driver genes operate through a number of connected functional genes. OS-related genes were extracted from the Catalogue Of Somatic Mutations In Cancer and subsequently ranked by virtue of their effect on a set of functional genes using a network-based algorithm. This revealed the driver genes associated with dysregulated networks. In addition, compared with the Mutations For Functional Impact on Network Neighbors algorithm, the results obtained using the aforementioned network-based algorithm revealed that the proposed method is effective. Gene functional analysis demonstrated that the potential OS driver genes were involved in OS-associated pathways. Through the validation of the 15 candidate OS driver genes, the classifier constructed in the present study revealed that the identified driver genes were able to distinguish 184 cancer samples from controls. Therefore, the present study provided insights into the identification of driver genes from a vast amount of sequencing data.
Collapse
Affiliation(s)
- Zebing Si
- Department of Orthopedics, The Affiliated Yuebei People's Hospital of Shantou University Medical College, Wujiang, Shaoguan 512026, P.R. China
| | - Konghe Hu
- Department of Orthopedics, The Affiliated Yuebei People's Hospital of Shantou University Medical College, Wujiang, Shaoguan 512026, P.R. China
- Correspondence to: Dr Konghe Hu, Department of Orthopedics, The Affiliated Yuebei People's Hospital of Shantou University Medical College, 133 Shaoguan Huimin South Avenue, Wujiang, Shaoguan 512026, P.R. China, E-mail:
| |
Collapse
|
7
|
Clarke R, Tyson JJ, Tan M, Baumann WT, Jin L, Xuan J, Wang Y. Systems biology: perspectives on multiscale modeling in research on endocrine-related cancers. Endocr Relat Cancer 2019; 26:R345-R368. [PMID: 30965282 PMCID: PMC7045974 DOI: 10.1530/erc-18-0309] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/25/2019] [Accepted: 04/08/2019] [Indexed: 12/12/2022]
Abstract
Drawing on concepts from experimental biology, computer science, informatics, mathematics and statistics, systems biologists integrate data across diverse platforms and scales of time and space to create computational and mathematical models of the integrative, holistic functions of living systems. Endocrine-related cancers are well suited to study from a systems perspective because of the signaling complexities arising from the roles of growth factors, hormones and their receptors as critical regulators of cancer cell biology and from the interactions among cancer cells, normal cells and signaling molecules in the tumor microenvironment. Moreover, growth factors, hormones and their receptors are often effective targets for therapeutic intervention, such as estrogen biosynthesis, estrogen receptors or HER2 in breast cancer and androgen receptors in prostate cancer. Given the complexity underlying the molecular control networks in these cancers, a simple, intuitive understanding of how endocrine-related cancers respond to therapeutic protocols has proved incomplete and unsatisfactory. Systems biology offers an alternative paradigm for understanding these cancers and their treatment. To correctly interpret the results of systems-based studies requires some knowledge of how in silico models are built, and how they are used to describe a system and to predict the effects of perturbations on system function. In this review, we provide a general perspective on the field of cancer systems biology, and we explore some of the advantages, limitations and pitfalls associated with using predictive multiscale modeling to study endocrine-related cancers.
Collapse
Affiliation(s)
- Robert Clarke
- Department of Oncology, Georgetown University Medical Center, Washington, District of Columbia, USA
| | - John J Tyson
- Department of Biological Sciences, Virginia Polytechnic Institute and State University, Blacksburg, Virginia, USA
| | - Ming Tan
- Department of Biostatistics, Bioinformatics & Biomathematics, Georgetown University Medical Center, Washington, District of Columbia, USA
| | - William T Baumann
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Blacksburg, Virginia, USA
| | - Lu Jin
- Department of Oncology, Georgetown University Medical Center, Washington, District of Columbia, USA
| | - Jianhua Xuan
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, Virginia, USA
| | - Yue Wang
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, Virginia, USA
| |
Collapse
|
8
|
Deng SP, Hu W, Calhoun VD, Wang YP. Integrating Imaging Genomic Data in the Quest for Biomarkers of Schizophrenia Disease. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:1480-1491. [PMID: 28880187 PMCID: PMC6207076 DOI: 10.1109/tcbb.2017.2748944] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/06/2023]
Abstract
It's increasingly important but difficult to determine potential biomarkers of schizophrenia (SCZ) disease, owing to the complex pathophysiology of this disease. In this study, a network-fusion based framework was proposed to identify genetic biomarkers of the SCZ disease. A three-step feature selection was applied to single nucleotide polymorphisms (SNPs), DNA methylation, and functional magnetic resonance imaging (fMRI) data to select important features, which were then used to construct two gene networks in different states for the SNPs and DNA methylation data, respectively. Two health networks (one is for SNP data and the other is for DNA methylation data) were combined into one health network from which health minimum spanning trees (MSTs) were extracted. Two disease networks also followed the same procedures. Those genes with significant changes were determined as SCZ biomarkers by comparing MSTs in two different states and they were finally validated from five aspects. The effectiveness of the proposed discovery framework was also demonstrated by comparing with other network-based discovery methods. In summary, our approach provides a general framework for discovering gene biomarkers of the complex diseases by integrating imaging genomic data, which can be applied to the diagnosis of the complex diseases in the future.
Collapse
Affiliation(s)
- Su-Ping Deng
- Department of Biomedical Engineering, School of Science and Engineering, Tulane University, New Orleans, LA 70118, USA.,
| | - Wenxing Hu
- Department of Biomedical Engineering, School of Science and Engineering, Tulane University, New Orleans, LA 70118, USA.,
| | | | - Yu-Ping Wang
- Department of Biomedical Engineering, School of Science and Engineering, Tulane University, New Orleans, LA 70118, USA., , Telephone: (504)865-5867, Fax: (504)862-8779
| |
Collapse
|
9
|
MGOGP: a gene module-based heuristic algorithm for cancer-related gene prioritization. BMC Bioinformatics 2018; 19:215. [PMID: 29871590 PMCID: PMC5989416 DOI: 10.1186/s12859-018-2216-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2017] [Accepted: 05/23/2018] [Indexed: 01/13/2023] Open
Abstract
Background Prioritizing genes according to their associations with a cancer allows researchers to explore genes in more informed ways. By far, Gene-centric or network-centric gene prioritization methods are predominated. Genes and their protein products carry out cellular processes in the context of functional modules. Dysfunctional gene modules have been previously reported to have associations with cancer. However, gene module information has seldom been considered in cancer-related gene prioritization. Results In this study, we propose a novel method, MGOGP (Module and Gene Ontology-based Gene Prioritization), for cancer-related gene prioritization. Different from other methods, MGOGP ranks genes considering information of both individual genes and their affiliated modules, and utilize Gene Ontology (GO) based fuzzy measure value as well as known cancer-related genes as heuristics. The performance of the proposed method is comprehensively validated by using both breast cancer and prostate cancer datasets, and by comparison with other methods. Results show that MGOGP outperforms other methods, and successfully prioritizes more genes with literature confirmed evidence. Conclusions This work will aid researchers in the understanding of the genetic architecture of complex diseases, and improve the accuracy of diagnosis and the effectiveness of therapy. Electronic supplementary material The online version of this article (10.1186/s12859-018-2216-0) contains supplementary material, which is available to authorized users.
Collapse
|
10
|
Incorporating biological prior knowledge for Bayesian learning via maximal knowledge-driven information priors. BMC Bioinformatics 2017; 18:552. [PMID: 29297278 PMCID: PMC5751802 DOI: 10.1186/s12859-017-1893-4] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
Background Phenotypic classification is problematic because small samples are ubiquitous; and, for these, use of prior knowledge is critical. If knowledge concerning the feature-label distribution – for instance, genetic pathways – is available, then it can be used in learning. Optimal Bayesian classification provides optimal classification under model uncertainty. It differs from classical Bayesian methods in which a classification model is assumed and prior distributions are placed on model parameters. With optimal Bayesian classification, uncertainty is treated directly on the feature-label distribution, which assures full utilization of prior knowledge and is guaranteed to outperform classical methods. Results The salient problem confronting optimal Bayesian classification is prior construction. In this paper, we propose a new prior construction methodology based on a general framework of constraints in the form of conditional probability statements. We call this prior the maximal knowledge-driven information prior (MKDIP). The new constraint framework is more flexible than our previous methods as it naturally handles the potential inconsistency in archived regulatory relationships and conditioning can be augmented by other knowledge, such as population statistics. We also extend the application of prior construction to a multinomial mixture model when labels are unknown, which often occurs in practice. The performance of the proposed methods is examined on two important pathway families, the mammalian cell-cycle and a set of p53-related pathways, and also on a publicly available gene expression dataset of non-small cell lung cancer when combined with the existing prior knowledge on relevant signaling pathways. Conclusion The new proposed general prior construction framework extends the prior construction methodology to a more flexible framework that results in better inference when proper prior knowledge exists. Moreover, the extension of optimal Bayesian classification to multinomial mixtures where data sets are both small and unlabeled, enables superior classifier design using small, unstructured data sets. We have demonstrated the effectiveness of our approach using pathway information and available knowledge of gene regulating functions; however, the underlying theory can be applied to a wide variety of knowledge types, and other applications when there are small samples.
Collapse
|
11
|
Mohammed A, Biegert G, Adamec J, Helikar T. Identification of potential tissue-specific cancer biomarkers and development of cancer versus normal genomic classifiers. Oncotarget 2017; 8:85692-85715. [PMID: 29156751 PMCID: PMC5689641 DOI: 10.18632/oncotarget.21127] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2017] [Accepted: 09/05/2017] [Indexed: 01/15/2023] Open
Abstract
Machine learning techniques for cancer prediction and biomarker discovery can hasten cancer detection and significantly improve prognosis. Recent “OMICS” studies which include a variety of cancer and normal tissue samples along with machine learning approaches have the potential to further accelerate such discovery. To demonstrate this potential, 2,175 gene expression samples from nine tissue types were obtained to identify gene sets whose expression is characteristic of each cancer class. Using random forests classification and ten-fold cross-validation, we developed nine single-tissue classifiers, two multi-tissue cancer-versus-normal classifiers, and one multi-tissue normal classifier. Given a sample of a specified tissue type, the single-tissue models classified samples as cancer or normal with a testing accuracy between 85.29% and 100%. Given a sample of non-specific tissue type, the multi-tissue bi-class model classified the sample as cancer versus normal with a testing accuracy of 97.89%. Given a sample of non-specific tissue type, the multi-tissue multi-class model classified the sample as cancer versus normal and as a specific tissue type with a testing accuracy of 97.43%. Given a normal sample of any of the nine tissue types, the multi-tissue normal model classified the sample as a particular tissue type with a testing accuracy of 97.35%. The machine learning classifiers developed in this study identify potential cancer biomarkers with sensitivity and specificity that exceed those of existing biomarkers and pointed to pathways that are critical to tissue-specific tumor development. This study demonstrates the feasibility of predicting the tissue origin of carcinoma in the context of multiple cancer classes.
Collapse
Affiliation(s)
- Akram Mohammed
- Department of Biochemistry, University of Nebraska-Lincoln, Lincoln, Nebraska, USA
| | - Greyson Biegert
- Department of Biochemistry, University of Nebraska-Lincoln, Lincoln, Nebraska, USA
| | - Jiri Adamec
- Department of Biochemistry, University of Nebraska-Lincoln, Lincoln, Nebraska, USA
| | - Tomáš Helikar
- Department of Biochemistry, University of Nebraska-Lincoln, Lincoln, Nebraska, USA
| |
Collapse
|
12
|
Li B, Liu J, Yu Y, Wang P, Zhang Y, Ni X, Liu Q, Zhang X, Wang Z, Wang Y. Network-Wide Screen Identifies Variation of Novel Precise On-Module Targets Using Conformational Modudaoism. CPT-PHARMACOMETRICS & SYSTEMS PHARMACOLOGY 2017; 7:16-25. [PMID: 28925077 PMCID: PMC5784734 DOI: 10.1002/psp4.12253] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/13/2017] [Revised: 08/30/2017] [Accepted: 09/13/2017] [Indexed: 01/30/2023]
Abstract
Modular targeting is promising in drug research at the network level, but it is challenging to quantificationally identify the precise on-modules. Based on a proposed Modudaoism (MD), we defined conserved MD (MDc) and varied MD (MDv) to quantitatively evaluate the conformational and energy variations of modules, and thereby identify the conserved and discrepant allosteric modules (AMs). Compared to the Zsummary , MDc/MDv got an optimized result of module preserved ratio and modular structure. In the mice anti-ischemic networks, 3, 5, and 1 conserved AMs as well as 4, 1, and 3 on-modules of baicalin (BA), jasminoidin (JA), and ursodeoxycholic acid (UA) were identified by MDc and MDv, 5 unique AMs and their characteristic actions were revealed. Besides, co-immunoprecipitation (Co-IP) experiments validated the representative modular structure. MDc/MDv method can quantitatively define the conformational variations of modules and screen the precise on-modules network-wide, which may provide a promising strategy for drug discovery.
Collapse
Affiliation(s)
- Bing Li
- Institute of Basic Research in Clinical Medicine, China Academy of Chinese Medical Sciences, Beijing, China.,Institute of Information on Traditional Chinese Medicine, China Academy of Chinese Medical Sciences, Beijing, China.,ShanXi Buchang Pharmaceutical Co., Ltd., Heze, China
| | - Jun Liu
- Institute of Basic Research in Clinical Medicine, China Academy of Chinese Medical Sciences, Beijing, China
| | - Yanan Yu
- Institute of Basic Research in Clinical Medicine, China Academy of Chinese Medical Sciences, Beijing, China
| | - Pengqian Wang
- Institute of Chinese Materia Medica, China Academy of Chinese Medical Sciences, Beijing, China
| | - Yingying Zhang
- Dongzhimen Hospital, Beijing University of Chinese Medicine, Beijing, China
| | - Xumin Ni
- Department of Mathematics, School of Science, Beijing Jiaotong University, Beijing, China
| | - Qiong Liu
- Institute of Basic Research in Clinical Medicine, China Academy of Chinese Medical Sciences, Beijing, China
| | - Xiaoxu Zhang
- Institute of Basic Research in Clinical Medicine, China Academy of Chinese Medical Sciences, Beijing, China
| | - Zhong Wang
- Institute of Basic Research in Clinical Medicine, China Academy of Chinese Medical Sciences, Beijing, China
| | - Yongyan Wang
- Institute of Basic Research in Clinical Medicine, China Academy of Chinese Medical Sciences, Beijing, China
| |
Collapse
|
13
|
Zhang C, Liu J, Shi Q, Zeng T, Chen L. Comparative network stratification analysis for identifying functional interpretable network biomarkers. BMC Bioinformatics 2017; 18:48. [PMID: 28361683 PMCID: PMC5374559 DOI: 10.1186/s12859-017-1462-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
BACKGROUND A major challenge of bioinformatics in the era of precision medicine is to identify the molecular biomarkers for complex diseases. It is a general expectation that these biomarkers or signatures have not only strong discrimination ability, but also readable interpretations in a biological sense. Generally, the conventional expression-based or network-based methods mainly capture differential genes or differential networks as biomarkers, however, such biomarkers only focus on phenotypic discrimination and usually have less biological or functional interpretation. Meanwhile, the conventional function-based methods could consider the biomarkers corresponding to certain biological functions or pathways, but ignore the differential information of genes, i.e., disregard the active degree of particular genes involved in particular functions, thereby resulting in less discriminative ability on phenotypes. Hence, it is strongly demanded to develop elaborate computational methods to directly identify functional network biomarkers with both discriminative power on disease states and readable interpretation on biological functions. RESULTS In this paper, we present a new computational framework based on an integer programming model, named as Comparative Network Stratification (CNS), to extract functional or interpretable network biomarkers, which are of strongly discriminative power on disease states and also readable interpretation on biological functions. In addition, CNS can not only recognize the pathogen biological functions disregarded by traditional Expression-based/Network-based methods, but also uncover the active network-structures underlying such dysregulated functions underestimated by traditional Function-based methods. To validate the effectiveness, we have compared CNS with five state-of-the-art methods, i.e. GSVA, Pathifier, stSVM, frSVM and AEP on four datasets of different complex diseases. The results show that CNS can enhance the discriminative power of network biomarkers, and further provide biologically interpretable information or disease pathogenic mechanism of these biomarkers. A case study on type 1 diabetes (T1D) demonstrates that CNS can identify many dysfunctional genes and networks previously disregarded by conventional approaches. CONCLUSION Therefore, CNS is actually a powerful bioinformatics tool, which can identify functional or interpretable network biomarkers with both discriminative power on disease states and readable interpretation on biological functions. CNS was implemented as a Matlab package, which is available at http://www.sysbio.ac.cn/cb/chenlab/images/CNSpackage_0.1.rar .
Collapse
Affiliation(s)
- Chuanchao Zhang
- State Key Laboratory of Software Engineering, School of Computer, Wuhan University, Wuhan, 430072, China
- Key Laboratory of Systems Biology, Innovation Center for Cell Signaling Network, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Juan Liu
- State Key Laboratory of Software Engineering, School of Computer, Wuhan University, Wuhan, 430072, China.
| | - Qianqian Shi
- Key Laboratory of Systems Biology, Innovation Center for Cell Signaling Network, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Tao Zeng
- Key Laboratory of Systems Biology, Innovation Center for Cell Signaling Network, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, China.
| | - Luonan Chen
- Key Laboratory of Systems Biology, Innovation Center for Cell Signaling Network, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, China.
| |
Collapse
|
14
|
Li B, Liu J, Zhang YY, Wang PQ, Yu YN, Kang RX, Wu HL, Zhang XX, Wang Z, Wang YY. Quantitative Identification of Compound-Dependent On-Modules and Differential Allosteric Modules From Homologous Ischemic Networks. CPT-PHARMACOMETRICS & SYSTEMS PHARMACOLOGY 2016; 5:575-584. [PMID: 27758049 PMCID: PMC5080653 DOI: 10.1002/psp4.12127] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/26/2016] [Revised: 07/28/2016] [Accepted: 08/22/2016] [Indexed: 12/13/2022]
Abstract
Module‐based methods have made much progress in deconstructing biological networks. However, it is a great challenge to quantitatively compare the topological structural variations of modules (allosteric modules [AMs]) under different situations. A total of 23, 42, and 15 coexpression modules were identified in baicalin (BA), jasminoidin (JA), and ursodeoxycholic acid (UA) in a global anti‐ischemic mice network, respectively. Then, we integrated the methods of module‐based consensus ratio (MCR) and modified Zsummary module statistic to validate 12 BA, 22 JA, and 8 UA on‐modules based on comparing with vehicle. The MCRs for pairwise comparisons were 1.55% (BA vs. JA), 1.45% (BA vs. UA), and 1.27% (JA vs. UA), respectively. Five conserved allosteric modules (CAMs) and 17 unique allosteric modules (UAMs) were identified among these groups. In conclusion, module‐centric analysis may provide us a unique approach to understand multiple pharmacological mechanisms associated with differential phenotypes in the era of modular pharmacology.
Collapse
Affiliation(s)
- B Li
- Institute of Basic Research in Clinical Medicine, China Academy of Chinese Medical Sciences, Beijing, China.,Institute of Information on Traditional Chinese Medicine, China Academy of Chinese Medical Sciences, Beijing, China
| | - J Liu
- Institute of Basic Research in Clinical Medicine, China Academy of Chinese Medical Sciences, Beijing, China
| | - Y Y Zhang
- Institute of Basic Research in Clinical Medicine, China Academy of Chinese Medical Sciences, Beijing, China
| | - P Q Wang
- Institute of Basic Research in Clinical Medicine, China Academy of Chinese Medical Sciences, Beijing, China
| | - Y N Yu
- Institute of Basic Research in Clinical Medicine, China Academy of Chinese Medical Sciences, Beijing, China
| | - R X Kang
- Institute of Basic Research in Clinical Medicine, China Academy of Chinese Medical Sciences, Beijing, China
| | - H L Wu
- Institute of Basic Research in Clinical Medicine, China Academy of Chinese Medical Sciences, Beijing, China
| | - X X Zhang
- Institute of Basic Research in Clinical Medicine, China Academy of Chinese Medical Sciences, Beijing, China
| | - Z Wang
- Institute of Basic Research in Clinical Medicine, China Academy of Chinese Medical Sciences, Beijing, China.
| | - Y Y Wang
- Institute of Basic Research in Clinical Medicine, China Academy of Chinese Medical Sciences, Beijing, China
| |
Collapse
|
15
|
Systematic tracking of coordinated differential network motifs identifies novel disease-related genes by integrating multiple data. Neurocomputing 2016. [DOI: 10.1016/j.neucom.2015.12.120] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
16
|
Liu ZP. Identifying network-based biomarkers of complex diseases from high-throughput data. Biomark Med 2016; 10:633-50. [DOI: 10.2217/bmm-2015-0035] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
In this work, we review the main available computational methods of identifying biomarkers of complex diseases from high-throughput data. The emerging omics techniques provide powerful alternatives to measure thousands of molecules in cells in parallel manners. The generated genomic, transcriptomic, proteomic, metabolomic and phenomic data provide comprehensive molecular and cellular information for detecting critical signals served as biomarkers by classifying disease phenotypic states. Networks are often employed to organize these profiles in the identification of biomarkers to deal with complex diseases in diagnosis, prognosis and therapy as well as mechanism deciphering from systematic perspectives. Here, we summarize some representative network-based bioinformatics methods in order to highlight the importance of computational strategies in biomarker discovery.
Collapse
Affiliation(s)
- Zhi-Ping Liu
- Department of Biomedical Engineering, School of Control Science & Engineering, Shandong University, Jinan, Shandong 250061, China
| |
Collapse
|
17
|
Efficient and biologically relevant consensus strategy for Parkinson's disease gene prioritization. BMC Med Genomics 2016; 9:12. [PMID: 26961748 PMCID: PMC4784386 DOI: 10.1186/s12920-016-0173-x] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2015] [Accepted: 03/01/2016] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND The systemic information enclosed in microarray data encodes relevant clues to overcome the poorly understood combination of genetic and environmental factors in Parkinson's disease (PD), which represents the major obstacle to understand its pathogenesis and to develop disease-modifying therapeutics. While several gene prioritization approaches have been proposed, none dominate over the rest. Instead, hybrid approaches seem to outperform individual approaches. METHODS A consensus strategy is proposed for PD related gene prioritization from mRNA microarray data based on the combination of three independent prioritization approaches: Limma, machine learning, and weighted gene co-expression networks. RESULTS The consensus strategy outperformed the individual approaches in terms of statistical significance, overall enrichment and early recognition ability. In addition to a significant biological relevance, the set of 50 genes prioritized exhibited an excellent early recognition ability (6 of the top 10 genes are directly associated with PD). 40 % of the prioritized genes were previously associated with PD including well-known PD related genes such as SLC18A2, TH or DRD2. Eight genes (CCNH, DLK1, PCDH8, SLIT1, DLD, PBX1, INSM1, and BMI1) were found to be significantly associated to biological process affected in PD, representing potentially novel PD biomarkers or therapeutic targets. Additionally, several metrics of standard use in chemoinformatics are proposed to evaluate the early recognition ability of gene prioritization tools. CONCLUSIONS The proposed consensus strategy represents an efficient and biologically relevant approach for gene prioritization tasks providing a valuable decision-making tool for the study of PD pathogenesis and the development of disease-modifying PD therapeutics.
Collapse
|
18
|
Yue H, Yang BO, Yang F, Hu XL, Kong FB. Co-expression network-based analysis of hippocampal expression data associated with Alzheimer's disease using a novel algorithm. Exp Ther Med 2016; 11:1707-1715. [PMID: 27168792 PMCID: PMC4840697 DOI: 10.3892/etm.2016.3131] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2014] [Accepted: 01/07/2016] [Indexed: 12/15/2022] Open
Abstract
Recent progress in bioinformatics has facilitated the clarification of biological processes associated with complex diseases. Numerous methods of co-expression analysis have been proposed for use in the study of pairwise relationships among genes. In the present study, a combined network based on gene pairs was constructed following the conversion and combination of gene pair score values using a novel algorithm across multiple approaches. Three hippocampal expression profiles of patients with Alzheimer's disease (AD) and normal controls were extracted from the ArrayExpress database, and a total of 144 differentially expressed (DE) genes across multiple studies were identified by a rank product (RP) method. Five groups of co-expression gene pairs and five networks were identified and constructed using four existing methods [weighted gene co-expression network analysis (WGCNA), empirical Bayesian (EB), differentially co-expressed genes and links (DCGL), search tool for the retrieval of interacting genes/proteins database (STRING)] and a novel rank-based algorithm with combined score, respectively. Topological analysis indicated that the co-expression network constructed by the WGCNA method had the tendency to exhibit small-world characteristics, and the combined co-expression network was confirmed to be a scale-free network. Functional analysis of the co-expression gene pairs was conducted by Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis. The co-expression gene pairs were mostly enriched in five pathways, namely proteasome, oxidative phosphorylation, Parkinson's disease, Huntington's disease and AD. This study provides a new perspective to co-expression analysis. Since different methods of analysis often present varying abilities, the novel combination algorithm may provide a more credible and robust outcome, and could be used to complement to traditional co-expression analysis.
Collapse
Affiliation(s)
- Hong Yue
- Department of Neurology (No. 2), Rizhao People's Hospital, Rizhao, Shandong 276826, P.R. China
| | - B O Yang
- Department of Neurology (No. 2), Rizhao People's Hospital, Rizhao, Shandong 276826, P.R. China
| | - Fang Yang
- Department of Neurology (No. 2), Rizhao People's Hospital, Rizhao, Shandong 276826, P.R. China
| | - Xiao-Li Hu
- Department of Neurology (No. 2), Rizhao People's Hospital, Rizhao, Shandong 276826, P.R. China
| | - Fan-Bin Kong
- Department of Neurology (No. 2), Rizhao People's Hospital, Rizhao, Shandong 276826, P.R. China
| |
Collapse
|
19
|
Zhang C, Wang J, Zhang C, Liu J, Xu D, Chen L. Network stratification analysis for identifying function-specific network layers. MOLECULAR BIOSYSTEMS 2016; 12:1232-40. [PMID: 26879865 DOI: 10.1039/c5mb00782h] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
A major challenge of systems biology is to capture the rewiring of biological functions (e.g. signaling pathways) in a molecular network. To address this problem, we proposed a novel computational framework, namely network stratification analysis (NetSA), to stratify the whole biological network into various function-specific network layers corresponding to particular functions (e.g. KEGG pathways), which transform the network analysis from the gene level to the functional level by integrating expression data, the gene/protein network and gene ontology information altogether. The application of NetSA in yeast and its comparison with a traditional network-partition both suggest that NetSA can more effectively reveal functional implications of network rewiring and extract significant phenotype-related biological processes. Furthermore, for time-series or stage-wise data, the function-specific network layer obtained by NetSA is also shown to be able to characterize the disease progression in a dynamic manner. In particular, when applying NetSA to hepatocellular carcinoma and type 1 diabetes, we can derive functional spectra regarding the progression of the disease, and capture active biological functions (i.e. active pathways) in different disease stages. The additional comparison between NetSA and SPIA illustrates again that NetSA could discover more complete biological functions during disease progression. Overall, NetSA provides a general framework to stratify a network into various layers of function-specific sub-networks, which can not only analyze a biological network on the functional level but also investigate gene rewiring patterns in biological processes.
Collapse
|
20
|
Thingholm LB, Andersen L, Makalic E, Southey MC, Thomassen M, Hansen LL. Strategies for Integrated Analysis of Genetic, Epigenetic, and Gene Expression Variation in Cancer: Addressing the Challenges. Front Genet 2016; 7:2. [PMID: 26870081 PMCID: PMC4740898 DOI: 10.3389/fgene.2016.00002] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2015] [Accepted: 01/11/2016] [Indexed: 12/15/2022] Open
Abstract
The development and progression of cancer, a collection of diseases with complex genetic architectures, is facilitated by the interplay of multiple etiological factors. This complexity challenges the traditional single-platform study design and calls for an integrated approach to data analysis. However, integration of heterogeneous measurements of biological variation is a non-trivial exercise due to the diversity of the human genome and the variety of output data formats and genome coverage obtained from the commonly used molecular platforms. This review article will provide an introduction to integration strategies used for analyzing genetic risk factors for cancer. We critically examine the ability of these strategies to handle the complexity of the human genome and also accommodate information about the biological and functional interactions between the elements that have been measured-making the assessment of disease risk against a composite genomic factor possible. The focus of this review is to provide an overview and introduction to the main strategies and to discuss where there is a need for further development.
Collapse
Affiliation(s)
- Louise B Thingholm
- Department of Pathology, The University of MelbourneMelbourne, VIC, Australia; Department of Biomedicine, The University of AarhusAarhus, Denmark
| | - Lars Andersen
- Department of Clinical Genetics, Odense University Hospital Odense, Denmark
| | - Enes Makalic
- Centre for Epidemiology and Biostatistics, The University of Melbourne Melbourne, VIC, Australia
| | - Melissa C Southey
- Department of Pathology, The University of Melbourne Melbourne, VIC, Australia
| | - Mads Thomassen
- Department of Clinical Genetics, Odense University Hospital Odense, Denmark
| | | |
Collapse
|
21
|
Han D, Wang S, Jiang C, Jiang X, Kim HE, Sun J, Ohno-Machado L. Trends in biomedical informatics: automated topic analysis of JAMIA articles. J Am Med Inform Assoc 2015; 22:1153-63. [PMID: 26555018 PMCID: PMC5009912 DOI: 10.1093/jamia/ocv157] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2015] [Revised: 09/08/2015] [Accepted: 09/14/2015] [Indexed: 01/26/2023] Open
Abstract
Biomedical Informatics is a growing interdisciplinary field in which research topics and citation trends have been evolving rapidly in recent years. To analyze these data in a fast, reproducible manner, automation of certain processes is needed. JAMIA is a "generalist" journal for biomedical informatics. Its articles reflect the wide range of topics in informatics. In this study, we retrieved Medical Subject Headings (MeSH) terms and citations of JAMIA articles published between 2009 and 2014. We use tensors (i.e., multidimensional arrays) to represent the interaction among topics, time and citations, and applied tensor decomposition to automate the analysis. The trends represented by tensors were then carefully interpreted and the results were compared with previous findings based on manual topic analysis. A list of most cited JAMIA articles, their topics, and publication trends over recent years is presented. The analyses confirmed previous studies and showed that, from 2012 to 2014, the number of articles related to MeSH terms Methods, Organization & Administration, and Algorithms increased significantly both in number of publications and citations. Citation trends varied widely by topic, with Natural Language Processing having a large number of citations in particular years, and Medical Record Systems, Computerized remaining a very popular topic in all years.
Collapse
Affiliation(s)
- Dong Han
- Health System Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, 92093, USA School of Electrical and Computer Engineering, University of Oklahoma, Tulsa, OK, 74135, USA
| | - Shuang Wang
- Health System Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, 92093, USA
| | - Chao Jiang
- Health System Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, 92093, USA School of Electrical and Computer Engineering, University of Oklahoma, Tulsa, OK, 74135, USA
| | - Xiaoqian Jiang
- Health System Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, 92093, USA
| | - Hyeon-Eui Kim
- Health System Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, 92093, USA
| | - Jimeng Sun
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA, S30313, USA
| | - Lucila Ohno-Machado
- Health System Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, 92093, USA
| |
Collapse
|
22
|
Quantitative assessment of gene expression network module-validation methods. Sci Rep 2015; 5:15258. [PMID: 26470848 PMCID: PMC4607977 DOI: 10.1038/srep15258] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2015] [Accepted: 09/21/2015] [Indexed: 02/01/2023] Open
Abstract
Validation of pluripotent modules in diverse networks holds enormous potential for systems biology and network pharmacology. An arising challenge is how to assess the accuracy of discovering all potential modules from multi-omic networks and validating their architectural characteristics based on innovative computational methods beyond function enrichment and biological validation. To display the framework progress in this domain, we systematically divided the existing Computational Validation Approaches based on Modular Architecture (CVAMA) into topology-based approaches (TBA) and statistics-based approaches (SBA). We compared the available module validation methods based on 11 gene expression datasets, and partially consistent results in the form of homogeneous models were obtained with each individual approach, whereas discrepant contradictory results were found between TBA and SBA. The TBA of the Zsummary value had a higher Validation Success Ratio (VSR) (51%) and a higher Fluctuation Ratio (FR) (80.92%), whereas the SBA of the approximately unbiased (AU) p-value had a lower VSR (12.3%) and a lower FR (45.84%). The Gray area simulated study revealed a consistent result for these two models and indicated a lower Variation Ratio (VR) (8.10%) of TBA at 6 simulated levels. Despite facing many novel challenges and evidence limitations, CVAMA may offer novel insights into modular networks.
Collapse
|
23
|
Dai Y, Jiang JB, Wang YL, Jin ZT, Hu SY. Functional and protein‑protein interaction network analysis of colorectal cancer induced by ulcerative colitis. Mol Med Rep 2015; 12:4947-58. [PMID: 26239378 PMCID: PMC4581825 DOI: 10.3892/mmr.2015.4102] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2014] [Accepted: 06/17/2015] [Indexed: 12/18/2022] Open
Abstract
Colorectal cancer (CRC) is a well‑recognized complication of ulcerative colitis (UC), and patients with UC have a higher incidence of CRC, compared with the general population. However, the properties of CRC induced by UC have not been clarified using an interaction network to analyze and compare gene sets. In the present study, six microarray datasets of CRC and UC were extracted from the Array Express database, and gene signatures were identified using the genome‑wide relative significance (GWRS) method. Functional analysis was performed based on the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. Prediction of the genes and microRNA were performed using a hypergeometric method. A protein‑protein interaction (PPI) network was constructed using the Search Tool for the Retrieval of Interacting Genes/proteins, and clusters were obtained through the Molecular Complex Detection algorithm. Topological centrality and a novel analyzing method, based on the rank value of GWGS, were used to characterize the biological importance of the clusters. A total of 217 differentially expressed (DE) genes of CRC were identified, 341 DE genes were identified in UC, and 62 common genes existed in the two. Several KEGG pathways were the same in CRC and UC. Collagenase, progesterone, heparin, urokinase, nadh and adenosine drugs demonstrated potential for use in treatment of CRC and UC. In the PPI network of CRC, 210 nodes and 752 edges were observed, wheras 314 nodes and 882 edges were identified in UC. Cluster 3 in UC had the highest GWGS, while the topological centrality of Cluster 3 in UC had the lowest degree and betweenness. PPI network analysis provided an effective way to estimate and understand the likelihood of the potential connections between proteins/genes. The results obtained following the use of GWGS to analyze differences between clusters did not agree with the topological degree and betweenness centrality, which indicated that gene fold change based GWGS was controversial with degree here in CRC and UC.
Collapse
Affiliation(s)
- Yong Dai
- Department of General Surgery, Qilu Hospital of Shandong University, Jinan, Shandong 250012, P.R. China
| | - Jin-Bo Jiang
- Department of General Surgery, Qilu Hospital of Shandong University, Jinan, Shandong 250012, P.R. China
| | - Yan-Lei Wang
- Department of General Surgery, Qilu Hospital of Shandong University, Jinan, Shandong 250012, P.R. China
| | - Zu-Tao Jin
- Department of General Surgery, Qilu Hospital of Shandong University, Jinan, Shandong 250012, P.R. China
| | - San-Yuan Hu
- Department of General Surgery, Qilu Hospital of Shandong University, Jinan, Shandong 250012, P.R. China
| |
Collapse
|
24
|
She Y, Zhang X, Wang Q, Wu Q. The potential relationship discovery model based on result fusion for biomedical medicine research. J Inf Sci 2015. [DOI: 10.1177/0165551515578395] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
With the amount of biomedical data growing explosively, medical scientists use many datasets in research for new medicine. However, the amount of biomedical data is growing too fast to abstract hidden information. At the same time, with the development of data storage diversification, scientists prefer to have data fusion based on heterogeneous data sources as opposed to a single data source, and ultimately to achieve knowledge and discovery across heterogeneous databases. Our study focuses on extending the application of latent semantic analysis methodologies into the area of biomedical research. Our purpose is to develop a model for discovering potential relationships between medicines and diseases based on biomedical latent semantic analysis. This model could be used in constructing link maps for biomedical entities, and provide a theoretical basis and practical support for biomedical scientists in their study of the disease–medicine relationship. In detail, we discuss the study of the integration of the latent semantic analysis model and data fusion methodologies. Our result fusion solution combines scientific literature repositories and a biomedical database based on context and the ABC model, and is supervised by a semi-supervised learning algorithm and data fusion algorithms. The expectation is that fused data could represent multilevel potential relationships between biological entities and related emotional relationship expression. The model is validated by experience and proven to be feasible and effective.
Collapse
Affiliation(s)
| | | | - Qian Wang
- Software School, Xiamen University, China
| | | |
Collapse
|
25
|
Xin J, Ren X, Chen L, Wang Y. Identifying network biomarkers based on protein-protein interactions and expression data. BMC Med Genomics 2015; 8 Suppl 2:S11. [PMID: 26044366 PMCID: PMC4460625 DOI: 10.1186/1755-8794-8-s2-s11] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Identifying effective biomarkers to battle complex diseases is an important but challenging task in biomedical research today. Molecular data of complex diseases is increasingly abundant due to the rapid advance of high throughput technologies. However, a great gap remains in identifying the massive molecular data to phenotypic changes, in particular, at a network level, i.e., a novel method for identifying network biomarkers is in pressing need to accurately classify and diagnose diseases from molecular data and shed light on the mechanisms of disease pathogenesis. Rather than seeking differential genes at an individual-molecule level, here we propose a novel method for identifying network biomarkers based on protein-protein interaction affinity (PPIA), which identify the differential interactions at a network level. Specifically, we firstly define PPIAs by estimating the concentrations of protein complexes based on the law of mass action upon gene expression data. Then we select a small and non-redundant group of protein-protein interactions and single proteins according to the PPIAs, that maximizes the discerning ability of cases from controls. This method is mathematically formulated as a linear programming, which can be efficiently solved and guarantees a globally optimal solution. Extensive results on experimental data in breast cancer demonstrate the effectiveness and efficiency of the proposed method for identifying network biomarkers, which not only can accurately distinguish the phenotypes but also provides significant biological insights at a network or pathway level. In addition, our method provides a new way to integrate static protein-protein interaction information with dynamical gene expression data.
Collapse
|
26
|
A novel mixed integer programming for multi-biomarker panel identification by distinguishing malignant from benign colorectal tumors. Methods 2015; 83:3-17. [PMID: 25980368 DOI: 10.1016/j.ymeth.2015.05.011] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2015] [Revised: 05/07/2015] [Accepted: 05/08/2015] [Indexed: 01/20/2023] Open
Abstract
Multi-biomarker panels can capture the nonlinear synergy among biomarkers and they are important to aid in the early diagnosis and ultimately battle complex diseases. However, identification of these multi-biomarker panels from case and control data is challenging. For example, the exhaustive search method is computationally infeasible when the data dimension is high. Here, we propose a novel method, MILP_k, to identify serum-based multi-biomarker panel to distinguish colorectal cancers (CRC) from benign colorectal tumors. Specifically, the multi-biomarker panel detection problem is modeled by a mixed integer programming to maximize the classification accuracy. Then we measured the serum profiling data for 101 CRC patients and 95 benign patients. The 61 biomarkers were analyzed individually and further their combinations by our method. We discovered 4 biomarkers as the optimal small multi-biomarker panel, including known CRC biomarkers CEA and IL-10 as well as novel biomarkers IMA and NSE. This multi-biomarker panel obtains leave-one-out cross-validation (LOOCV) accuracy to 0.7857 by nearest centroid classifier. An independent test of this panel by support vector machine (SVM) with threefold cross validation gets an AUC 0.8438. This greatly improves the predictive accuracy by 20% over the single best biomarker. Further extension of this 4-biomarker panel to a larger 13-biomarker panel improves the LOOCV to 0.8673 with independent AUC 0.8437. Comparison with the exhaustive search method shows that our method dramatically reduces the searching time by 1000-fold. Experiments on the early cancer stage samples reveal two panel of biomarkers and show promising accuracy. The proposed method allows us to select the subset of biomarkers with best accuracy to distinguish case and control samples given the number of selected biomarkers. Both receiver operating characteristic curve and precision-recall curve show our method's consistent performance gain in accuracy. Our method also shows its advantage in capturing synergy among selected biomarkers. The multi-biomarker panel far outperforms the simple combination of best single features. Close investigation of the multi-biomarker panel illustrates that our method possesses the ability to remove redundancy and reveals complementary biomarker combinations. In addition, our method is efficient and can select multi-biomarker panel with more than 5 biomarkers, for which the exhaustive methods fail. In conclusion, we propose a promising model to improve the clinical data interpretability and to serve as a useful tool for other complex disease studies. Our small multi-biomarker panel, CEA, IL-10, IMA, and NSE, may provide insights on the disease status of colorectal diseases. The implementation of our method in MATLAB is available via the website: http://doc.aporc.org/wiki/MILP_k.
Collapse
|
27
|
Das J, Gayvert KM, Yu H. Predicting cancer prognosis using functional genomics data sets. Cancer Inform 2014; 13:85-8. [PMID: 25392695 PMCID: PMC4218897 DOI: 10.4137/cin.s14064] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2014] [Revised: 09/17/2014] [Accepted: 09/19/2014] [Indexed: 11/06/2022] Open
Abstract
Elucidating the molecular basis of human cancers is an extremely complex and challenging task. A wide variety of computational tools and experimental techniques have been used to address different aspects of this characterization. One major hurdle faced by both clinicians and researchers has been to pinpoint the mechanistic basis underlying a wide range of prognostic outcomes for the same type of cancer. Here, we provide an overview of various computational methods that have leveraged different functional genomics data sets to identify molecular signatures that can be used to predict prognostic outcome for various human cancers. Furthermore, we outline challenges that remain and future directions that may be explored to address them.
Collapse
Affiliation(s)
- Jishnu Das
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY, USA. ; Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY, USA
| | - Kaitlyn M Gayvert
- Tri-Institutional Training Program in Computational Biology and Medicine, New York, NY, USA
| | - Haiyuan Yu
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY, USA. ; Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY, USA
| |
Collapse
|
28
|
Ohno-Machado L. Disseminating informatics knowledge and training the next generation of leaders. J Am Med Inform Assoc 2014; 21:954-6. [DOI: 10.1136/amiajnl-2014-noveditorial] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open
|
29
|
Integrated exon level expression analysis of driver genes explain their role in colorectal cancer. PLoS One 2014; 9:e110134. [PMID: 25335079 PMCID: PMC4204855 DOI: 10.1371/journal.pone.0110134] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2014] [Accepted: 09/16/2014] [Indexed: 12/14/2022] Open
Abstract
Integrated analysis of genomic and transcriptomic level changes holds promise for a better understanding of colorectal cancer (CRC) biology. There is a pertinent need to explain the functional effect of genome level changes by integrating the information at the transcript level. Using high resolution cytogenetics array, we had earlier identified driver genes by ‘Genomic Identification of Significant Targets In Cancer (GISTIC)’ analysis of paired tumour-normal samples from colorectal cancer patients. In this study, we analyze these driver genes at three levels using exon array data – gene, exon and network. Gene level analysis revealed a small subset to experience differential expression. These results were reinforced by carrying out separate differential expression analyses (SAM and LIMMA). ATP8B1 was found to be the novel gene associated with CRC that shows changes at cytogenetic, gene and exon levels. Splice index of 29 exons corresponding to 13 genes was found to be significantly altered in tumour samples. Driver genes were used to construct regulatory networks for tumour and normal groups. There were rearrangements in transcription factor genes suggesting the presence of regulatory switching. The regulatory pattern of AHR gene was found to have the most significant alteration. Our results integrate data with focus on driver genes resulting in highly enriched novel molecules that need further studies to establish their role in CRC.
Collapse
|
30
|
Zeng T, Zhang W, Yu X, Liu X, Li M, Liu R, Chen L. Edge biomarkers for classification and prediction of phenotypes. SCIENCE CHINA-LIFE SCIENCES 2014; 57:1103-14. [PMID: 25326072 DOI: 10.1007/s11427-014-4757-4] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/09/2014] [Accepted: 08/07/2014] [Indexed: 12/19/2022]
Abstract
In general, a disease manifests not from malfunction of individual molecules but from failure of the relevant system or network, which can be considered as a set of interactions or edges among molecules. Thus, instead of individual molecules, networks or edges are stable forms to reliably characterize complex diseases. This paper reviews both traditional node biomarkers and edge biomarkers, which have been newly proposed. These biomarkers are classified in terms of their contained information. In particular, we show that edge and network biomarkers provide novel ways of stably and reliably diagnosing the disease state of a sample. First, we categorize the biomarkers based on the information used in the learning and prediction steps. We then briefly introduce conventional node biomarkers, or molecular biomarkers without network information, and their computational approaches. The main focus of this paper is edge and network biomarkers, which exploit network information to improve the accuracy of diagnosis and prognosis. Moreover, by extracting both network and dynamic information from the data, we can develop dynamical network and edge biomarkers. These biomarkers not only diagnose the immediate pre-disease state but also detect the critical molecules or networks by which the biological system progresses from the healthy to the disease state. The identified critical molecules can be used as drug targets, and the critical state indicates the critical point of disease control. The paper also discusses representative biomarker-based methods.
Collapse
Affiliation(s)
- Tao Zeng
- Key Laboratory of Systems Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
| | | | | | | | | | | | | |
Collapse
|
31
|
Li Y, Chen L. Big biological data: challenges and opportunities. GENOMICS PROTEOMICS & BIOINFORMATICS 2014; 12:187-9. [PMID: 25462151 PMCID: PMC4411415 DOI: 10.1016/j.gpb.2014.10.001] [Citation(s) in RCA: 43] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 09/01/2014] [Revised: 10/01/2014] [Accepted: 10/01/2014] [Indexed: 11/17/2022]
Affiliation(s)
- Yixue Li
- Key Laboratory of Systems Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China.
| | - Luonan Chen
- Key Laboratory of Systems Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China.
| |
Collapse
|
32
|
Qu X, Xie R, Chen L, Feng C, Zhou Y, Li W, Huang H, Jia X, Lv J, He Y, Du Y, Li W, Shi Y, He W. Identifying colon cancer risk modules with better classification performance based on human signaling network. Genomics 2014; 104:242-8. [DOI: 10.1016/j.ygeno.2013.11.002] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2013] [Revised: 09/29/2013] [Accepted: 11/01/2013] [Indexed: 11/26/2022]
|
33
|
Dominietto M, Tsinoremas N, Capobianco E. Integrative analysis of cancer imaging readouts by networks. Mol Oncol 2014; 9:1-16. [PMID: 25263240 PMCID: PMC5528685 DOI: 10.1016/j.molonc.2014.08.013] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2014] [Revised: 08/27/2014] [Accepted: 08/27/2014] [Indexed: 02/01/2023] Open
Abstract
Cancer is a multifactorial and heterogeneous disease. The corresponding complexity appears at multiple levels: from the molecular and the cellular constitution to the macroscopic phenotype, and at the diagnostic and therapeutic management stages. The overall complexity can be approximated to a certain extent, e.g. characterized by a set of quantitative phenotypic observables recorded in time‐space resolved dimensions by using multimodal imaging approaches. The transition from measures to data can be made effective through various computational inference methods, including networks, which are inherently capable of mapping variables and data to node‐ and/or edge‐valued topological properties, dynamic modularity configurations, and functional motifs. We illustrate how networks can integrate imaging data to explain cancer complexity, and assess potential pre‐clinical and clinical impact. Computational Multiplexing Imaging merges imaging and networks. Networks show signatures of tumor heterogeneity and phenotypic profiles observed in‐vivo. A profile ensemble establishes a tumor fingerprint, and this constitutes a novel type of marker. Personalized treatment is embedded in a systems medicine approach.
Collapse
Affiliation(s)
- Marco Dominietto
- Biomaterial Science Center, University of Basel, Basel, Switzerland; Institute for Biomedical Engineering, ETH and University of Zurich, Zurich, Switzerland
| | | | - Enrico Capobianco
- Center for Computational Science, University of Miami, Miami, FL, USA; Laboratory of Integrative Systems Medicine, Institute of Clinical Physiology, CNR, Pisa, Italy.
| |
Collapse
|
34
|
Wen Z, Zhang W, Zeng T, Chen L. MCentridFS: a tool for identifying module biomarkers for multi-phenotypes from high-throughput data. ACTA ACUST UNITED AC 2014; 10:2870-5. [DOI: 10.1039/c4mb00325j] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
|
35
|
Zhang W, Zeng T, Chen L. EdgeMarker: Identifying differentially correlated molecule pairs as edge-biomarkers. J Theor Biol 2014; 362:35-43. [PMID: 24931676 DOI: 10.1016/j.jtbi.2014.05.041] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2014] [Revised: 05/22/2014] [Accepted: 05/27/2014] [Indexed: 10/25/2022]
Abstract
Biomarker discovery is one of the major topics in translational biomedicine study based on high-throughput biological data analysis. Traditional methods focus on differentially expressed genes (or node-biomarkers) but ignore non-differentials. However, non-differentially expressed genes also play important roles in the biological processes and the rewired interactions / edges among non-differential genes may reveal fundamental difference between variable conditions. Therefore, it is necessary to identify relevant interactions or gene pairs to elucidate the molecular mechanism of complex biological phenomena, e.g. distinguish different phenotypes. To address this issue, we proposed a new method based on a new vector representation of an edge, EdgeMarker, to (1) identify edge-biomarkers, i.e. the differentially correlated molecular pairs (e.g., gene pairs) with optimal classification ability, and (2) transform the 'node expression' data in node space into the 'edge expression' data in edge space and classify the phenotype of each single sample in edge space, which generally cannot be achieved in traditional methods. Unlike the traditional methods which analyze the node space (i.e. molecular expression space) or higher dimensional space using arbitrary kernel methods, this study provides a mathematical model to explore the edge space (i.e. correlation space) for classification of a single sample. In this work, we show that the identified edge-biomarkers indeed have strong ability in distinguishing normal and disease samples even when all involved genes are not significantly differentially expressed. The analysis of human cholangiocarcinoma dataset and diabetes dataset also suggested that the identified edge-biomarkers may cast new biological insights into the pathogenesis of human complex diseases.
Collapse
Affiliation(s)
- Wanwei Zhang
- Key Laboratory of Systems Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Tao Zeng
- Key Laboratory of Systems Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China.
| | - Luonan Chen
- Key Laboratory of Systems Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China; Collaborative Research Center for Innovative Mathematical Modelling, Institute of Industrial Science, The University of Tokyo, Tokyo 153-8505, Japan.
| |
Collapse
|
36
|
Deciphering early development of complex diseases by progressive module network. Methods 2014; 67:334-43. [DOI: 10.1016/j.ymeth.2014.01.021] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2013] [Revised: 01/20/2014] [Accepted: 01/23/2014] [Indexed: 11/23/2022] Open
|
37
|
Gender-specific DNA methylome analysis of a Han Chinese longevity population. BIOMED RESEARCH INTERNATIONAL 2014; 2014:396727. [PMID: 24822201 PMCID: PMC4009103 DOI: 10.1155/2014/396727] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/28/2013] [Accepted: 02/28/2014] [Indexed: 01/06/2023]
Abstract
Human longevity is always a biological hotspot and so much effort has been devoted to identifying genes and genetic variations associated with longer lives. Most of the demographic studies have highlighted that females have a longer life span than males. The reasons for this are not entirely clear. In this study, we carried out a pool-based, epigenome-wide investigation of DNA methylation profiles in male and female nonagenarians/centenarians using the Illumina 450 K Methylation Beadchip assays. Although no significant difference was detected for the average methylation levels of examined CpGs (or probes) between male and female samples, a significant number of differentially methylated probes (DMPs) were identified, which appeared to be enriched in certain chromosome regions and certain parts of genes. Further analysis of DMP-containing genes (named DMGs) revealed that almost all of them are solely hypermethylated or hypomethylated. Functional enrichment analysis of these DMGs indicated that DNA hypermethylation and hypomethylation may regulate genes involved in different biological processes, such as hormone regulation, neuron projection, and disease-related pathways. This is the first effort to explore the gender-based methylome difference in nonagenarians/centenarians, which may provide new insights into the complex mechanism of longevity gender gap of human beings.
Collapse
|
38
|
Esfahani MS, Dougherty ER. Incorporation of Biological Pathway Knowledge in the Construction of Priors for Optimal Bayesian Classification. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2014; 11:202-218. [PMID: 26355519 DOI: 10.1109/tcbb.2013.143] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Small samples are commonplace in genomic/proteomic classification, the result being inadequate classifier design and poor error estimation. The problem has recently been addressed by utilizing prior knowledge in the form of a prior distribution on an uncertainty class of feature-label distributions. A critical issue remains: how to incorporate biological knowledge into the prior distribution. For genomics/proteomics, the most common kind of knowledge is in the form of signaling pathways. Thus, it behooves us to find methods of transforming pathway knowledge into knowledge of the feature-label distribution governing the classification problem. In this paper, we address the problem of prior probability construction by proposing a series of optimization paradigms that utilize the incomplete prior information contained in pathways (both topological and regulatory). The optimization paradigms employ the marginal log-likelihood, established using a small number of feature-label realizations (sample points) regularized with the prior pathway information about the variables. In the special case of a Normal-Wishart prior distribution on the mean and inverse covariance matrix (precision matrix) of a Gaussian distribution, these optimization problems become convex. Companion website: gsp.tamu.edu/Publications/supplementary/shahrokh13a.
Collapse
|
39
|
Boland MR, Hripcsak G, Shen Y, Chung WK, Weng C. Defining a comprehensive verotype using electronic health records for personalized medicine. J Am Med Inform Assoc 2013; 20:e232-8. [PMID: 24001516 PMCID: PMC3861934 DOI: 10.1136/amiajnl-2013-001932] [Citation(s) in RCA: 51] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2013] [Accepted: 08/12/2013] [Indexed: 11/04/2022] Open
Abstract
The burgeoning adoption of electronic health records (EHR) introduces a golden opportunity for studying individual manifestations of myriad diseases, which is called 'EHR phenotyping'. In this paper, we break down this concept by: relating it to phenotype definitions from Johannsen; comparing it to cohort identification and disease subtyping; introducing a new concept called 'verotype' (Latin: vere = true, actually) to represent the 'true' population of similar patients for treatment purposes through the integration of genotype, phenotype, and disease subtype (eg, specific glucose value pattern in patients with diabetes) information; analyzing the value of the 'verotype' concept for personalized medicine; and outlining the potential for using network-based approaches to reverse engineer clinical disease subtypes.
Collapse
Affiliation(s)
- Mary Regina Boland
- Department of Biomedical Informatics, Columbia University, New York, New York, USA
| | - George Hripcsak
- Department of Biomedical Informatics, Columbia University, New York, New York, USA
| | - Yufeng Shen
- Department of Biomedical Informatics, Columbia University, New York, New York, USA
- Department of Systems Biology, Columbia University, New York, New York, USA
| | - Wendy K Chung
- Department of Pediatrics, Columbia University, New York, New York, USA
- Department of Medicine, Columbia University, New York, New York, USA
- The Irving Institute for Clinical and Translational Research, Columbia University, New York, New York, USA
| | - Chunhua Weng
- Department of Biomedical Informatics, Columbia University, New York, New York, USA
- The Irving Institute for Clinical and Translational Research, Columbia University, New York, New York, USA
| |
Collapse
|
40
|
Jiang X, Tse K, Wang S, Doan S, Kim H, Ohno-Machado L. Recent trends in biomedical informatics: a study based on JAMIA articles. J Am Med Inform Assoc 2013; 20:e198-205. [PMID: 24214018 PMCID: PMC3861936 DOI: 10.1136/amiajnl-2013-002429] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
In a growing interdisciplinary field like biomedical informatics, information dissemination and citation trends are changing rapidly due to many factors. To understand these factors better, we analyzed the evolution of the number of articles per major biomedical informatics topic, download/online view frequencies, and citation patterns (using Web of Science) for articles published from 2009 to 2012 in JAMIA. The number of articles published in JAMIA increased significantly from 2009 to 2012, and there were some topic differences in the last 4 years. Medical Record Systems, Algorithms, and Methods are topic categories that are growing fast in several publications. We observed a significant correlation between download frequencies and the number of citations per month since publication for a given article. Earlier free availability of articles to non-subscribers was associated with a higher number of downloads and showed a trend towards a higher number of citations. This trend will need to be verified as more data accumulate in coming years.
Collapse
Affiliation(s)
- Xiaoqian Jiang
- Division of Biomedical Informatics, Department of Medicine, University of California San Diego, La Jolla, California, USA
| | | | | | | | | | | |
Collapse
|
41
|
Zeng T, Sun SY, Wang Y, Zhu H, Chen L. Network biomarkers reveal dysfunctional gene regulations during disease progression. FEBS J 2013; 280:5682-95. [PMID: 24107168 DOI: 10.1111/febs.12536] [Citation(s) in RCA: 55] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2013] [Revised: 08/30/2013] [Accepted: 09/09/2013] [Indexed: 12/13/2022]
Abstract
Extensive studies have been conducted on gene biomarkers by exploring the increasingly accumulated gene expression and sequence data generated from high-throughput technology. Here, we briefly report on the state-of-the-art research and application of biomarkers from single genes (i.e. gene biomarkers) to gene sets (i.e. group or set biomarkers), gene networks (i.e. network biomarkers) and dynamical gene networks (i.e. dynamical network biomarkers). In particular, differential and dynamical network biomarkers are used as representative examples to demonstrate their effectiveness in both detecting early signals for complex diseases and revealing essential mechanisms on disease initiation and progression at a network level.
Collapse
Affiliation(s)
- Tao Zeng
- Key Laboratory of Systems Biology, SIBS-Novo Nordisk Translational Research Centre for PreDiabetes, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| | | | | | | | | |
Collapse
|
42
|
Yang Z, Zhang Y, Chen L. Investigation of anti-cancer mechanisms by comparative analysis of naked mole rat and rat. BMC SYSTEMS BIOLOGY 2013; 7 Suppl 2:S5. [PMID: 24565050 PMCID: PMC3852044 DOI: 10.1186/1752-0509-7-s2-s5] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
Background The naked mole rats (NMRs) are small-sized underground rodents with plenty of unusual traits. Their life expectancy can be up to thirty years, more than seven times longer than laboratory rat. Furthermore, they are resistant to both congenital and experimentally induced cancer genesis. These peculiar physiological and pathological characteristics allow them to become a suitable model for cancer and aging research. Results In this paper, we carried out a genome-wide comparative analysis of rat and NMR using the recently published genome sequence of NMR. First, we identified all the rat-NMR orthologous genes and specific genes within each of them. The expanded and contracted numbers of protein families in NMR were also analyzed when compared to rat. Seven cancer-related protein families appeared to be significantly expanded, whereas several receptor families were found to be contracted in NMR. We then chose those rat genes that were inexistent in NMR and adopted KEGG pathway database to investigate the metabolic processes in which their proteins may be involved. These genes were significantly enriched in two rat cancer pathways, "Pathway in cancer" and "Bladder cancer". In the rat "Pathway in cancer", 9 out of 14 paths leading to evading apoptosis appeared to be affected in NMR. In addition, a significant number of other NMR-missing genes enriched in several cancer-related pathways have been known to be related to a variety of cancers, implying that many of them may be also related to tumorigenesis in mammals. Finally, investigation of sequence variations among orthologous proteins between rat and NMR revealed that significant fragment insertions/deletions within important functional domains were present in some NMR proteins, which might lead to expressional and/or functional changes of these genes in different species. Conclusions Overall, this study provides insights into understanding the possible anti-cancer mechanisms of NMR as well as searching for new cancer-related candidate genes.
Collapse
|
43
|
Wu Z, Wang Y, Chen L. Drug repositioning framework by incorporating functional information. IET Syst Biol 2013; 7:188-94. [DOI: 10.1049/iet-syb.2012.0064] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Affiliation(s)
- Zikai Wu
- Business School, University of Shanghai for Science and TechnologyShanghai200093People's Republic of China
| | - Yong Wang
- Academy of Mathematics and Systems Science, National Centre for Mathematics and Interdisciplinary Sciences, Chinese Academy of SciencesBeijing100190People's Republic of China
| | - Luonan Chen
- Key Laboratory of Systems BiologyShanghai Institutes for Biological Sciences, Chinese Academy of SciencesShanghai200031People's Republic of China
- Collaborative Research Center for Innovative Mathematical ModelingInstitute of Industrial Science, the University of TokyoJapan
| |
Collapse
|
44
|
Liu X, Liu R, Zhao XM, Chen L. Detecting early-warning signals of type 1 diabetes and its leading biomolecular networks by dynamical network biomarkers. BMC Med Genomics 2013; 6 Suppl 2:S8. [PMID: 23819540 PMCID: PMC3654886 DOI: 10.1186/1755-8794-6-s2-s8] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND Type 1 diabetes (T1D) is a complex disease and harmful to human health, and most of the existing biomarkers are mainly to measure the disease phenotype after the disease onset (or drastic deterioration). Until now, there is no effective biomarker which can predict the upcoming disease (or pre-disease state) before disease onset or disease deterioration. Further, the detail molecular mechanism for such deterioration of the disease, e.g., driver genes or causal network of the disease, is still unclear. METHODS In this study, we detected early-warning signals of T1D and its leading biomolecular networks based on serial gene expression profiles of NOD (non-obese diabetic) mice by identifying a new type of biomarker, i.e., dynamical network biomarker (DNB) which forms a specific module for marking the time period just before the drastic deterioration of T1D. RESULTS Two dynamical network biomarkers were obtained to signal the emergence of two critical deteriorations for the disease, and could be used to predict the upcoming sudden changes during the disease progression. We found that the two critical transitions led to peri-insulitis and hyperglycemia in NOD mice, which are consistent with other independent experimental results from literature. CONCLUSIONS The identified dynamical network biomarkers can be used to detect the early-warning signals of T1D and predict upcoming disease onset before the drastic deterioration. In addition, we also demonstrated that the leading biomolecular networks are causally related to the initiation and progression of T1D, and provided the biological insight into the molecular mechanism of T1D. Experimental data from literature and functional analysis on DNBs validated the computational results.
Collapse
Affiliation(s)
- Xiaoping Liu
- Key Laboratory of Systems Biology, SIBS-Novo Nordisk Translational Research Centre for PreDiabetes, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | | | | | | |
Collapse
|
45
|
Wu Z, Wang Y, Chen L. Network-based drug repositioning. MOLECULAR BIOSYSTEMS 2013; 9:1268-81. [PMID: 23493874 DOI: 10.1039/c3mb25382a] [Citation(s) in RCA: 113] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Network-based computational biology, with the emphasis on biomolecular interactions and omics-data integration, has had success in drug development and created new directions such as drug repositioning and drug combination. Drug repositioning, i.e., revealing a drug's new roles, is increasingly attracting much attention from the pharmaceutical community to tackle the problems of high failure rate and long-term development in drug discovery. While drug combination or drug cocktails, i.e., combining multiple drugs against diseases, mainly aims to alleviate the problems of the recurrent emergence of drug resistance and also reveal their synergistic effects. In this paper, we unify the two topics to reveal new roles of drug interactions from a network perspective by treating drug combination as another form of drug repositioning. In particular, first, we emphasize that rationally repositioning drugs in the large scale is driven by the accumulation of various high-throughput genome-wide data. These data can be utilized to capture the interplay among targets and biological molecules, uncover the resulting network structures, and further bridge molecular profiles and phenotypes. This motivates many network-based computational methods on these topics. Second, we organize these existing methods into two categories, i.e., single drug repositioning and drug combination, and further depict their main features by three data sources. Finally, we discuss the merits and shortcomings of these methods and pinpoint some future topics in this promising field.
Collapse
Affiliation(s)
- Zikai Wu
- Business School, University of Shanghai for Science and Technology, Shanghai, China
| | | | | |
Collapse
|
46
|
Abstract
Deregulation of gene expression, a hallmark of cancer, is caused by both genetic and epigenetic mechanisms. The rapid accumulation of epigenome maps of various cancers suggests a new avenue of research, namely integrating epigenomic data with other types of omic data for cancer diagnosis, prognosis, and biomarker discovery. We introduce the MAPIT algorithm (Multi Analyte Pathway Inference Tool), to enable principled integration of epigenomic, transcriptomic, and protein interactome data. As a proof-of-principle, we apply MAPIT to glioblastoma multiforme (GBM), the most common and aggressive form of brain tumor. Few predictive markers were reported for the prognosis of GBM patients. By integrating mRNA transcriptome, promoter DNA methylome and protein-protein physical interactome, we find ten expression- and three methylation-based network markers, involving 118 genes. When tested on additional GBM patient samples, the prognostic accuracy of the multi-analyte network markers (73.5%) is 9.7% and 8.6% higher than previous prognostic signatures built on gene expression or DNA methylation alone. Our results highlight the critical role of two novel pathways in the prognosis of GBM patients, small GTPase-mediated protein trafficking and ubiquitination-dependent protein degradation. A better understanding of these two pathways could lead to personalized therapies for subgroups of GBM patients. Our study demonstrates that integrating epigenomic, transcriptomic, and interactomic data can improve the accuracy network-based prognosis markers and lead to novel mechanistic understanding of cancer.
Collapse
Affiliation(s)
- Jongkwang Kim
- Department of Internal Medicine, University of Iowa, Iowa City, Iowa, United States of America
| | - Long Gao
- Department of Biomedical Engineering, University of Iowa, Iowa City, Iowa, United States of America
| | - Kai Tan
- Department of Internal Medicine, University of Iowa, Iowa City, Iowa, United States of America
- Department of Biomedical Engineering, University of Iowa, Iowa City, Iowa, United States of America
- * E-mail:
| |
Collapse
|
47
|
Ren X, Wang Y, Chen L, Zhang XS, Jin Q. ellipsoidFN: a tool for identifying a heterogeneous set of cancer biomarkers based on gene expressions. Nucleic Acids Res 2012; 41:e53. [PMID: 23262226 PMCID: PMC3575836 DOI: 10.1093/nar/gks1288] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
Computationally identifying effective biomarkers for cancers from gene expression profiles is an important and challenging task. The challenge lies in the complicated pathogenesis of cancers that often involve the dysfunction of many genes and regulatory interactions. Thus, sophisticated classification model is in pressing need. In this study, we proposed an efficient approach, called ellipsoidFN (ellipsoid Feature Net), to model the disease complexity by ellipsoids and seek a set of heterogeneous biomarkers. Our approach achieves a non-linear classification scheme for the mixed samples by the ellipsoid concept, and at the same time uses a linear programming framework to efficiently select biomarkers from high-dimensional space. ellipsoidFN reduces the redundancy and improves the complementariness between the identified biomarkers, thus significantly enhancing the distinctiveness between cancers and normal samples, and even between cancer types. Numerical evaluation on real prostate cancer, breast cancer and leukemia gene expression datasets suggested that ellipsoidFN outperforms the state-of-the-art biomarker identification methods, and it can serve as a useful tool for cancer biomarker identification in the future. The Matlab code of ellipsoidFN is freely available from http://doc.aporc.org/wiki/EllipsoidFN.
Collapse
Affiliation(s)
- Xianwen Ren
- MOH Key Laboratory of Systems Biology of Pathogens, Institute of Pathogen Biology, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100730, China
| | | | | | | | | |
Collapse
|