1
|
Ma Y, Shi Y, Chen X, Zhang B, Wu H, Gao J. NFMCLDA: Predicting miRNA-based lncRNA-disease associations by network fusion and matrix completion. Comput Biol Med 2024; 174:108403. [PMID: 38582002 DOI: 10.1016/j.compbiomed.2024.108403] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Revised: 03/28/2024] [Accepted: 04/01/2024] [Indexed: 04/08/2024]
Abstract
In recent years, emerging evidence has revealed a strong association between dysregulations of long non-coding RNAs (lncRNAs) and sophisticated human diseases. Biological experiments are adequate to identify such associations, but they are costly and time-consuming. Therefore, developing high-quality computational methods is a challenging and urgent task in the field of bioinformatics. This paper proposes a new lncRNA-disease association inference approach NFMCLDA (Network Fusion and Matrix Completion lncRNA-Disease Association), which can effectively integrate multi-source association data. In this approach, miRNA information is used as the transition path, and an unbalanced random walk method on three-layer heterogeneous network is adopted in the preprocessing. Therefore, more effective information between networks can be mined and the sparsity problem of the association matrix can be solved. Finally, the matrix completion method accurately predicts associations. The results show that NFMCLDA can provide more accurate lncRNA-disease associations than state-of-the-art methods. The areas under the receiver operating characteristic curves are 0.9648 and 0.9713, respectively, through the cross-validation of 5-fold and 10-fold. Data from published case studies on four diseases - lung cancer, osteosarcoma, cervical cancer, and colon cancer - have confirmed the reliable predictive potential of NFMCLDA model.
Collapse
Affiliation(s)
- Yibing Ma
- School of Science, Jiangnan University, Wuxi, Jiangsu, 214122, China
| | - Yongle Shi
- School of Science, Jiangnan University, Wuxi, Jiangsu, 214122, China
| | - Xiang Chen
- School of Science, Jiangnan University, Wuxi, Jiangsu, 214122, China
| | - Bai Zhang
- School of Science, Jiangnan University, Wuxi, Jiangsu, 214122, China
| | - Hanwen Wu
- School of Science, Jiangnan University, Wuxi, Jiangsu, 214122, China
| | - Jie Gao
- School of Science, Jiangnan University, Wuxi, Jiangsu, 214122, China.
| |
Collapse
|
2
|
Kang D, Yun D, Cho KH, Baek SS, Jeon J. Profiling emerging micropollutants in urban stormwater runoff using suspect and non-target screening via high-resolution mass spectrometry. CHEMOSPHERE 2024; 352:141402. [PMID: 38346509 DOI: 10.1016/j.chemosphere.2024.141402] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Revised: 01/31/2024] [Accepted: 02/05/2024] [Indexed: 02/24/2024]
Abstract
Urban surface runoff contains chemicals that can negatively affect water quality. Urban runoff studies have determined the transport dynamics of many legacy pollutants. However, less attention has been paid to determining the first-flush effects (FFE) of emerging micropollutants using suspect and non-target screening (SNTS). Therefore, this study employed suspect and non-target analyses using liquid chromatography-high resolution mass spectrometry to detect emerging pollutants in urban receiving waters during stormwater events. Time-interval sampling was used to determine occurrence trends during stormwater events. Suspect screening tentatively identified 65 substances, then, their occurrence trend was grouped using correlation analysis. Non-target peaks were prioritized through hierarchical cluster analysis, focusing on the first flush-concentrated peaks. This approach revealed 38 substances using in silico identification. Simultaneously, substances identified through homologous series observation were evaluated for their observed trends in individual events using network analysis. The results of SNTS were normalized through internal standards to assess the FFE, and the most of tentatively identified substances showed observed FFE. Our findings suggested that diverse pollutants that could not be covered by target screening alone entered urban water through stormwater runoff during the first flush. This study showcases the applicability of the SNTS in evaluating the FFE of urban pollutants, offering insights for first-flush stormwater monitoring and management.
Collapse
Affiliation(s)
- Daeho Kang
- Department of Environmental Engineering, Changwon National University, Changwon, Gyeongsangnamdo, 51140, South Korea
| | - Daeun Yun
- Civil Urban Earth and Environmental Engineering, Ulsan National Institute of Science and Technology, 50 UNIST-gil, Eonyang-eup, Ulju-gun, Ulsan, 44919, South Korea
| | - Kyung Hwa Cho
- School of Civil, Environmental and Architectural Engineering, Korea University, Seoul, 02841, South Korea
| | - Sang-Soo Baek
- Department of Environmental Engineering, Yeungnam University, 280 Daehak-Ro, Gyeongsan-Si, Gyeongbuk, 38541, South Korea
| | - Junho Jeon
- Department of Environmental Engineering, Changwon National University, Changwon, Gyeongsangnamdo, 51140, South Korea; School of Smart and Green Engineering, Changwon National University, Changwon, Gyeongsangnamdo, 51140, South Korea.
| |
Collapse
|
3
|
Wu X, Cao S, Zou Y, Wu F. Traditional Chinese Medicine studies for Alzheimer's disease via network pharmacology based on entropy and random walk. PLoS One 2023; 18:e0294772. [PMID: 38019798 PMCID: PMC10686466 DOI: 10.1371/journal.pone.0294772] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2023] [Accepted: 11/08/2023] [Indexed: 12/01/2023] Open
Abstract
Alzheimer's disease (AD) is a common neurodegenerative disease having complex pathogenesis, approved drugs can only alleviate symptoms of AD for a period of time. Traditional Chinese medicine (TCM) contains multiple active ingredients that can act on multiple targets simultaneously. In this paper, a novel algorithm based on entropy and random walk with the restart of heterogeneous network (RWRHE) is proposed for predicting active ingredients for AD and screening out the effective TCMs for AD. First, Six TCM compounds containing 20 herbs from the AD drug reviews in the CNKI (China National Knowledge Internet) are collected, their active ingredients and targets are retrieved from different databases. Then, comprehensive similarity networks of active ingredients and targets are constructed based on different aspects and entropy weight, respectively. A comprehensive heterogeneous network is constructed by integrating the known active ingredient-target association information and two comprehensive similarity networks. Subsequently, bi-random walks are applied on the heterogeneous network to predict active ingredient-target associations. AD related targets are selected as the seed nodes, a random walk is carried out on the target similarity network to predict the AD-target associations, and the associations of AD-active ingredients are inferred and scored. The effective herbs and compounds for AD are screened out based on their active ingredients' scores. The results measured by machine learning and bioinformatics show that the RWRHE algorithm achieves better prediction accuracy, the top 15 active ingredients may act as multi-target agents in the prevention and treatment of AD, Danshen, Gouteng and Chaihu are recommended as effective TCMs for AD, Yiqitongyutang is recommended as effective compound for AD.
Collapse
Affiliation(s)
- Xiaolu Wu
- School of Mathematical Sciences, Tiangong University, Tianjin, China
| | - Shujuan Cao
- School of Mathematical Sciences, Tiangong University, Tianjin, China
| | - Yongming Zou
- Department of Neurology, Tianjin Huanhu Hospital, Tianjin, China
| | - Fangxiang Wu
- Division of Biomedical Engineering, Department of Mechanical Engineering and Department of Computer Science, University of Saskatchewan, Saskatoon, Saskatchewan, Canada
| |
Collapse
|
4
|
Ratajczak F, Joblin M, Hildebrandt M, Ringsquandl M, Falter-Braun P, Heinig M. Speos: an ensemble graph representation learning framework to predict core gene candidates for complex diseases. Nat Commun 2023; 14:7206. [PMID: 37938585 PMCID: PMC10632370 DOI: 10.1038/s41467-023-42975-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Accepted: 10/27/2023] [Indexed: 11/09/2023] Open
Abstract
Understanding phenotype-to-genotype relationships is a grand challenge of 21st century biology with translational implications. The recently proposed "omnigenic" model postulates that effects of genetic variation on traits are mediated by core-genes and -proteins whose activities mechanistically influence the phenotype, whereas peripheral genes encode a regulatory network that indirectly affects phenotypes via core gene products. Here, we develop a positive-unlabeled graph representation-learning ensemble-approach based on a nested cross-validation to predict core-like genes for diverse diseases using Mendelian disorder genes for training. Employing mouse knockout phenotypes for external validations, we demonstrate that core-like genes display several key properties of core genes: Mouse knockouts of genes corresponding to our most confident predictions give rise to relevant mouse phenotypes at rates on par with the Mendelian disorder genes, and all candidates exhibit core gene properties like transcriptional deregulation in disease and loss-of-function intolerance. Moreover, as predicted for core genes, our candidates are enriched for drug targets and druggable proteins. In contrast to Mendelian disorder genes the new core-like genes are enriched for druggable yet untargeted gene products, which are therefore attractive targets for drug development. Interpretation of the underlying deep learning model suggests plausible explanations for our core gene predictions in form of molecular mechanisms and physical interactions. Our results demonstrate the potential of graph representation learning for the interpretation of biological complexity and pave the way for studying core gene properties and future drug development.
Collapse
Affiliation(s)
- Florin Ratajczak
- Institute of Network Biology (INET), Molecular Targets and Therapeutics Center (MTTC), Helmholtz Munich, Neuherberg, Germany
| | | | | | | | - Pascal Falter-Braun
- Institute of Network Biology (INET), Molecular Targets and Therapeutics Center (MTTC), Helmholtz Munich, Neuherberg, Germany.
- Microbe-Host Interactions, Faculty of Biology, Ludwig-Maximilians-Universität München, Planegg-Martinsried, Germany.
| | - Matthias Heinig
- Institute of Computational Biology (ICB), Helmholtz Munich, Neuherberg, Germany.
- Department of Computer Science, TUM School of Computation, Information and Technology, Technical University of Munich, Garching, Germany.
- German Centre for Cardiovascular Research (DZHK), Munich Heart Association, Partner Site Munich, Berlin, Germany.
| |
Collapse
|
5
|
Zhang L, Lu D, Bi X, Zhao K, Yu G, Quan N. Predicting disease genes based on multi-head attention fusion. BMC Bioinformatics 2023; 24:162. [PMID: 37085750 PMCID: PMC10122338 DOI: 10.1186/s12859-023-05285-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2022] [Accepted: 04/12/2023] [Indexed: 04/23/2023] Open
Abstract
BACKGROUND The identification of disease-related genes is of great significance for the diagnosis and treatment of human disease. Most studies have focused on developing efficient and accurate computational methods to predict disease-causing genes. Due to the sparsity and complexity of biomedical data, it is still a challenge to develop an effective multi-feature fusion model to identify disease genes. RESULTS This paper proposes an approach to predict the pathogenic gene based on multi-head attention fusion (MHAGP). Firstly, the heterogeneous biological information networks of disease genes are constructed by integrating multiple biomedical knowledge databases. Secondly, two graph representation learning algorithms are used to capture the feature vectors of gene-disease pairs from the network, and the features are fused by introducing multi-head attention. Finally, multi-layer perceptron model is used to predict the gene-disease association. CONCLUSIONS The MHAGP model outperforms all of other methods in comparative experiments. Case studies also show that MHAGP is able to predict genes potentially associated with diseases. In the future, more biological entity association data, such as gene-drug, disease phenotype-gene ontology and so on, can be added to expand the information in heterogeneous biological networks and achieve more accurate predictions. In addition, MHAGP with strong expansibility can be used for potential tasks such as gene-drug association and drug-disease association prediction.
Collapse
Affiliation(s)
- Linlin Zhang
- College of Software Engineering, Xinjiang University, Urumqi, China.
| | - Dianrong Lu
- College of information Science and Engineering, Xinjiang University, Urumqi, China
| | - Xuehua Bi
- Medical Engineering and Technology College, Xinjiang Medical University, Urumqi, China
| | - Kai Zhao
- College of information Science and Engineering, Xinjiang University, Urumqi, China
| | - Guanglei Yu
- Medical Engineering and Technology College, Xinjiang Medical University, Urumqi, China
| | - Na Quan
- College of information Science and Engineering, Xinjiang University, Urumqi, China
| |
Collapse
|
6
|
Nguyen T, Yue Z, Slominski R, Welner R, Zhang J, Chen JY. WINNER: A network biology tool for biomolecular characterization and prioritization. Front Big Data 2022; 5:1016606. [DOI: 10.3389/fdata.2022.1016606] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Accepted: 10/14/2022] [Indexed: 11/06/2022] Open
Abstract
Background and contributionIn network biology, molecular functions can be characterized by network-based inference, or “guilt-by-associations.” PageRank-like tools have been applied in the study of biomolecular interaction networks to obtain further the relative significance of all molecules in the network. However, there is a great deal of inherent noise in widely accessible data sets for gene-to-gene associations or protein-protein interactions. How to develop robust tests to expand, filter, and rank molecular entities in disease-specific networks remains an ad hoc data analysis process.ResultsWe describe a new biomolecular characterization and prioritization tool called Weighted In-Network Node Expansion and Ranking (WINNER). It takes the input of any molecular interaction network data and generates an optionally expanded network with all the nodes ranked according to their relevance to one another in the network. To help users assess the robustness of results, WINNER provides two different types of statistics. The first type is a node-expansion p-value, which helps evaluate the statistical significance of adding “non-seed” molecules to the original biomolecular interaction network consisting of “seed” molecules and molecular interactions. The second type is a node-ranking p-value, which helps evaluate the relative statistical significance of the contribution of each node to the overall network architecture. We validated the robustness of WINNER in ranking top molecules by spiking noises in several network permutation experiments. We have found that node degree–preservation randomization of the gene network produced normally distributed ranking scores, which outperform those made with other gene network randomization techniques. Furthermore, we validated that a more significant proportion of the WINNER-ranked genes was associated with disease biology than existing methods such as PageRank. We demonstrated the performance of WINNER with a few case studies, including Alzheimer's disease, breast cancer, myocardial infarctions, and Triple negative breast cancer (TNBC). In all these case studies, the expanded and top-ranked genes identified by WINNER reveal disease biology more significantly than those identified by other gene prioritizing software tools, including Ingenuity Pathway Analysis (IPA) and DiAMOND.ConclusionWINNER ranking strongly correlates to other ranking methods when the network covers sufficient node and edge information, indicating a high network quality. WINNER users can use this new tool to robustly evaluate a list of candidate genes, proteins, or metabolites produced from high-throughput biology experiments, as long as there is available gene/protein/metabolic network information.
Collapse
|
7
|
Identification of Warning Transition Points from Hepatitis B to Hepatocellular Carcinoma Based on Mutation Accumulation for the Early Diagnosis and Potential Drug Treatment of HBV-HCC. OXIDATIVE MEDICINE AND CELLULAR LONGEVITY 2022; 2022:3472179. [PMID: 36105485 PMCID: PMC9467738 DOI: 10.1155/2022/3472179] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Accepted: 08/09/2022] [Indexed: 11/17/2022]
Abstract
The accumulation of multiple genetic mutations is essential during the occurrence and development of hepatocellular carcinoma induced by hepatitis B (HBV-HCC), but understanding their cooperative effects and identifying the warning transition point from hepatitis B to HCC are challenges. In the genomic analysis of somatic mutations of the patient with HBV-HCC in a patient-specific protein-protein interaction (ps-PPI) network, we find mutation influence can propagate along the ps-PPI network. Therefore, in the article, we got the mutation cluster as a new research unit using the Random Walks with Restarts algorithm that is used to describe the efficient boundary of mutation influences. The connection of mutation cluster leads to dysregulation of signaling pathways corresponding to HCC, while dysregulated signaling pathways accumulate gradually and experience a process from quantitative to qualitative changes including a critical mutation cluster called transition point (TP) from hepatitis B to HCC. Moreover, two subtypes of HCC patients with different prognosis and their corresponding biological and clinical characteristics were identified according to TP. The poor prognosis HCC subtype was associated with significant metabolic pathway dysregulation and lower immune cell infiltration, while we also identified several preventive drugs to block the transformation of hepatitis B to hepatocellular carcinoma. The network-level study integrated multiomics data not only showed the sequence of multiple somatic mutations and their cooperative effect but also identified the warning transition point in HCC tumorigenesis for each patient. Our study provides new insight into exploring the cooperative molecular mechanism of chronic inflammatory malignancy in the liver and lays the foundation for the development of new approaches for early prediction and diagnosis of hepatocellular carcinoma and personalized targeted therapy.
Collapse
|
8
|
Baek SS, Yun D, Pyo J, Kang D, Cho KH, Jeon J. Analysis of micropollutants in a marine outfall using network analysis and decision tree. THE SCIENCE OF THE TOTAL ENVIRONMENT 2022; 806:150938. [PMID: 34655621 DOI: 10.1016/j.scitotenv.2021.150938] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/27/2021] [Revised: 10/08/2021] [Accepted: 10/08/2021] [Indexed: 06/13/2023]
Abstract
The presence of micropollutants (MPs), including pharmaceutical, industrial, and pesticidal compounds, threatens both human health and the aquatic ecosystem. The development and extensive use of new chemicals have also inevitably led to the accumulation of MPs in aquatic environments. Recreational beaches are especially vulnerable to contamination, affecting humans and aquatic animals via the absorption of MPs in water during marine activities (e.g., swimming, sailing, and windsurfing). Additionally, marine outfalls in an urbanized coastal city can cause serious chemical and microbial pollution on recreational beaches, leading to an increase in adverse effects on public health and the ecological system. Therefore, the aim of this study was to, with the use of network and decision tree analyses, identify the features and factors that influence the change in MP concentrations in a marine outfall. These analyses were conducted to inspect the relationship between each MP and its hierarchical structure as well as hydrometeorological variables. Additionally, a risk analysis was conducted in this study in which the MPs were prioritized based on their optimized risk quotient values. During our monitoring of MP concentrations over time at the marine outfall, high concentrations of pharmaceutical and industrial compounds were detected when the tide level was low after rainfall. Furthermore, results of the risk analysis and the prioritization revealed that a total of 18 substances identified in our study posed a risk to the ecosystem; these include major ecotoxicologically hazardous substances such as telmisartan, mevinphos, and methiocarb. Results of the network analysis demonstrated distinct trends for pharmaceutical and industrial substances, whilst those for pesticide compounds were irregular. Additionally, the hierarchical structures for most MPs consisted of rainfall, tide level, and antecedent dry hours; this implies that these factors influence MP dynamics. These findings will be helpful for establishing chemical contamination management plans for recreational beaches in the future.
Collapse
Affiliation(s)
- Sang-Soo Baek
- School of Urban and Environmental Engineering, Ulsan National Institute of Science and Technology, Ulsan 44919, Republic of Korea
| | - Daeun Yun
- School of Urban and Environmental Engineering, Ulsan National Institute of Science and Technology, Ulsan 44919, Republic of Korea
| | - JongCheol Pyo
- Center for Environmental Data Strategy, Korea Environment Institute, Sejong 30147, Republic of Korea
| | - Daeho Kang
- Department of Environmental Engineering, Changwon National University, Changwondaehak-ro 20, Uichang-gu, Changwon-si, Gyeongsangnam-do 51140, Republic of Korea
| | - Kyung Hwa Cho
- School of Urban and Environmental Engineering, Ulsan National Institute of Science and Technology, Ulsan 44919, Republic of Korea
| | - Junho Jeon
- Department of Environmental Engineering, Changwon National University, Changwondaehak-ro 20, Uichang-gu, Changwon-si, Gyeongsangnam-do 51140, Republic of Korea; School of Civil, Environmental and Chemical Engineering, Changwon National University, Changwon, Gyeongsangnamdo 51140, Republic of Korea.
| |
Collapse
|
9
|
Wang L, Shang M, Dai Q, He PA. Prediction of lncRNA-disease association based on a Laplace normalized random walk with restart algorithm on heterogeneous networks. BMC Bioinformatics 2022; 23:5. [PMID: 34983367 PMCID: PMC8729064 DOI: 10.1186/s12859-021-04538-1] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2021] [Accepted: 12/15/2021] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND More and more evidence showed that long non-coding RNAs (lncRNAs) play important roles in the development and progression of human sophisticated diseases. Therefore, predicting human lncRNA-disease associations is a challenging and urgently task in bioinformatics to research of human sophisticated diseases. RESULTS In the work, a global network-based computational framework called as LRWRHLDA were proposed which is a universal network-based method. Firstly, four isomorphic networks include lncRNA similarity network, disease similarity network, gene similarity network and miRNA similarity network were constructed. And then, six heterogeneous networks include known lncRNA-disease, lncRNA-gene, lncRNA-miRNA, disease-gene, disease-miRNA, and gene-miRNA associations network were applied to design a multi-layer network. Finally, the Laplace normalized random walk with restart algorithm in this global network is suggested to predict the relationship between lncRNAs and diseases. CONCLUSIONS The ten-fold cross validation is used to evaluate the performance of LRWRHLDA. As a result, LRWRHLDA achieves an AUC of 0.98402, which is higher than other compared methods. Furthermore, LRWRHLDA can predict isolated disease-related lnRNA (isolated lnRNA related disease). The results for colorectal cancer, lung adenocarcinoma, stomach cancer and breast cancer have been verified by other researches. The case studies indicated that our method is effective.
Collapse
Affiliation(s)
- Liugen Wang
- School of Science, Zhejiang Sci-Tech University, Hangzhou, 310018, China
| | - Min Shang
- School of Science, Zhejiang Sci-Tech University, Hangzhou, 310018, China
| | - Qi Dai
- College of Life Science, Zhejiang Sci-Tech University, Hangzhou, 310018, China
| | - Ping-An He
- School of Science, Zhejiang Sci-Tech University, Hangzhou, 310018, China.
| |
Collapse
|
10
|
Ding P, Ouyang W, Luo J, Kwoh CK. Heterogeneous information network and its application to human health and disease. Brief Bioinform 2021; 21:1327-1346. [PMID: 31566212 DOI: 10.1093/bib/bbz091] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2019] [Revised: 06/29/2019] [Accepted: 06/30/2019] [Indexed: 12/11/2022] Open
Abstract
The molecular components with the functional interdependencies in human cell form complicated biological network. Diseases are mostly caused by the perturbations of the composite of the interaction multi-biomolecules, rather than an abnormality of a single biomolecule. Furthermore, new biological functions and processes could be revealed by discovering novel biological entity relationships. Hence, more and more biologists focus on studying the complex biological system instead of the individual biological components. The emergence of heterogeneous information network (HIN) offers a promising way to systematically explore complicated and heterogeneous relationships between various molecules for apparently distinct phenotypes. In this review, we first present the basic definition of HIN and the biological system considered as a complex HIN. Then, we discuss the topological properties of HIN and how these can be applied to detect network motif and functional module. Afterwards, methodologies of discovering relationships between disease and biomolecule are presented. Useful insights on how HIN aids in drug development and explores human interactome are provided. Finally, we analyze the challenges and opportunities for uncovering combinatorial patterns among pharmacogenomics and cell-type detection based on single-cell genomic data.
Collapse
Affiliation(s)
- Pingjian Ding
- School of Computer Science, University of South China, Hengyang, China
| | - Wenjue Ouyang
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Jiawei Luo
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Chee-Keong Kwoh
- School of Computer Science and Engineering, Nanyang Technological University, Singapore, Singapore
| |
Collapse
|
11
|
Zhang Y, Xiang J, Tang L, Li J, Lu Q, Tian G, He BS, Yang J. Identifying Breast Cancer-Related Genes Based on a Novel Computational Framework Involving KEGG Pathways and PPI Network Modularity. Front Genet 2021; 12:596794. [PMID: 34484285 PMCID: PMC8415302 DOI: 10.3389/fgene.2021.596794] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2020] [Accepted: 05/05/2021] [Indexed: 01/04/2023] Open
Abstract
Complex diseases, such as breast cancer, are often caused by mutations of multiple functional genes. Identifying disease-related genes is a critical and challenging task for unveiling the biological mechanisms behind these diseases. In this study, we develop a novel computational framework to analyze the network properties of the known breast cancer–associated genes, based on which we develop a random-walk-with-restart (RCRWR) algorithm to predict novel disease genes. Specifically, we first curated a set of breast cancer–associated genes from the Genome-Wide Association Studies catalog and Online Mendelian Inheritance in Man database and then studied the distribution of these genes on an integrated protein–protein interaction (PPI) network. We found that the breast cancer–associated genes are significantly closer to each other than random, which confirms the modularity property of disease genes in a PPI network as revealed by previous studies. We then retrieved PPI subnetworks spanning top breast cancer–associated KEGG pathways and found that the distribution of these genes on the subnetworks are non-random, suggesting that these KEGG pathways are activated non-uniformly. Taking advantage of the non-random distribution of breast cancer–associated genes, we developed an improved RCRWR algorithm to predict novel cancer genes, which integrates network reconstruction based on local random walk dynamics and subnetworks spanning KEGG pathways. Compared with the disease gene prediction without using the information from the KEGG pathways, this method has a better prediction performance on inferring breast cancer–associated genes, and the top predicted genes are better enriched on known breast cancer–associated gene ontologies. Finally, we performed a literature search on top predicted novel genes and found that most of them are supported by at least wet-lab experiments on cell lines. In summary, we propose a robust computational framework to prioritize novel breast cancer–associated genes, which could be used for further in vitro and in vivo experimental validation.
Collapse
Affiliation(s)
- Yan Zhang
- School of Computer Science and Engineering, Central South University, Changsha, China.,School of Information Science and Engineering, Changsha Medical University, Changsha, China.,Academician Workstation, Changsha Medical University, Changsha, China
| | - Ju Xiang
- School of Computer Science and Engineering, Central South University, Changsha, China.,Academician Workstation, Changsha Medical University, Changsha, China.,Neuroscience Research Center & Department of Basic Medical Sciences, Changsha Medical University, Changsha, China
| | - Liang Tang
- Neuroscience Research Center & Department of Basic Medical Sciences, Changsha Medical University, Changsha, China
| | - Jianming Li
- Neuroscience Research Center & Department of Basic Medical Sciences, Changsha Medical University, Changsha, China
| | - Qingqing Lu
- Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao, China.,Geneis Beijing Co., Ltd., Beijing, China
| | - Geng Tian
- Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao, China.,Geneis Beijing Co., Ltd., Beijing, China
| | - Bin-Sheng He
- Academician Workstation, Changsha Medical University, Changsha, China.,Neuroscience Research Center & Department of Basic Medical Sciences, Changsha Medical University, Changsha, China
| | - Jialiang Yang
- Academician Workstation, Changsha Medical University, Changsha, China.,Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao, China.,Geneis Beijing Co., Ltd., Beijing, China
| |
Collapse
|
12
|
Zhang Y, Xiang J, Tang L, Li J, Lu Q, Tian G, He BS, Yang J. Identifying Breast Cancer-Related Genes Based on a Novel Computational Framework Involving KEGG Pathways and PPI Network Modularity. Front Genet 2021; 12:596794. [PMID: 34484285 DOI: 10.3389/fgene.2021.596794/full] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2020] [Accepted: 05/05/2021] [Indexed: 05/28/2023] Open
Abstract
Complex diseases, such as breast cancer, are often caused by mutations of multiple functional genes. Identifying disease-related genes is a critical and challenging task for unveiling the biological mechanisms behind these diseases. In this study, we develop a novel computational framework to analyze the network properties of the known breast cancer-associated genes, based on which we develop a random-walk-with-restart (RCRWR) algorithm to predict novel disease genes. Specifically, we first curated a set of breast cancer-associated genes from the Genome-Wide Association Studies catalog and Online Mendelian Inheritance in Man database and then studied the distribution of these genes on an integrated protein-protein interaction (PPI) network. We found that the breast cancer-associated genes are significantly closer to each other than random, which confirms the modularity property of disease genes in a PPI network as revealed by previous studies. We then retrieved PPI subnetworks spanning top breast cancer-associated KEGG pathways and found that the distribution of these genes on the subnetworks are non-random, suggesting that these KEGG pathways are activated non-uniformly. Taking advantage of the non-random distribution of breast cancer-associated genes, we developed an improved RCRWR algorithm to predict novel cancer genes, which integrates network reconstruction based on local random walk dynamics and subnetworks spanning KEGG pathways. Compared with the disease gene prediction without using the information from the KEGG pathways, this method has a better prediction performance on inferring breast cancer-associated genes, and the top predicted genes are better enriched on known breast cancer-associated gene ontologies. Finally, we performed a literature search on top predicted novel genes and found that most of them are supported by at least wet-lab experiments on cell lines. In summary, we propose a robust computational framework to prioritize novel breast cancer-associated genes, which could be used for further in vitro and in vivo experimental validation.
Collapse
Affiliation(s)
- Yan Zhang
- School of Computer Science and Engineering, Central South University, Changsha, China
- School of Information Science and Engineering, Changsha Medical University, Changsha, China
- Academician Workstation, Changsha Medical University, Changsha, China
| | - Ju Xiang
- School of Computer Science and Engineering, Central South University, Changsha, China
- Academician Workstation, Changsha Medical University, Changsha, China
- Neuroscience Research Center & Department of Basic Medical Sciences, Changsha Medical University, Changsha, China
| | - Liang Tang
- Neuroscience Research Center & Department of Basic Medical Sciences, Changsha Medical University, Changsha, China
| | - Jianming Li
- Neuroscience Research Center & Department of Basic Medical Sciences, Changsha Medical University, Changsha, China
| | - Qingqing Lu
- Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao, China
- Geneis Beijing Co., Ltd., Beijing, China
| | - Geng Tian
- Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao, China
- Geneis Beijing Co., Ltd., Beijing, China
| | - Bin-Sheng He
- Academician Workstation, Changsha Medical University, Changsha, China
- Neuroscience Research Center & Department of Basic Medical Sciences, Changsha Medical University, Changsha, China
| | - Jialiang Yang
- Academician Workstation, Changsha Medical University, Changsha, China
- Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao, China
- Geneis Beijing Co., Ltd., Beijing, China
| |
Collapse
|
13
|
Harikumar H, Quinn TP, Rana S, Gupta S, Venkatesh S. Personalized single-cell networks: a framework to predict the response of any gene to any drug for any patient. BioData Min 2021; 14:37. [PMID: 34353329 PMCID: PMC8340371 DOI: 10.1186/s13040-021-00263-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2021] [Accepted: 05/10/2021] [Indexed: 11/15/2022] Open
Abstract
BACKGROUND The last decade has seen a major increase in the availability of genomic data. This includes expert-curated databases that describe the biological activity of genes, as well as high-throughput assays that measure gene expression in bulk tissue and single cells. Integrating these heterogeneous data sources can generate new hypotheses about biological systems. Our primary objective is to combine population-level drug-response data with patient-level single-cell expression data to predict how any gene will respond to any drug for any patient. METHODS We take 2 approaches to benchmarking a "dual-channel" random walk with restart (RWR) for data integration. First, we evaluate how well RWR can predict known gene functions from single-cell gene co-expression networks. Second, we evaluate how well RWR can predict known drug responses from individual cell networks. We then present two exploratory applications. In the first application, we combine the Gene Ontology database with glioblastoma single cells from 5 individual patients to identify genes whose functions differ between cancers. In the second application, we combine the LINCS drug-response database with the same glioblastoma data to identify genes that may exhibit patient-specific drug responses. CONCLUSIONS Our manuscript introduces two innovations to the integration of heterogeneous biological data. First, we use a "dual-channel" method to predict up-regulation and down-regulation separately. Second, we use individualized single-cell gene co-expression networks to make personalized predictions. These innovations let us predict gene function and drug response for individual patients. Taken together, our work shows promise that single-cell co-expression data could be combined in heterogeneous information networks to facilitate precision medicine.
Collapse
Affiliation(s)
- Haripriya Harikumar
- Applied Artificial Intelligence Institute, Deakin University, Geelong, Australia.
- Institute for Health Transformation, Deakin University, Geelong, Australia.
| | - Thomas P Quinn
- Applied Artificial Intelligence Institute, Deakin University, Geelong, Australia.
| | - Santu Rana
- Applied Artificial Intelligence Institute, Deakin University, Geelong, Australia
| | - Sunil Gupta
- Applied Artificial Intelligence Institute, Deakin University, Geelong, Australia
| | - Svetha Venkatesh
- Applied Artificial Intelligence Institute, Deakin University, Geelong, Australia
| |
Collapse
|
14
|
Liu L, Shao Z, Lv J, Xu F, Ren S, Jin Q, Yang J, Ma W, Xie H, Zhang D, Chen X. Identification of Early Warning Signals at the Critical Transition Point of Colorectal Cancer Based on Dynamic Network Analysis. Front Bioeng Biotechnol 2020; 8:530. [PMID: 32548109 PMCID: PMC7272579 DOI: 10.3389/fbioe.2020.00530] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2019] [Accepted: 05/04/2020] [Indexed: 12/22/2022] Open
Abstract
Colorectal cancer (CRC) is one of the leading causes of cancer-related death worldwide. Due to the lack of early diagnosis methods and warning signals of CRC and its strong heterogeneity, the determination of accurate treatments for CRC and the identification of specific early warning signals are still urgent problems for researchers. In this study, the expression profiles of cancer tissues and the expression profiles of tumor-adjacent tissues in 28 CRC patients were combined into a human protein–protein interaction (PPI) network to construct a specific network for each patient. A network propagation method was used to obtain a mutant giant cluster (GC) containing more than 90% of the mutation information of one patient. Next, mutation selection rules were applied to the GC to mine the mutation sequence of driver genes in each CRC patient. The mutation sequences from patients with the same type CRC were integrated to obtain the mutation sequences of driver genes of different types of CRC, which provide a reference for the diagnosis of clinical CRC disease progression. Finally, dynamic network analysis was used to mine dynamic network biomarkers (DNBs) in CRC patients. These DNBs were verified by clinical staging data to identify the critical transition point between the pre-disease state and the disease state in tumor progression. Twelve known drug targets were found in the DNBs, and 6 of them have been used as targets for anticancer drugs for clinical treatment. This study provides important information for the prognosis, diagnosis and treatment of CRC, especially for pre-emptive treatments. It is of great significance for reducing the incidence and mortality of CRC.
Collapse
Affiliation(s)
- Lei Liu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Zhuo Shao
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Jiaxuan Lv
- School of Stomatology, Harbin Medical University, Harbin, China
| | - Fei Xu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Sibo Ren
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Qing Jin
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Jingbo Yang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Weifang Ma
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Hongbo Xie
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Denan Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Xiujie Chen
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| |
Collapse
|
15
|
Phylogenetic Analysis of HIV-1 Genomes Based on the Position-Weighted K-mers Method. ENTROPY 2020; 22:e22020255. [PMID: 33286029 PMCID: PMC7516702 DOI: 10.3390/e22020255] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/17/2020] [Revised: 02/07/2020] [Accepted: 02/20/2020] [Indexed: 12/31/2022]
Abstract
HIV-1 viruses, which are predominant in the family of HIV viruses, have strong pathogenicity and infectivity. They can evolve into many different variants in a very short time. In this study, we propose a new and effective alignment-free method for the phylogenetic analysis of HIV-1 viruses using complete genome sequences. Our method combines the position distribution information and the counts of the k-mers together. We also propose a metric to determine the optimal k value. We name our method the Position-Weighted k-mers (PWkmer) method. Validation and comparison with the Robinson-Foulds distance method and the modified bootstrap method on a benchmark dataset show that our method is reliable for the phylogenetic analysis of HIV-1 viruses. PWkmer can resolve within-group variations for different known subtypes of Group M of HIV-1 viruses. This method is simple and computationally fast for whole genome phylogenetic analysis.
Collapse
|
16
|
Yu L, Shen X, Zhong D, Yang J. Three-Layer Heterogeneous Network Combined With Unbalanced Random Walk for miRNA-Disease Association Prediction. Front Genet 2020; 10:1316. [PMID: 31998371 PMCID: PMC6967737 DOI: 10.3389/fgene.2019.01316] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2019] [Accepted: 12/02/2019] [Indexed: 12/19/2022] Open
Abstract
miRNA plays an important role in many biological processes, and increasing evidence shows that miRNAs are closely related to human diseases. Most existing miRNA-disease association prediction methods were only based on data related to miRNAs and diseases and failed to effectively use other existing biological data. However, experimentally verified miRNA-disease associations are limited, there are complex correlations between biological data. Therefore, we propose a novel Three-layer heterogeneous network Combined with unbalanced Random Walk for MiRNA-Disease Association prediction algorithm (TCRWMDA), which can effectively integrate multi-source association data. TCRWMDA based not only on the known miRNA-disease associations, also add the new priori information (lncRNA-miRNA and lncRNA-disease associations) to build a three-layer heterogeneous network, lncRNA was added as the transition path of the intermediate point to mine more effective information between networks. The AUC value obtained by the TCRWMDA algorithm on 5-fold cross validation is 0.9209, compared with other models based on the same similarity calculation method, TCRWMDA obtained better results. TCRWMDA was applied to the analysis of four types of cancer, the results proved that TCRWMDA is an effective tool to predict the potential miRNA-disease association. The source code and dataset of TCRWMDA are available at: https://github.com/ylm0505/TCRWMDA.
Collapse
Affiliation(s)
- Limin Yu
- School of Computer, Central China Normal University, Wuhan, China
- Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning, Central China Normal University, Wuhan, China
| | - Xianjun Shen
- School of Computer, Central China Normal University, Wuhan, China
- Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning, Central China Normal University, Wuhan, China
| | - Duo Zhong
- School of Computer, Central China Normal University, Wuhan, China
- Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning, Central China Normal University, Wuhan, China
| | - Jincai Yang
- School of Computer, Central China Normal University, Wuhan, China
| |
Collapse
|
17
|
Wen Y, Han G, Anh VV. Laplacian normalization and bi-random walks on heterogeneous networks for predicting lncRNA-disease associations. BMC SYSTEMS BIOLOGY 2018; 12:122. [PMID: 30598088 PMCID: PMC6311918 DOI: 10.1186/s12918-018-0660-0] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
BACKGROUND Evidences have increasingly indicated that lncRNAs (long non-coding RNAs) are deeply involved in important biological regulation processes leading to various human complex diseases. Experimental investigations of these disease associated lncRNAs are slow with high costs. Computational methods to infer potential associations between lncRNAs and diseases have become an effective prior-pinpointing approach to the experimental verification. RESULTS In this study, we develop a novel method for the prediction of lncRNA-disease associations using bi-random walks on a network merging the similarities of lncRNAs and diseases. Particularly, this method applies a Laplacian technique to normalize the lncRNA similarity matrix and the disease similarity matrix before the construction of the lncRNA similarity network and disease similarity network. The two networks are then connected via existing lncRNA-disease associations. After that, bi-random walks are applied on the heterogeneous network to predict the potential associations between the lncRNAs and the diseases. Experimental results demonstrate that the performance of our method is highly comparable to or better than the state-of-the-art methods for predicting lncRNA-disease associations. Our analyses on three cancer data sets (breast cancer, lung cancer, and liver cancer) also indicate the usefulness of our method in practical applications. CONCLUSIONS Our proposed method, including the construction of the lncRNA similarity network and disease similarity network and the bi-random walks algorithm on the heterogeneous network, could be used for prediction of potential associations between the lncRNAs and the diseases.
Collapse
Affiliation(s)
- Yaping Wen
- School of Mathematics and Computational Science, Xiangtan University, Hunan, 411105, China
| | - Guosheng Han
- School of Mathematics and Computational Science, Xiangtan University, Hunan, 411105, China.
| | - Vo V Anh
- School of Mathematics and Computational Science, Xiangtan University, Hunan, 411105, China.,Department of Mathematics, Swinburne University of Technology, PO Box 218, Hawthorn, Vic 3122, Australia
| |
Collapse
|
18
|
|
19
|
Yoo S, Kim K, Nam H, Lee D. Discovering Health Benefits of Phytochemicals with Integrated Analysis of the Molecular Network, Chemical Properties and Ethnopharmacological Evidence. Nutrients 2018; 10:nu10081042. [PMID: 30096807 PMCID: PMC6115900 DOI: 10.3390/nu10081042] [Citation(s) in RCA: 34] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2018] [Revised: 08/03/2018] [Accepted: 08/06/2018] [Indexed: 12/18/2022] Open
Abstract
Identifying the health benefits of phytochemicals is an essential step in drug and functional food development. While many in vitro screening methods have been developed to identify the health effects of phytochemicals, there is still room for improvement because of high cost and low productivity. Therefore, researchers have alternatively proposed in silico methods, primarily based on three types of approaches; utilizing molecular, chemical or ethnopharmacological information. Although each approach has its own strength in analyzing the characteristics of phytochemicals, previous studies have not considered them all together. Here, we apply an integrated in silico analysis to identify the potential health benefits of phytochemicals based on molecular analysis and chemical properties as well as ethnopharmacological evidence. From the molecular analysis, we found an average of 415.6 health effects for 591 phytochemicals. We further investigated ethnopharmacological evidence of phytochemicals and found that on average 129.1 (31%) of the predicted health effects had ethnopharmacological evidence. Lastly, we investigated chemical properties to confirm whether they are orally bio-available, drug available or effective on certain tissues. The evaluation results indicate that the health effects can be predicted more accurately by cooperatively considering the molecular analysis, chemical properties and ethnopharmacological evidence.
Collapse
Affiliation(s)
- Sunyong Yoo
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea.
- Bio-Synergy Research Center, Daejeon 34141, Korea.
| | - Kwansoo Kim
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea.
- Bio-Synergy Research Center, Daejeon 34141, Korea.
| | - Hojung Nam
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology (GIST), Gwangju 61005, Korea.
| | - Doheon Lee
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea.
- Bio-Synergy Research Center, Daejeon 34141, Korea.
| |
Collapse
|
20
|
Valdeolivas A, Tichit L, Navarro C, Perrin S, Odelin G, Levy N, Cau P, Remy E, Baudot A. Random walk with restart on multiplex and heterogeneous biological networks. Bioinformatics 2018; 35:497-505. [DOI: 10.1093/bioinformatics/bty637] [Citation(s) in RCA: 111] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2017] [Accepted: 07/16/2018] [Indexed: 01/04/2023] Open
Affiliation(s)
- Alberto Valdeolivas
- Aix Marseille Univ, CNRS, Centrale Marseille, I2M, Marseille, France
- ProGeLife, Marseille
| | - Laurent Tichit
- Aix Marseille Univ, CNRS, Centrale Marseille, I2M, Marseille, France
| | - Claire Navarro
- ProGeLife, Marseille
- Aix Marseille Univ, INSERM, MMG, Marseille, France
| | - Sophie Perrin
- ProGeLife, Marseille
- Aix Marseille Univ, INSERM, MMG, Marseille, France
| | - Gaëlle Odelin
- ProGeLife, Marseille
- Aix Marseille Univ, INSERM, MMG, Marseille, France
| | - Nicolas Levy
- Aix Marseille Univ, INSERM, MMG, Marseille, France
| | - Pierre Cau
- ProGeLife, Marseille
- Aix Marseille Univ, INSERM, MMG, Marseille, France
| | - Elisabeth Remy
- Aix Marseille Univ, CNRS, Centrale Marseille, I2M, Marseille, France
| | - Anaïs Baudot
- Aix Marseille Univ, CNRS, Centrale Marseille, I2M, Marseille, France
| |
Collapse
|
21
|
Genome-wide predicting disease-related protein complexes by walking on the heterogeneous network based on data integration and laplacian normalization. Comput Biol Chem 2017; 69:41-47. [DOI: 10.1016/j.compbiolchem.2017.04.007] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2016] [Revised: 04/08/2017] [Accepted: 04/12/2017] [Indexed: 11/20/2022]
|
22
|
Liu Z, Hu J. Mislocalization-related disease gene discovery using gene expression based computational protein localization prediction. Methods 2015; 93:119-27. [PMID: 26416496 DOI: 10.1016/j.ymeth.2015.09.022] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2015] [Revised: 09/17/2015] [Accepted: 09/21/2015] [Indexed: 01/09/2023] Open
Abstract
Protein sorting is an important mechanism for transporting proteins to their target subcellular locations after their synthesis. Mutations on genes may disrupt the well regulated protein sorting process, leading to a variety of mislocation related diseases. This paper proposes a methodology to discover such disease genes based on gene expression data and computational protein localization prediction. A kernel logistic regression based algorithm is used to successfully identify several candidate cancer genes which may cause cancers due to their mislocation within the cell. Our results also showed that compared to the gene co-expression network defined on Pearson correlation coefficients, the nonlinear Maximum Correlation Coefficients (MIC) based co-expression network give better results for subcellular localization prediction.
Collapse
Affiliation(s)
- Zhonghao Liu
- Department of Computer Science & Engineering, University of South Carolina, 301 Main Street, Columbia, SC 29208, United States
| | - Jianjun Hu
- Department of Computer Science & Engineering, University of South Carolina, 301 Main Street, Columbia, SC 29208, United States.
| |
Collapse
|