1
|
Zhang F, Zhang Y, Zhu X, Chen X, Lu F, Zhang X. DeepSG2PPI: A Protein-Protein Interaction Prediction Method Based on Deep Learning. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:2907-2919. [PMID: 37079417 DOI: 10.1109/tcbb.2023.3268661] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
Protein-protein interaction (PPI) plays an important role in almost all life activities. Many protein interaction sites have been confirmed by biological experiments, but these PPI site identification methods are time-consuming and expensive. In this study, a deep learning-based PPI prediction method, named DeepSG2PPI, is developed. First, the protein sequence information is retrieved and the local context information of each amino acid residue is calculated. A two-dimensional convolutional neural network (2D-CNN) model is employed to extract features from a two-channel coding structure, in which an attention mechanism is embedded to assign higher weights to key features. Second, the global statistical information of each amino acid residue and the relationship graph between the protein and GO (Gene Ontology) function annotation are built, and the graph embedding vector is constructed to represent the biological features of the protein. Finally, a 2D-CNN model and two 1D-CNN models are combined for PPI prediction. The comparison analysis with existing algorithms shows that the DeepSG2PPI method has better performance. It provides more accurate and effective PPI site prediction, which will be helpful in reducing the cost and failure rate of biological experiments.
Collapse
|
2
|
Li H, Wang S, Liu B, Fang M, Cao R, He B, Liu S, Hu C, Dong D, Wang X, Wang H, Tian J. A multi-view co-training network for semi-supervised medical image-based prognostic prediction. Neural Netw 2023; 164:455-463. [PMID: 37182347 DOI: 10.1016/j.neunet.2023.04.030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2022] [Revised: 03/07/2023] [Accepted: 04/18/2023] [Indexed: 05/16/2023]
Abstract
Prognostic prediction has long been a hotspot in disease analysis and management, and the development of image-based prognostic prediction models has significant clinical implications for current personalized treatment strategies. The main challenge in prognostic prediction is to model a regression problem based on censored observations, and semi-supervised learning has the potential to play an important role in improving the utilization efficiency of censored data. However, there are yet few effective semi-supervised paradigms to be applied. In this paper, we propose a semi-supervised co-training deep neural network incorporating a support vector regression layer for survival time estimation (Co-DeepSVS) that improves the efficiency in utilizing censored data for prognostic prediction. First, we introduce a support vector regression layer in deep neural networks to deal with censored data and directly predict survival time, and more importantly to calculate the labeling confidence of each case. Then, we apply a semi-supervised multi-view co-training framework to achieve accurate prognostic prediction, where labeling confidence estimation with prior knowledge of pseudo time is conducted for each view. Experimental results demonstrate that the proposed Co-DeepSVS has a promising prognostic ability and surpasses most widely used methods on a multi-phase CT dataset. Besides, the introduction of SVR layer makes the model more robust in the presence of follow-up bias.
Collapse
Affiliation(s)
- Hailin Li
- Beijing Advanced Innovation Center for Big Data-Based Precision Medicine, School of Engineering Medicine, Beihang University, Beijing, 100191, China; CAS Key Laboratory of Molecular Imaging, Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China
| | - Siwen Wang
- CAS Key Laboratory of Molecular Imaging, Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China; School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Bo Liu
- Lanzhou University Second Hospital, Lanzhou, 730050, Gansu, China; Department of Radiology, Shandong Provincial Hospital Affiliated to Shandong First Medical University, Shandong University, Jinan, 250021, Shandong, China
| | - Mengjie Fang
- Beijing Advanced Innovation Center for Big Data-Based Precision Medicine, School of Engineering Medicine, Beihang University, Beijing, 100191, China; CAS Key Laboratory of Molecular Imaging, Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China
| | - Runnan Cao
- CAS Key Laboratory of Molecular Imaging, Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China; School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Bingxi He
- Beijing Advanced Innovation Center for Big Data-Based Precision Medicine, School of Engineering Medicine, Beihang University, Beijing, 100191, China; CAS Key Laboratory of Molecular Imaging, Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China
| | - Shengyuan Liu
- CAS Key Laboratory of Molecular Imaging, Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China; School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Chaoen Hu
- CAS Key Laboratory of Molecular Imaging, Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China
| | - Di Dong
- CAS Key Laboratory of Molecular Imaging, Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China; School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, 100049, China.
| | - Ximing Wang
- Department of Radiology, Shandong Provincial Hospital Affiliated to Shandong First Medical University, Shandong University, Jinan, 250021, Shandong, China.
| | - Hexiang Wang
- Department of Radiology, The Affiliated Hospital of Qingdao University, Qingdao, 266000, Shandong, China.
| | - Jie Tian
- Beijing Advanced Innovation Center for Big Data-Based Precision Medicine, School of Engineering Medicine, Beihang University, Beijing, 100191, China; CAS Key Laboratory of Molecular Imaging, Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China.
| |
Collapse
|
3
|
Moulick D, Bhutia KL, Sarkar S, Roy A, Mishra UN, Pramanick B, Maitra S, Shankar T, Hazra S, Skalicky M, Brestic M, Barek V, Hossain A. The intertwining of Zn-finger motifs and abiotic stress tolerance in plants: Current status and future prospects. FRONTIERS IN PLANT SCIENCE 2023; 13:1083960. [PMID: 36684752 PMCID: PMC9846276 DOI: 10.3389/fpls.2022.1083960] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/29/2022] [Accepted: 11/22/2022] [Indexed: 06/17/2023]
Abstract
Environmental stresses such as drought, high salinity, and low temperature can adversely modulate the field crop's ability by altering the morphological, physiological, and biochemical processes of the plants. It is estimated that about 50% + of the productivity of several crops is limited due to various types of abiotic stresses either presence alone or in combination (s). However, there are two ways plants can survive against these abiotic stresses; a) through management practices and b) through adaptive mechanisms to tolerate plants. These adaptive mechanisms of tolerant plants are mostly linked to their signalling transduction pathway, triggering the action of plant transcription factors and controlling the expression of various stress-regulated genes. In recent times, several studies found that Zn-finger motifs have a significant function during abiotic stress response in plants. In the first report, a wide range of Zn-binding motifs has been recognized and termed Zn-fingers. Since the zinc finger motifs regulate the function of stress-responsive genes. The Zn-finger was first reported as a repeated Zn-binding motif, comprising conserved cysteine (Cys) and histidine (His) ligands, in Xenopus laevis oocytes as a transcription factor (TF) IIIA (or TFIIIA). In the proteins where Zn2+ is mainly attached to amino acid residues and thus espousing a tetrahedral coordination geometry. The physical nature of Zn-proteins, defining the attraction of Zn-proteins for Zn2+, is crucial for having an in-depth knowledge of how a Zn2+ facilitates their characteristic function and how proteins control its mobility (intra and intercellular) as well as cellular availability. The current review summarized the concept, importance and mechanisms of Zn-finger motifs during abiotic stress response in plants.
Collapse
Affiliation(s)
- Debojyoti Moulick
- Department of Environmental Science, University of Kalyani, Nadia, West Bengal, India
| | - Karma Landup Bhutia
- Department of Agricultural Biotechnology & Molecular Breeding, College of Basic Science and Humanities, Dr. Rajendra Prasad Central Agricultural University, Samastipur, India
| | - Sukamal Sarkar
- School of Agriculture and Rural Development, Faculty Centre for Integrated Rural Development and Management (IRDM), Ramakrishna Mission Vivekananda Educational and Research Institute, Ramakrishna Mission Ashrama, Narendrapur, Kolkata, India
| | - Anirban Roy
- School of Agriculture and Rural Development, Faculty Centre for Integrated Rural Development and Management (IRDM), Ramakrishna Mission Vivekananda Educational and Research Institute, Ramakrishna Mission Ashrama, Narendrapur, Kolkata, India
| | - Udit Nandan Mishra
- Department of Crop Physiology and Biochemistry, Sri University, Cuttack, Odisha, India
| | - Biswajit Pramanick
- Department of Agronomy, Dr. Rajendra Prasad Central Agricultural University, PUSA, Samastipur, Bihar, India
- Department of Agronomy and Horticulture, University of Nebraska Lincoln, Scottsbluff, NE, United States
| | - Sagar Maitra
- Department of Agronomy and Agroforestry, Centurion University of Technology and Management, Paralakhemundi, Odisha, India
| | - Tanmoy Shankar
- Department of Agronomy and Agroforestry, Centurion University of Technology and Management, Paralakhemundi, Odisha, India
| | - Swati Hazra
- School of Agricultural Sciences, Sharda University, Greater Noida, Uttar Pradesh, India
| | - Milan Skalicky
- Department of Botany and Plant Physiology, Faculty of Agrobiology, Food, and Natural Resources, Czech University of Life Sciences Prague, Prague, Czechia
| | - Marian Brestic
- Department of Botany and Plant Physiology, Faculty of Agrobiology, Food, and Natural Resources, Czech University of Life Sciences Prague, Prague, Czechia
- Institute of Plant and Environmental Sciences, Slovak University of Agriculture, Nitra, Slovakia
| | - Viliam Barek
- Department of Water Resources and Environmental Engineering, Faculty of Horticulture and Landscape Engineering, Slovak University of Agriculture, Nitra, Slovakia
| | - Akbar Hossain
- Division of Agronomy, Bangladesh Wheat and Maize Research Institute, Dinajpur, Bangladesh
| |
Collapse
|
4
|
Kirk D, Kok E, Tufano M, Tekinerdogan B, Feskens EJM, Camps G. Machine Learning in Nutrition Research. Adv Nutr 2022; 13:2573-2589. [PMID: 36166846 PMCID: PMC9776646 DOI: 10.1093/advances/nmac103] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2022] [Revised: 08/02/2022] [Accepted: 09/22/2022] [Indexed: 01/29/2023] Open
Abstract
Data currently generated in the field of nutrition are becoming increasingly complex and high-dimensional, bringing with them new methods of data analysis. The characteristics of machine learning (ML) make it suitable for such analysis and thus lend itself as an alternative tool to deal with data of this nature. ML has already been applied in important problem areas in nutrition, such as obesity, metabolic health, and malnutrition. Despite this, experts in nutrition are often without an understanding of ML, which limits its application and therefore potential to solve currently open questions. The current article aims to bridge this knowledge gap by supplying nutrition researchers with a resource to facilitate the use of ML in their research. ML is first explained and distinguished from existing solutions, with key examples of applications in the nutrition literature provided. Two case studies of domains in which ML is particularly applicable, precision nutrition and metabolomics, are then presented. Finally, a framework is outlined to guide interested researchers in integrating ML into their work. By acting as a resource to which researchers can refer, we hope to support the integration of ML in the field of nutrition to facilitate modern research.
Collapse
Affiliation(s)
- Daniel Kirk
- Division of Human Nutrition and Health, Wageningen University and Research, Wageningen, The Netherlands
| | - Esther Kok
- Division of Human Nutrition and Health, Wageningen University and Research, Wageningen, The Netherlands
| | - Michele Tufano
- Division of Human Nutrition and Health, Wageningen University and Research, Wageningen, The Netherlands
| | - Bedir Tekinerdogan
- Information Technology Group, Wageningen University and Research, Wageningen, The Netherlands
| | - Edith J M Feskens
- Division of Human Nutrition and Health, Wageningen University and Research, Wageningen, The Netherlands
| | - Guido Camps
- Division of Human Nutrition and Health, Wageningen University and Research, Wageningen, The Netherlands.,OnePlanet Research Center, Wageningen, The Netherlands
| |
Collapse
|
5
|
IoMT-Based Mitochondrial and Multifactorial Genetic Inheritance Disorder Prediction Using Machine Learning. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:2650742. [PMID: 35909844 PMCID: PMC9334098 DOI: 10.1155/2022/2650742] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/15/2022] [Accepted: 07/04/2022] [Indexed: 11/18/2022]
Abstract
A genetic disorder is a serious disease that affects a large number of individuals around the world. There are various types of genetic illnesses, however, we focus on mitochondrial and multifactorial genetic disorders for prediction. Genetic illness is caused by a number of factors, including a defective maternal or paternal gene, excessive abortions, a lack of blood cells, and low white blood cell count. For premature or teenage life development, early detection of genetic diseases is crucial. Although it is difficult to forecast genetic disorders ahead of time, this prediction is very critical since a person's life progress depends on it. Machine learning algorithms are used to diagnose genetic disorders with high accuracy utilizing datasets collected and constructed from a large number of patient medical reports. A lot of studies have been conducted recently employing genome sequencing for illness detection, but fewer studies have been presented using patient medical history. The accuracy of existing studies that use a patient's history is restricted. The internet of medical things (IoMT) based proposed model for genetic disease prediction in this article uses two separate machine learning algorithms: support vector machine (SVM) and K-Nearest Neighbor (KNN). Experimental results show that SVM has outperformed the KNN and existing prediction methods in terms of accuracy. SVM achieved an accuracy of 94.99% and 86.6% for training and testing, respectively.
Collapse
|
6
|
Liu H, Hou L, Xu S, Li H, Chen X, Gao J, Wang Z, Han B, Liu X, Wan S. Discovering Cerebral Ischemic Stroke Associated Genes Based on Network Representation Learning. Front Genet 2021; 12:728333. [PMID: 34539754 PMCID: PMC8442767 DOI: 10.3389/fgene.2021.728333] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2021] [Accepted: 07/26/2021] [Indexed: 11/13/2022] Open
Abstract
Cerebral ischemic stroke (IS) is a complex disease caused by multiple factors including vascular risk factors, genetic factors, and environment factors, which accentuates the difficulty in discovering corresponding disease-related genes. Identifying the genes associated with IS is critical for understanding the biological mechanism of IS, which would be significantly beneficial to the diagnosis and clinical treatment of cerebral IS. However, existing methods to predict IS-related genes are mainly based on the hypothesis of guilt-by-association (GBA). These methods cannot capture the global structure information of the whole protein-protein interaction (PPI) network. Inspired by the success of network representation learning (NRL) in the field of network analysis, we apply NRL to the discovery of disease-related genes and launch the framework to identify the disease-related genes of cerebral IS. The utilized framework contains three main parts: capturing the topological information of the PPI network with NRL, denoising the gene feature with the participation of a stacked autoencoder (SAE), and optimizing a support vector machine (SVM) classifier to identify IS-related genes. Superior to the existing methods on IS-related gene prediction, our framework presents more accurate results. The case study also shows that the proposed method can identify IS-related genes.
Collapse
Affiliation(s)
- Haijie Liu
- Department of Neurology, Xuanwu Hospital, Capital Medical University, Beijing, China
| | - Liping Hou
- Department of Clinical Laboratory, General Hospital of Heilongjiang Province Land Reclamation Bureau, Harbin, China
| | - Shanhu Xu
- Affiliated Zhejiang Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - He Li
- Department of Automation, College of Information Science and Engineering, Tianjin Tianshi College, Tianjin, China
| | - Xiuju Chen
- Department of Neurology, Tianjin Nankai Hospital, Tianjin, China
| | - Juan Gao
- Department of Neurology, Baoding No. 1 Central Hospital, Baoding, China
| | - Ziwen Wang
- Graduate School of Chengde Medical College, Chengde, China
| | - Bo Han
- Department of Neurology, Xuanwu Hospital, Capital Medical University, Beijing, China
| | - Xiaoli Liu
- Affiliated Zhejiang Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Shu Wan
- Affiliated Zhejiang Hospital, Zhejiang University School of Medicine, Hangzhou, China
| |
Collapse
|
7
|
Le DH. Machine learning-based approaches for disease gene prediction. Brief Funct Genomics 2020; 19:350-363. [PMID: 32567652 DOI: 10.1093/bfgp/elaa013] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2020] [Revised: 04/30/2020] [Accepted: 05/09/2020] [Indexed: 12/20/2022] Open
Abstract
Disease gene prediction is an essential issue in biomedical research. In the early days, annotation-based approaches were proposed for this problem. With the development of high-throughput technologies, interaction data between genes/proteins have grown quickly and covered almost genome and proteome; thus, network-based methods for the problem become prominent. In parallel, machine learning techniques, which formulate the problem as a classification, have also been proposed. Here, we firstly show a roadmap of the machine learning-based methods for the disease gene prediction. In the beginning, the problem was usually approached using a binary classification, where positive and negative training sample sets are comprised of disease genes and non-disease genes, respectively. The disease genes are ones known to be associated with diseases; meanwhile, non-disease genes were randomly selected from those not yet known to be associated with diseases. However, the later may contain unknown disease genes. To overcome this uncertainty of defining the non-disease genes, more realistic approaches have been proposed for the problem, such as unary and semi-supervised classification. Recently, more advanced methods, including ensemble learning, matrix factorization and deep learning, have been proposed for the problem. Secondly, 12 representative machine learning-based methods for the disease gene prediction were examined and compared in terms of prediction performance and running time. Finally, their advantages, disadvantages, interpretability and trust were also analyzed and discussed.
Collapse
Affiliation(s)
- Duc-Hau Le
- Department of Computational Biomedicine, Vingroup Big Data Institute, Hanoi, Vietnam
| |
Collapse
|
8
|
Association extraction from biomedical literature based on representation and transfer learning. J Theor Biol 2020; 488:110112. [DOI: 10.1016/j.jtbi.2019.110112] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2019] [Accepted: 12/08/2019] [Indexed: 12/17/2022]
|
9
|
Network modeling of patients' biomolecular profiles for clinical phenotype/outcome prediction. Sci Rep 2020; 10:3612. [PMID: 32107391 PMCID: PMC7046773 DOI: 10.1038/s41598-020-60235-8] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2019] [Accepted: 11/05/2019] [Indexed: 12/15/2022] Open
Abstract
Methods for phenotype and outcome prediction are largely based on inductive supervised models that use selected biomarkers to make predictions, without explicitly considering the functional relationships between individuals. We introduce a novel network-based approach named Patient-Net (P-Net) in which biomolecular profiles of patients are modeled in a graph-structured space that represents gene expression relationships between patients. Then a kernel-based semi-supervised transductive algorithm is applied to the graph to explore the overall topology of the graph and to predict the phenotype/clinical outcome of patients. Experimental tests involving several publicly available datasets of patients afflicted with pancreatic, breast, colon and colorectal cancer show that our proposed method is competitive with state-of-the-art supervised and semi-supervised predictive systems. Importantly, P-Net also provides interpretable models that can be easily visualized to gain clues about the relationships between patients, and to formulate hypotheses about their stratification.
Collapse
|
10
|
Mao Y, Fisher DW, Yang S, Keszycki RM, Dong H. Protein-protein interactions underlying the behavioral and psychological symptoms of dementia (BPSD) and Alzheimer's disease. PLoS One 2020; 15:e0226021. [PMID: 31951614 PMCID: PMC6968845 DOI: 10.1371/journal.pone.0226021] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2019] [Accepted: 11/19/2019] [Indexed: 12/25/2022] Open
Abstract
Alzheimer’s Disease (AD) is a devastating neurodegenerative disorder currently affecting 45 million people worldwide, ranking as the 6th highest cause of death. Throughout the development and progression of AD, over 90% of patients display behavioral and psychological symptoms of dementia (BPSD), with some of these symptoms occurring before memory deficits and therefore serving as potential early predictors of AD-related cognitive decline. However, the biochemical links between AD and BPSD are not known. In this study, we explored the molecular interactions between AD and BPSD using protein-protein interaction (PPI) networks built from OMIM (Online Mendelian Inheritance in Man) genes that were related to AD and two distinct BPSD domains, the Affective Domain and the Hyperactivity, Impulsivity, Disinhibition, and Aggression (HIDA) Domain. Our results yielded 8 unique proteins for the Affective Domain (RHOA, GRB2, PIK3R1, HSPA4, HSP90AA1, GSK3beta, PRKCZ, and FYN), 5 unique proteins for the HIDA Domain (LRP1, EGFR, YWHAB, SUMO1, and EGR1), and 6 shared proteins between both BPSD domains (APP, UBC, ELAV1, YWHAZ, YWHAE, and SRC) and AD. These proteins might suggest specific targets and pathways that are involved in the pathogenesis of these BPSD domains in AD.
Collapse
Affiliation(s)
- Yimin Mao
- School of Information and Technology, Jiangxi University of Science and Technology, Jiangxi, China
- Applied Science Institute, Jiangxi University of Science and Technology, Jiangxi, China
| | - Daniel W. Fisher
- Department of Psychiatry and Behavioral Sciences, Feinberg School of Medicine, Northwestern University, Chicago, Illinois, United States of America
| | - Shuxing Yang
- School of Information and Technology, Jiangxi University of Science and Technology, Jiangxi, China
| | - Rachel M. Keszycki
- Department of Psychiatry and Behavioral Sciences, Feinberg School of Medicine, Northwestern University, Chicago, Illinois, United States of America
| | - Hongxin Dong
- Department of Psychiatry and Behavioral Sciences, Feinberg School of Medicine, Northwestern University, Chicago, Illinois, United States of America
- * E-mail:
| |
Collapse
|
11
|
Arabfard M, Ohadi M, Rezaei Tabar V, Delbari A, Kavousi K. Genome-wide prediction and prioritization of human aging genes by data fusion: a machine learning approach. BMC Genomics 2019; 20:832. [PMID: 31706268 PMCID: PMC6842548 DOI: 10.1186/s12864-019-6140-0] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2019] [Accepted: 09/25/2019] [Indexed: 12/11/2022] Open
Abstract
Background Machine learning can effectively nominate novel genes for various research purposes in the laboratory. On a genome-wide scale, we implemented multiple databases and algorithms to predict and prioritize the human aging genes (PPHAGE). Results We fused data from 11 databases, and used Naïve Bayes classifier and positive unlabeled learning (PUL) methods, NB, Spy, and Rocchio-SVM, to rank human genes in respect with their implication in aging. The PUL methods enabled us to identify a list of negative (non-aging) genes to use alongside the seed (known age-related) genes in the ranking process. Comparison of the PUL algorithms revealed that none of the methods for identifying a negative sample were advantageous over other methods, and their simultaneous use in a form of fusion was critical for obtaining optimal results (PPHAGE is publicly available at https://cbb.ut.ac.ir/pphage). Conclusion We predict and prioritize over 3,000 candidate age-related genes in human, based on significant ranking scores. The identified candidate genes are associated with pathways, ontologies, and diseases that are linked to aging, such as cancer and diabetes. Our data offer a platform for future experimental research on the genetic and biological aspects of aging. Additionally, we demonstrate that fusion of PUL methods and data sources can be successfully used for aging and disease candidate gene prioritization.
Collapse
Affiliation(s)
- Masoud Arabfard
- Department of Bioinformatics, Kish International Campus University of Tehran, Kish, Iran.,Laboratory of Complex Biological Systems and Bioinformatics (CBB), Department of Bioinformatics, Institute of Biochemistry and Biophysics (IBB), University of Tehran, Tehran, Iran
| | - Mina Ohadi
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran.
| | - Vahid Rezaei Tabar
- Department of Statistics, Faculty of Mathematical Sciences and Computer, Allameh Tabataba'i University, Tehran, Iran
| | - Ahmad Delbari
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - Kaveh Kavousi
- Laboratory of Complex Biological Systems and Bioinformatics (CBB), Department of Bioinformatics, Institute of Biochemistry and Biophysics (IBB), University of Tehran, Tehran, Iran.
| |
Collapse
|
12
|
Malhotra AG, Singh S, Jha M, Pandey KM. A Parametric Targetability Evaluation Approach for Vitiligo Proteome Extracted through Integration of Gene Ontologies and Protein Interaction Topologies. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:1830-1842. [PMID: 29994537 DOI: 10.1109/tcbb.2018.2835459] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Vitiligo is a well-known skin disorder with complex etiology. Vitiligo pathogenesis is multifaceted with many ramifications. A computational systemic path was designed to first propose candidate disease proteins by merging properties from protein interaction networks and gene ontology terms. All in all, 109 proteins were identified and suggested to be involved in the onset of disease or its progression. Later, a composite approach was employed to prioritize vitiligo disease proteins by comparing and benchmarking the properties against standard target identification criteria. This includes sequence-based, structural, functional, essentiality, protein-protein interaction, vulnerability, secretability, assayability, and druggability information. The existing information was seamlessly integrated into efficient pipelines to propose a novel protocol for assessment of targetability of disease proteins. Using the online data resources and the scripting, an illustrative list of 68 potential drug targets was generated for vitiligo. While this list is broadly consistent with the research community's current interest in certain specific proteins, and suggests novel target candidates that may merit further study, it can still be modified to correspond to a user-specific environment, either by adjusting the weights for chosen criteria (i.e., a quantitative approach) or by changing the considered criteria (i.e., a qualitative approach).
Collapse
|
13
|
Vuong QH, Ho MT, Vuong TT, La VP, Ho MT, Nghiem KCP, Tran BX, Giang HH, Giang TV, Latkin C, Nguyen HKT, Ho CSH, Ho RCM. Artificial Intelligence vs. Natural Stupidity: Evaluating AI readiness for the Vietnamese Medical Information System. J Clin Med 2019; 8:E168. [PMID: 30717268 PMCID: PMC6406313 DOI: 10.3390/jcm8020168] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2018] [Revised: 01/29/2019] [Accepted: 01/29/2019] [Indexed: 01/02/2023] Open
Abstract
This review paper presents a framework to evaluate the artificial intelligence (AI) readiness for the healthcare sector in developing countries: a combination of adequate technical or technological expertise, financial sustainability, and socio-political commitment embedded in a healthy psycho-cultural context could bring about the smooth transitioning toward an AI-powered healthcare sector. Taking the Vietnamese healthcare sector as a case study, this paper attempts to clarify the negative and positive influencers. With only about 1500 publications about AI from 1998 to 2017 according to the latest Elsevier AI report, Vietnamese physicians are still capable of applying the state-of-the-art AI techniques in their research. However, a deeper look at the funding sources suggests a lack of socio-political commitment, hence the financial sustainability, to advance the field. The AI readiness in Vietnam's healthcare also suffers from the unprepared information infrastructure-using text mining for the official annual reports from 2012 to 2016 of the Ministry of Health, the paper found that the frequency of the word "database" actually decreases from 2012 to 2016, and the word has a high probability to accompany words such as "lacking", "standardizing", "inefficient", and "inaccurate." Finally, manifestations of psycho-cultural elements such as the public's mistaken views on AI or the non-transparent, inflexible and redundant of Vietnamese organizational structures can impede the transition to an AI-powered healthcare sector.
Collapse
Affiliation(s)
- Quan-Hoang Vuong
- Center for Interdisciplinary Social Research, Phenikaa University, Yen Nghia, Ha Dong district, Hanoi 100803, Vietnam.
- Faculty of Economics and Finance, Phenikaa University, Yen Nghia, Ha Dong district, Hanoi 100803, Vietnam.
| | - Manh-Tung Ho
- Center for Interdisciplinary Social Research, Phenikaa University, Yen Nghia, Ha Dong district, Hanoi 100803, Vietnam.
- Faculty of Economics and Finance, Phenikaa University, Yen Nghia, Ha Dong district, Hanoi 100803, Vietnam.
| | | | - Viet-Phuong La
- Center for Interdisciplinary Social Research, Phenikaa University, Yen Nghia, Ha Dong district, Hanoi 100803, Vietnam.
- Faculty of Economics and Finance, Phenikaa University, Yen Nghia, Ha Dong district, Hanoi 100803, Vietnam.
| | - Manh-Toan Ho
- Center for Interdisciplinary Social Research, Phenikaa University, Yen Nghia, Ha Dong district, Hanoi 100803, Vietnam.
- Faculty of Economics and Finance, Phenikaa University, Yen Nghia, Ha Dong district, Hanoi 100803, Vietnam.
| | | | - Bach Xuan Tran
- Institute for Preventive Medicine and Public Health, Hanoi Medical University, Hanoi 100000, Vietnam.
- Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD 21205, USA.
| | - Hai-Ha Giang
- Institute for Global Health Innovations, Duy Tan University, Da Nang 100000, Vietnam.
| | - Thu-Vu Giang
- Center of Excellence in Artificial Intelligence in Medicine, Nguyen Tat Thanh University, Ho Chi Minh City 100000, Vietnam.
| | - Carl Latkin
- Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD 21205, USA.
| | - Hong-Kong T Nguyen
- A.I. for Social Data Lab (AISDL), Vuong & Associates, Dong Da district, Hanoi 100000, Vietnam.
| | - Cyrus S H Ho
- Department of Psychological Medicine, National University Health System, Singapore 119228, Singapore.
| | - Roger C M Ho
- Department of Psychological Medicine, Yong Loo Lin School of Medicine, National University of Singapore, Singapore 119228, Singapore.
| |
Collapse
|
14
|
Amiri Dash Atan N, Koushki M, Rezaei Tavirani M, Ahmadi NA. Protein-Protein Interaction Network Analysis of Salivary Proteomic Data in Oral Cancer Cases. Asian Pac J Cancer Prev 2018; 19:1639-1645. [PMID: 29937423 PMCID: PMC6103602 DOI: 10.22034/apjcp.2018.19.6.1639] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
Abstract
Background: Oral cancer is a frequently encountered neoplasm of the head and neck region, being the eight most common type of human malignancy worldwide. Despite improvement in its control, morbidity and mortality rates have improved little in the past decades. Therefore, prevention and/or early detection are a high priority. Proteomics with network analysis have emerged as a powerful tool to identify important proteins associated with cancer development and progression that can be potential targets for early diagnosis. In the present study, network- based protein- protein interactions (PPI) for oral cancer were identified and then analyzed for use as key proteins/potential biomarkers. Material and Methods: Gene expression data in articles which focused on saliva proteomics of oral cancer were collected and 74 candidate genes or proteins were extracted. Related protein networks of differentially expressed proteins were explored and visualized using cytoscape software. Further PPI analysis was performed by Molecular Complex Detection (MCODE) and BiNGO methods. Results: Network analysis of genes/proteins related to oral cancer identified kininogen-1, angiotensinogen, annexin A1, IL-8, IgG heavy variable and constant chains, CRP, collagen alpha-1 and fibronectin as 9 hub-bottleneck proteins. In addition, based on clustering with the MCODE tool, vitronectin, collagen alpha-2, IL-8 and integrin alpha-v were established as 5 distinct seed proteins. Conclusion: A hub-bottleneck protein panel may offer a potential /candidate biomarker pattern for diagnosis and treatment of oral cancer disease. Further investigation and validation of these proteins are warranted.
Collapse
Affiliation(s)
- Nasrin Amiri Dash Atan
- Proteomics Research Center, Shahid Beheshti University of Medical Sciences, Tehran, Iran.
| | | | | | | |
Collapse
|
15
|
Zeng ISL, Lumley T. Review of Statistical Learning Methods in Integrated Omics Studies (An Integrated Information Science). Bioinform Biol Insights 2018; 12:1177932218759292. [PMID: 29497285 PMCID: PMC5824897 DOI: 10.1177/1177932218759292] [Citation(s) in RCA: 38] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2017] [Accepted: 01/24/2018] [Indexed: 12/14/2022] Open
Abstract
Integrated omics is becoming a new channel for investigating the complex molecular system in modern biological science and sets a foundation for systematic learning for precision medicine. The statistical/machine learning methods that have emerged in the past decade for integrated omics are not only innovative but also multidisciplinary with integrated knowledge in biology, medicine, statistics, machine learning, and artificial intelligence. Here, we review the nontrivial classes of learning methods from the statistical aspects and streamline these learning methods within the statistical learning framework. The intriguing findings from the review are that the methods used are generalizable to other disciplines with complex systematic structure, and the integrated omics is part of an integrated information science which has collated and integrated different types of information for inferences and decision making. We review the statistical learning methods of exploratory and supervised learning from 42 publications. We also discuss the strengths and limitations of the extended principal component analysis, cluster analysis, network analysis, and regression methods. Statistical techniques such as penalization for sparsity induction when there are fewer observations than the number of features and using Bayesian approach when there are prior knowledge to be integrated are also included in the commentary. For the completeness of the review, a table of currently available software and packages from 23 publications for omics are summarized in the appendix.
Collapse
Affiliation(s)
- Irene Sui Lan Zeng
- Department of Statistics, Faculty of Science, The University of Auckland, Auckland, New Zealand
| | - Thomas Lumley
- Department of Statistics, Faculty of Science, The University of Auckland, Auckland, New Zealand
| |
Collapse
|
16
|
Kim M, Tagkopoulos I. Data integration and predictive modeling methods for multi-omics datasets. Mol Omics 2018; 14:8-25. [DOI: 10.1039/c7mo00051k] [Citation(s) in RCA: 56] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
We provide an overview of opportunities and challenges in multi-omics predictive analytics with particular emphasis on data integration and machine learning methods.
Collapse
Affiliation(s)
- Minseung Kim
- Department of Computer Science
- University of California
- Davis
- USA
- Genome Center
| | - Ilias Tagkopoulos
- Department of Computer Science
- University of California
- Davis
- USA
- Genome Center
| |
Collapse
|
17
|
|
18
|
Chai H, Li ZN, Meng DY, Xia LY, Liang Y. A new semi-supervised learning model combined with Cox and SP-AFT models in cancer survival analysis. Sci Rep 2017; 7:13053. [PMID: 29026100 PMCID: PMC5638936 DOI: 10.1038/s41598-017-13133-5] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2017] [Accepted: 09/19/2017] [Indexed: 01/03/2023] Open
Abstract
Gene selection is an attractive and important task in cancer survival analysis. Most existing supervised learning methods can only use the labeled biological data, while the censored data (weakly labeled data) far more than the labeled data are ignored in model building. Trying to utilize such information in the censored data, a semi-supervised learning framework (Cox-AFT model) combined with Cox proportional hazard (Cox) and accelerated failure time (AFT) model was used in cancer research, which has better performance than the single Cox or AFT model. This method, however, is easily affected by noise. To alleviate this problem, in this paper we combine the Cox-AFT model with self-paced learning (SPL) method to more effectively employ the information in the censored data in a self-learning way. SPL is a kind of reliable and stable learning mechanism, which is recently proposed for simulating the human learning process to help the AFT model automatically identify and include samples of high confidence into training, minimizing interference from high noise. Utilizing the SPL method produces two direct advantages: (1) The utilization of censored data is further promoted; (2) the noise delivered to the model is greatly decreased. The experimental results demonstrate the effectiveness of the proposed model compared to the traditional Cox-AFT model.
Collapse
Affiliation(s)
- Hua Chai
- Faculty of Information Technology & State Key Laboratory of Quality Research in Chinese Medicines, Macau University of Science and Technology, Avenida Wai Long,Taipa, Macau, 999078, China
| | - Zi-Na Li
- Institute for Information and System Sciences and Ministry of Education Key Lab of Intelligent Networks and Network Security, Xi'an Jiaotong University, Xi'an Shaan'xi, 710049, China
| | - De-Yu Meng
- Institute for Information and System Sciences and Ministry of Education Key Lab of Intelligent Networks and Network Security, Xi'an Jiaotong University, Xi'an Shaan'xi, 710049, China
| | - Liang-Yong Xia
- Faculty of Information Technology & State Key Laboratory of Quality Research in Chinese Medicines, Macau University of Science and Technology, Avenida Wai Long,Taipa, Macau, 999078, China
| | - Yong Liang
- Faculty of Information Technology & State Key Laboratory of Quality Research in Chinese Medicines, Macau University of Science and Technology, Avenida Wai Long,Taipa, Macau, 999078, China.
| |
Collapse
|
19
|
Frasca M. Gene2DisCo: Gene to disease using disease commonalities. Artif Intell Med 2017; 82:34-46. [DOI: 10.1016/j.artmed.2017.08.001] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2017] [Revised: 07/24/2017] [Accepted: 08/13/2017] [Indexed: 01/10/2023]
|
20
|
Dongliang X, Jingchang P, Bailing W. Multiple kernels learning-based biological entity relationship extraction method. J Biomed Semantics 2017; 8:38. [PMID: 29297359 PMCID: PMC5763518 DOI: 10.1186/s13326-017-0138-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Background Automatic extracting protein entity interaction information from biomedical literature can help to build protein relation network and design new drugs. There are more than 20 million literature abstracts included in MEDLINE, which is the most authoritative textual database in the field of biomedicine, and follow an exponential growth over time. This frantic expansion of the biomedical literature can often be difficult to absorb or manually analyze. Thus efficient and automated search engines are necessary to efficiently explore the biomedical literature using text mining techniques. Results The P, R, and F value of tag graph method in Aimed corpus are 50.82, 69.76, and 58.61%, respectively. The P, R, and F value of tag graph kernel method in other four evaluation corpuses are 2–5% higher than that of all-paths graph kernel. And The P, R and F value of feature kernel and tag graph kernel fuse methods is 53.43, 71.62 and 61.30%, respectively. The P, R and F value of feature kernel and tag graph kernel fuse methods is 55.47, 70.29 and 60.37%, respectively. It indicated that the performance of the two kinds of kernel fusion methods is better than that of simple kernel. Conclusion In comparison with the all-paths graph kernel method, the tag graph kernel method is superior in terms of overall performance. Experiments show that the performance of the multi-kernels method is better than that of the three separate single-kernel method and the dual-mutually fused kernel method used hereof in five corpus sets.
Collapse
Affiliation(s)
- Xu Dongliang
- School of Mechanical, Electrical and Information Engineering, ShanDong University, WenHua West Road, WeiHai, 264209, China
| | - Pan Jingchang
- School of Mechanical, Electrical and Information Engineering, ShanDong University, WenHua West Road, WeiHai, 264209, China.
| | - Wang Bailing
- School of Computer Science and Technology, Harbin Institute of Technology, WenHua West Road, WeiHai, 264209, China
| |
Collapse
|
21
|
Abstract
Background Biological system is a multi-layered structure of omics with genome, epigenome, transcriptome, metabolome, proteome, etc., and can be further stretched to clinical/medical layers such as diseasome, drugs, and symptoms. One advantage of omics is that we can figure out an unknown component or its trait by inferring from known omics components. The component can be inferred by the ones in the same level of omics or the ones in different levels. Methods To implement the inference process, an algorithm that can be applied to the multi-layered complex system is required. In this study, we develop a semi-supervised learning algorithm that can be applied to the multi-layered complex system. In order to verify the validity of the inference, it was applied to the prediction problem of disease co-occurrence with a two-layered network composed of symptom-layer and disease-layer. Results The symptom-disease layered network obtained a fairly high value of AUC, 0.74, which is regarded as noticeable improvement when comparing 0.59 AUC of single-layered disease network. If further stretched to whole layered structure of omics, the proposed method is expected to produce more promising results. Conclusion This research has novelty in that it is a new integrative algorithm that incorporates the vertical structure of omics data, on contrary to other existing methods that integrate the data in parallel fashion. The results can provide enhanced guideline for disease co-occurrence prediction, thereby serve as a valuable tool for inference process of multi-layered biological system.
Collapse
Affiliation(s)
- Myungjun Kim
- Department of Industrial Engineering, Ajou University, 206 Worldcup-ro, Yeongtong-gu, Suwon, 16499, South Korea
| | - Yonghyun Nam
- Department of Industrial Engineering, Ajou University, 206 Worldcup-ro, Yeongtong-gu, Suwon, 16499, South Korea
| | - Hyunjung Shin
- Department of Industrial Engineering, Ajou University, 206 Worldcup-ro, Yeongtong-gu, Suwon, 16499, South Korea.
| |
Collapse
|
22
|
|
23
|
González A, Ramos J, De Paz JF, Corchado JM. Obtaining Relevant Genes by Analysis of Expression Arrays with a Multi-Agent System. ADCAIJ: ADVANCES IN DISTRIBUTED COMPUTING AND ARTIFICIAL INTELLIGENCE JOURNAL 2015. [DOI: 10.14201/adcaij2014333542] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Triple negative breast cancer (TNBC) is an aggressive form of breast cancer. Despite treatment with chemotherapy, relapses are frequent and response to these treatments is not the same in younger women as in older women. Therefore, the identification of genes that provoke this disease is required, as well as the identification of therapeutic targets.There are currently different hybridization techniques, such as expression ar-rays, which measure the signal expression of both the genomic and tran-scriptomic levels of thousands of genes of a given sample. Probesets of Gene 1.0 ST GeneChip arrays provide the ultimate genome transcript coverage, providing a measurement of the expression level of the sample.This paper proposes a multi-agent system to manage information of expres-sion arrays, with the goal of providing an intuitive system that is also extensible to analyze and interpret the results.The roles of agent integrate different types of techniques, from statistical and data mining techniques that select a set of genes, to search techniques that find pathways in which such genes participate, and information extraction techniques that apply a CBR system to check if these genes are involved in the disease.
Collapse
|
24
|
Wu S, Shao F, Ji J, Sun R, Dong R, Zhou Y, Xu S, Sui Y, Hu J. Network propagation with dual flow for gene prioritization. PLoS One 2015; 10:e0116505. [PMID: 25689268 PMCID: PMC4331530 DOI: 10.1371/journal.pone.0116505] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2014] [Accepted: 11/24/2014] [Indexed: 12/31/2022] Open
Abstract
Based on the hypothesis that the neighbors of disease genes trend to cause similar diseases, network-based methods for disease prediction have received increasing attention. Taking full advantage of network structure, the performance of global distance measurements is generally superior to local distance measurements. However, some problems exist in the global distance measurements. For example, global distance measurements may mistake non-disease hub proteins that have dense interactions with known disease proteins for potential disease proteins. To find a new method to avoid the aforementioned problem, we analyzed the differences between disease proteins and other proteins by using essential proteins (proteins encoded by essential genes) as references. We find that disease proteins are not well connected with essential proteins in the protein interaction networks. Based on this new finding, we proposed a novel strategy for gene prioritization based on protein interaction networks. We allocated positive flow to disease genes and negative flow to essential genes, and adopted network propagation for gene prioritization. Experimental results on 110 diseases verified the effectiveness and potential of the proposed method.
Collapse
Affiliation(s)
- Shunyao Wu
- College of Automation Engineering, Qingdao University, Qingdao, China
- College of Information Engineering, Qingdao University, Qingdao, China
| | - Fengjing Shao
- College of Automation Engineering, Qingdao University, Qingdao, China
- College of Information Engineering, Qingdao University, Qingdao, China
- * E-mail:
| | - Jun Ji
- College of Information Engineering, Qingdao University, Qingdao, China
| | - Rencheng Sun
- College of Information Engineering, Qingdao University, Qingdao, China
| | - Rizhuang Dong
- School of Computer Engineering, Qingdao Technological University, Qingdao, China
| | - Yuanke Zhou
- College of Information Engineering, Qingdao University, Qingdao, China
| | - Shaojie Xu
- College of Information Engineering, Qingdao University, Qingdao, China
| | - Yi Sui
- College of Information Engineering, Qingdao University, Qingdao, China
| | - Jianlong Hu
- College of Information Engineering, Qingdao University, Qingdao, China
| |
Collapse
|
25
|
Keith BP, Robertson DL, Hentges KE. Locus heterogeneity disease genes encode proteins with high interconnectivity in the human protein interaction network. Front Genet 2014; 5:434. [PMID: 25538735 PMCID: PMC4260505 DOI: 10.3389/fgene.2014.00434] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2014] [Accepted: 11/24/2014] [Indexed: 01/20/2023] Open
Abstract
Mutations in genes potentially lead to a number of genetic diseases with differing severity. These disease genes have been the focus of research in recent years showing that the disease gene population as a whole is not homogeneous, and can be categorized according to their interactions. Locus heterogeneity describes a single disorder caused by mutations in different genes each acting individually to cause the same disease. Using datasets of experimentally derived human disease genes and protein interactions, we created a protein interaction network to investigate the relationships between the products of genes associated with a disease displaying locus heterogeneity, and use network parameters to suggest properties that distinguish these disease genes from the overall disease gene population. Through the manual curation of known causative genes of 100 diseases displaying locus heterogeneity and 397 single-gene Mendelian disorders, we use network parameters to show that our locus heterogeneity network displays distinct properties from the global disease network and a Mendelian network. Using the global human proteome, through random simulation of the network we show that heterogeneous genes display significant interconnectivity. Further topological analysis of this network revealed clustering of locus heterogeneity genes that cause identical disorders, indicating that these disease genes are involved in similar biological processes. We then use this information to suggest additional genes that may contribute to diseases with locus heterogeneity.
Collapse
Affiliation(s)
- Benjamin P Keith
- Faculty of Life Sciences, University of Manchester Manchester, UK
| | | | | |
Collapse
|
26
|
Chen WJ, Shao YH, Deng NY, Feng ZL. Laplacian least squares twin support vector machine for semi-supervised classification. Neurocomputing 2014. [DOI: 10.1016/j.neucom.2014.05.007] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
27
|
Tan J, Zhen L, Deng N, Zhang Z. Laplacian p-norm proximal support vector machine for semi-supervised classification. Neurocomputing 2014. [DOI: 10.1016/j.neucom.2014.05.052] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
28
|
Abstract
The challenging task of studying and modeling complex dynamics of biological systems in order to describe various human diseases has gathered great interest in recent years. Major biological processes are mediated through protein interactions, hence there is a need to understand the chaotic network that forms these processes in pursuance of understanding human diseases. The applications of protein interaction networks to disease datasets allow the identification of genes and proteins associated with diseases, the study of network properties, identification of subnetworks, and network-based disease gene classification. Although various protein interaction network analysis strategies have been employed, grand challenges are still existing. Global understanding of protein interaction networks via integration of high-throughput functional genomics data from different levels will allow researchers to examine the disease pathways and identify strategies to control them. As a result, it seems likely that more personalized, more accurate and more rapid disease gene diagnostic techniques will be devised in the future, as well as novel strategies that are more personalized. This mini-review summarizes the current practice of protein interaction networks in medical research as well as challenges to be overcome.
Collapse
Affiliation(s)
- Tuba Sevimoglu
- Department of Bioengineering, Marmara University, Goztepe, 34722 Istanbul, Turkey
| | - Kazim Yalcin Arga
- Department of Bioengineering, Marmara University, Goztepe, 34722 Istanbul, Turkey
| |
Collapse
|
29
|
Caberlotto L, Nguyen TP. A systems biology investigation of neurodegenerative dementia reveals a pivotal role of autophagy. BMC SYSTEMS BIOLOGY 2014; 8:65. [PMID: 24908109 PMCID: PMC4077228 DOI: 10.1186/1752-0509-8-65] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/14/2014] [Accepted: 05/20/2014] [Indexed: 11/25/2022]
Abstract
Background Neurodegenerative dementia comprises chronic and progressive illnesses with major clinical features represented by progressive and permanent loss of cognitive and mental performance, including impairment of memory and brain functions. Many different forms of neurodegenerative dementia exist, but they are all characterized by death of specific subpopulation of neurons and accumulation of proteins in the brain. We incorporated data from OMIM and primary molecular targets of drugs in the different phases of the drug discovery process to try to reveal possible hidden mechanism in neurodegenerative dementia. In the present study, a systems biology approach was used to investigate the molecular connections among seemingly distinct complex diseases with the shared clinical symptoms of dementia that could suggest related disease mechanisms. Results Network analysis was applied to characterize an interaction network of disease proteins and drug targets, revealing a major role of metabolism and, predominantly, of autophagy process in dementia and, particularly, in tauopathies. Different phases of the autophagy molecular pathway appear to be implicated in the individual disease pathophysiology and specific drug targets associated to autophagy modulation could be considered for pharmacological intervention. In particular, in view of their centrality and of the direct association to autophagy proteins in the network, PP2A subunits could be suggested as a suitable molecular target for the development of novel drugs. Conclusion The present systems biology investigation identifies the autophagy pathway as a central dis-regulated process in neurodegenerative dementia with a prevalent involvement in diseases characterized by tau inclusion and indicates the disease-specific molecules in the pathway that could be considered for therapy.
Collapse
Affiliation(s)
- Laura Caberlotto
- The Microsoft Research, University of Trento Centre for Computational Systems Biology (COSBI), Piazza Manifattura 1, 38068 Rovereto, Italy.
| | | |
Collapse
|
30
|
Valentini G, Paccanaro A, Caniza H, Romero AE, Re M. An extensive analysis of disease-gene associations using network integration and fast kernel-based gene prioritization methods. Artif Intell Med 2014; 61:63-78. [PMID: 24726035 PMCID: PMC4070077 DOI: 10.1016/j.artmed.2014.03.003] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2013] [Revised: 03/05/2014] [Accepted: 03/10/2014] [Indexed: 02/07/2023]
Abstract
OBJECTIVE In the context of "network medicine", gene prioritization methods represent one of the main tools to discover candidate disease genes by exploiting the large amount of data covering different types of functional relationships between genes. Several works proposed to integrate multiple sources of data to improve disease gene prioritization, but to our knowledge no systematic studies focused on the quantitative evaluation of the impact of network integration on gene prioritization. In this paper, we aim at providing an extensive analysis of gene-disease associations not limited to genetic disorders, and a systematic comparison of different network integration methods for gene prioritization. MATERIALS AND METHODS We collected nine different functional networks representing different functional relationships between genes, and we combined them through both unweighted and weighted network integration methods. We then prioritized genes with respect to each of the considered 708 medical subject headings (MeSH) diseases by applying classical guilt-by-association, random walk and random walk with restart algorithms, and the recently proposed kernelized score functions. RESULTS The results obtained with classical random walk algorithms and the best single network achieved an average area under the curve (AUC) across the 708 MeSH diseases of about 0.82, while kernelized score functions and network integration boosted the average AUC to about 0.89. Weighted integration, by exploiting the different "informativeness" embedded in different functional networks, outperforms unweighted integration at 0.01 significance level, according to the Wilcoxon signed rank sum test. For each MeSH disease we provide the top-ranked unannotated candidate genes, available for further bio-medical investigation. CONCLUSIONS Network integration is necessary to boost the performances of gene prioritization methods. Moreover the methods based on kernelized score functions can further enhance disease gene ranking results, by adopting both local and global learning strategies, able to exploit the overall topology of the network.
Collapse
Affiliation(s)
- Giorgio Valentini
- AnacletoLab - Dipartimento di Informatica, Università degli Studi di Milano, via Comelico 39/41, 20135 Milano, Italy.
| | - Alberto Paccanaro
- Department of Computer Science and Centre for Systems and Synthetic Biology, Royal Holloway, University of London, Egham TW20 0EX, UK
| | - Horacio Caniza
- Department of Computer Science and Centre for Systems and Synthetic Biology, Royal Holloway, University of London, Egham TW20 0EX, UK
| | - Alfonso E Romero
- Department of Computer Science and Centre for Systems and Synthetic Biology, Royal Holloway, University of London, Egham TW20 0EX, UK
| | - Matteo Re
- AnacletoLab - Dipartimento di Informatica, Università degli Studi di Milano, via Comelico 39/41, 20135 Milano, Italy
| |
Collapse
|
31
|
Saccà C, Teso S, Diligenti M, Passerini A. Improved multi-level protein-protein interaction prediction with semantic-based regularization. BMC Bioinformatics 2014; 15:103. [PMID: 24725682 PMCID: PMC4004462 DOI: 10.1186/1471-2105-15-103] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2013] [Accepted: 03/03/2014] [Indexed: 11/24/2022] Open
Abstract
Background Protein–protein interactions can be seen as a hierarchical process occurring at three related levels: proteins bind by means of specific domains, which in turn form interfaces through patches of residues. Detailed knowledge about which domains and residues are involved in a given interaction has extensive applications to biology, including better understanding of the binding process and more efficient drug/enzyme design. Alas, most current interaction prediction methods do not identify which parts of a protein actually instantiate an interaction. Furthermore, they also fail to leverage the hierarchical nature of the problem, ignoring otherwise useful information available at the lower levels; when they do, they do not generate predictions that are guaranteed to be consistent between levels. Results Inspired by earlier ideas of Yip et al. (BMC Bioinformatics 10:241, 2009), in the present paper we view the problem as a multi-level learning task, with one task per level (proteins, domains and residues), and propose a machine learning method that collectively infers the binding state of all object pairs. Our method is based on Semantic Based Regularization (SBR), a flexible and theoretically sound machine learning framework that uses First Order Logic constraints to tie the learning tasks together. We introduce a set of biologically motivated rules that enforce consistent predictions between the hierarchy levels. Conclusions We study the empirical performance of our method using a standard validation procedure, and compare its performance against the only other existing multi-level prediction technique. We present results showing that our method substantially outperforms the competitor in several experimental settings, indicating that exploiting the hierarchical nature of the problem can lead to better predictions. In addition, our method is also guaranteed to produce interactions that are consistent with respect to the protein–domain–residue hierarchy.
Collapse
Affiliation(s)
| | | | | | - Andrea Passerini
- Dipartimento di Ingegneria e Scienza dell'Informazione, University of Trento, Trento, Italy.
| |
Collapse
|
32
|
Park C, Ahn J, Kim H, Park S. Integrative gene network construction to analyze cancer recurrence using semi-supervised learning. PLoS One 2014; 9:e86309. [PMID: 24497942 PMCID: PMC3908883 DOI: 10.1371/journal.pone.0086309] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2013] [Accepted: 12/09/2013] [Indexed: 12/17/2022] Open
Abstract
Background The prognosis of cancer recurrence is an important research area in bioinformatics and is challenging due to the small sample sizes compared to the vast number of genes. There have been several attempts to predict cancer recurrence. Most studies employed a supervised approach, which uses only a few labeled samples. Semi-supervised learning can be a great alternative to solve this problem. There have been few attempts based on manifold assumptions to reveal the detailed roles of identified cancer genes in recurrence. Results In order to predict cancer recurrence, we proposed a novel semi-supervised learning algorithm based on a graph regularization approach. We transformed the gene expression data into a graph structure for semi-supervised learning and integrated protein interaction data with the gene expression data to select functionally-related gene pairs. Then, we predicted the recurrence of cancer by applying a regularization approach to the constructed graph containing both labeled and unlabeled nodes. Conclusions The average improvement rate of accuracy for three different cancer datasets was 24.9% compared to existing supervised and semi-supervised methods. We performed functional enrichment on the gene networks used for learning. We identified that those gene networks are significantly associated with cancer-recurrence-related biological functions. Our algorithm was developed with standard C++ and is available in Linux and MS Windows formats in the STL library. The executable program is freely available at: http://embio.yonsei.ac.kr/~Park/ssl.php.
Collapse
Affiliation(s)
- Chihyun Park
- Department of Computer Science, Yonsei University, Seoul, South Korea
| | - Jaegyoon Ahn
- Department of Computer Science, Yonsei University, Seoul, South Korea
| | - Hyunjin Kim
- Department of Computer Science, Yonsei University, Seoul, South Korea
| | - Sanghyun Park
- Department of Computer Science, Yonsei University, Seoul, South Korea
- * E-mail:
| |
Collapse
|
33
|
Nguyen TP, Caberlotto L, Morine MJ, Priami C. Network analysis of neurodegenerative disease highlights a role of Toll-like receptor signaling. BIOMED RESEARCH INTERNATIONAL 2014; 2014:686505. [PMID: 24551850 PMCID: PMC3914352 DOI: 10.1155/2014/686505] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/05/2013] [Revised: 11/20/2013] [Accepted: 11/30/2013] [Indexed: 01/23/2023]
Abstract
Despite significant advances in the study of the molecular mechanisms altered in the development and progression of neurodegenerative diseases (NDs), the etiology is still enigmatic and the distinctions between diseases are not always entirely clear. We present an efficient computational method based on protein-protein interaction network (PPI) to model the functional network of NDs. The aim of this work is fourfold: (i) reconstruction of a PPI network relating to the NDs, (ii) construction of an association network between diseases based on proximity in the disease PPI network, (iii) quantification of disease associations, and (iv) inference of potential molecular mechanism involved in the diseases. The functional links of diseases not only showed overlap with the traditional classification in clinical settings, but also offered new insight into connections between diseases with limited clinical overlap. To gain an expanded view of the molecular mechanisms involved in NDs, both direct and indirect connector proteins were investigated. The method uncovered molecular relationships that are in common apparently distinct diseases and provided important insight into the molecular networks implicated in disease pathogenesis. In particular, the current analysis highlighted the Toll-like receptor signaling pathway as a potential candidate pathway to be targeted by therapy in neurodegeneration.
Collapse
Affiliation(s)
- Thanh-Phuong Nguyen
- The Microsoft Research, University of Trento Centre for Computational Systems Biology (COSBI), Piazza Manifattura 1, 38068 Rovereto, Italy
| | - Laura Caberlotto
- The Microsoft Research, University of Trento Centre for Computational Systems Biology (COSBI), Piazza Manifattura 1, 38068 Rovereto, Italy
| | - Melissa J. Morine
- The Microsoft Research, University of Trento Centre for Computational Systems Biology (COSBI), Piazza Manifattura 1, 38068 Rovereto, Italy
- Department of Mathematics, University of Trento, Via Sommarive, 14-38123 Povo, Italy
| | - Corrado Priami
- The Microsoft Research, University of Trento Centre for Computational Systems Biology (COSBI), Piazza Manifattura 1, 38068 Rovereto, Italy
- Department of Mathematics, University of Trento, Via Sommarive, 14-38123 Povo, Italy
| |
Collapse
|
34
|
|
35
|
|