1
|
Zhao Y, Yin J, Zhang L, Zhang Y, Chen X. Drug-drug interaction prediction: databases, web servers and computational models. Brief Bioinform 2023; 25:bbad445. [PMID: 38113076 PMCID: PMC10782925 DOI: 10.1093/bib/bbad445] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Revised: 10/26/2023] [Accepted: 11/14/2023] [Indexed: 12/21/2023] Open
Abstract
In clinical treatment, two or more drugs (i.e. drug combination) are simultaneously or successively used for therapy with the purpose of primarily enhancing the therapeutic efficacy or reducing drug side effects. However, inappropriate drug combination may not only fail to improve efficacy, but even lead to adverse reactions. Therefore, according to the basic principle of improving the efficacy and/or reducing adverse reactions, we should study drug-drug interactions (DDIs) comprehensively and thoroughly so as to reasonably use drug combination. In this review, we first introduced the basic conception and classification of DDIs. Further, some important publicly available databases and web servers about experimentally verified or predicted DDIs were briefly described. As an effective auxiliary tool, computational models for predicting DDIs can not only save the cost of biological experiments, but also provide relevant guidance for combination therapy to some extent. Therefore, we summarized three types of prediction models (including traditional machine learning-based models, deep learning-based models and score function-based models) proposed during recent years and discussed the advantages as well as limitations of them. Besides, we pointed out the problems that need to be solved in the future research of DDIs prediction and provided corresponding suggestions.
Collapse
Affiliation(s)
- Yan Zhao
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China
| | - Jun Yin
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China
| | - Li Zhang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China
| | - Yong Zhang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China
| | - Xing Chen
- School of Science, Jiangnan University, Wuxi 214122, China
| |
Collapse
|
2
|
Huang A, Xie X, Yao X, Liu H, Wang X, Peng S. HF-DDI: Predicting Drug-Drug Interaction Events Based on Multimodal Hybrid Fusion. J Comput Biol 2023; 30:961-971. [PMID: 37594774 DOI: 10.1089/cmb.2023.0068] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/19/2023] Open
Abstract
Drug-drug interactions (DDIs) can have a significant impact on patient safety and health. Predicting potential DDIs before administering drugs to patients is a critical step in drug development and can help prevent adverse drug events. In this study, we propose a novel method called HF-DDI for predicting DDI events based on various drug features, including molecular structure, target, and enzyme information. Specifically, we design our model with both early fusion and late fusion strategies and utilize a score calculation module to predict the likelihood of interactions between drugs. Our model was trained and tested on a large data set of known DDIs, achieving an overall accuracy of 0.948. The results suggest that incorporating multiple drug features can improve the accuracy of DDI event prediction and may be useful for improving drug safety and patient outcomes.
Collapse
Affiliation(s)
- An Huang
- Guangxi Key Laboratory of Embedded Technology and Intelligent System, Guilin, China
- College of Information Science and Engineering, Guilin University of Technology, Guilin, China
| | - Xiaolan Xie
- Guangxi Key Laboratory of Embedded Technology and Intelligent System, Guilin, China
- College of Information Science and Engineering, Guilin University of Technology, Guilin, China
| | - Xiaojun Yao
- State Key Laboratory of Quality Research in Chinese Medicines, Macau University of Science and Technology, Macau, China
| | - Huanxiang Liu
- Faculty of Applied Sciences, Macao Polytechnic University, Macao, China
| | - Xiaoqi Wang
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Shaoliang Peng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| |
Collapse
|
3
|
Chen M, Jiang W, Pan Y, Dai J, Lei Y, Ji C. SGFNNs: Signed Graph Filtering-based Neural Networks for Predicting Drug-Drug Interactions. J Comput Biol 2022; 29:1104-1116. [PMID: 35723646 DOI: 10.1089/cmb.2022.0113] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Capturing comprehensive information about drug-drug interactions (DDIs) is one of the key tasks in public health and drug development. Recently, graph neural networks (GNNs) have received increasing attention in the drug discovery domain due to their capability of integrating drugs profiles and the network structure into a low-dimensional feature space for predicting links and classification. Most of GNN models for DDI predictions are built on an unsigned graph, which tends to represent associated nodes with similar embedding results. However, semantic correlation between drugs, such as degressive effects, or even adverse side reactions should be disassortative. In this study, we put forward signed GNNs to model assortative and disassortative relationships within drug pairs. Since negative links exclude direct generalization of spectral filters on unsigned graph, we divide the signed graph into two unsigned subgraphs to dedicate two spectral filters, which captures both commonality and difference of drug pairs. For drug representations we derive two signed graph filtering-based neural networks (SGFNNs) which integrate signed graph structures and drug node attributes. Moreover, we use an end-to-end framework for learning DDIs, where an SGFNN together with a discriminator is jointly trained under a problem-specific loss function. The experimental results on two prediction problems show that our framework can obtain significant improvements compared with baselines. The case study further verifies the validation of our method.
Collapse
Affiliation(s)
- Ming Chen
- Department of Artificial Intelligence, College of Information Science and Engineering, Hunan Normal University, Changsha, Hunan, China
| | - Wei Jiang
- Department of Artificial Intelligence, College of Information Science and Engineering, Hunan Normal University, Changsha, Hunan, China
| | - Yi Pan
- Faculty of Computer Science and Control Engineering, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong, China
| | - Jianhua Dai
- Department of Artificial Intelligence, College of Information Science and Engineering, Hunan Normal University, Changsha, Hunan, China
| | - Yunwen Lei
- School of Computer Science, University of Birmingham, Birmingham, United Kingdom
| | - Chunyan Ji
- Department of Computer Science, Georgia State University, Atlanta, Georgia, USA
| |
Collapse
|
4
|
Yan C, Duan G, Zhang Y, Wu FX, Pan Y, Wang J. Predicting Drug-Drug Interactions Based on Integrated Similarity and Semi-Supervised Learning. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:168-179. [PMID: 32310779 DOI: 10.1109/tcbb.2020.2988018] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
A drug-drug interaction (DDI) is defined as an association between two drugs where the pharmacological effects of a drug are influenced by another drug. Positive DDIs can usually improve the therapeutic effects of patients, but negative DDIs cause the major cause of adverse drug reactions and even result in the drug withdrawal from the market and the patient death. Therefore, identifying DDIs has become a key component of the drug development and disease treatment. In this study, we propose a novel method to predict DDIs based on the integrated similarity and semi-supervised learning (DDI-IS-SL). DDI-IS-SL integrates the drug chemical, biological and phenotype data to calculate the feature similarity of drugs with the cosine similarity method. The Gaussian Interaction Profile kernel similarity of drugs is also calculated based on known DDIs. A semi-supervised learning method (the Regularized Least Squares classifier) is used to calculate the interaction possibility scores of drug-drug pairs. In terms of the 5-fold cross validation, 10-fold cross validation and de novo drug validation, DDI-IS-SL can achieve the better prediction performance than other comparative methods. In addition, the average computation time of DDI-IS-SL is shorter than that of other comparative methods. Finally, case studies further demonstrate the performance of DDI-IS-SL in practical applications.
Collapse
|
5
|
He B, Hou F, Ren C, Bing P, Xiao X. A Review of Current In Silico Methods for Repositioning Drugs and Chemical Compounds. Front Oncol 2021; 11:711225. [PMID: 34367996 PMCID: PMC8340770 DOI: 10.3389/fonc.2021.711225] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2021] [Accepted: 07/07/2021] [Indexed: 12/23/2022] Open
Abstract
Drug repositioning is a new way of applying the existing therapeutics to new disease indications. Due to the exorbitant cost and high failure rate in developing new drugs, the continued use of existing drugs for treatment, especially anti-tumor drugs, has become a widespread practice. With the assistance of high-throughput sequencing techniques, many efficient methods have been proposed and applied in drug repositioning and individualized tumor treatment. Current computational methods for repositioning drugs and chemical compounds can be divided into four categories: (i) feature-based methods, (ii) matrix decomposition-based methods, (iii) network-based methods, and (iv) reverse transcriptome-based methods. In this article, we comprehensively review the widely used methods in the above four categories. Finally, we summarize the advantages and disadvantages of these methods and indicate future directions for more sensitive computational drug repositioning methods and individualized tumor treatment, which are critical for further experimental validation.
Collapse
Affiliation(s)
- Binsheng He
- Academician Workstation, Changsha Medical University, Changsha, China
| | - Fangxing Hou
- Queen Mary School, Nanchang University, Jiangxi, China
| | - Changjing Ren
- School of Science, Dalian Maritime University, Dalian, China.,Genies Beijing Co., Ltd., Beijing, China
| | - Pingping Bing
- Academician Workstation, Changsha Medical University, Changsha, China
| | - Xiangzuo Xiao
- Department of Radiology, The First Affiliated Hospital of Nanchang University, Jiangxi, China
| |
Collapse
|
6
|
Identifying Infliximab- (IFX-) Responsive Blood Signatures for the Treatment of Rheumatoid Arthritis. BIOMED RESEARCH INTERNATIONAL 2021. [DOI: 10.1155/2021/5556784] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Rheumatoid arthritis (RA) is a severe chronic pathogenic inflammatory abnormality that damages small joints. Comprehensive diagnosis and treatment procedures for RA have been established because of its severe symptoms and relatively high morbidity. Medication and surgery are the two major therapeutic approaches. Infliximab (IFX) is a novel biological agent applied for the treatment of RA. IFX improves physical functions and benefits the achievement of clinical remission even under discontinuous medication. However, not all patients react to IFX, and distinguishing IFX-sensitive and IFX-resistant patients is quite difficult. Thus, how to predict the therapeutic effects of IFX on patients with RA is one of the urgent translational medicine problems in the clinical treatment of RA. In this study, we present a novel computational method for the identification of the applicable and substantial blood gene signatures of IFX sensitivity by liquid biopsy, which may assist in the establishment of a clinical drug sensitivity test standard for RA and contribute to the revelation of unique IFX-associated pharmacological mechanisms.
Collapse
|
7
|
Wu Z, Shou L, Wang J, Huang T, Xu X. The Methylation Pattern for Knee and Hip Osteoarthritis. Front Cell Dev Biol 2020; 8:602024. [PMID: 33240895 PMCID: PMC7677303 DOI: 10.3389/fcell.2020.602024] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2020] [Accepted: 10/22/2020] [Indexed: 01/08/2023] Open
Abstract
Osteoarthritis is one of the most prevalent chronic joint diseases for middle-aged and elderly people. But in recent years, the number of young people suffering from the disease increases quickly. It is known that osteoarthritis is a common degenerative disease caused by the combination and interaction of many factors such as natural and environmental factors. DNA methylations reflect the effects of environmental factors. Several researches on DNA methylation at specific genes in OA cartilage indicated the great potential roles of DNA methylation in OA. To systematically investigate the methylation pattern in knee and hip osteoarthritis, we analyzed the methylation profiles in cartilage of 16 OA hip samples, 19 control hip samples and 62 OA knee samples. 12 discriminative methylation sites were identified using advanced minimal Redundancy Maximal Relevance (mRMR) and Incremental Feature Selection (IFS) methods. The SVM classifier of these 12 methylation sites from genes like MEIS1, GABRG3, RXRA, and EN1, can perfectly classify the OA hip samples, control hip samples and OA knee samples evaluated with LOOCV (Leave-One Out-Cross Validation). These 12 methylation sites can not only serve as biomarker, but also provide underlying mechanism of OA.
Collapse
Affiliation(s)
- Zhen Wu
- Departmemt of Orthopaedics, Tongde Hospital of Zhejiang Province, Hangzhou, China
| | - Lu Shou
- Departmemt of Pneumology, Tongde Hospital of Zhejiang Province, Hangzhou, China
| | - Jian Wang
- Departmemt of Orthopaedics, Tongde Hospital of Zhejiang Province, Hangzhou, China
| | - Tao Huang
- Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, Shanghai, China
| | - Xinwei Xu
- Departmemt of Orthopaedics, Tongde Hospital of Zhejiang Province, Hangzhou, China
| |
Collapse
|
8
|
Zhu JH, Yan QL, Wang JW, Chen Y, Ye QH, Wang ZJ, Huang T. The Key Genes for Perineural Invasion in Pancreatic Ductal Adenocarcinoma Identified With Monte-Carlo Feature Selection Method. Front Genet 2020; 11:554502. [PMID: 33193628 PMCID: PMC7593847 DOI: 10.3389/fgene.2020.554502] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2020] [Accepted: 08/17/2020] [Indexed: 12/20/2022] Open
Abstract
Background Pancreatic ductal adenocarcinoma (PDAC) is the most aggressive form of pancreatic cancer. Its 5-year survival rate is only 3–5%. Perineural invasion (PNI) is a process of cancer cells invading the surrounding nerves and perineural spaces. It is considered to be associated with the poor prognosis of PDAC. About 90% of pancreatic cancer patients have PNI. The high incidence of PNI in pancreatic cancer limits radical resection and promotes local recurrence, which negatively affects life quality and survival time of the patients with pancreatic cancer. Objectives To investigate the mechanism of PNI in pancreatic cancer, we analyzed the gene expression profiles of tumors and adjacent tissues from 50 PDAC patients which included 28 patients with perineural invasion and 22 patients without perineural invasion. Method Using Monte-Carlo feature selection and Incremental Feature Selection (IFS) method, we identified 26 key features within which 15 features were from tumor tissues and 11 features were from adjacent tissues. Results Our results suggested that not only the tumor tissue, but also the adjacent tissue, was informative for perineural invasion prediction. The SVM classifier based on these 26 key features can predict perineural invasion accurately, with a high accuracy of 0.94 evaluated with leave-one-out cross validation (LOOCV). Conclusion The in-depth biological analysis of key feature genes, such as TNFRSF14, XPO1, and ATF3, shed light on the understanding of perineural invasion in pancreatic ductal adenocarcinoma.
Collapse
Affiliation(s)
- Jin-Hui Zhu
- Department of General Surgery, The Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Qiu-Liang Yan
- Department of General Surgery, Jinhua People's Hospital, Jinhua, China
| | - Jian-Wei Wang
- Department of Surgical Oncology, The Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Yan Chen
- Department of General Surgery, The Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Qing-Huang Ye
- Department of General Surgery, The Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Zhi-Jiang Wang
- Department of General Surgery, The Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Tao Huang
- Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| |
Collapse
|
9
|
Zhang J, Zhang M, Zhao H, Xu X. Identification of proliferative diabetic retinopathy-associated genes on the protein–protein interaction network by using heat diffusion algorithm. Biochim Biophys Acta Mol Basis Dis 2020; 1866:165794. [DOI: 10.1016/j.bbadis.2020.165794] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2020] [Revised: 03/25/2020] [Accepted: 04/04/2020] [Indexed: 12/11/2022]
|
10
|
Zhou JP, Chen L, Guo ZH. iATC-NRAKEL: an efficient multi-label classifier for recognizing anatomical therapeutic chemical classes of drugs. Bioinformatics 2020; 36:1391-1396. [PMID: 31593226 DOI: 10.1093/bioinformatics/btz757] [Citation(s) in RCA: 52] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2019] [Revised: 09/10/2019] [Accepted: 10/01/2019] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION The anatomical therapeutic chemical (ATC) classification system plays an increasingly important role in drug repositioning and discovery. The correct identification of classes in each level of such system that a given drug may belong to is an essential problem. Several multi-label classifiers have been proposed in this regard. Although they provided satisfactory performance, the feature extraction procedures were still rough. More refined features may further improve the predicted quality. RESULTS In this article, we provide a novel multi-label classifier, called iATC-NRAKEL, to predict drug ATC classes in the first level. To obtain more informative drug features, we employed the drug association information in STITCH and KEGG, which was organized by seven drug networks. The powerful network embedding algorithm, Mashup, was adopted to extract informative drug features. The obtained features were fed into the RAndom k-labELsets (RAKEL) algorithm with support vector machine as the basic classification algorithm to construct the classifier. The 10-fold cross-validation of the benchmark dataset with 3883 drugs showed that the accuracy and absolute true were 76.56 and 74.51%, respectively. The comparison results indicated that iATC-NRAKEL was much superior to all previous reported classifiers. Finally, the contribution of each network was analyzed. AVAILABILITY AND IMPLEMENTATION The codes of iATC-NRAKEL are available at https://github.com/zhou256/iATC-NRAKEL.
Collapse
Affiliation(s)
- Jian-Peng Zhou
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, People's Republic of China
| | - Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, People's Republic of China.,Shanghai Key Laboratory of PMMP, East China Normal University, Shanghai 200241, People's Republic of China
| | - Zi-Han Guo
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, People's Republic of China
| |
Collapse
|
11
|
Xu Y, Zhang YH, Li J, Pan XY, Huang T, Cai YD. New Computational Tool Based on Machine-learning Algorithms for the Identification of Rhinovirus Infection-Related Genes. Comb Chem High Throughput Screen 2020; 22:665-674. [PMID: 31782358 DOI: 10.2174/1386207322666191129114741] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2019] [Revised: 05/22/2019] [Accepted: 07/09/2019] [Indexed: 12/14/2022]
Abstract
BACKGROUND Human rhinovirus has different identified serotypes and is the most common cause of cold in humans. To date, many genes have been discovered to be related to rhinovirus infection. However, the pathogenic mechanism of rhinovirus is difficult to elucidate through experimental approaches due to the high cost and consuming time. METHODS AND RESULTS In this study, we presented a novel approach that relies on machine-learning algorithms and identified two genes OTOF and SOCS1. The expression levels of these genes in the blood samples can be used to accurately distinguish virus-infected and non-infected individuals. CONCLUSION Our findings suggest the crucial roles of these two genes in rhinovirus infection and the robustness of the computational tool in dissecting pathogenic mechanisms.
Collapse
Affiliation(s)
- Yan Xu
- School of Life Sciences, Shanghai University, Shanghai 200444, China
| | - Yu-Hang Zhang
- Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - JiaRui Li
- School of Life Sciences, Shanghai University, Shanghai 200444, China
| | - Xiao Y Pan
- BASF & IDLab, Ghent University, Ghent, Belgium
| | - Tao Huang
- Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai 200444, China
| |
Collapse
|
12
|
Baker A, Syed A, Alyousef AA, Arshad M, Alqasim A, Khalid M, Khan MS. Sericin-functionalized GNPs potentiate the synergistic effect of levofloxacin and balofloxacin against MDR bacteria. Microb Pathog 2020; 148:104467. [PMID: 32877723 DOI: 10.1016/j.micpath.2020.104467] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2020] [Revised: 07/27/2020] [Accepted: 08/24/2020] [Indexed: 12/20/2022]
Abstract
A gradual expansion in resistant bacterial strains against commercially available antibacterial agents is the serious concern of the given research. It poses critical problem for public health. Thus, the demand for new antimicrobial agents has increased the interest in newer technologies and innovative approaches are required to advance the diagnosis and elimination of causative organisms. In this study, the potential role of technologies based on gold nanoparticles (GNPs) has been evaluated. GNPs were synthesized by using a cysteine protease, sericin whose reducing properties were exploited to bioengineer NPs (SrGNPs) where sericin with the help of thiol groups encapsulated over the surface of GNPs. Further, SrGNPs were bioconjugated with levofloxacin (Levo) and balofloxacin (Balo) to increase the efficacy of these drugs. Here, the antibacterial action of SrGNPs and their bioconjugated counterparts comprising Levo (Levo-SrGNPs), Balo (Balo-SrGNPs), and Levo/Balo (Levo-Balo-SrGNPs) were examined against normal and multi-drug resistant (MDR) strains of E. coli and S. aureus. The minimum inhibitory concentration (MIC) of these bioconjugates against said bacteria were found less than their pure counterparts. Further, the synergistic role of SrGNPs in combination with Levo and Balo was also explained using Chou-Talalay (C-T) method. The synthesis and bioconjugation of SrGNPs were confirmed by UV-visible spectroscopy, dynamic light scattering (DLS), transmission electron microscopy (TEM), and zeta-potential.
Collapse
Affiliation(s)
- Abu Baker
- Nanomedicine & Nanobiotechnology Lab, Department of Biosciences, Integral University, Lucknow, 226026, India
| | - Asad Syed
- Department of Botany and Microbiology, College of Science, King Saud University, Riyadh, 11451, Saudi Arabia
| | - Abdullah A Alyousef
- Department of Clinical Laboratory Sciences, College of Applied Medical Sciences, King Saud University, P.O. Box 10219, Riyadh, 11433, Saudi Arabia
| | - Mohammed Arshad
- Department of Clinical Laboratory Sciences, College of Applied Medical Sciences, King Saud University, P.O. Box 10219, Riyadh, 11433, Saudi Arabia
| | - Abdulaziz Alqasim
- Department of Clinical Laboratory Sciences, College of Applied Medical Sciences, King Saud University, P.O. Box 10219, Riyadh, 11433, Saudi Arabia
| | - Mohammad Khalid
- Department of Pharmacognosy, College of Pharmacy, Prince Sattam Bin Abdulaziz University, P.O. Box 173, Al-Kharj 11942, Saudi Arabia
| | - Mohd Sajid Khan
- Department of Biochemistry, Aligarh Muslim University, Aligarh, 202001, UP, India.
| |
Collapse
|
13
|
Zhou B, Zhao X, Lu J, Sun Z, Liu M, Zhou Y, Liu R, Wang Y. Relating Substructures and Side Effects of Drugs with Chemical-chemical Interactions. Comb Chem High Throughput Screen 2020; 23:285-294. [DOI: 10.2174/1386207322666190702102752] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2018] [Revised: 03/11/2019] [Accepted: 04/16/2019] [Indexed: 12/17/2022]
Abstract
Background:Drugs are very important for human life because they can provide treatment, cure, prevention, or diagnosis of different diseases. However, they also cause side effects, which can increase the risks for humans and pharmaceuticals companies. It is essential to identify drug side effects in drug discovery. To date, lots of computational methods have been proposed to predict the side effects of drugs and most of them used the fact that similar drugs always have similar side effects. However, previous studies did not analyze which substructures are highly related to which kind of side effect.Method:In this study, we conducted a computational investigation. In this regard, we extracted a drug set for each side effect, which consisted of drugs having the side effect. Also, for each substructure, a set was constructed by picking up drugs owing such substructure. The relationship between one side effect and one substructure was evaluated based on linkages between drugs in their corresponding drug sets, resulting in an Es value. Then, the statistical significance of Es value was measured by a permutation test.Results and Conclusion:A number of highly related pairs of side effects and substructures were obtained and some were extensively analyzed to confirm the reliability of the results reported in this study.
Collapse
Affiliation(s)
- Bo Zhou
- Shanghai University of Medicine and Health Sciences, Shanghai 201318, China
| | - Xian Zhao
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| | - Jing Lu
- School of Pharmacy, Key Laboratory of Molecular Pharmacology and Drug Evaluation (Yantai University), Ministry of Education, Collaborative Innovation Center of Advanced Drug Delivery System and Biotech Drugs in Universities of Shandong, Yantai University, Yantai 264005, China
| | - Zuntao Sun
- Informatization Office, Shanghai Maritime University, Shanghai 201306, China
| | - Min Liu
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| | - Yilu Zhou
- Biological Sciences, Faculty of Environmental and Life Sciences, University of Southampton, Southampton SO17 1BJ, United Kingdom
| | - Rongzhi Liu
- Center for Medical Device Evaluation, China Drug Administration, State Administration for Market Regulation, Beijing 100081, China
| | - Yihua Wang
- Biological Sciences, Faculty of Environmental and Life Sciences, University of Southampton, Southampton SO17 1BJ, United Kingdom
| |
Collapse
|
14
|
Cheng Q, Li J, Fan F, Cao H, Dai ZY, Wang ZY, Feng SS. Identification and Analysis of Glioblastoma Biomarkers Based on Single Cell Sequencing. Front Bioeng Biotechnol 2020; 8:167. [PMID: 32195242 PMCID: PMC7066068 DOI: 10.3389/fbioe.2020.00167] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2019] [Accepted: 02/19/2020] [Indexed: 12/16/2022] Open
Abstract
Glioblastoma (GBM) is one of the most common and aggressive primary adult brain tumors. Tumor heterogeneity poses a great challenge to the treatment of GBM, which is determined by both heterogeneous GBM cells and a complex tumor microenvironment. Single-cell RNA sequencing (scRNA-seq) enables the transcriptomes of great deal of individual cells to be assayed in an unbiased manner and has been applied in head and neck cancer, breast cancer, blood disease, and so on. In this study, based on the scRNA-seq results of infiltrating neoplastic cells in GBM, computational methods were applied to screen core biomarkers that can distinguish the discrepancy between GBM tumor and pericarcinomatous environment. The gene expression profiles of GBM from 2343 tumor cells and 1246 periphery cells were analyzed by maximum relevance minimum redundancy (mRMR). Upon further analysis of the feature lists yielded by the mRMR method, 31 important genes were extracted that may be essential biomarkers for GBM tumor cells. Besides, an optimal classification model using a support vector machine (SVM) algorithm as the classifier was also built. Our results provided insights of GBM mechanisms and may be useful for GBM diagnosis and therapy.
Collapse
Affiliation(s)
- Quan Cheng
- Department of Neurosurgery, Xiangya Hospital, Central South University, Changsha, China.,Department of Clinical Pharmacology, Xiangya Hospital, Central South University, Changsha, China
| | - Jing Li
- Department of Rehabilitation, The Second Xiangya Hospital, Central South University, Changsha, China
| | - Fan Fan
- Department of Neurosurgery, Xiangya Hospital, Central South University, Changsha, China
| | - Hui Cao
- Department of Psychiatry, The Second People's Hospital of Hunan University of Chinese Medicine, Changsha, China
| | - Zi-Yu Dai
- Department of Neurosurgery, Xiangya Hospital, Central South University, Changsha, China
| | - Ze-Yu Wang
- Department of Neurosurgery, Xiangya Hospital, Central South University, Changsha, China
| | - Song-Shan Feng
- Department of Neurosurgery, Xiangya Hospital, Central South University, Changsha, China
| |
Collapse
|
15
|
Yan C, Duan G, Pan Y, Wu FX, Wang J. DDIGIP: predicting drug-drug interactions based on Gaussian interaction profile kernels. BMC Bioinformatics 2019; 20:538. [PMID: 31874609 PMCID: PMC6929542 DOI: 10.1186/s12859-019-3093-x] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2019] [Accepted: 09/10/2019] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND A drug-drug interaction (DDI) is defined as a drug effect modified by another drug, which is very common in treating complex diseases such as cancer. Many studies have evidenced that some DDIs could be an increase or a decrease of the drug effect. However, the adverse DDIs maybe result in severe morbidity and even morality of patients, which also cause some drugs to withdraw from the market. As the multi-drug treatment becomes more and more common, identifying the potential DDIs has become the key issue in drug development and disease treatment. However, traditional biological experimental methods, including in vitro and vivo, are very time-consuming and expensive to validate new DDIs. With the development of high-throughput sequencing technology, many pharmaceutical studies and various bioinformatics data provide unprecedented opportunities to study DDIs. RESULT In this study, we propose a method to predict new DDIs, namely DDIGIP, which is based on Gaussian Interaction Profile (GIP) kernel on the drug-drug interaction profiles and the Regularized Least Squares (RLS) classifier. In addition, we also use the k-nearest neighbors (KNN) to calculate the initial relational score in the presence of new drugs via the chemical, biological, phenotypic data of drugs. We compare the prediction performance of DDIGIP with other competing methods via the 5-fold cross validation, 10-cross validation and de novo drug validation. CONLUSION In 5-fold cross validation and 10-cross validation, DDRGIP method achieves the area under the ROC curve (AUC) of 0.9600 and 0.9636 which are better than state-of-the-art method (L1 Classifier ensemble method) of 0.9570 and 0.9599. Furthermore, for new drugs, the AUC value of DDIGIP in de novo drug validation reaches 0.9262 which also outperforms the other state-of-the-art method (Weighted average ensemble method) of 0.9073. Case studies and these results demonstrate that DDRGIP is an effective method to predict DDIs while being beneficial to drug development and disease treatment.
Collapse
Affiliation(s)
- Cheng Yan
- School of Computer Science and Engineering, Central South University, 932 South Lushan Rd, ChangSha, 410083 China
- School of Computer and Information,Qiannan Normal University for Nationalities, Longshan Road, DuYun, 558000 China
| | - Guihua Duan
- School of Computer Science and Engineering, Central South University, 932 South Lushan Rd, ChangSha, 410083 China
| | - Yi Pan
- Department of Computer Science, Georgia State University, Atlanta, GA30302 USA
| | - Fang-Xiang Wu
- Biomedical Engineering and Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, SKS7N5A9 Canada
| | - Jianxin Wang
- School of Computer Science and Engineering, Central South University, 932 South Lushan Rd, ChangSha, 410083 China
| |
Collapse
|
16
|
Chen L, Li D, Shao Y, Wang H, Liu Y, Zhang Y. Identifying Microbiota Signature and Functional Rules Associated With Bacterial Subtypes in Human Intestine. Front Genet 2019; 10:1146. [PMID: 31803234 PMCID: PMC6872643 DOI: 10.3389/fgene.2019.01146] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2019] [Accepted: 10/21/2019] [Indexed: 12/12/2022] Open
Abstract
Gut microbiomes are integral microflora located in the human intestine with particular symbiosis. Among all microorganisms in the human intestine, bacteria are the most significant subgroup that contains many unique and functional species. The distribution patterns of bacteria in the human intestine not only reflect the different microenvironments in different sections of the intestine but also indicate that bacteria may have unique biological functions corresponding to their proper regions of the intestine. However, describing the functional differences between the bacterial subgroups and their distributions in different individuals is difficult using traditional computational approaches. Here, we first attempted to introduce four effective sets of bacterial features from independent databases. We then presented a novel computational approach to identify potential distinctive features among bacterial subgroups based on a systematic dataset on the gut microbiome from approximately 1,500 human gut bacterial strains. We also established a group of quantitative rules for explaining such distinctions. Results may reveal the microstructural characteristics of the intestinal flora and deepen our understanding on the regulatory role of bacterial subgroups in the human intestine.
Collapse
Affiliation(s)
- Lijuan Chen
- College of Animal Science and Technology, Anhui Agricultural University, Hefei, China
| | - Daojie Li
- College of Animal Science and Technology, Anhui Agricultural University, Hefei, China
| | - Ye Shao
- School of Medicine, Huaqiao University, Quanzhou, China
| | - Hui Wang
- College of Animal Science and Technology, Anhui Agricultural University, Hefei, China
| | - Yuqing Liu
- Anhui Province Key Laboratory of Farmland Ecological Conservation and Pollution Prevention, School of Resources and Environment, Anhui Agricultural University, Hefei, China
| | - Yunhua Zhang
- Anhui Province Key Laboratory of Farmland Ecological Conservation and Pollution Prevention, School of Resources and Environment, Anhui Agricultural University, Hefei, China
| |
Collapse
|
17
|
Zhang GL, Pan LL, Huang T, Wang JH. The transcriptome difference between colorectal tumor and normal tissues revealed by single-cell sequencing. J Cancer 2019; 10:5883-5890. [PMID: 31737124 PMCID: PMC6843882 DOI: 10.7150/jca.32267] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2018] [Accepted: 06/17/2019] [Indexed: 12/29/2022] Open
Abstract
The previous cancer studies were difficult to reproduce since the tumor tissues were analyzed directly. But the tumor tissues were actually a mixture of different cancer cells. The transcriptome of single-cell was much robust than the transcriptome of a mixed tissue. The single-cell transcriptome had much smaller variance. In this study, we analyzed the single-cell transcriptome of 272 colorectal cancer (CRC) epithelial cells and 160 normal epithelial cells and identified 342 discriminative transcripts using advanced machine learning methods. The most discriminative transcripts were LGALS4, PHGR1, C15orf48, HEPACAM2, PERP, FABP1, FCGBP, MT1G, TSPAN1 and CKB. We further clustered the 342 transcripts into two categories. The upregulated transcripts in CRC epithelial cells were significantly enriched in Ribosome, Protein processing in endoplasmic reticulum, Antigen processing and presentation and p53 signaling pathway. The downregulated transcripts in CRC epithelial cells were significantly enriched in Mineral absorption, Aldosterone-regulated sodium reabsorption and Oxidative phosphorylation pathways. The biological analysis of the discriminative transcripts revealed the possible mechanism of colorectal cancer.
Collapse
Affiliation(s)
- Guo-Liang Zhang
- Department of Colorectal Surgery, The First Affiliated Hospital, College of Medicine, Zhejiang University, Hangzhou 310003, Zhejiang, China
| | - Le-Lin Pan
- Department of Colorectal Surgery, The First Affiliated Hospital, College of Medicine, Zhejiang University, Hangzhou 310003, Zhejiang, China
| | - Tao Huang
- Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Jin-Hai Wang
- Department of Colorectal Surgery, The First Affiliated Hospital, College of Medicine, Zhejiang University, Hangzhou 310003, Zhejiang, China
| |
Collapse
|
18
|
Inferring novel genes related to oral cancer with a network embedding method and one-class learning algorithms. Gene Ther 2019; 26:465-478. [PMID: 31455874 DOI: 10.1038/s41434-019-0099-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2019] [Revised: 06/18/2019] [Accepted: 07/15/2019] [Indexed: 12/14/2022]
Abstract
Oral cancer (OC) is one of the most common cancers threatening human lives. However, OC pathogenesis has yet to be fully uncovered, and thus designing effective treatments remains difficult. Identifying genes related to OC is an important way for achieving this purpose. In this study, we proposed three computational models for inferring novel OC-related genes. In contrast to previously proposed computational methods, which lacked the learning procedures, each proposed model adopted a one-class learning algorithm, which can provide a deep insight into features of validated OC-related genes. A network embedding algorithm (i.e., node2vec) was applied to the protein-protein interaction network to produce the representation of genes. The features of the OC-related genes were used in the training of the one-class algorithm, and the performance of the final inferring model was improved through a feature selection procedure. Then, candidate genes were produced by applying the trained inferring model to other genes. Three tests were performed to screen out the important candidate genes. Accordingly, we obtained three inferred gene sets, any two of which were different. The inferred genes were also different from previous reported genes and some of them have been included in the public Oral Cancer Gene Database. Finally, we analyzed several inferred genes to confirm whether they are novel OC-related genes.
Collapse
|
19
|
Analysis of Protein-Protein Functional Associations by Using Gene Ontology and KEGG Pathway. BIOMED RESEARCH INTERNATIONAL 2019; 2019:4963289. [PMID: 31396531 PMCID: PMC6668538 DOI: 10.1155/2019/4963289] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/04/2019] [Revised: 06/04/2019] [Accepted: 06/26/2019] [Indexed: 12/19/2022]
Abstract
Protein–protein interaction (PPI) plays an extremely remarkable role in the growth, reproduction, and metabolism of all lives. A thorough investigation of PPI can uncover the mechanism of how proteins express their functions. In this study, we used gene ontology (GO) terms and biological pathways to study an extended version of PPI (protein–protein functional associations) and subsequently identify some essential GO terms and pathways that can indicate the difference between two proteins with and without functional associations. The protein–protein functional associations validated by experiments were retrieved from STRING, a well-known database on collected associations between proteins from multiple sources, and they were termed as positive samples. The negative samples were constructed by randomly pairing two proteins. Each sample was represented by several features based on GO and KEGG pathway information of two proteins. Then, the mutual information was adopted to evaluate the importance of all features and some important ones could be accessed, from which a number of essential GO terms or KEGG pathways were identified. The final analysis of some important GO terms and one KEGG pathway can partly uncover the difference between proteins with and without functional associations.
Collapse
|
20
|
Li J, Lu L, Zhang YH, Xu Y, Liu M, Feng K, Chen L, Kong X, Huang T, Cai YD. Identification of leukemia stem cell expression signatures through Monte Carlo feature selection strategy and support vector machine. Cancer Gene Ther 2019; 27:56-69. [PMID: 31138902 DOI: 10.1038/s41417-019-0105-y] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2019] [Revised: 04/28/2019] [Accepted: 05/04/2019] [Indexed: 01/09/2023]
Abstract
Acute myeloid leukemia (AML) is a type of blood cancer characterized by the rapid growth of immature white blood cells from the bone marrow. Therapy resistance resulting from the persistence of leukemia stem cells (LSCs) are found in numerous patients. Comparative transcriptome studies have been previously conducted to analyze differentially expressed genes between LSC+ and LSC- cells. However, these studies mainly focused on a limited number of genes with the most obvious expression differences between the two cell types. We developed a computational approach incorporating several machine learning algorithms, including Monte Carlo feature selection (MCFS), incremental feature selection (IFS), support vector machine (SVM), Repeated Incremental Pruning to Produce Error Reduction (RIPPER), to identify gene expression features specific to LSCs. One thousand 0ne hudred fifty-nine features (genes) were first identified, which can be used to build the optimal SVM classifier for distinguishing LSC+ and LSC- cells. Among these 1159 genes, the top 17 genes were identified as LSC-specific biomarkers. In addition, six classification rules were produced by RIPPER algorithm. The subsequent literature review on these features/genes and the classification rules and functional enrichment analyses of the 1159 features/genes confirmed the relevance of extracted genes and rules to the characteristics of LSCs.
Collapse
Affiliation(s)
- JiaRui Li
- Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, P. R. China.,School of Life Sciences, Shanghai University, Shanghai, 200444, P. R. China
| | - Lin Lu
- Department of Radiology, Columbia University Medical Center, New York, NY, 10032, USA
| | - Yu-Hang Zhang
- Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, P. R. China
| | - YaoChen Xu
- Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, P. R. China
| | - Min Liu
- College of Information Engineering, Shanghai Maritime University, Shanghai, 201306, P. R. China
| | - KaiYan Feng
- Department of Computer Science, Guangdong AIB Polytechnic, Guangzhou, 510507, P. R. China
| | - Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai, 201306, P. R. China.,Shanghai Key Laboratory of PMMP, East China Normal University, Shanghai, 200241, P. R. China
| | - XiangYin Kong
- Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, P. R. China.
| | - Tao Huang
- Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, P. R. China.
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, 200444, P. R. China.
| |
Collapse
|
21
|
Qian S, Liang S, Yu H. Leveraging genetic interactions for adverse drug-drug interaction prediction. PLoS Comput Biol 2019; 15:e1007068. [PMID: 31125330 PMCID: PMC6553795 DOI: 10.1371/journal.pcbi.1007068] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2019] [Revised: 06/06/2019] [Accepted: 05/03/2019] [Indexed: 12/20/2022] Open
Abstract
In light of increased co-prescription of multiple drugs, the ability to discern and predict drug-drug interactions (DDI) has become crucial to guarantee the safety of patients undergoing treatment with multiple drugs. However, information on DDI profiles is incomplete and the experimental determination of DDIs is labor-intensive and time-consuming. Although previous studies have explored various feature spaces for in silico screening of interacting drug pairs, their use of conventional cross-validation prevents them from achieving generalizable performance on drug pairs where neither drug is seen during training. Here we demonstrate for the first time targets of adversely interacting drug pairs are significantly more likely to have synergistic genetic interactions than non-interacting drug pairs. Leveraging genetic interaction features and a novel training scheme, we construct a gradient boosting-based classifier that achieves robust DDI prediction even for drugs whose interaction profiles are completely unseen during training. We demonstrate that in addition to classification power—including the prediction of 432 novel DDIs—our genetic interaction approach offers interpretability by providing plausible mechanistic insights into the mode of action of DDIs. Adverse drug-drug interactions are adverse side effects caused by taking two or more drugs together. As co-prescription of multiple drugs becomes an increasingly prevalent practice, affecting 42.2% of Americans over 65 years old, adverse drug-drug interactions have become a serious safety concern, accounting for over 74,000 emergency room visits and 195,000 hospitalizations each year in the United States alone. Since experimental determination of adverse drug-drug interactions is labor-intensive and time-consuming, various machine learning-based computational approaches have been developed for predicting drug-drug interactions. Considering the fact that drugs effect through binding and modulating the function of their targets, we have explored whether drug-drug interactions can be predicted from the genetic interaction between the gene targets of two drugs, which characterizes the unexpected fitness effect when two genes are simultaneously knocked out. Furthermore, we have built a fast and robust classifier that achieves accurate prediction of adverse drug-drug interactions by incorporating genetic interaction and several other types of widely used features. Our analyses suggest that genetic interaction is an important feature for our prediction model, and that it provides mechanistic insight into the mode of action of drugs leading to drug-drug interactions.
Collapse
Affiliation(s)
- Sheng Qian
- Department of Computational Biology, Cornell University, Ithaca, New York, United States of America
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, New York, United States of America
| | - Siqi Liang
- Department of Computational Biology, Cornell University, Ithaca, New York, United States of America
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, New York, United States of America
| | - Haiyuan Yu
- Department of Computational Biology, Cornell University, Ithaca, New York, United States of America
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, New York, United States of America
- * E-mail:
| |
Collapse
|
22
|
Chen L, Pan X, Zhang YH, Kong X, Huang T, Cai YD. Tissue differences revealed by gene expression profiles of various cell lines. J Cell Biochem 2019; 120:7068-7081. [PMID: 30368905 DOI: 10.1002/jcb.27977] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2018] [Accepted: 10/04/2018] [Indexed: 01/24/2023]
Abstract
Mechanisms through which tissues are formed and maintained remain unknown but are fundamental aspects in biology. Tissue-specific gene expression is a valuable tool to study such mechanisms. But in many biomedical studies, cell lines, rather than human body tissues, are used to investigate biological mechanisms Whether or not cell lines maintain their tissue-specific characteristics after they are isolated and cultured outside the human body remains to be explored. In this study, we applied a novel computational method to identify core genes that contribute to the differentiation of cell lines from various tissues. Several advanced computational techniques, such as Monte Carlo feature selection method, incremental feature selection method, and support vector machine (SVM) algorithm, were incorporated in the proposed method, which extensively analyzed the gene expression profiles of cell lines from different tissues. As a result, we extracted a group of functional genes that can indicate the differences of cell lines in different tissues and built an optimal SVM classifier for identifying cell lines in different tissues. In addition, a set of rules for classifying cell lines were also reported, which can give a clearer picture of cell lines in different issues although its performance was not better than the optimal SVM classifier. Finally, we compared such genes with the tissue-specific genes identified by the Genotype-tissue Expression project. Results showed that most expression patterns between tissues remained in the derived cell lines despite some uniqueness that some genes show tissue specificity.
Collapse
Affiliation(s)
- Lei Chen
- School of Life Sciences, Shanghai University, Shanghai, China.,College of Information Engineering, Shanghai Maritime University, Shanghai, China.,Shanghai Key Laboratory of PMMP, East China Normal University, Shanghai, China
| | - Xiaoyong Pan
- Department of Medical Informatics, Erasmus MC, Rotterdam, The Netherlands
| | - Yu-Hang Zhang
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Xiangyin Kong
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Tao Huang
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, China
| |
Collapse
|
23
|
Zhang Y, Dong D, Li D, Lu L, Li J, Zhang Y, Chen L. Computational Method for the Identification of Molecular Metabolites Involved in Cereal Hull Color Variations. Comb Chem High Throughput Screen 2019; 21:760-770. [DOI: 10.2174/1386207322666190129105441] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2018] [Revised: 08/02/2018] [Accepted: 08/16/2018] [Indexed: 11/22/2022]
Abstract
Background:
Cereal hull color is an important quality specification characteristic. Many
studies were conducted to identify genetic changes underlying cereal hull color diversity. However,
these studies mainly focused on the gene level. Recent studies have suggested that metabolomics can
accurately reflect the integrated and real-time cell processes that contribute to the formation of
different cereal colors.
Methods:
In this study, we exploited published metabolomics databases and applied several
advanced computational methods, such as minimum redundancy maximum relevance (mRMR),
incremental forward search (IFS), random forest (RF) to investigate cereal hull color at the metabolic
level. First, the mRMR was applied to analyze cereal hull samples represented by metabolite
features, yielding a feature list. Then, the IFS and RF were used to test several feature sets,
constructed according to the aforementioned feature list. Finally, the optimal feature sets and RF
classifier were accessed based on the testing results.
Results and Conclusion:
A total of 158 key metabolites were found to be useful in distinguishing
white cereal hulls from colorful cereal hulls. A prediction model constructed with these metabolites
and a random forest algorithm generated a high Matthews coefficient correlation value of 0.701.
Furthermore, 24 of these metabolites were previously found to be relevant to cereal color. Our study
can provide new insights into the molecular basis of cereal hull color formation.
Collapse
Affiliation(s)
- Yunhua Zhang
- Anhui Province Key Laboratory of Farmland Ecological Conservation and Pollution Prevention, School of Resources and Environment, Anhui Agricultural University, Hefei, Anhui, China
| | - Dong Dong
- Anhui Province Key Laboratory of Farmland Ecological Conservation and Pollution Prevention, School of Resources and Environment, Anhui Agricultural University, Hefei, Anhui, China
| | - Dai Li
- Anhui Province Key Laboratory of Farmland Ecological Conservation and Pollution Prevention, School of Resources and Environment, Anhui Agricultural University, Hefei, Anhui, China
| | - Lin Lu
- Department of Radiology, Columbia University Medical Center, New York, United States
| | - JiaRui Li
- School of Life Sciences, Shanghai University, Shanghai, China
| | - YuHang Zhang
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Lijuan Chen
- College of Animal Science and Technology, Anhui Agricultural University, Hefei, Anhui, China
| |
Collapse
|
24
|
Wang T, Chen L, Zhao X. Prediction of Drug Combinations with a Network Embedding Method. Comb Chem High Throughput Screen 2019; 21:789-797. [DOI: 10.2174/1386207322666181226170140] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2018] [Revised: 11/02/2018] [Accepted: 11/28/2018] [Indexed: 01/10/2023]
Abstract
Aim and Objective:
There are several diseases having a complicated mechanism. For such
complicated diseases, a single drug cannot treat them very well because these diseases always
involve several targets and single targeted drugs cannot modulate these targets simultaneously. Drug
combination is an effective way to treat such diseases. However, determination of effective drug
combinations is time- and cost-consuming via traditional methods. It is urgent to build quick and
cheap methods in this regard. Designing effective computational methods incorporating advanced
computational techniques to predict drug combinations is an alternative and feasible way.
Method:
In this study, we proposed a novel network embedding method, which can extract
topological features of each drug combination from a drug network that was constructed using
chemical-chemical interaction information retrieved from STITCH. These topological features were
combined with individual features of drug combination reported in one previous study. Several
advanced computational methods were employed to construct an effective prediction model, such as
synthetic minority oversampling technique (SMOTE) that was used to tackle imbalanced dataset,
minimum redundancy maximum relevance (mRMR) and incremental feature selection (IFS)
methods that were adopted to analyze features and extract optimal features for building an optimal
support machine vector (SVM) classifier.
Results and Conclusion:
The constructed optimal SVM classifier yielded an MCC of 0.806, which
is superior to the classifier only using individual features with or without SMOTE. The performance
of the classifier can be improved by combining the topological features and essential features of a
drug combination.
Collapse
Affiliation(s)
- Tianyun Wang
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| | - Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| | - Xian Zhao
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| |
Collapse
|
25
|
Zhang C, Yan G. Synergistic drug combinations prediction by integrating pharmacological data. Synth Syst Biotechnol 2019; 4:67-72. [PMID: 30820478 PMCID: PMC6370570 DOI: 10.1016/j.synbio.2018.10.002] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2018] [Revised: 09/30/2018] [Accepted: 10/04/2018] [Indexed: 12/12/2022] Open
Abstract
There is compelling evidence that synergistic drug combinations have become promising strategies for combating complex diseases, and they have evident predominance comparing to traditional one drug - one disease approaches. In this paper, we develop a computational method, namely SyFFM, that takes pharmacological data into consideration and applies field-aware factorization machines to analyze and predict potential synergistic drug combinations. Firstly, features of drug pairs are constructed based on associations between drugs and target, and enzymes, and indication areas. Then, the synergistic scores of drug combinations are obtained by implementing field-aware factorization machines on latent vector space of these features. Finally, synergistic combinations can be predicted by introducing a threshold. We applied SyFFM to predict pairwise synergistic combinations and three-drug synergistic combinations, and the performance is good in terms of cross-validation. Besides, more than 90% combinations of the top ranked predictions are proved by literature and the analysis of parameters in model shows that our method can help to investigate and explain synergistic mechanisms underlying combinatorial therapy.
Collapse
Affiliation(s)
- Chengzhi Zhang
- Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, PR China.,School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing, 100049, PR China
| | - Guiying Yan
- Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, PR China.,School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing, 100049, PR China
| |
Collapse
|
26
|
Wang S, Li J, Sun X, Zhang YH, Huang T, Cai Y. Computational Method for Identifying Malonylation Sites by Using Random Forest Algorithm. Comb Chem High Throughput Screen 2018; 23:304-312. [PMID: 30588879 DOI: 10.2174/1386207322666181227144318] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2018] [Revised: 09/03/2018] [Accepted: 12/04/2018] [Indexed: 12/12/2022]
Abstract
BACKGROUND As a newly uncovered post-translational modification on the ε-amino group of lysine residue, protein malonylation was found to be involved in metabolic pathways and certain diseases. Apart from experimental approaches, several computational methods based on machine learning algorithms were recently proposed to predict malonylation sites. However, previous methods failed to address imbalanced data sizes between positive and negative samples. OBJECTIVE In this study, we identified the significant features of malonylation sites in a novel computational method which applied machine learning algorithms and balanced data sizes by applying synthetic minority over-sampling technique. METHOD Four types of features, namely, amino acid (AA) composition, position-specific scoring matrix (PSSM), AA factor, and disorder were used to encode residues in protein segments. Then, a two-step feature selection procedure including maximum relevance minimum redundancy and incremental feature selection, together with random forest algorithm, was performed on the constructed hybrid feature vector. RESULTS An optimal classifier was built from the optimal feature subset, which featured an F1-measure of 0.356. Feature analysis was performed on several selected important features. CONCLUSION Results showed that certain types of PSSM and disorder features may be closely associated with malonylation of lysine residues. Our study contributes to the development of computational approaches for predicting malonyllysine and provides insights into molecular mechanism of malonylation.
Collapse
Affiliation(s)
- ShaoPeng Wang
- School of Life Sciences, Shanghai University, Shanghai 200444, China
| | - JiaRui Li
- School of Life Sciences, Shanghai University, Shanghai 200444, China
| | - Xijun Sun
- School of Life Sciences, Shanghai University, Shanghai 200444, China
| | - Yu-Hang Zhang
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Tao Huang
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Yudong Cai
- School of Life Sciences, Shanghai University, Shanghai 200444, China
| |
Collapse
|
27
|
Chen L, Pan X, Zhang YH, Liu M, Huang T, Cai YD. Classification of Widely and Rarely Expressed Genes with Recurrent Neural Network. Comput Struct Biotechnol J 2018; 17:49-60. [PMID: 30595815 PMCID: PMC6307323 DOI: 10.1016/j.csbj.2018.12.002] [Citation(s) in RCA: 30] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2018] [Revised: 12/07/2018] [Accepted: 12/09/2018] [Indexed: 02/06/2023] Open
Abstract
A tissue-specific gene expression shapes the formation of tissues, while gene expression changes reflect the immune response of the human body to environmental stimulations or pressure, particularly in disease conditions, such as cancers. A few genes are commonly expressed across tissues or various cancers, while others are not. To investigate the functional differences between widely and rarely expressed genes, we defined the genes that were expressed in 32 normal tissues/cancers (i.e., called widely expressed genes; FPKM >1 in all samples) and those that were not detected (i.e., called rarely expressed genes; FPKM <1 in all samples) based on the large gene expression data set provided by Uhlen et al. Each gene was encoded using the gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment scores. Minimum redundancy maximum relevance (mRMR) was used to measure and rank these features on the mRMR feature list. Thereafter, we applied the incremental feature selection method with a supervised classifier recurrent neural network (RNN) to select the discriminate features for classifying widely expressed genes from rarely expressed genes and construct an optimum RNN classifier. The Youden's indexes generated by the optimum RNN classifier and evaluated using a 10-fold cross validation were 0.739 for normal tissues and 0.639 for cancers. Furthermore, the underlying mechanisms of the key discriminate GO and KEGG features were analyzed. Results can facilitate the identification of the expression landscape of genes and elucidation of how gene expression shapes tissues and the microenvironment of cancers. Some genes are widely expressed across tissues or various cancers. A number of genes are rarely expressed across tissues or various cancers. The functional differences between widely and rarely expressed genes were studied. Several GO terms and KEGG pathways were extracted and analyzed.
Collapse
Affiliation(s)
- Lei Chen
- School of Life Sciences, Shanghai University, Shanghai 200444, People's Republic of China.,College of Information Engineering, Shanghai Maritime University, Shanghai 201306, People's Republic of China.,Shanghai Key Laboratory of PMMP, East China Normal University, Shanghai 200241, People's Republic of China
| | - XiaoYong Pan
- Department of Medical Informatics, Erasmus MC, Rotterdam, the Netherlands
| | - Yu-Hang Zhang
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, People's Republic of China
| | - Min Liu
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, People's Republic of China
| | - Tao Huang
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, People's Republic of China
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai 200444, People's Republic of China
| |
Collapse
|
28
|
Sheng M, Dong Z, Xie Y. Identification of tumor-educated platelet biomarkers of non-small-cell lung cancer. Onco Targets Ther 2018; 11:8143-8151. [PMID: 30532555 PMCID: PMC6241732 DOI: 10.2147/ott.s177384] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
BACKGROUND Lung cancer is a severe cancer with a high death rate. The 5-year survival rate for stage III lung cancer is much lower than stage I. Early detection and intervention of lung cancer patients can significantly increase their survival time. However, conventional lung cancer-screening methods, such as chest X-rays, sputum cytology, positron-emission tomography (PET), low-dose computed tomography (CT), magnetic resonance imaging, and gene-mutation, -methylation, and -expression biomarkers of lung tissue, are invasive, radiational, or expensive. Liquid biopsy is non-invasive and does little harm to the body. It can reflect early-stage dysfunctions of tumorigenesis and enable early detection and intervention. METHODS In this study, we analyzed RNA-sequencing data of tumor-educated platelets (TEPs) in 402 non-small-cell lung cancer (NSCLC) patients and 231 healthy controls. A total of 48 biomarker genes were selected with advanced minimal-redundancy, maximal-relevance, and incremental feature-selection (IFS) methods. RESULTS A support vector-machine (SVM) classifier based on the 48 biomarker genes accurately predicted NSCLC with leave-one-out cross-validation (LOOCV) sensitivity, specificity, accuracy, and Matthews correlation coefficients of 0.925, 0.827, 0.889, and 0.760, respectively. Network analysis of the 48 genes revealed that the WASF1 actin cytoskeleton module, PRKAB2 kinase module, RSRC1 ribosomal protein module, PDHB carbohydrate-metabolism module, and three intermodule hubs (TPM2, MYL9, and PPP1R12C) may play important roles in NSCLC tumorigenesis and progression. CONCLUSION The 48-gene TEP liquid-biopsy biomarkers will facilitate early screening of NSCLC and prolong the survival of cancer patients.
Collapse
Affiliation(s)
- Meiling Sheng
- Department of Respiration, Jinhua People's Hospital, Jinhua, Zhejiang 321000, China
| | - Zhaohui Dong
- Department of Intensive Care Unit, First Hospital of Huzhou, First Affiliated Hospital of Huzhou University, Huzhou, Zhejiang 313000, China
| | - Yanping Xie
- Department of Respiratory Medicine, First Hospital of Huzhou, First Affiliated Hospital of Huzhou University, Huzhou, Zhejiang 313000, China,
| |
Collapse
|
29
|
The early detection of asthma based on blood gene expression. Mol Biol Rep 2018; 46:217-223. [PMID: 30421126 DOI: 10.1007/s11033-018-4463-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2018] [Accepted: 11/01/2018] [Indexed: 01/10/2023]
Abstract
Asthma is a complex heterogeneous disorder with hereditary tendency and the most widely used therapy is inhalation of anti-inflammatory corticosteroids. But it has systemic side effects. If the chronic inflammation can be detected in early stage, the dosage of corticosteroids will be low and the side effects can be avoided. Therefore, to discover the early stage blood biomarkers for asthma, we analyzed the gene expression profiles in the blood of 77 moderate asthma patients and 87 healthy controls. With advanced feature selection methods, minimal Redundancy Maximal Relevance and Incremental Feature Selection, we identified 31 genes, such as MYD88, ZFP36, CCR3 and CYP3A5, as the optimal asthma biomarker. The sensitivity, specificity and accuracy of the 31-gene Support Vector Machine predictor evaluated with Leave-One-Out Cross Validation were 0.870, 0.816 and 0.841, respectively. Through literature survey, many biomarker genes have asthma associated functions. Our results not only provided the easy-to-apply blood gene expression biomarkers for early detection of asthma, but also an explainable qualitative model with biological significance.
Collapse
|
30
|
Chen L, Zhang YH, Pan X, Liu M, Wang S, Huang T, Cai YD. Tissue Expression Difference between mRNAs and lncRNAs. Int J Mol Sci 2018; 19:ijms19113416. [PMID: 30384456 PMCID: PMC6274976 DOI: 10.3390/ijms19113416] [Citation(s) in RCA: 49] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2018] [Revised: 10/26/2018] [Accepted: 10/28/2018] [Indexed: 12/15/2022] Open
Abstract
Messenger RNA (mRNA) and long noncoding RNA (lncRNA) are two main subgroups of RNAs participating in transcription regulation. With the development of next generation sequencing, increasing lncRNAs are identified. Many hidden functions of lncRNAs are also revealed. However, the differences in lncRNAs and mRNAs are still unclear. For example, we need to determine whether lncRNAs have stronger tissue specificity than mRNAs and which tissues have more lncRNAs expressed. To investigate such tissue expression difference between mRNAs and lncRNAs, we encoded 9339 lncRNAs and 14,294 mRNAs with 71 expression features, including 69 maximum expression features for 69 types of cells, one feature for the maximum expression in all cells, and one expression specificity feature that was measured as Chao-Shen-corrected Shannon's entropy. With advanced feature selection methods, such as maximum relevance minimum redundancy, incremental feature selection methods, and random forest algorithm, 13 features presented the dissimilarity of lncRNAs and mRNAs. The 11 cell subtype features indicated which cell types of the lncRNAs and mRNAs had the largest expression difference. Such cell subtypes may be the potential cell models for lncRNA identification and function investigation. The expression specificity feature suggested that the cell types to express mRNAs and lncRNAs were different. The maximum expression feature suggested that the maximum expression levels of mRNAs and lncRNAs were different. In addition, the rule learning algorithm, repeated incremental pruning to produce error reduction algorithm, was also employed to produce effective classification rules for classifying lncRNAs and mRNAs, which gave competitive results compared with random forest and could give a clearer picture of different expression patterns between lncRNAs and mRNAs. Results not only revealed the heterogeneous expression pattern of lncRNA and mRNA, but also gave rise to the development of a new tool to identify the potential biological functions of such RNA subgroups.
Collapse
Affiliation(s)
- Lei Chen
- School of Life Sciences, Shanghai University, Shanghai 200444, China.
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China.
- Shanghai Key Laboratory of PMMP, East China Normal University, Shanghai 200241, China.
| | - Yu-Hang Zhang
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China.
| | - Xiaoyong Pan
- Department of Medical Informatics, Erasmus MC, 3000 CA Rotterdam, The Netherlands.
| | - Min Liu
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China.
| | - Shaopeng Wang
- School of Life Sciences, Shanghai University, Shanghai 200444, China.
| | - Tao Huang
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China.
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai 200444, China.
| |
Collapse
|
31
|
Lu S, Zhao K, Wang X, Liu H, Ainiwaer X, Xu Y, Ye M. Use of Laplacian Heat Diffusion Algorithm to Infer Novel Genes With Functions Related to Uveitis. Front Genet 2018; 9:425. [PMID: 30349554 PMCID: PMC6186792 DOI: 10.3389/fgene.2018.00425] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2018] [Accepted: 09/10/2018] [Indexed: 12/17/2022] Open
Abstract
Uveitis is the inflammation of the uvea and is a serious eye disease that can cause blindness for middle-aged and young people. However, the pathogenesis of this disease has not been fully uncovered and thus renders difficulties in designing effective treatments. Completely identifying the genes related to this disease can help improve and accelerate the comprehension of uveitis. In this study, a new computational method was developed to infer potential related genes based on validated ones. We employed a large protein–protein interaction network reported in STRING, in which Laplacian heat diffusion algorithm was applied using validated genes as seed nodes. Except for the validated ones, all genes in the network were filtered by three tests, namely, permutation, association, and function tests, which evaluated the genes based on their specialties and associations to uveitis. Results indicated that 59 inferred genes were accessed, several of which were confirmed to be highly related to uveitis by literature review. In addition, the inferred genes were compared with those reported in a previous study, indicating that our reported genes are necessary supplements.
Collapse
Affiliation(s)
- Shiheng Lu
- Department of Ophthalmology, Shanghai Pudong Hospital, Fudan University Pudong Medical Center, Pudong, China
| | - Ke Zhao
- Department of Ophthalmology, Shanghai Pudong Hospital, Fudan University Pudong Medical Center, Pudong, China
| | - Xuefei Wang
- Department of Ophthalmology, Shanghai Pudong Hospital, Fudan University Pudong Medical Center, Pudong, China
| | - Hui Liu
- Department of Ophthalmology, Shanghai Pudong Hospital, Fudan University Pudong Medical Center, Pudong, China
| | - Xiamuxiya Ainiwaer
- Department of Ophthalmology, Shanghai Pudong Hospital, Fudan University Pudong Medical Center, Pudong, China
| | - Yan Xu
- School of Life Sciences, Shanghai University, Shanghai, China
| | - Min Ye
- Department of Ophthalmology, Shanghai Pudong Hospital, Fudan University Pudong Medical Center, Pudong, China
| |
Collapse
|
32
|
Zhao X, Chen L, Lu J. A similarity-based method for prediction of drug side effects with heterogeneous information. Math Biosci 2018; 306:136-144. [PMID: 30296417 DOI: 10.1016/j.mbs.2018.09.010] [Citation(s) in RCA: 111] [Impact Index Per Article: 15.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2018] [Revised: 09/22/2018] [Accepted: 09/25/2018] [Indexed: 12/25/2022]
Abstract
Drugs can produce intended therapeutic effects to treat different diseases. However, they may also cause side effects at the same time. For an approved drug, it is best to detect all side effects it can produce. Otherwise, it may bring great risks for pharmaceuticals companies as well as be harmful to human body. It is urgent to design quick and reliable identification methods to detect the side effects for a given drug. In this study, a binary classification model was proposed to predict drug side effects. Different from most previous methods, our model termed the pair of drug and side effect as a sample and convert the original problem to a binary classification problem. Based on the similarity idea, each pair was represented by five features, each of which was derived from a type of drug property. The strong machine learning algorithm, random forest, was adopted as the prediction engine. The ten-fold cross-validation on five datasets with different negative samples indicated that the proposed model yielded a good performance of Matthews correlation coefficient around 0.550 and AUC around 0.8492. In addition, we also analyzed the contribution of each drug property for construction of the model. The results indicated that drug similarity in fingerprint was most related to the prediction of drug side effects and all drug properties gave less or more contributions.
Collapse
Affiliation(s)
- Xian Zhao
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, People's Republic of China
| | - Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, People's Republic of China; Shanghai Key Laboratory of PMMP, East China Normal University, Shanghai 200241, People's Republic of China.
| | - Jing Lu
- School of Pharmacy, Key Laboratory of Molecular Pharmacology and Drug Evaluation (Yantai University), Ministry of Education, Collaborative Innovation Center of Advanced Drug Delivery System and Biotech Drugs in Universities of Shandong, Yantai University, Yantai 264005, People's Republic of China
| |
Collapse
|
33
|
Lin H, Qiu X, Zhang B, Zhang J. Identification of the predictive genes for the response of colorectal cancer patients to FOLFOX therapy. Onco Targets Ther 2018; 11:5943-5955. [PMID: 30271178 PMCID: PMC6149834 DOI: 10.2147/ott.s167656] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Background Colorectal cancer is a malignant tumor with high death rate. Chemotherapy, radiotherapy and surgery are the three common treatments of colorectal cancer. For early colorectal cancer patients, postoperative adjuvant chemotherapy can reduce the risk of recurrence. For advanced colorectal cancer patients, palliative chemotherapy can significantly improve the life quality of patients and prolong survival. FOLFOX is one of the mainstream chemotherapies in colorectal cancer, however, its response rate is only about 50%. Methods To systematically investigate why some of the colorectal cancer patients have response to FOLFOX therapy while others do not, we searched all publicly available database and combined three gene expression datasets of colorectal cancer patients with FOLFOX therapy. With advanced minimal redundancy maximal relevance and incremental feature selection method, we identified the biomarker genes. Results A Support Vector Machine-based classifier was constructed to predict the response of colorectal cancer patients to FOLFOX therapy. Its accuracy, sensitivity and specificity were 0.854, 0.845 and 0.863, respectively. Conclusion The biological analysis of representative biomarker genes suggested that apoptosis and inflammation signaling pathways were essential for the response of colorectal cancer patients to FOLFOX chemotherapy.
Collapse
Affiliation(s)
- Hengjun Lin
- Department of Tumor, Anus and Intestine, Jinhua People's Hospital, Jinhua, Zhejiang 321000, China,
| | - Xueke Qiu
- Department of Tumor, Anus and Intestine, Jinhua People's Hospital, Jinhua, Zhejiang 321000, China,
| | - Bo Zhang
- Department of Tumor, Anus and Intestine, Jinhua People's Hospital, Jinhua, Zhejiang 321000, China,
| | - Jichao Zhang
- Department of Tumor, Anus and Intestine, Jinhua People's Hospital, Jinhua, Zhejiang 321000, China,
| |
Collapse
|
34
|
Pan X, Hu X, Zhang YH, Chen L, Zhu L, Wan S, Huang T, Cai YD. Identification of the copy number variant biomarkers for breast cancer subtypes. Mol Genet Genomics 2018; 294:95-110. [PMID: 30203254 DOI: 10.1007/s00438-018-1488-4] [Citation(s) in RCA: 51] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2018] [Accepted: 09/03/2018] [Indexed: 01/07/2023]
Abstract
Breast cancer is a common and threatening malignant disease with multiple biological and clinical subtypes. It can be categorized into subtypes of luminal A, luminal B, Her2 positive, and basal-like. Copy number variants (CNVs) have been reported to be a potential and even better biomarker for cancer diagnosis than mRNA biomarkers, because it is considerably more stable and robust than gene expression. Thus, it is meaningful to detect CNVs of different cancers. To identify the CNV biomarker for breast cancer subtypes, we integrated the CNV data of more than 2000 samples from two large breast cancer databases, METABRIC and The Cancer Genome Atlas (TCGA). A Monte Carlo feature selection-based and incremental feature selection-based computational method was proposed and tested to identify the distinctive core CNVs in different breast cancer subtypes. We identified the CNV genes that may contribute to breast cancer tumorigenesis as well as built a set of quantitative distinctive rules for recognition of the breast cancer subtypes. The tenfold cross-validation Matthew's correlation coefficient (MCC) on METABRIC training set and the independent test on TCGA dataset were 0.515 and 0.492, respectively. The CNVs of PGAP3, GRB7, MIR4728, PNMT, STARD3, TCAP and ERBB2 were important for the accurate diagnosis of breast cancer subtypes. The findings reported in this study may further uncover the difference between different breast cancer subtypes and improve the diagnosis accuracy.
Collapse
Affiliation(s)
- Xiaoyong Pan
- College of Life Science, Shanghai University, Shanghai, 200444, People's Republic of China.,Department of Medical Informatics, Erasmus MC, Rotterdam, The Netherlands
| | - XiaoHua Hu
- Department of Biostatistics and Computational Biology, School of Life Sciences, Fudan University, Shanghai, 200438, People's Republic of China
| | - Yu-Hang Zhang
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, People's Republic of China
| | - Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai, 201306, People's Republic of China.,Shanghai Key Laboratory of PMMP, East China Normal University, Shanghai, 200241, People's Republic of China
| | - LiuCun Zhu
- College of Life Science, Shanghai University, Shanghai, 200444, People's Republic of China
| | - ShiBao Wan
- College of Life Science, Shanghai University, Shanghai, 200444, People's Republic of China
| | - Tao Huang
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, People's Republic of China.
| | - Yu-Dong Cai
- College of Life Science, Shanghai University, Shanghai, 200444, People's Republic of China.
| |
Collapse
|
35
|
A Computational Method for Classifying Different Human Tissues with Quantitatively Tissue-Specific Expressed Genes. Genes (Basel) 2018; 9:genes9090449. [PMID: 30205473 PMCID: PMC6162521 DOI: 10.3390/genes9090449] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2018] [Revised: 09/01/2018] [Accepted: 09/04/2018] [Indexed: 02/06/2023] Open
Abstract
Tissue-specific gene expression has long been recognized as a crucial key for understanding tissue development and function. Efforts have been made in the past decade to identify tissue-specific expression profiles, such as the Human Proteome Atlas and FANTOM5. However, these studies mainly focused on "qualitatively tissue-specific expressed genes" which are highly enriched in one or a group of tissues but paid less attention to "quantitatively tissue-specific expressed genes", which are expressed in all or most tissues but with differential expression levels. In this study, we applied machine learning algorithms to build a computational method for identifying "quantitatively tissue-specific expressed genes" capable of distinguishing 25 human tissues from their expression patterns. Our results uncovered the expression of 432 genes as optimal features for tissue classification, which were obtained with a Matthews Correlation Coefficient (MCC) of more than 0.99 yielded by a support vector machine (SVM). This constructed model was superior to the SVM model using tissue enriched genes and yielded MCC of 0.985 on an independent test dataset, indicating its good generalization ability. These 432 genes were proven to be widely expressed in multiple tissues and a literature review of the top 23 genes found that most of them support their discriminating powers. As a complement to previous studies, our discovery of these quantitatively tissue-specific genes provides insights into the detailed understanding of tissue development and function.
Collapse
|
36
|
Li J, Lan CN, Kong Y, Feng SS, Huang T. Identification and Analysis of Blood Gene Expression Signature for Osteoarthritis With Advanced Feature Selection Methods. Front Genet 2018; 9:246. [PMID: 30214455 PMCID: PMC6125376 DOI: 10.3389/fgene.2018.00246] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2018] [Accepted: 06/22/2018] [Indexed: 12/15/2022] Open
Abstract
Osteoarthritis (OA) is a complex disease that affects articular joints and may cause disability. The incidence of OA is extremely high. Most elderly people have the symptoms of osteoarthritis. The physiotherapy of OA is time consuming, and the chances of full recovery from OA are very minimal. The most effective way of fighting OA is early diagnosis and early intervention. Liquid biopsy has become a popular noninvasive test. To find the blood gene expression signature for OA, we reanalyzed the publicly available blood gene expression profiles of 106 patients with OA and 33 control samples using an automatic computational pipeline based on advanced feature selection methods. Finally, a compact 23-gene set was identified. On the basis of these 23 genes, we constructed a Support Vector Machine (SVM) classifier and evaluated it with leave-one-out cross-validation. Its sensitivity (Sn), specificity (Sp), accuracy (ACC), and Mathew's correlation coefficient (MCC) were 0.991, 0.909, 0.971, and 0.920, respectively. Obviously, the performance needed to be validated in an independent large dataset, but the in-depth biological analysis of the 23 biomarkers showed great promise and suggested that mRNA surveillance pathway and multicellular organism growth played important roles in OA. Our results shed light on OA diagnosis through liquid biopsy.
Collapse
Affiliation(s)
- Jing Li
- Department of Rehabilitation, The Second Xiangya Hospital, Central South University, Changsha, China
| | - Chun-Na Lan
- Department of Rehabilitation, The Second Xiangya Hospital, Central South University, Changsha, China
| | - Ying Kong
- Department of Rehabilitation, The Second Xiangya Hospital, Central South University, Changsha, China
| | - Song-Shan Feng
- Department of Neurosurgery, Xiangya Hospital, Central South University, Changsha, China
| | - Tao Huang
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| |
Collapse
|
37
|
Li J, Lu L, Zhang YH, Liu M, Chen L, Huang T, Cai YD. Identification of synthetic lethality based on a functional network by using machine learning algorithms. J Cell Biochem 2018; 120:405-416. [PMID: 30125975 DOI: 10.1002/jcb.27395] [Citation(s) in RCA: 44] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2017] [Accepted: 07/09/2018] [Indexed: 12/27/2022]
Abstract
Synthetic lethality is the synthesis of mutations leading to cell death. Tumor-specific synthetic lethality has been targeted in research to improve cancer therapy. With the advances of techniques in molecular biology, such as RNAi and CRISPR/Cas9 gene editing, efforts have been made to systematically identify synthetic lethal interactions, especially for frequently mutated genes in cancers. However, elucidating the mechanism of synthetic lethality remains a challenge because of the complexity of its influencing conditions. In this study, we proposed a new computational method to identify critical functional features that can accurately predict synthetic lethal interactions. This method incorporates several machine learning algorithms and encodes protein-coding genes by an enrichment system derived from gene ontology terms and Kyoto Encyclopedia of Genes and Genomes pathways to represent their functional features. We built a random forest-based prediction engine by using 2120 selected features and obtained a Matthews correlation coefficient of 0.532. We examined the top 15 features and found that most of them have potential roles in synthetic lethality according to previous studies. These results demonstrate the ability of our proposed method to predict synthetic lethal interactions and provide a basis for further characterization of these particular genetic combinations.
Collapse
Affiliation(s)
- JiaRui Li
- School of Life Sciences, Shanghai University, Shanghai, China
| | - Lin Lu
- Department of Radiology, Columbia University Medical Center, New York
| | - Yu-Hang Zhang
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Min Liu
- College of Information Engineering, Shanghai Maritime University, Shanghai, China
| | - Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai, China
| | - Tao Huang
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, China
| |
Collapse
|
38
|
Yuan F, Lu L, Zhang Y, Wang S, Cai YD. Data mining of the cancer-related lncRNAs GO terms and KEGG pathways by using mRMR method. Math Biosci 2018; 304:1-8. [PMID: 30086268 DOI: 10.1016/j.mbs.2018.08.001] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2018] [Revised: 06/15/2018] [Accepted: 08/01/2018] [Indexed: 02/07/2023]
Abstract
LncRNAs plays an important role in the regulation of gene expression. Identification of cancer-related lncRNAs GO terms and KEGG pathways is great helpful for revealing cancer-related functional biological processes. Therefore, in this study, we proposed a computational method to identify novel cancer-related lncRNAs GO terms and KEGG pathways. By using existing lncRNA database and Max-relevance Min-redundancy (mRMR) method, GO terms and KEGG pathways were evaluated based on their importance on distinguishing cancer-related and non-cancer-related lncRNAs. Finally, GO terms and KEGG pathways with high importance were presented and analyzed. Our literature reviewing showed that the top 10 ranked GO terms and pathways were really related to interpretable tumorigenesis according to recent publications.
Collapse
Affiliation(s)
- Fei Yuan
- Department of Science & Technology, Binzhou Medical University Hospital, Binzhou 256603, Shandong, China.
| | - Lin Lu
- Department of Radiology, Columbia University Medical Center, New York 10032, USA.
| | - YuHang Zhang
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China.
| | - ShaoPeng Wang
- School of Life Sciences, Shanghai University, Shanghai 200444, China.
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai 200444, China.
| |
Collapse
|
39
|
Zhang TM, Huang T, Wang RF. Cross talk of chromosome instability, CpG island methylator phenotype and mismatch repair in colorectal cancer. Oncol Lett 2018; 16:1736-1746. [PMID: 30008861 PMCID: PMC6036478 DOI: 10.3892/ol.2018.8860] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2017] [Accepted: 05/22/2018] [Indexed: 12/20/2022] Open
Abstract
Colorectal cancer is a severe cancer associated with a high prevalence and fatality rate. There are three major mechanisms for colorectal cancer: (1) Chromosome instability (CIN), (2) CpG island methylator phenotype (CIMP) and (3) mismatch repair (MMR), of which CIN is the most common type. However, these subtypes are not exclusive and overlap. To investigate their biological mechanisms and cross talk, the gene expression profiles of 585 colorectal cancer patients with CIN, CIMP and MMR status records were collected. By comparing the CIN+ and CIN-samples, CIMP+ and CIMP-samples, MMR+ and MMR-samples with minimal redundancy maximal relevance (mRMR) and incremental feature selection (IFS) methods, the CIN, CIMP and MMR associated genes were selected. Unfortunately, there was little direct overlap among them. To investigate their indirect interactions, downstream genes of CIN, CIMP and MMR were identified using the random walk with restart (RWR) method and a greater overlap of downstream genes was indicated. The common downstream genes were involved in biosynthetic and metabolic pathways. These findings were consistent with the clinical observation of wide range metabolite aberrations in colorectal cancer. To conclude, the present study gave a gene level explanation of CIN, CIMP and MMR, but also showed the network level cross talk of CIN, CIMP and MMR. The common genes of CIN, CIMP and MMR may be useful for cross-subtype general colorectal cancer drug development.
Collapse
Affiliation(s)
- Tian-Ming Zhang
- Department of Colorectal and Anal Surgery, Jinhua Hospital of Zhejiang University, Jinhua, Zhejiang 321000, P.R. China
| | - Tao Huang
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, P.R. China
| | - Rong-Fei Wang
- Department of Colorectal and Anal Surgery, Jinhua People's Hospital, Jinhua, Zhejiang 321000, P.R. China
| |
Collapse
|
40
|
Wang S, Wang D, Li J, Huang T, Cai YD. Identification and analysis of the cleavage site in a signal peptide using SMOTE, dagging, and feature selection methods. Mol Omics 2018; 14:64-73. [DOI: 10.1039/c7mo00030h] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Several machine learning algorithms were adopted to investigate cleavage sites in a signal peptide. An optimal dagging based classifier was constructed and 870 important features were deemed to be important for this classifier.
Collapse
Affiliation(s)
- ShaoPeng Wang
- School of Life Sciences
- Shanghai University
- Shanghai 200444
- People's Republic of China
| | - Deling Wang
- Department of Medical Imaging
- Sun Yat-sen University Cancer Center
- State Key Laboratory of Oncology in South China
- Collaborative Innovation Center for Cancer Medicine
- Guangzhou
| | - JiaRui Li
- School of Life Sciences
- Shanghai University
- Shanghai 200444
- People's Republic of China
| | - Tao Huang
- Institute of Health Sciences
- Shanghai Institutes for Biological Sciences
- Chinese Academy of Sciences
- Shanghai 200031
- People's Republic of China
| | - Yu-Dong Cai
- School of Life Sciences
- Shanghai University
- Shanghai 200444
- People's Republic of China
| |
Collapse
|
41
|
Deng L, Xu X, Liu H. PredCSO: an ensemble method for the prediction of S-sulfenylation sites in proteins. Mol Omics 2018; 14:257-265. [DOI: 10.1039/c8mo00089a] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Predicting S-sulfenylation sites in proteins based on sequence and structural features by building an ensemble model by gradient tree boosting.
Collapse
Affiliation(s)
- Lei Deng
- School of Software, Central South University
- Changsha
- China
| | - Xiaojie Xu
- School of Software, Central South University
- Changsha
- China
| | - Hui Liu
- School of Software, Central South University
- Changsha
- China
- Lab of Information Management, Changzhou University
- Jiangsu
| |
Collapse
|
42
|
Identification of the functional alteration signatures across different cancer types with support vector machine and feature analysis. Biochim Biophys Acta Mol Basis Dis 2017; 1864:2218-2227. [PMID: 29277326 DOI: 10.1016/j.bbadis.2017.12.026] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2017] [Revised: 12/04/2017] [Accepted: 12/15/2017] [Indexed: 12/13/2022]
Abstract
Cancers are regarded as malignant proliferations of tumor cells present in many tissues and organs, which can severely curtail the quality of human life. The potential of using plasma DNA for cancer detection has been widely recognized, leading to the need of mapping the tissue-of-origin through the identification of somatic mutations. With cutting-edge technologies, such as next-generation sequencing, numerous somatic mutations have been identified, and the mutation signatures have been uncovered across different cancer types. However, somatic mutations are not independent events in carcinogenesis but exert functional effects. In this study, we applied a pan-cancer analysis to five types of cancers: (I) breast cancer (BRCA), (II) colorectal adenocarcinoma (COADREAD), (III) head and neck squamous cell carcinoma (HNSC), (IV) kidney renal clear cell carcinoma (KIRC), and (V) ovarian cancer (OV). Based on the mutated genes of patients suffering from one of the aforementioned cancer types, patients they were encoded into a large number of numerical values based upon the enrichment theory of gene ontology (GO) terms and the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. We analyzed these features with the Monte-Carlo Feature Selection (MCFS) method, followed by the incremental feature selection (IFS) method to identify functional alteration features that could be used to build the support vector machine (SVM)-based classifier for distinguishing the five types of cancers. Our results showed that the optimal classifier with the selected 344 features had the highest Matthews correlation coefficient value of 0.523. Sixteen decision rules produced by the MCFS method can yield an overall accuracy of 0.498 for the classification of the five cancer types. Further analysis indicated that some of these features and rules were supported by previous experiments. This study not only presents a new approach to mapping the tissue-of-origin for cancer detection but also unveils the specific functional alterations of each cancer type, providing insight into cancer-specific functional aberrations as potential therapeutic targets. This article is part of a Special Issue entitled: Accelerating Precision Medicine through Genetic and Genomic Big Data Analysis edited by Yudong Cai & Tao Huang.
Collapse
|
43
|
Chen L, Liu T, Zhao X. Inferring anatomical therapeutic chemical (ATC) class of drugs using shortest path and random walk with restart algorithms. Biochim Biophys Acta Mol Basis Dis 2017; 1864:2228-2240. [PMID: 29247833 DOI: 10.1016/j.bbadis.2017.12.019] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2017] [Revised: 12/01/2017] [Accepted: 12/12/2017] [Indexed: 01/02/2023]
Abstract
The anatomical therapeutic chemical (ATC) classification system is a widely accepted drug classification scheme. This system comprises five levels and includes several classes in each level. Drugs are classified into classes according to their therapeutic effects and characteristics. The first level includes 14 main classes. In this study, we proposed two network-based models to infer novel potential chemicals deemed to belong in the first level of ATC classification. To build these models, two large chemical networks were constructed using the chemical-chemical interaction information retrieved from the Search Tool for Interactions of Chemicals (STITCH). Two classic network algorithms, shortest path (SP) and random walk with restart (RWR) algorithms, were executed on the corresponding network to mine novel chemicals for each ATC class using the validated drugs in a class as seed nodes. Then, the obtained chemicals yielded by these two algorithms were further evaluated by a permutation test and an association test. The former can exclude chemicals produced by the structure of the network, i.e., false positive discoveries. By contrast, the latter identifies the most important chemicals that have strong associations with the ATC class. Comparisons indicated that the two models can provide quite dissimilar results, suggesting that the results yielded by one model can be essential supplements for those obtained by the other model. In addition, several representative inferred chemicals were analyzed to confirm the reliability of the results generated by the two models. This article is part of a Special Issue entitled: Accelerating Precision Medicine through Genetic and Genomic Big Data Analysis edited by Yudong Cai & Tao Huang.
Collapse
Affiliation(s)
- Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, People's Republic of China.
| | - Tao Liu
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, People's Republic of China.
| | - Xian Zhao
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, People's Republic of China
| |
Collapse
|
44
|
Zhang YH, Hu Y, Zhang Y, Hu LD, Kong X. Distinguishing three subtypes of hematopoietic cells based on gene expression profiles using a support vector machine. Biochim Biophys Acta Mol Basis Dis 2017; 1864:2255-2265. [PMID: 29241664 DOI: 10.1016/j.bbadis.2017.12.003] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2017] [Revised: 11/20/2017] [Accepted: 12/01/2017] [Indexed: 02/08/2023]
Abstract
Hematopoiesis is a complicated process involving a series of biological sub-processes that lead to the formation of various blood components. A widely accepted model of early hematopoiesis proceeds from long-term hematopoietic stem cells (LT-HSCs) to multipotent progenitors (MPPs) and then to lineage-committed progenitors. However, the molecular mechanisms of early hematopoiesis have not been fully characterized. In this study, we applied a computational strategy to identify the gene expression signatures distinguishing three types of closely related hematopoietic cells collected in recent studies: (1) hematopoietic stem cell/multipotent progenitor cells; (2) LT-HSCs; and (3) hematopoietic progenitor cells. Each cell in these cell types was represented by its gene expression profile among a total number of 20,475 genes. The expression features were analyzed by a Monte-Carlo Feature Selection (MCFS) method, resulting in a feature list. Then, the incremental feature selection (IFS) and a support vector machine (SVM) optimized with a sequential minimum optimization (SMO) algorithm were employed to access the optimal classifier with the highest Matthews correlation coefficient (MCC) value of 0.889, in which 6698 features were used to represent cells. In addition, through an updated program of MCFS method, seventeen decision rules can be obtained, which can classify the three cell types with an overall accuracy of 0.812. Using a literature review, both the rules and the top features used for building the optimal classifier were confirmed to be commonly used or potential biological markers for distinguishing the three cell types of HSPCs. This article is part of a Special Issue entitled: Accelerating Precision Medicine through Genetic and Genomic Big Data Analysis edited by Yudong Cai & Tao Huang.
Collapse
Affiliation(s)
- Yu-Hang Zhang
- Shanghai Institutes for Biological Sciences, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, People's Republic of China
| | - Yu Hu
- Shanghai Institutes for Biological Sciences, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, People's Republic of China
| | - Yuchao Zhang
- Shanghai Institutes for Biological Sciences, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, People's Republic of China.
| | - Lan-Dian Hu
- Shanghai Institutes for Biological Sciences, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, People's Republic of China.
| | - Xiangyin Kong
- Shanghai Institutes for Biological Sciences, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, People's Republic of China.
| |
Collapse
|
45
|
Zhang J, Suo Y, Liu M, Xu X. Identification of genes related to proliferative diabetic retinopathy through RWR algorithm based on protein-protein interaction network. Biochim Biophys Acta Mol Basis Dis 2017; 1864:2369-2375. [PMID: 29237571 DOI: 10.1016/j.bbadis.2017.11.017] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2017] [Revised: 11/15/2017] [Accepted: 11/25/2017] [Indexed: 12/14/2022]
Abstract
Proliferative diabetic retinopathy (PDR) is one of the most common complications of diabetes and can lead to blindness. Proteomic studies have provided insight into the pathogenesis of PDR and a series of PDR-related genes has been identified but are far from fully characterized because the experimental methods are expensive and time consuming. In our previous study, we successfully identified 35 candidate PDR-related genes through the shortest-path algorithm. In the current study, we developed a computational method using the random walk with restart (RWR) algorithm and the protein-protein interaction (PPI) network to identify potential PDR-related genes. After some possible genes were obtained by the RWR algorithm, a three-stage filtration strategy, which includes the permutation test, interaction test and enrichment test, was applied to exclude potential false positives caused by the structure of PPI network, the poor interaction strength, and the limited similarity on gene ontology (GO) terms and biological pathways. As a result, 36 candidate genes were discovered by the method which was different from the 35 genes reported in our previous study. A literature review showed that 21 of these 36 genes are supported by previous experiments. These findings suggest the robustness and complementary effects of both our efforts using different computational methods, thus providing an alternative method to study PDR pathogenesis.
Collapse
Affiliation(s)
- Jian Zhang
- Department of Ophthalmology, Shanghai General Hospital, School of Medicine, Shanghai JiaoTong University, Shanghai, China; Shanghai Key Laboratory of Fundus Disease, Shanghai, China; Shanghai Engineering Center for Visual Science and Photomedicine, Shanghai, China
| | - Yan Suo
- Department of Ophthalmology, Shanghai General Hospital, School of Medicine, Shanghai JiaoTong University, Shanghai, China; Shanghai Key Laboratory of Fundus Disease, Shanghai, China; Shanghai Engineering Center for Visual Science and Photomedicine, Shanghai, China
| | - Min Liu
- College of Information Engineering, Shanghai Maritime University, Shanghai, China
| | - Xun Xu
- Department of Ophthalmology, Shanghai General Hospital, School of Medicine, Shanghai JiaoTong University, Shanghai, China; Shanghai Key Laboratory of Fundus Disease, Shanghai, China; Shanghai Engineering Center for Visual Science and Photomedicine, Shanghai, China.
| |
Collapse
|
46
|
Li J, Huang T. Predicting and analyzing early wake-up associated gene expressions by integrating GWAS and eQTL studies. Biochim Biophys Acta Mol Basis Dis 2017; 1864:2241-2246. [PMID: 29109033 DOI: 10.1016/j.bbadis.2017.10.036] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2017] [Revised: 10/19/2017] [Accepted: 10/30/2017] [Indexed: 12/31/2022]
Abstract
Circadian rhythms are endogenous 24-hour rhythmic oscillations affecting human behaviors, such as sleep, blood pressure and other biological processes, the disturbance of which lead to circadian rhythm sleep disorders (CRSDs). In this study, based on the data from genome-wide association studies (GWASs) and expression quantitative trait loci (eQTLs), we tried to identify novel gene expression patterns in brain tissues that were associated with early wake-up. First, the maximum-relevance-minimum-redundancy (mRMR) method was adopted to analyze the involved gene expression patterns, yielding a feature list. Second, the incremental feature selection (IFS) method and the Dagging algorithm were applied to extract important gene expression patterns, which yield the best performance for Dagging. As a result, 4374 gene expression patterns were obtained, and they were further used to build an optimal classifier with a good performance of a Matthews's correlation coefficient of 0.933. Furthermore, the most important 49 gene expression patterns were extensively analyzed. Four genes were found to be related to circadian rhythm, as reported in previous studies. As a first attempt in identifying the target genes whose expression levels are associated with sleep-wake rhythms through integrating GWAS and eQTL results, this study can motivate more investigations in this regard. This article is part of a Special Issue entitled: Accelerating Precision Medicine through Genetic and Genomic Big Data Analysis edited by Yudong Cai & Tao Huang.
Collapse
Affiliation(s)
- JiaRui Li
- College of Life Science, Shanghai University, Shanghai 200444, People's Republic of China
| | - Tao Huang
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, People's Republic of China.
| |
Collapse
|
47
|
A computational method using the random walk with restart algorithm for identifying novel epigenetic factors. Mol Genet Genomics 2017; 293:293-301. [PMID: 28932904 DOI: 10.1007/s00438-017-1374-5] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2017] [Accepted: 09/11/2017] [Indexed: 12/31/2022]
Abstract
Epigenetic regulation has long been recognized as a significant factor in various biological processes, such as development, transcriptional regulation, spermatogenesis, and chromosome stabilization. Epigenetic alterations lead to many human diseases, including cancer, depression, autism, and immune system defects. Although efforts have been made to identify epigenetic regulators, it remains a challenge to systematically uncover all the components of the epigenetic regulation in the genome level using experimental approaches. The advances of constructing protein-protein interaction (PPI) networks provide an excellent opportunity to identify novel epigenetic factors computationally in the genome level. In this study, we identified potential epigenetic factors by using a computational method that applied the random walk with restart (RWR) algorithm on a protein-protein interaction (PPI) network using reported epigenetic factors as seed nodes. False positives were identified by their specific roles in the PPI network or by a low-confidence interaction and a weak functional relationship with epigenetic regulators. After filtering out the false positives, 26 candidate epigenetic factors were finally accessed. According to previous studies, 22 of these are thought to be involved in epigenetic regulation, suggesting the robustness of our method. Our study provides a novel computational approach which successfully identified 26 potential epigenetic factors, paving the way on deepening our understandings on the epigenetic mechanism.
Collapse
|
48
|
Zhang YH, Huang T, Chen L, Xu Y, Hu Y, Hu LD, Cai Y, Kong X. Identifying and analyzing different cancer subtypes using RNA-seq data of blood platelets. Oncotarget 2017; 8:87494-87511. [PMID: 29152097 PMCID: PMC5675649 DOI: 10.18632/oncotarget.20903] [Citation(s) in RCA: 35] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2017] [Accepted: 08/16/2017] [Indexed: 12/11/2022] Open
Abstract
Detection and diagnosis of cancer are especially important for early prevention and effective treatments. Traditional methods of cancer detection are usually time-consuming and expensive. Liquid biopsy, a newly proposed noninvasive detection approach, can promote the accuracy and decrease the cost of detection according to a personalized expression profile. However, few studies have been performed to analyze this type of data, which can promote more effective methods for detection of different cancer subtypes. In this study, we applied some reliable machine learning algorithms to analyze data retrieved from patients who had one of six cancer subtypes (breast cancer, colorectal cancer, glioblastoma, hepatobiliary cancer, lung cancer and pancreatic cancer) as well as healthy persons. Quantitative gene expression profiles were used to encode each sample. Then, they were analyzed by the maximum relevance minimum redundancy method. Two feature lists were obtained in which genes were ranked rigorously. The incremental feature selection method was applied to the mRMR feature list to extract the optimal feature subset, which can be used in the support vector machine algorithm to determine the best performance for the detection of cancer subtypes and healthy controls. The ten-fold cross-validation for the constructed optimal classification model yielded an overall accuracy of 0.751. On the other hand, we extracted the top eighteen features (genes), including TTN, RHOH, RPS20, TRBC2, in another feature list, the MaxRel feature list, and performed a detailed analysis of them. The results indicated that these genes could be important biomarkers for discriminating different cancer subtypes and healthy controls.
Collapse
Affiliation(s)
- Yu-Hang Zhang
- Department of General Surgery, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai 200233, People's Republic of China.,Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Shanghai 200031, People's Republic of China
| | - Tao Huang
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Shanghai 200031, People's Republic of China
| | - Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, People's Republic of China
| | - YaoChen Xu
- Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, People's Republic of China
| | - Yu Hu
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Shanghai 200031, People's Republic of China
| | - Lan-Dian Hu
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Shanghai 200031, People's Republic of China
| | - Yudong Cai
- School of Life Sciences, Shanghai University, Shanghai 200444, People's Republic of China
| | - Xiangyin Kong
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Shanghai 200031, People's Republic of China
| |
Collapse
|
49
|
Chen L, Zhang YH, Huang G, Pan X, Wang S, Huang T, Cai YD. Discriminating cirRNAs from other lncRNAs using a hierarchical extreme learning machine (H-ELM) algorithm with feature selection. Mol Genet Genomics 2017; 293:137-149. [DOI: 10.1007/s00438-017-1372-7] [Citation(s) in RCA: 42] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2017] [Accepted: 09/07/2017] [Indexed: 12/15/2022]
|
50
|
Chen L, Zhang YH, Wang S, Zhang Y, Huang T, Cai YD. Prediction and analysis of essential genes using the enrichments of gene ontology and KEGG pathways. PLoS One 2017; 12:e0184129. [PMID: 28873455 PMCID: PMC5584762 DOI: 10.1371/journal.pone.0184129] [Citation(s) in RCA: 204] [Impact Index Per Article: 25.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2017] [Accepted: 08/18/2017] [Indexed: 12/20/2022] Open
Abstract
Identifying essential genes in a given organism is important for research on their fundamental roles in organism survival. Furthermore, if possible, uncovering the links between core functions or pathways with these essential genes will further help us obtain deep insight into the key roles of these genes. In this study, we investigated the essential and non-essential genes reported in a previous study and extracted gene ontology (GO) terms and biological pathways that are important for the determination of essential genes. Through the enrichment theory of GO and KEGG pathways, we encoded each essential/non-essential gene into a vector in which each component represented the relationship between the gene and one GO term or KEGG pathway. To analyze these relationships, the maximum relevance minimum redundancy (mRMR) was adopted. Then, the incremental feature selection (IFS) and support vector machine (SVM) were employed to extract important GO terms and KEGG pathways. A prediction model was built simultaneously using the extracted GO terms and KEGG pathways, which yielded nearly perfect performance, with a Matthews correlation coefficient of 0.951, for distinguishing essential and non-essential genes. To fully investigate the key factors influencing the fundamental roles of essential genes, the 21 most important GO terms and three KEGG pathways were analyzed in detail. In addition, several genes was provided in this study, which were predicted to be essential genes by our prediction model. We suggest that this study provides more functional and pathway information on the essential genes and provides a new way to investigate related problems.
Collapse
Affiliation(s)
- Lei Chen
- School of Life Sciences, Shanghai University, Shanghai, People’s Republic of China
- College of Information Engineering, Shanghai Maritime University, Shanghai, People’s Republic of China
| | - Yu-Hang Zhang
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, People’s Republic of China
| | - ShaoPeng Wang
- School of Life Sciences, Shanghai University, Shanghai, People’s Republic of China
| | - YunHua Zhang
- Anhui province key lab of farmland ecological conversation and pollution prevention, School of Resources and Environment, Anhui Agricultural University, Hefei, People’s Republic of China
| | - Tao Huang
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, People’s Republic of China
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, People’s Republic of China
| |
Collapse
|