1
|
Huckvale ED, Moseley HNB. A cautionary tale about properly vetting datasets used in supervised learning predicting metabolic pathway involvement. PLoS One 2024; 19:e0299583. [PMID: 38696410 PMCID: PMC11065254 DOI: 10.1371/journal.pone.0299583] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Accepted: 02/13/2024] [Indexed: 05/04/2024] Open
Abstract
The mapping of metabolite-specific data to pathways within cellular metabolism is a major data analysis step needed for biochemical interpretation. A variety of machine learning approaches, particularly deep learning approaches, have been used to predict these metabolite-to-pathway mappings, utilizing a training dataset of known metabolite-to-pathway mappings. A few such training datasets have been derived from the Kyoto Encyclopedia of Genes and Genomes (KEGG). However, several prior published machine learning approaches utilized an erroneous KEGG-derived training dataset that used SMILES molecular representations strings (KEGG-SMILES dataset) and contained a sizable proportion (~26%) duplicate entries. The presence of so many duplicates taint the training and testing sets generated from k-fold cross-validation of the KEGG-SMILES dataset. Therefore, the k-fold cross-validation performance of the resulting machine learning models was grossly inflated by the erroneous presence of these duplicate entries. Here we describe and evaluate the KEGG-SMILES dataset so that others may avoid using it. We also identify the prior publications that utilized this erroneous KEGG-SMILES dataset so their machine learning results can be properly and critically evaluated. In addition, we demonstrate the reduction of model k-fold cross-validation (CV) performance after de-duplicating the KEGG-SMILES dataset. This is a cautionary tale about properly vetting prior published benchmark datasets before using them in machine learning approaches. We hope others will avoid similar mistakes.
Collapse
Affiliation(s)
- Erik D. Huckvale
- Markey Cancer Center, University of Kentucky, Lexington, Kentucky, United States of America
| | - Hunter N. B. Moseley
- Markey Cancer Center, University of Kentucky, Lexington, Kentucky, United States of America
- Superfund Research Center, University of Kentucky, Lexington, Kentucky, United States of America
- Department of Molecular and Cellular Biochemistry, University of Kentucky, Lexington, Kentucky, United States of America
- Institute for Biomedical Informatics, University of Kentucky, Lexington, Kentucky, United States of America
| |
Collapse
|
2
|
Liu Y, Jiang Y, Zhang F, Yang Y. A Novel Multi-Scale Graph Neural Network for Metabolic Pathway Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:178-187. [PMID: 38127612 DOI: 10.1109/tcbb.2023.3345647] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2023]
Abstract
Predicting the metabolic pathway classes of compounds in the human body is an important problem in drug research and development. For this purpose, we propose a Multi-Scale Graph Neural Network framework, named MSGNN. The framework includes a subgraph encoder, a feature encoder and a global feature processor, and a graph augmentation strategy is adopted. The subgraph encoder is responsible for extracting the local structural features of the compound, the feature encoder learns the characteristics of the atoms, and the global feature processor processes the information from the pre-training model and the two molecular fingerprints, while the graph augmentation strategy is to expand the train set through a scientific and reasonable method. The experiment result illustrates that the accuracy, precision, recall and F1 metrics of MSGNN reach 98.17%, 94.18%, 94.43% and 94.30%, respectively, which is superior to the similar models we have known. In addition, the ablation experiment demonstrates the indispensability of MSGNN modules.
Collapse
|
3
|
Ibor OR, Khan EA, Arkuwe A. A bioanalytical approach for assessing the effects of soil extracts from solid waste dumpsite in Calabar (Nigeria) on lipid and estrogenic signaling of fish Poeciliopsis lucida hepatocellular carcinoma-1 cells in vitro and in vivo African catfish ( Clarias gariepinus). JOURNAL OF TOXICOLOGY AND ENVIRONMENTAL HEALTH. PART A 2023; 86:774-789. [PMID: 37504673 DOI: 10.1080/15287394.2023.2240839] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
In applying bioanalytical approaches, the aim of this study was to determine the toxicity of contaminants derived from a solid waste dumpsite in Calabar (Nigeria), by investigating the alterations of lipid and estrogen signaling pathways in Poeciliopsis lucida hepatocellular carcinoma-1 (PLHC-1) cells and compared to in vivo African catfish (Clarias gariepinus), using polar, nonpolar and elutriate extraction methods. Cells were exposed for 48 hr period to different concentrations of the contaminant extracts. The PLHC-1 cells were evaluated for lipid responses as follows adipoRed assay, retinoid x receptor (rxr), peroxisome proliferator-activated receptor isoforms (ppar-α and γ), estrogen receptor (er-α) and vitellogenin (vtg) transcripts. The lipid signaling activation was also assessed in vivo using C. gariepinus, where hepatic levels of ppar-α were determined at both transcript and functional proteins levels. Data showed variable-, extract type and concentration-specific elevations in mRNA and protein levels for lipidomic and estrogenic effects. These effects were either biphasic at low and high concentrations, depending upon extract type, or concentration-dependent elevations. In general, these toxicological responses may be attributed to soil organic and inorganic contaminants burden previously derived from the dumpsite. Thus, our data demonstrate a unique lipid and endocrine-disruptive chemical (EDC) effects of each soil extract, suggesting multiple and complex contaminant interactions in the environment and biota. Analysis of numerous soil- or sediment-bound contaminants have numerous limitations and cost implications for developing countries. Our approach provides a bioanalytical protocol and endpoints for measuring the metabolic and EDC effects of complex environmental matrices for ecotoxicological assessment and monitoring.
Collapse
Affiliation(s)
- Oju Richard Ibor
- Department of Biology, Norwegian University of Science and Technology (NTNU), Trondheim, Norway
- Department of Zoology and Environmental Biology, University of Calabar, Calabar, Nigeria
| | - Essa Ahsan Khan
- Department of Biology, Norwegian University of Science and Technology (NTNU), Trondheim, Norway
| | - Augustine Arkuwe
- Department of Biology, Norwegian University of Science and Technology (NTNU), Trondheim, Norway
| |
Collapse
|
4
|
Liu X, Yang H, Ai C, Ding Y, Guo F, Tang J. MVML-MPI: Multi-View Multi-Label Learning for Metabolic Pathway Inference. Brief Bioinform 2023; 24:bbad393. [PMID: 37930024 DOI: 10.1093/bib/bbad393] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Revised: 09/20/2023] [Accepted: 10/11/2023] [Indexed: 11/07/2023] Open
Abstract
Development of robust and effective strategies for synthesizing new compounds, drug targeting and constructing GEnome-scale Metabolic models (GEMs) requires a deep understanding of the underlying biological processes. A critical step in achieving this goal is accurately identifying the categories of pathways in which a compound participated. However, current machine learning-based methods often overlook the multifaceted nature of compounds, resulting in inaccurate pathway predictions. Therefore, we present a novel framework on Multi-View Multi-Label Learning for Metabolic Pathway Inference, hereby named MVML-MPI. First, MVML-MPI learns the distinct compound representations in parallel with corresponding compound encoders to fully extract features. Subsequently, we propose an attention-based mechanism that offers a fusion module to complement these multi-view representations. As a result, MVML-MPI accurately represents and effectively captures the complex relationship between compounds and metabolic pathways and distinguishes itself from current machine learning-based methods. In experiments conducted on the Kyoto Encyclopedia of Genes and Genomes pathways dataset, MVML-MPI outperformed state-of-the-art methods, demonstrating the superiority of MVML-MPI and its potential to utilize the field of metabolic pathway design, which can aid in optimizing drug-like compounds and facilitating the development of GEMs. The code and data underlying this article are freely available at https://github.com/guofei-tju/MVML-MPI. Contact: jtang@cse.sc.edu, guofei@csu.edu.com or wuxi_dyj@csj.uestc.edu.cn.
Collapse
Affiliation(s)
- Xiaoyi Liu
- Computer Science and Engineering, University of South Carolina, Columbia 29208, USA
| | - Hongpeng Yang
- Computer Science and Engineering, University of South Carolina, Columbia 29208, USA
| | - Chengwei Ai
- Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Yijie Ding
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, China
| | - Fei Guo
- Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Jijun Tang
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Nanshan 518055, China
| |
Collapse
|
5
|
Bao H, Zhao J, Zhao X, Zhao C, Lu X, Xu G. Prediction of plant secondary metabolic pathways using deep transfer learning. BMC Bioinformatics 2023; 24:348. [PMID: 37726702 PMCID: PMC10507959 DOI: 10.1186/s12859-023-05485-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Accepted: 09/14/2023] [Indexed: 09/21/2023] Open
Abstract
BACKGROUND Plant secondary metabolites are highly valued for their applications in pharmaceuticals, nutrition, flavors, and aesthetics. It is of great importance to elucidate plant secondary metabolic pathways due to their crucial roles in biological processes during plant growth and development. However, understanding plant biosynthesis and degradation pathways remains a challenge due to the lack of sufficient information in current databases. To address this issue, we proposed a transfer learning approach using a pre-trained hybrid deep learning architecture that combines Graph Transformer and convolutional neural network (GTC) to predict plant metabolic pathways. RESULTS GTC provides comprehensive molecular representation by extracting both structural features from the molecular graph and textual information from the SMILES string. GTC is pre-trained on the KEGG datasets to acquire general features, followed by fine-tuning on plant-derived datasets. Four metrics were chosen for model performance evaluation. The results show that GTC outperforms six other models, including three previously reported machine learning models, on the KEGG dataset. GTC yields an accuracy of 96.75%, precision of 85.14%, recall of 83.03%, and F1_score of 84.06%. Furthermore, an ablation study confirms the indispensability of all the components of the hybrid GTC model. Transfer learning is then employed to leverage the shared knowledge acquired from the KEGG metabolic pathways. As a result, the transferred GTC exhibits outstanding accuracy in predicting plant secondary metabolic pathways with an average accuracy of 98.30% in fivefold cross-validation and 97.82% on the final test. In addition, GTC is employed to classify natural products. It achieves a perfect accuracy score of 100.00% for alkaloids, while the lowest accuracy score of 98.42% for shikimates and phenylpropanoids. CONCLUSIONS The proposed GTC effectively captures molecular features, and achieves high performance in classifying KEGG metabolic pathways and predicting plant secondary metabolic pathways via transfer learning. Furthermore, GTC demonstrates its generalization ability by accurately classifying natural products. A user-friendly executable program has been developed, which only requires the input of the SMILES string of the query compound in a graphical interface.
Collapse
Affiliation(s)
- Han Bao
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, 116023, People's Republic of China
- University of Chinese Academy of Sciences, Beijing, 100049, People's Republic of China
- Liaoning Province Key Laboratory of Metabolomics, Dalian, 116023, People's Republic of China
| | - Jinhui Zhao
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, 116023, People's Republic of China
- University of Chinese Academy of Sciences, Beijing, 100049, People's Republic of China
- Liaoning Province Key Laboratory of Metabolomics, Dalian, 116023, People's Republic of China
| | - Xinjie Zhao
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, 116023, People's Republic of China
- University of Chinese Academy of Sciences, Beijing, 100049, People's Republic of China
- Liaoning Province Key Laboratory of Metabolomics, Dalian, 116023, People's Republic of China
| | - Chunxia Zhao
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, 116023, People's Republic of China
- University of Chinese Academy of Sciences, Beijing, 100049, People's Republic of China
- Liaoning Province Key Laboratory of Metabolomics, Dalian, 116023, People's Republic of China
| | - Xin Lu
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, 116023, People's Republic of China.
- University of Chinese Academy of Sciences, Beijing, 100049, People's Republic of China.
- Liaoning Province Key Laboratory of Metabolomics, Dalian, 116023, People's Republic of China.
| | - Guowang Xu
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, 116023, People's Republic of China.
- University of Chinese Academy of Sciences, Beijing, 100049, People's Republic of China.
- Liaoning Province Key Laboratory of Metabolomics, Dalian, 116023, People's Republic of China.
| |
Collapse
|
6
|
Belakhov VV. Polyfunctional Drugs: Search, Development, Use in Medical Practice, and Environmental Aspects of Preparation and Application (A Review). RUSS J GEN CHEM+ 2022. [DOI: 10.1134/s1070363222130047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/24/2023]
|
7
|
Yang J, Li Z, Wu WKK, Yu S, Xu Z, Chu Q, Zhang Q. Deep learning identifies explainable reasoning paths of mechanism of action for drug repurposing from multilayer biological network. Brief Bioinform 2022; 23:6809964. [PMID: 36347526 DOI: 10.1093/bib/bbac469] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2022] [Revised: 09/07/2022] [Accepted: 09/29/2022] [Indexed: 11/11/2022] Open
Abstract
The discovery and repurposing of drugs require a deep understanding of the mechanism of drug action (MODA). Existing computational methods mainly model MODA with the protein-protein interaction (PPI) network. However, the molecular interactions of drugs in the human body are far beyond PPIs. Additionally, the lack of interpretability of these models hinders their practicability. We propose an interpretable deep learning-based path-reasoning framework (iDPath) for drug discovery and repurposing by capturing MODA on by far the most comprehensive multilayer biological network consisting of the complex high-dimensional molecular interactions between genes, proteins and chemicals. Experiments show that iDPath outperforms state-of-the-art machine learning methods on a general drug repurposing task. Further investigations demonstrate that iDPath can identify explicit critical paths that are consistent with clinical evidence. To demonstrate the practical value of iDPath, we apply it to the identification of potential drugs for treating prostate cancer and hypertension. Results show that iDPath can discover new FDA-approved drugs. This research provides a novel interpretable artificial intelligence perspective on drug discovery.
Collapse
Affiliation(s)
- Jiannan Yang
- School of Data Science, City University of Hong Kong, Hong Kong SAR, China
| | - Zhen Li
- Department of Radiology, Tongji Hospital, Huazhong University of Science and Technology, Wuhan, China
| | - William Ka Kei Wu
- Department of Anaesthesia and Intensive Care, Chinese University of Hong Kong, Hong Kong SAR, China
| | - Shi Yu
- The USC Norris Center for Cancer Drug Development, University of Southern California, Los Angeles, CA, USA.,Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Zhongzhi Xu
- School of Data Science, City University of Hong Kong, Hong Kong SAR, China
| | - Qian Chu
- Department of Thoracic Oncology, Tongji Hospital, Huazhong University of Science and Technology, Wuhan, China
| | - Qingpeng Zhang
- School of Data Science, City University of Hong Kong, Hong Kong SAR, China
| |
Collapse
|
8
|
Yang Z, Liu J, Shah HA, Feng J. A novel hybrid framework for metabolic pathways prediction based on the graph attention network. BMC Bioinformatics 2022; 23:329. [PMID: 36171550 PMCID: PMC9520805 DOI: 10.1186/s12859-022-04856-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2022] [Accepted: 07/25/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Making clear what kinds of metabolic pathways a drug compound involves in can help researchers understand how the drug is absorbed, distributed, metabolized, and excreted. The characteristics of a compound such as structure, composition and so on directly determine the metabolic pathways it participates in. METHODS We developed a novel hybrid framework based on the graph attention network (GAT) to predict the metabolic pathway classes that a compound involves in, named HFGAT, by making use of its global and local characteristics. The framework mainly consists of a two-branch feature extracting layer and a fully connected (FC) layer. In the two-branch feature extracting layer, one branch is responsible to extract global features of the compound; and the other branch introduces a GAT consisting of two graph attention layers to extract local structural features of the compound. Both the global and the local features of the compound are then integrated into the FC layer which outputs the predicted result of metabolic pathway categories that the compound belongs to. RESULTS We compared the multi-class classification performance of HFGAT with six other representative methods, including five classic machine learning methods and one graph convolutional network (GCN) based deep learning method, on the benchmark dataset containing 6999 compounds belonging to 11 pathway categories. The results showed that the deep learning-based methods (HFGAT, GCN-based method) outperformed the traditional machine learning methods in the prediction of metabolic pathways and our proposed HFGAT method performed better than the GCN-based method. Moreover, HFGAT achieved higher [Formula: see text] scores on 8 of 11 classes than the GCN-based method. CONCLUSIONS Our proposed HFGAT makes use of both the global and local information of the compounds to predict their metabolic pathway categories and has achieved a significant performance. Compared with the GCN model, the introduction of the GAT can help our model pay more attention to substructures of the compound that are useful for the prediction task. The study provided a potential method for drug discovery with all types of metabolic reactions that may be involved in the decomposition and synthesis of pharmaceutical compounds in the organism.
Collapse
Affiliation(s)
- Zhihui Yang
- School of Computer Science, Wuhan University, Luojia Hill Street, Wuhan, 430072, China
| | - Juan Liu
- School of Computer Science, Wuhan University, Luojia Hill Street, Wuhan, 430072, China. .,Institute of Artificial Intelligence, Wuhan University, Luojia Hill Street, Wuhan, 430072, China. .,National Engineering Research Center for Multimedia Software, Luojia Hill Street, Wuhan, 430072, China.
| | - Hayat Ali Shah
- School of Computer Science, Wuhan University, Luojia Hill Street, Wuhan, 430072, China
| | - Jing Feng
- School of Computer Science, Wuhan University, Luojia Hill Street, Wuhan, 430072, China.,Institute of Artificial Intelligence, Wuhan University, Luojia Hill Street, Wuhan, 430072, China.,National Engineering Research Center for Multimedia Software, Luojia Hill Street, Wuhan, 430072, China
| |
Collapse
|
9
|
Sreenivasan AP, Harrison PJ, Schaal W, Matuszewski DJ, Kultima K, Spjuth O. Predicting protein network topology clusters from chemical structure using deep learning. J Cheminform 2022; 14:47. [PMID: 35841114 PMCID: PMC9284831 DOI: 10.1186/s13321-022-00622-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2021] [Accepted: 06/06/2022] [Indexed: 11/10/2022] Open
Abstract
Comparing chemical structures to infer protein targets and functions is a common approach, but basing comparisons on chemical similarity alone can be misleading. Here we present a methodology for predicting target protein clusters using deep neural networks. The model is trained on clusters of compounds based on similarities calculated from combined compound-protein and protein-protein interaction data using a network topology approach. We compare several deep learning architectures including both convolutional and recurrent neural networks. The best performing method, the recurrent neural network architecture MolPMoFiT, achieved an F1 score approaching 0.9 on a held-out test set of 8907 compounds. In addition, in-depth analysis on a set of eleven well-studied chemical compounds with known functions showed that predictions were justifiable for all but one of the chemicals. Four of the compounds, similar in their molecular structure but with dissimilarities in their function, revealed advantages of our method compared to using chemical similarity.
Collapse
Affiliation(s)
- Akshai P Sreenivasan
- Department of Pharmaceutical Biosciences, Uppsala University, Box 591, 75124, Uppsala, Sweden.,Department of Medical Sciences, Uppsala University, Uppsala, Sweden
| | - Philip J Harrison
- Department of Pharmaceutical Biosciences, Uppsala University, Box 591, 75124, Uppsala, Sweden
| | - Wesley Schaal
- Department of Pharmaceutical Biosciences, Uppsala University, Box 591, 75124, Uppsala, Sweden
| | - Damian J Matuszewski
- Centre for Image Analysis, Department of Information Technology, Uppsala University, Uppsala, Sweden
| | - Kim Kultima
- Department of Medical Sciences, Uppsala University, Uppsala, Sweden
| | - Ola Spjuth
- Department of Pharmaceutical Biosciences, Uppsala University, Box 591, 75124, Uppsala, Sweden.
| |
Collapse
|
10
|
Du BX, Zhao PC, Zhu B, Yiu SM, Nyamabo AK, Yu H, Shi JY. MLGL-MP: a Multi-Label Graph Learning framework enhanced by pathway interdependence for Metabolic Pathway prediction. Bioinformatics 2022; 38:i325-i332. [PMID: 35758801 PMCID: PMC9235472 DOI: 10.1093/bioinformatics/btac222] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Motivation During lead compound optimization, it is crucial to identify pathways where a drug-like compound is metabolized. Recently, machine learning-based methods have achieved inspiring progress to predict potential metabolic pathways for drug-like compounds. However, they neglect the knowledge that metabolic pathways are dependent on each other. Moreover, they are inadequate to elucidate why compounds participate in specific pathways. Results To address these issues, we propose a novel Multi-Label Graph Learning framework of Metabolic Pathway prediction boosted by pathway interdependence, called MLGL-MP, which contains a compound encoder, a pathway encoder and a multi-label predictor. The compound encoder learns compound embedding representations by graph neural networks. After constructing a pathway dependence graph by re-trained word embeddings and pathway co-occurrences, the pathway encoder learns pathway embeddings by graph convolutional networks. Moreover, after adapting the compound embedding space into the pathway embedding space, the multi-label predictor measures the proximity of two spaces to discriminate which pathways a compound participates in. The comparison with state-of-the-art methods on KEGG pathways demonstrates the superiority of our MLGL-MP. Also, the ablation studies reveal how its three components contribute to the model, including the pathway dependence, the adapter between compound embeddings and pathway embeddings, as well as the pre-training strategy. Furthermore, a case study illustrates the interpretability of MLGL-MP by indicating crucial substructures in a compound, which are significantly associated with the attending metabolic pathways. It is anticipated that this work can boost metabolic pathway predictions in drug discovery. Availability and implementation The code and data underlying this article are freely available at https://github.com/dubingxue/MLGL-MP.
Collapse
Affiliation(s)
- Bing-Xue Du
- School of Life Sciences, Northwestern Polytechnical University, Xi'an 710072, China
| | - Peng-Cheng Zhao
- School of Life Sciences, Northwestern Polytechnical University, Xi'an 710072, China
| | - Bei Zhu
- School of Life Sciences, Northwestern Polytechnical University, Xi'an 710072, China
| | - Siu-Ming Yiu
- Department of Computer Science, The University of Hong Kong, Hong Kong 999077, China
| | - Arnold K Nyamabo
- School of Computer Science, Northwestern Polytechnical University, Xi'an 710072, China
| | - Hui Yu
- School of Computer Science, Northwestern Polytechnical University, Xi'an 710072, China
| | - Jian-Yu Shi
- School of Life Sciences, Northwestern Polytechnical University, Xi'an 710072, China
| |
Collapse
|
11
|
Similarity-Based Method with Multiple-Feature Sampling for Predicting Drug Side Effects. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2022; 2022:9547317. [PMID: 35401786 PMCID: PMC8993545 DOI: 10.1155/2022/9547317] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/20/2021] [Revised: 09/18/2021] [Accepted: 03/15/2022] [Indexed: 12/23/2022]
Abstract
Drugs can treat different diseases but also bring side effects. Undetected and unaccepted side effects for approved drugs can greatly harm the human body and bring huge risks for pharmaceutical companies. Traditional experimental methods used to determine the side effects have several drawbacks, such as low efficiency and high cost. One alternative to achieve this purpose is to design computational methods. Previous studies modeled a binary classification problem by pairing drugs and side effects; however, their classifiers can only extract one feature from each type of drug association. The present work proposed a novel multiple-feature sampling scheme that can extract several features from one type of drug association. Thirteen classification algorithms were employed to construct classifiers with features yielded by such scheme. Their performance was greatly improved compared with that of the classifiers that use the features yielded by the original scheme. Best performance was observed for the classifier based on random forest with MCC of 0.8661, AUROC of 0.969, and AUPR of 0.977. Finally, one key parameter in the multiple-feature sampling scheme was analyzed.
Collapse
|
12
|
Zhao H, Zheng K, Li Y, Wang J. A novel graph attention model for predicting frequencies of drug-side effects from multi-view data. Brief Bioinform 2021; 22:6312959. [PMID: 34213525 DOI: 10.1093/bib/bbab239] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2021] [Revised: 05/30/2021] [Accepted: 06/04/2021] [Indexed: 12/15/2022] Open
Abstract
Identifying the frequencies of the drug-side effects is a very important issue in pharmacological studies and drug risk-benefit. However, designing clinical trials to determine the frequencies is usually time consuming and expensive, and most existing methods can only predict the drug-side effect existence or associations, not their frequencies. Inspired by the recent progress of graph neural networks in the recommended system, we develop a novel prediction model for drug-side effect frequencies, using a graph attention network to integrate three different types of features, including the similarity information, known drug-side effect frequency information and word embeddings. In comparison, the few available studies focusing on frequency prediction use only the known drug-side effect frequency scores. One novel approach used in this work first decomposes the feature types in drug-side effect graph to extract different view representation vectors based on three different type features, and then recombines these latent view vectors automatically to obtain unified embeddings for prediction. The proposed method demonstrates high effectiveness in 10-fold cross-validation. The computational results show that the proposed method achieves the best performance in the benchmark dataset, outperforming the state-of-the-art matrix decomposition model. In addition, some ablation experiments and visual analyses are also supplied to illustrate the usefulness of our method for the prediction of the drug-side effect frequencies. The codes of MGPred are available at https://github.com/zhc940702/MGPred and https://zenodo.org/record/4449613.
Collapse
Affiliation(s)
- Haochen Zhao
- School of Computer Science and Engineering, Central South University, Changsha 410083, China.,Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha 410083, China
| | - Kai Zheng
- School of Computer Science and Engineering, Central South University, Changsha 410083, China.,Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha 410083, China
| | - Yaohang Li
- Department of Computer Science, Old Dominion University, Norfolk, VA 23529-0001, United States
| | - Jianxin Wang
- School of Computer Science and Engineering, Central South University, Changsha 410083, China.,Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha 410083, China
| |
Collapse
|
13
|
Lopez-Ibañez J, Pazos F, Chagoyen M. Predicting biological pathways of chemical compounds with a profile-inspired approach. BMC Bioinformatics 2021; 22:320. [PMID: 34118870 PMCID: PMC8199418 DOI: 10.1186/s12859-021-04252-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2021] [Accepted: 06/09/2021] [Indexed: 01/18/2023] Open
Abstract
Background Assignment of chemical compounds to biological pathways is a crucial step to understand the relationship between the chemical repertory of an organism and its biology. Protein sequence profiles are very successful in capturing the main structural and functional features of a protein family, and can be used to assign new members to it based on matching of their sequences against these profiles. In this work, we extend this idea to chemical compounds, constructing a profile-inspired model for a set of related metabolites (those in the same biological pathway), based on a fragment-based vectorial representation of their chemical structures. Results We use this representation to predict the biological pathway of a chemical compound with good overall accuracy (AUC 0.74–0.90 depending on the database tested), and analyzed some factors that affect performance. The approach, which is compared with equivalent methods, can in addition detect those molecular fragments characteristic of a pathway. Conclusions The method is available as a graphical interactive web server http://csbg.cnb.csic.es/iFragMent. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04252-y.
Collapse
Affiliation(s)
- Javier Lopez-Ibañez
- Computational Systems Biology Group, National Center for Biotecnology (CNB-CSIC), Darwin 3, 28049, Madrid, Spain
| | - Florencio Pazos
- Computational Systems Biology Group, National Center for Biotecnology (CNB-CSIC), Darwin 3, 28049, Madrid, Spain
| | - Monica Chagoyen
- Computational Systems Biology Group, National Center for Biotecnology (CNB-CSIC), Darwin 3, 28049, Madrid, Spain.
| |
Collapse
|
14
|
Damiens A, Alebrahim MT, Léonard E, Fayeulle A, Furman C, Hilbert JL, Siah A, Billamboz M. Sesamol-based terpenoids as promising bio-sourced crop protection compounds against the wheat pathogen Zymoseptoria tritici. PEST MANAGEMENT SCIENCE 2021; 77:2403-2414. [PMID: 33415837 DOI: 10.1002/ps.6269] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/17/2020] [Revised: 01/04/2021] [Accepted: 01/08/2021] [Indexed: 06/12/2023]
Abstract
BACKGROUND Research into environmentally friendly alternatives to conventional plant protection products, to promote sustainable agriculture and healthy food, is strongly encouraged. RESULTS In this context, 20 naturally occurring terpenoids and phenolic compounds were selected and evaluated in vitro as crop protection compounds against Zymoseptoria tritici, the causal agent of Septoria tritici blotch of wheat. After selection of the most active compounds, some hemisynthetic modifications were conducted to modify their lipophilicity. These modifications led to the discovery of sesamol esters as promising antifungal agents, with IC50 around 10 μg/mL and a total absence of cytotoxicity against human cells. CONCLUSION These sesamol-based derivatives should be selected for further evaluations in planta to validate their use as wheat crop protection agents. Moreover, the importance of a balanced hydrophily/lipophilicity ratio should be further studied. © 2021 Society of Chemical Industry.
Collapse
Affiliation(s)
- Audrey Damiens
- Laboratoire de Chimie Durable et Santé, Health & Environment Department, Team Sustainable Chemistry, Ecole des Hautes Etudes d'Ingénieur (HEI), Yncréa Hauts-de-France, Lille, France
- Université de Technologie de Compiègne, ESCOM, Integrated Transformations of Renewable Matter, Centre de Recherche Royallieu, Compiègne, France
| | - Mohammad Taghi Alebrahim
- Department of Agronomy and Plant Breeding, Faculty of Agriculture and Natural Resources, University of Mohaghegh Ardabili, Ardabil, Iran
| | - Estelle Léonard
- Université de Technologie de Compiègne, ESCOM, Integrated Transformations of Renewable Matter, Centre de Recherche Royallieu, Compiègne, France
| | - Antoine Fayeulle
- Université de Technologie de Compiègne, ESCOM, Integrated Transformations of Renewable Matter, Centre de Recherche Royallieu, Compiègne, France
| | - Christophe Furman
- Université de Lille, Inserm, CHU Lille, Institut Pasteur de Lille, U1167 - RID-AGE - Facteurs de risque et déterminants moléculaires des maladies liées au vieillissement, Lille, France
- Institut de Chimie Pharmaceutique Albert Lespagnol, Lille, France
| | - Jean-Louis Hilbert
- Joint Research Unit BioEcoAgro N° 1158, Université de Lille, Université Liège, UPJV, INRAE, YNCREA, Université d'Artois, Université Littoral Côte d'Opale, ICV Institut Charles Viollette, Lille, France
| | - Ali Siah
- Agriculture and Landscape Department, Team Plant Pathology and Biocontrol, UMR-Transfrontalière N° 1158 BioEcoAgro, Yncrea Hauts-de-France, ISA, Lille, France
| | - Muriel Billamboz
- Laboratoire de Chimie Durable et Santé, Health & Environment Department, Team Sustainable Chemistry, Ecole des Hautes Etudes d'Ingénieur (HEI), Yncréa Hauts-de-France, Lille, France
- Université de Lille, Inserm, CHU Lille, Institut Pasteur de Lille, U1167 - RID-AGE - Facteurs de risque et déterminants moléculaires des maladies liées au vieillissement, Lille, France
| |
Collapse
|
15
|
Peng X, Chen L, Zhou JP. Identification of Carcinogenic Chemicals with Network Embedding and Deep Learning Methods. Curr Bioinform 2021. [DOI: 10.2174/1574893615999200414084317] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
Background:
Cancer is the second leading cause of human death in the world. To date,
many factors have been confirmed to be the cause of cancer. Among them, carcinogenic chemicals
have been widely accepted as the important ones. Traditional methods for detecting carcinogenic
chemicals are of low efficiency and high cost.
Objective:
The aim of this study was to design an efficient computational method for the
identification of carcinogenic chemicals.
Methods:
A new computational model was proposed for detecting carcinogenic chemicals. As a
data-driven model, carcinogenic and non-carcinogenic chemicals were obtained from Carcinogenic
Potency Database (CPDB). These chemicals were represented by features extracted from five
chemical networks, representing five types of chemical associations, via a network embedding
method, Mashup. Obtained features were fed into a powerful deep learning method, recurrent
neural network, to build the model.
Results:
The jackknife test on such model provided the F-measure of 0.971 and AUROC of 0.971.
Conclusion:
The proposed model was quite effective and was superior to the models with
traditional machine learning algorithms, classic chemical encoding schemes or direct usage of
chemical associations.
Collapse
Affiliation(s)
- Xuefei Peng
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| | - Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| | - Jian-Peng Zhou
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| |
Collapse
|
16
|
iMPTCE-Hnetwork: A Multilabel Classifier for Identifying Metabolic Pathway Types of Chemicals and Enzymes with a Heterogeneous Network. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2021; 2021:6683051. [PMID: 33488764 PMCID: PMC7803417 DOI: 10.1155/2021/6683051] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/19/2020] [Revised: 12/16/2020] [Accepted: 12/19/2020] [Indexed: 12/16/2022]
Abstract
Metabolic pathway is an important type of biological pathways. It produces essential molecules and energies to maintain the life of living organisms. Each metabolic pathway consists of a chain of chemical reactions, which always need enzymes to participate in. Thus, chemicals and enzymes are two major components for each metabolic pathway. Although several metabolic pathways have been uncovered, the metabolic pathway system is still far from complete. Some hidden chemicals or enzymes are not discovered in a certain metabolic pathway. Besides the traditional experiments to detect hidden chemicals or enzymes, an alternative pipeline is to design efficient computational methods. In this study, we proposed a powerful multilabel classifier, called iMPTCE-Hnetwork, to uniformly assign chemicals and enzymes to metabolic pathway types reported in KEGG. Such classifier adopted the embedding features derived from a heterogeneous network, which defined chemicals and enzymes as nodes and the interactions between chemicals and enzymes as edges, through a powerful network embedding algorithm, Mashup. The popular RAndom k-labELsets (RAKEL) algorithm was employed to construct the classifier, which incorporated the support vector machine (polynomial kernel) as the basic classifier. The ten-fold cross-validation results indicated that such a classifier had good performance with accuracy higher than 0.800 and exact match higher than 0.750. Several comparisons were done to indicate the superiority of the iMPTCE-Hnetwork.
Collapse
|
17
|
Yang L, Jiao X. Distinguishing Enzymes and Non-enzymes Based on Structural Information with an Alignment Free Approach. Curr Bioinform 2021. [DOI: 10.2174/1574893615666200324134037] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
Knowledge of protein functions is very crucial for the understanding of biological processes. Experimental methods for protein function prediction are powerless to treat the growing amount of protein sequence and structure data.
Objective:
To develop some computational techniques for the protein function prediction.
Method:
Based on the residue interaction network features and the motion mode information, an
SVM model was constructed and used as the predictor. The role of these features was analyzed
and some interesting results were obtained.
Results:
An alignment-free method for the classification of enzyme and non-enzyme is developed in this work. There is not any single feature that occupies a dominant position in the prediction process. The topological and the information-theoretic residue interaction network features have a better performance. The combination of the fast mode and the slow mode can get a better explanation for the classification result.
Conclusion:
The method proposed in this paper can act as a classifier for the enzymes and nonenzymes.
Collapse
Affiliation(s)
- Lifeng Yang
- College of Information and Computer, Taiyuan University of Technology, Taiyuan, 030600,China
| | - Xiong Jiao
- College of Biomedical Engineering, Taiyuan University of Technology, Taiyuan, 030600,China
| |
Collapse
|
18
|
Liang H, Hu B, Chen L, Wang S, Aorigele. Recognizing novel chemicals/drugs for anatomical therapeutic chemical classes with a heat diffusion algorithm. Biochim Biophys Acta Mol Basis Dis 2020; 1866:165910. [DOI: 10.1016/j.bbadis.2020.165910] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2020] [Revised: 07/20/2020] [Accepted: 08/03/2020] [Indexed: 12/14/2022]
|
19
|
Baranwal M, Magner A, Elvati P, Saldinger J, Violi A, Hero AO. A deep learning architecture for metabolic pathway prediction. Bioinformatics 2020; 36:2547-2553. [PMID: 31879763 DOI: 10.1093/bioinformatics/btz954] [Citation(s) in RCA: 45] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2019] [Revised: 12/02/2019] [Accepted: 12/22/2019] [Indexed: 01/14/2023] Open
Abstract
MOTIVATION Understanding the mechanisms and structural mappings between molecules and pathway classes are critical for design of reaction predictors for synthesizing new molecules. This article studies the problem of prediction of classes of metabolic pathways (series of chemical reactions occurring within a cell) in which a given biochemical compound participates. We apply a hybrid machine learning approach consisting of graph convolutional networks used to extract molecular shape features as input to a random forest classifier. In contrast to previously applied machine learning methods for this problem, our framework automatically extracts relevant shape features directly from input SMILES representations, which are atom-bond specifications of chemical structures composing the molecules. RESULTS Our method is capable of correctly predicting the respective metabolic pathway class of 95.16% of tested compounds, whereas competing methods only achieve an accuracy of 84.92% or less. Furthermore, our framework extends to the task of classification of compounds having mixed membership in multiple pathway classes. Our prediction accuracy for this multi-label task is 97.61%. We analyze the relative importance of various global physicochemical features to the pathway class prediction problem and show that simple linear/logistic regression models can predict the values of these global features from the shape features extracted using our framework. AVAILABILITY AND IMPLEMENTATION https://github.com/baranwa2/MetabolicPathwayPrediction. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Mayank Baranwal
- Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI 48109, USA
| | - Abram Magner
- Department of Computer Science, University at Albany, SUNY, Albany, NY 12222, USA
| | | | | | - Angela Violi
- Department of Mechanical Engineering.,Department of Chemical Engineering and Biophysics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Alfred O Hero
- Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
20
|
Zhou B, Zhao X, Lu J, Sun Z, Liu M, Zhou Y, Liu R, Wang Y. Relating Substructures and Side Effects of Drugs with Chemical-chemical Interactions. Comb Chem High Throughput Screen 2020; 23:285-294. [DOI: 10.2174/1386207322666190702102752] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2018] [Revised: 03/11/2019] [Accepted: 04/16/2019] [Indexed: 12/17/2022]
Abstract
Background:Drugs are very important for human life because they can provide treatment, cure, prevention, or diagnosis of different diseases. However, they also cause side effects, which can increase the risks for humans and pharmaceuticals companies. It is essential to identify drug side effects in drug discovery. To date, lots of computational methods have been proposed to predict the side effects of drugs and most of them used the fact that similar drugs always have similar side effects. However, previous studies did not analyze which substructures are highly related to which kind of side effect.Method:In this study, we conducted a computational investigation. In this regard, we extracted a drug set for each side effect, which consisted of drugs having the side effect. Also, for each substructure, a set was constructed by picking up drugs owing such substructure. The relationship between one side effect and one substructure was evaluated based on linkages between drugs in their corresponding drug sets, resulting in an Es value. Then, the statistical significance of Es value was measured by a permutation test.Results and Conclusion:A number of highly related pairs of side effects and substructures were obtained and some were extensively analyzed to confirm the reliability of the results reported in this study.
Collapse
Affiliation(s)
- Bo Zhou
- Shanghai University of Medicine and Health Sciences, Shanghai 201318, China
| | - Xian Zhao
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| | - Jing Lu
- School of Pharmacy, Key Laboratory of Molecular Pharmacology and Drug Evaluation (Yantai University), Ministry of Education, Collaborative Innovation Center of Advanced Drug Delivery System and Biotech Drugs in Universities of Shandong, Yantai University, Yantai 264005, China
| | - Zuntao Sun
- Informatization Office, Shanghai Maritime University, Shanghai 201306, China
| | - Min Liu
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| | - Yilu Zhou
- Biological Sciences, Faculty of Environmental and Life Sciences, University of Southampton, Southampton SO17 1BJ, United Kingdom
| | - Rongzhi Liu
- Center for Medical Device Evaluation, China Drug Administration, State Administration for Market Regulation, Beijing 100081, China
| | - Yihua Wang
- Biological Sciences, Faculty of Environmental and Life Sciences, University of Southampton, Southampton SO17 1BJ, United Kingdom
| |
Collapse
|
21
|
Che J, Chen L, Guo ZH, Wang S, Aorigele. Drug Target Group Prediction with Multiple Drug Networks. Comb Chem High Throughput Screen 2020; 23:274-284. [DOI: 10.2174/1386207322666190702103927] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2018] [Revised: 03/11/2019] [Accepted: 04/15/2019] [Indexed: 02/07/2023]
Abstract
Background:
Identification of drug-target interaction is essential in drug discovery. It is
beneficial to predict unexpected therapeutic or adverse side effects of drugs. To date, several
computational methods have been proposed to predict drug-target interactions because they are
prompt and low-cost compared with traditional wet experiments.
Methods:
In this study, we investigated this problem in a different way. According to KEGG,
drugs were classified into several groups based on their target proteins. A multi-label classification
model was presented to assign drugs into correct target groups. To make full use of the known drug
properties, five networks were constructed, each of which represented drug associations in one
property. A powerful network embedding method, Mashup, was adopted to extract drug features
from above-mentioned networks, based on which several machine learning algorithms, including
RAndom k-labELsets (RAKEL) algorithm, Label Powerset (LP) algorithm and Support Vector
Machine (SVM), were used to build the classification model.
Results and Conclusion:
Tenfold cross-validation yielded the accuracy of 0.839, exact match of
0.816 and hamming loss of 0.037, indicating good performance of the model. The contribution of
each network was also analyzed. Furthermore, the network model with multiple networks was
found to be superior to the one with a single network and classic model, indicating the superiority
of the proposed model.
Collapse
Affiliation(s)
- Jingang Che
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| | - Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| | - Zi-Han Guo
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| | - Shuaiqun Wang
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| | - Aorigele
- Faculty of Engineering, University of Toyama, Toyama, Japan
| |
Collapse
|
22
|
Prediction of Drug Side Effects with a Refined Negative Sample Selection Strategy. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2020; 2020:1573543. [PMID: 32454877 PMCID: PMC7232712 DOI: 10.1155/2020/1573543] [Citation(s) in RCA: 49] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/28/2020] [Revised: 04/14/2020] [Accepted: 04/23/2020] [Indexed: 01/07/2023]
Abstract
Drugs are an important way to treat various diseases. However, they inevitably produce side effects, bringing great risks to human bodies and pharmaceutical companies. How to predict the side effects of drugs has become one of the essential problems in drug research. Designing efficient computational methods is an alternative way. Some studies paired the drug and side effect as a sample, thereby modeling the problem as a binary classification problem. However, the selection of negative samples is a key problem in this case. In this study, a novel negative sample selection strategy was designed for accessing high-quality negative samples. Such strategy applied the random walk with restart (RWR) algorithm on a chemical-chemical interaction network to select pairs of drugs and side effects, such that drugs were less likely to have corresponding side effects, as negative samples. Through several tests with a fixed feature extraction scheme and different machine-learning algorithms, models with selected negative samples produced high performance. The best model even yielded nearly perfect performance. These models had much higher performance than those without such strategy or with another selection strategy. Furthermore, it is not necessary to consider the balance of positive and negative samples under such a strategy.
Collapse
|
23
|
Kwon S, Yoon S. End-to-End Representation Learning for Chemical-Chemical Interaction Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:1436-1447. [PMID: 30106687 DOI: 10.1109/tcbb.2018.2864149] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Chemical-chemical interaction (CCI) plays a major role in predicting candidate drugs, toxicities, therapeutic effects, and biological functions. CCI is typically inferred from a variety of information; however, CCI has yet not been predicted using a learning-based approach. In other drug analyses, deep learning has been actively used in recent years. However, in most cases, deep learning has been used only for classification even though it has feature extraction capabilities. Thus, in this paper, we propose an end-to-end representation learning method for CCI, named DeepCCI, which includes feature extraction and a learning-based approach. Our proposed architecture is based on the Siamese network. Hidden representations are extracted from a simplified molecular input line entry system (SMILES), which is a string notation representing the chemical structure using weight-shared convolutional neural networks. Subsequently, L1 element-wise distances between the two extracted hidden representations are measured. The performance of DeepCCI is compared with those of 12 fingerprint-method combinations. The proposed DeepCCI shows the best performance in most of the evaluation metrics used. In addition, DeepCCI was experimentally validated to guarantee the commutative property. The automatically extracted features can alleviate the efforts required for manual feature engineering and improve prediction performance.
Collapse
|
24
|
Jiménez J, Sabbadin D, Cuzzolin A, Martínez-Rosell G, Gora J, Manchester J, Duca J, De Fabritiis G. PathwayMap: Molecular Pathway Association with Self-Normalizing Neural Networks. J Chem Inf Model 2019; 59:1172-1181. [PMID: 30586501 DOI: 10.1021/acs.jcim.8b00711] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Drug discovery suffers from high attrition because compounds initially deemed as promising can later show ineffectiveness or toxicity resulting from a poor understanding of their activity profile. In this work, we describe a deep self-normalizing neural network model for the prediction of molecular pathway association and evaluate its performance, showing an AUC ranging from 0.69 to 0.91 on a set of compounds extracted from ChEMBL and from 0.81 to 0.83 on an external data set provided by Novartis. We finally discuss the applicability of the proposed model in the domain of lead discovery. A usable application is available via PlayMolecule.org .
Collapse
Affiliation(s)
- José Jiménez
- Computational Science Laboratory , Universitat Pompeu Fabra , Barcelona Biomedical Research Park (PRBB), Carrer del Dr. Aiguader 88 , 08003 , Barcelona , Spain
| | - Davide Sabbadin
- Computational Science Laboratory , Universitat Pompeu Fabra , Barcelona Biomedical Research Park (PRBB), Carrer del Dr. Aiguader 88 , 08003 , Barcelona , Spain
| | - Alberto Cuzzolin
- Acellera , Barcelona Biomedical Research Park (PRBB) , Carrer del Dr. Aiguader 88 , 08003 , Barcelona , Spain
| | - Gerard Martínez-Rosell
- Acellera , Barcelona Biomedical Research Park (PRBB) , Carrer del Dr. Aiguader 88 , 08003 , Barcelona , Spain
| | - Jacob Gora
- Global Discovery Chemistry , Novartis Institutes for Biomedical Research , 250 Massachusetts Avenue , Cambridge , Massachusetts 02139 , United States.,Department of Mathematics and Computer Science , Freie Universität Berlin , Takustr. 9 , 14195 Berlin , Germany
| | - John Manchester
- Global Discovery Chemistry , Novartis Institutes for Biomedical Research , 250 Massachusetts Avenue , Cambridge , Massachusetts 02139 , United States
| | - José Duca
- Global Discovery Chemistry , Novartis Institutes for Biomedical Research , 250 Massachusetts Avenue , Cambridge , Massachusetts 02139 , United States
| | - Gianni De Fabritiis
- Computational Science Laboratory , Universitat Pompeu Fabra , Barcelona Biomedical Research Park (PRBB), Carrer del Dr. Aiguader 88 , 08003 , Barcelona , Spain.,Acellera , Barcelona Biomedical Research Park (PRBB) , Carrer del Dr. Aiguader 88 , 08003 , Barcelona , Spain.,Institució Catalana de Recerca i Estudis Avançats (ICREA) , Passeig Lluis Companys 23 , 08010 Barcelona , Spain
| |
Collapse
|
25
|
Chen L, Liu T, Zhao X. Inferring anatomical therapeutic chemical (ATC) class of drugs using shortest path and random walk with restart algorithms. Biochim Biophys Acta Mol Basis Dis 2017; 1864:2228-2240. [PMID: 29247833 DOI: 10.1016/j.bbadis.2017.12.019] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2017] [Revised: 12/01/2017] [Accepted: 12/12/2017] [Indexed: 01/02/2023]
Abstract
The anatomical therapeutic chemical (ATC) classification system is a widely accepted drug classification scheme. This system comprises five levels and includes several classes in each level. Drugs are classified into classes according to their therapeutic effects and characteristics. The first level includes 14 main classes. In this study, we proposed two network-based models to infer novel potential chemicals deemed to belong in the first level of ATC classification. To build these models, two large chemical networks were constructed using the chemical-chemical interaction information retrieved from the Search Tool for Interactions of Chemicals (STITCH). Two classic network algorithms, shortest path (SP) and random walk with restart (RWR) algorithms, were executed on the corresponding network to mine novel chemicals for each ATC class using the validated drugs in a class as seed nodes. Then, the obtained chemicals yielded by these two algorithms were further evaluated by a permutation test and an association test. The former can exclude chemicals produced by the structure of the network, i.e., false positive discoveries. By contrast, the latter identifies the most important chemicals that have strong associations with the ATC class. Comparisons indicated that the two models can provide quite dissimilar results, suggesting that the results yielded by one model can be essential supplements for those obtained by the other model. In addition, several representative inferred chemicals were analyzed to confirm the reliability of the results generated by the two models. This article is part of a Special Issue entitled: Accelerating Precision Medicine through Genetic and Genomic Big Data Analysis edited by Yudong Cai & Tao Huang.
Collapse
Affiliation(s)
- Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, People's Republic of China.
| | - Tao Liu
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, People's Republic of China.
| | - Xian Zhao
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, People's Republic of China
| |
Collapse
|
26
|
Augustin AU, Katzsch F, Prior SH, Gruber T. Supramolecular layers and versatile packing modes: The solid state behavior of ortho, ortho-linked bisphenols. J Mol Struct 2017. [DOI: 10.1016/j.molstruc.2017.01.055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
27
|
The Use of Gene Ontology Term and KEGG Pathway Enrichment for Analysis of Drug Half-Life. PLoS One 2016; 11:e0165496. [PMID: 27780226 PMCID: PMC5079577 DOI: 10.1371/journal.pone.0165496] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2016] [Accepted: 10/12/2016] [Indexed: 02/07/2023] Open
Abstract
A drug's biological half-life is defined as the time required for the human body to metabolize or eliminate 50% of the initial drug dosage. Correctly measuring the half-life of a given drug is helpful for the safe and accurate usage of the drug. In this study, we investigated which gene ontology (GO) terms and biological pathways were highly related to the determination of drug half-life. The investigated drugs, with known half-lives, were analyzed based on their enrichment scores for associated GO terms and KEGG pathways. These scores indicate which GO terms or KEGG pathways the drug targets. The feature selection method, minimum redundancy maximum relevance, was used to analyze these GO terms and KEGG pathways and to identify important GO terms and pathways, such as sodium-independent organic anion transmembrane transporter activity (GO:0015347), monoamine transmembrane transporter activity (GO:0008504), negative regulation of synaptic transmission (GO:0050805), neuroactive ligand-receptor interaction (hsa04080), serotonergic synapse (hsa04726), and linoleic acid metabolism (hsa00591), among others. This analysis confirmed our results and may show evidence for a new method in studying drug half-lives and building effective computational methods for the prediction of drug half-lives.
Collapse
|
28
|
Lu J, Chen L, Yin J, Huang T, Bi Y, Kong X, Zheng M, Cai YD. Identification of new candidate drugs for lung cancer using chemical-chemical interactions, chemical-protein interactions and a K-means clustering algorithm. J Biomol Struct Dyn 2016; 34:906-17. [PMID: 26849843 DOI: 10.1080/07391102.2015.1060161] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
Lung cancer, characterized by uncontrolled cell growth in the lung tissue, is the leading cause of global cancer deaths. Until now, effective treatment of this disease is limited. Many synthetic compounds have emerged with the advancement of combinatorial chemistry. Identification of effective lung cancer candidate drug compounds among them is a great challenge. Thus, it is necessary to build effective computational methods that can assist us in selecting for potential lung cancer drug compounds. In this study, a computational method was proposed to tackle this problem. The chemical-chemical interactions and chemical-protein interactions were utilized to select candidate drug compounds that have close associations with approved lung cancer drugs and lung cancer-related genes. A permutation test and K-means clustering algorithm were employed to exclude candidate drugs with low possibilities to treat lung cancer. The final analysis suggests that the remaining drug compounds have potential anti-lung cancer activities and most of them have structural dissimilarity with approved drugs for lung cancer.
Collapse
Affiliation(s)
- Jing Lu
- a School of Pharmacy, Key Laboratory of Molecular Pharmacology and Drug Evaluation (Yantai University), Ministry of Education, Collaborative Innovation Center of Advanced Drug Delivery System and Biotech Drugs in Universities of Shandong , Yantai University , Yantai , 264005 , P.R. China
| | - Lei Chen
- b College of Information Engineering , Shanghai Maritime University , Shanghai 201306 , P.R. China
| | - Jun Yin
- b College of Information Engineering , Shanghai Maritime University , Shanghai 201306 , P.R. China
| | - Tao Huang
- c The Key Laboratory of Stem Cell Biology , Institute of Health Sciences, Shanghai Jiao Tong University School of Medicine (SJTUSM) and Shanghai Institutes for Biological Sciences (SIBS), Chinese Academy of Sciences (CAS) , Shanghai 200025 , P.R. China
| | - Yi Bi
- a School of Pharmacy, Key Laboratory of Molecular Pharmacology and Drug Evaluation (Yantai University), Ministry of Education, Collaborative Innovation Center of Advanced Drug Delivery System and Biotech Drugs in Universities of Shandong , Yantai University , Yantai , 264005 , P.R. China
| | - Xiangyin Kong
- c The Key Laboratory of Stem Cell Biology , Institute of Health Sciences, Shanghai Jiao Tong University School of Medicine (SJTUSM) and Shanghai Institutes for Biological Sciences (SIBS), Chinese Academy of Sciences (CAS) , Shanghai 200025 , P.R. China
| | - Mingyue Zheng
- d Drug Discovery and Design Center , Shanghai Institute of Materia Medica , Shanghai 201203 , P.R. China
| | - Yu-Dong Cai
- e College of Life Science , Shanghai University , Shanghai 200444 , P.R. China
| |
Collapse
|
29
|
Identification of Chemical Toxicity Using Ontology Information of Chemicals. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2015; 2015:246374. [PMID: 26508991 PMCID: PMC4609800 DOI: 10.1155/2015/246374] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/26/2015] [Revised: 03/20/2015] [Accepted: 03/22/2015] [Indexed: 12/26/2022]
Abstract
With the advance of the combinatorial chemistry, a large number of synthetic compounds have surged. However, we have limited knowledge about them. On the other hand, the speed of designing new drugs is very slow. One of the key causes is the unacceptable toxicities of chemicals. If one can correctly identify the toxicity of chemicals, the unsuitable chemicals can be discarded in early stage, thereby accelerating the study of new drugs and reducing the R&D costs. In this study, a new prediction method was built for identification of chemical toxicities, which was based on ontology information of chemicals. By comparing to a previous method, our method is quite effective. We hope that the proposed method may give new insights to study chemical toxicity and other attributes of chemicals.
Collapse
|
30
|
Identifying New Candidate Genes and Chemicals Related to Prostate Cancer Using a Hybrid Network and Shortest Path Approach. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2015; 2015:462363. [PMID: 26504486 PMCID: PMC4609422 DOI: 10.1155/2015/462363] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/24/2015] [Accepted: 02/24/2015] [Indexed: 12/26/2022]
Abstract
Prostate cancer is a type of cancer that occurs in the male prostate, a gland in the male reproductive system. Because prostate cancer cells may spread to other parts of the body and can influence human reproduction, understanding the mechanisms underlying this disease is critical for designing effective treatments. The identification of as many genes and chemicals related to prostate cancer as possible will enhance our understanding of this disease. In this study, we proposed a computational method to identify new candidate genes and chemicals based on currently known genes and chemicals related to prostate cancer by applying a shortest path approach in a hybrid network. The hybrid network was constructed according to information concerning chemical-chemical interactions, chemical-protein interactions, and protein-protein interactions. Many of the obtained genes and chemicals are associated with prostate cancer.
Collapse
|
31
|
Chen L, Chu C, Lu J, Kong X, Huang T, Cai YD. A computational method for the identification of new candidate carcinogenic and non-carcinogenic chemicals. MOLECULAR BIOSYSTEMS 2015; 11:2541-50. [PMID: 26194467 DOI: 10.1039/c5mb00276a] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Cancer is one of the leading causes of human death. Based on current knowledge, one of the causes of cancer is exposure to toxic chemical compounds, including radioactive compounds, dioxin, and arsenic. The identification of new carcinogenic chemicals may warn us of potential danger and help to identify new ways to prevent cancer. In this study, a computational method was proposed to identify potential carcinogenic chemicals, as well as non-carcinogenic chemicals. According to the current validated carcinogenic and non-carcinogenic chemicals from the CPDB (Carcinogenic Potency Database), the candidate chemicals were searched in a weighted chemical network constructed according to chemical-chemical interactions. Then, the obtained candidate chemicals were further selected by a randomization test and information on chemical interactions and structures. The analyses identified several candidate carcinogenic chemicals, while those candidates identified as non-carcinogenic were supported by a literature search. In addition, several candidate carcinogenic/non-carcinogenic chemicals exhibit structural dissimilarity with validated carcinogenic/non-carcinogenic chemicals.
Collapse
Affiliation(s)
- Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, People's Republic of China.
| | | | | | | | | | | |
Collapse
|
32
|
Chen L, Yang J, Zheng M, Kong X, Huang T, Cai YD. The Use of Chemical-Chemical Interaction and Chemical Structure to Identify New Candidate Chemicals Related to Lung Cancer. PLoS One 2015; 10:e0128696. [PMID: 26047514 PMCID: PMC4457841 DOI: 10.1371/journal.pone.0128696] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2014] [Accepted: 04/29/2015] [Indexed: 11/19/2022] Open
Abstract
Lung cancer causes over one million deaths every year worldwide. However, prevention and treatment methods for this serious disease are limited. The identification of new chemicals related to lung cancer may aid in disease prevention and the design of more effective treatments. This study employed a weighted network, constructed using chemical-chemical interaction information, to identify new chemicals related to two types of lung cancer: non-small lung cancer and small-cell lung cancer. Then, a randomization test as well as chemical-chemical interaction and chemical structure information were utilized to make further selections. A final analysis of these new chemicals in the context of the current literature indicates that several chemicals are strongly linked to lung cancer.
Collapse
Affiliation(s)
- Lei Chen
- College of Life Science, Shanghai University, Shanghai, 200444, People’s Republic of China
- College of Information Engineering, Shanghai Maritime University, Shanghai, 201306, People’s Republic of China
| | - Jing Yang
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, People’s Republic of China
| | - Mingyue Zheng
- Drug Discovery and Design Center, Shanghai Institute of Materia Medica, Shanghai, 201203, People’s Republic of China
| | - Xiangyin Kong
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, People’s Republic of China
| | - Tao Huang
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, People’s Republic of China
- * E-mail: (TH); (YDC)
| | - Yu-Dong Cai
- College of Life Science, Shanghai University, Shanghai, 200444, People’s Republic of China
- * E-mail: (TH); (YDC)
| |
Collapse
|
33
|
Chen L, Chu C, Lu J, Kong X, Huang T, Cai YD. Gene Ontology and KEGG Pathway Enrichment Analysis of a Drug Target-Based Classification System. PLoS One 2015; 10:e0126492. [PMID: 25951454 PMCID: PMC4423955 DOI: 10.1371/journal.pone.0126492] [Citation(s) in RCA: 40] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2014] [Accepted: 04/02/2015] [Indexed: 12/22/2022] Open
Abstract
Drug-target interaction (DTI) is a key aspect in pharmaceutical research. With the ever-increasing new drug data resources, computational approaches have emerged as powerful and labor-saving tools in predicting new DTIs. However, so far, most of these predictions have been based on structural similarities rather than biological relevance. In this study, we proposed for the first time a "GO and KEGG enrichment score" method to represent a certain category of drug molecules by further classification and interpretation of the DTI database. A benchmark dataset consisting of 2,015 drugs that are assigned to nine categories ((1) G protein-coupled receptors, (2) cytokine receptors, (3) nuclear receptors, (4) ion channels, (5) transporters, (6) enzymes, (7) protein kinases, (8) cellular antigens and (9) pathogens) was constructed by collecting data from KEGG. We analyzed each category and each drug for its contribution in GO terms and KEGG pathways using the popular feature selection "minimum redundancy maximum relevance (mRMR)" method, and key GO terms and KEGG pathways were extracted. Our analysis revealed the top enriched GO terms and KEGG pathways of each drug category, which were highly enriched in the literature and clinical trials. Our results provide for the first time the biological relevance among drugs, targets and biological functions, which serves as a new basis for future DTI predictions.
Collapse
Affiliation(s)
- Lei Chen
- College of Life Science, Shanghai University, Shanghai, People’s Republic of China
- College of Information Engineering, Shanghai Maritime University, Shanghai, People’s Republic of China
| | - Chen Chu
- Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, People’s Republic of China
| | - Jing Lu
- Department of Medicinal Chemistry, School of Pharmacy, Yantai University, Shandong, Yantai, People’s Republic of China
| | - Xiangyin Kong
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, People’s Republic of China
| | - Tao Huang
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, People’s Republic of China
| | - Yu-Dong Cai
- College of Life Science, Shanghai University, Shanghai, People’s Republic of China
| |
Collapse
|
34
|
Prediction of drug indications based on chemical interactions and chemical similarities. BIOMED RESEARCH INTERNATIONAL 2015; 2015:584546. [PMID: 25821813 PMCID: PMC4363546 DOI: 10.1155/2015/584546] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/09/2014] [Accepted: 09/11/2014] [Indexed: 12/13/2022]
Abstract
Discovering potential indications of novel or approved drugs is a key step in drug development. Previous computational approaches could be categorized into disease-centric and drug-centric based on the starting point of the issues or small-scaled application and large-scale application according to the diversity of the datasets. Here, a classifier has been constructed to predict the indications of a drug based on the assumption that interactive/associated drugs or drugs with similar structures are more likely to target the same diseases using a large drug indication dataset. To examine the classifier, it was conducted on a dataset with 1,573 drugs retrieved from Comprehensive Medicinal Chemistry database for five times, evaluated by 5-fold cross-validation, yielding five 1st order prediction accuracies that were all approximately 51.48%. Meanwhile, the model yielded an accuracy rate of 50.00% for the 1st order prediction by independent test on a dataset with 32 other drugs in which drug repositioning has been confirmed. Interestingly, some clinically repurposed drug indications that were not included in the datasets are successfully identified by our method. These results suggest that our method may become a useful tool to associate novel molecules with new indications or alternative indications with existing drugs.
Collapse
|
35
|
Hamdalla MA, Rajasekaran S, Grant DF, Măndoiu II. Metabolic pathway predictions for metabolomics: a molecular structure matching approach. J Chem Inf Model 2015; 55:709-18. [PMID: 25668446 DOI: 10.1021/ci500517v] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
Metabolic pathways are composed of a series of chemical reactions occurring within a cell. In each pathway, enzymes catalyze the conversion of substrates into structurally similar products. Thus, structural similarity provides a potential means for mapping newly identified biochemical compounds to known metabolic pathways. In this paper, we present TrackSM, a cheminformatics tool designed to associate a chemical compound to a known metabolic pathway based on molecular structure matching techniques. Validation experiments show that TrackSM is capable of associating 93% of tested structures to their correct KEGG pathway class and 88% to their correct individual KEGG pathway. This suggests that TrackSM may be a valuable tool to aid in associating previously unknown small molecules to known biochemical pathways and improve our ability to link metabolomics, proteomic, and genomic data sets. TrackSM is freely available at http://metabolomics.pharm.uconn.edu/?q=Software.html .
Collapse
Affiliation(s)
- Mai A Hamdalla
- ‡Computer Science Department, Helwan University, Cairo, Egypt
| | | | | | | |
Collapse
|
36
|
Bag S, Ramaiah S, Anbarasu A. fabp4 is central to eight obesity associated genes: A functional gene network-based polymorphic study. J Theor Biol 2015; 364:344-54. [DOI: 10.1016/j.jtbi.2014.09.034] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2014] [Revised: 08/27/2014] [Accepted: 09/23/2014] [Indexed: 01/04/2023]
|
37
|
Chen L, Lu J, Huang T, Yin J, Wei L, Cai YD. Finding candidate drugs for hepatitis C based on chemical-chemical and chemical-protein interactions. PLoS One 2014; 9:e107767. [PMID: 25225900 PMCID: PMC4166673 DOI: 10.1371/journal.pone.0107767] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2014] [Accepted: 08/14/2014] [Indexed: 11/18/2022] Open
Abstract
Hepatitis C virus (HCV) is an infectious virus that can cause serious illnesses. Only a few drugs have been reported to effectively treat hepatitis C. To have greater diversity in drug choice and better treatment options, it is necessary to develop more drugs to treat the infection. However, it is time-consuming and expensive to discover candidate drugs using experimental methods, and computational methods may complement experimental approaches as a preliminary filtering process. This type of approach was proposed by using known chemical-chemical interactions to extract interactive compounds with three known drug compounds of HCV, and the probabilities of these drug compounds being able to treat hepatitis C were calculated using chemical-protein interactions between the interactive compounds and HCV target genes. Moreover, the randomization test and expectation-maximization (EM) algorithm were both employed to exclude false discoveries. Analysis of the selected compounds, including acyclovir and ganciclovir, indicated that some of these compounds had potential to treat the HCV. Hopefully, this proposed method could provide new insights into the discovery of candidate drugs for the treatment of HCV and other diseases.
Collapse
Affiliation(s)
- Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai, People's Republic of China
| | - Jing Lu
- Department of Medicinal Chemistry, School of Pharmacy, Yantai University, Shandong, Yantai, People's Republic of China
| | - Tao Huang
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
| | - Jun Yin
- College of Information Engineering, Shanghai Maritime University, Shanghai, People's Republic of China
| | - Lai Wei
- College of Information Engineering, Shanghai Maritime University, Shanghai, People's Republic of China
| | - Yu-Dong Cai
- Institute of Systems Biology, Shanghai University, Shanghai, People's Republic of China
- * E-mail:
| |
Collapse
|
38
|
Chen L, Lu J, Zhang N, Huang T, Cai YD. A hybrid method for prediction and repositioning of drug Anatomical Therapeutic Chemical classes. MOLECULAR BIOSYSTEMS 2014; 10:868-77. [PMID: 24492783 DOI: 10.1039/c3mb70490d] [Citation(s) in RCA: 58] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
In the Anatomical Therapeutic Chemical (ATC) classification system, therapeutic drugs are divided into 14 main classes according to the organ or system on which they act and their chemical, pharmacological and therapeutic properties. This system, recommended by the World Health Organization (WHO), provides a global standard for classifying medical substances and serves as a tool for international drug utilization research to improve quality of drug use. In view of this, it is necessary to develop effective computational prediction methods to identify the ATC-class of a given drug, which thereby could facilitate further analysis of this system. In this study, we initiated an attempt to develop a prediction method and to gain insights from it by utilizing ontology information of drug compounds. Since only about one-fourth of drugs in the ATC classification system have ontology information, a hybrid prediction method combining the ontology information, chemical interaction information and chemical structure information of drug compounds was proposed for the prediction of drug ATC-classes. As a result, by using the Jackknife test, the 1st prediction accuracies for identifying the 14 main ATC-classes in the training dataset, the internal validation dataset and the external validation dataset were 75.90%, 75.70% and 66.36%, respectively. Analysis of some samples with false-positive predictions in the internal and external validation datasets indicated that some of them may even have a relationship with the false-positive predicted ATC-class, suggesting novel uses of these drugs. It was conceivable that the proposed method could be used as an efficient tool to identify ATC-classes of novel drugs or to discover novel uses of known drugs.
Collapse
Affiliation(s)
- Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, People's Republic of China.
| | | | | | | | | |
Collapse
|
39
|
Lu J, Huang G, Li HP, Feng KY, Chen L, Zheng MY, Cai YD. Prediction of cancer drugs by chemical-chemical interactions. PLoS One 2014; 9:e87791. [PMID: 24498372 PMCID: PMC3912061 DOI: 10.1371/journal.pone.0087791] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2013] [Accepted: 12/31/2013] [Indexed: 11/19/2022] Open
Abstract
Cancer, which is a leading cause of death worldwide, places a big burden on health-care system. In this study, an order-prediction model was built to predict a series of cancer drug indications based on chemical-chemical interactions. According to the confidence scores of their interactions, the order from the most likely cancer to the least one was obtained for each query drug. The 1(st) order prediction accuracy of the training dataset was 55.93%, evaluated by Jackknife test, while it was 55.56% and 59.09% on a validation test dataset and an independent test dataset, respectively. The proposed method outperformed a popular method based on molecular descriptors. Moreover, it was verified that some drugs were effective to the 'wrong' predicted indications, indicating that some 'wrong' drug indications were actually correct indications. Encouraged by the promising results, the method may become a useful tool to the prediction of drugs indications.
Collapse
Affiliation(s)
- Jing Lu
- Department of Medicinal Chemistry, School of Pharmacy, Yantai University, Yantai, Shandong, People’s Republic of China
| | - Guohua Huang
- Institute of Systems Biology, Shanghai University, Shanghai, People’s Republic of China
- Department of Mathematics, Shaoyang University, Shaoyang, Hunan, People’s Republic of China
| | - Hai-Peng Li
- CAS-MPG Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, People’s Republic of China
| | - Kai-Yan Feng
- Beijing Genomics Institute, Shenzhen Beishan Industrial zone, Shenzhen, People’s Republic of China
| | - Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai, People’s Republic of China
- * E-mail: (LC); (MYZ); (YDC)
| | - Ming-Yue Zheng
- State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Shanghai, People’s Republic of China
- * E-mail: (LC); (MYZ); (YDC)
| | - Yu-Dong Cai
- Institute of Systems Biology, Shanghai University, Shanghai, People’s Republic of China
- * E-mail: (LC); (MYZ); (YDC)
| |
Collapse
|
40
|
Chen L, Lu J, Luo X, Feng KY. Prediction of drug target groups based on chemical–chemical similarities and chemical–chemical/protein connections. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2014; 1844:207-13. [DOI: 10.1016/j.bbapap.2013.05.021] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/29/2012] [Revised: 05/20/2013] [Accepted: 05/22/2013] [Indexed: 10/26/2022]
|
41
|
Prediction of drugs target groups based on ChEBI ontology. BIOMED RESEARCH INTERNATIONAL 2013; 2013:132724. [PMID: 24350241 PMCID: PMC3853244 DOI: 10.1155/2013/132724] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 09/15/2013] [Accepted: 10/28/2013] [Indexed: 11/17/2022]
Abstract
Most drugs have beneficial as well as adverse effects and exert their biological functions by adjusting and altering the functions of their target proteins. Thus, knowledge of drugs target proteins is essential for the improvement of therapeutic effects and mitigation of undesirable side effects. In the study, we proposed a novel prediction method based on drug/compound ontology information extracted from ChEBI to identify drugs target groups from which the kind of functions of a drug may be deduced. By collecting data in KEGG, a benchmark dataset consisting of 876 drugs, categorized into four target groups, was constructed. To evaluate the method more thoroughly, the benchmark dataset was divided into a training dataset and an independent test dataset. It is observed by jackknife test that the overall prediction accuracy on the training dataset was 83.12%, while it was 87.50% on the test dataset-the predictor exhibited an excellent generalization. The good performance of the method indicates that the ontology information of the drugs contains rich information about their target groups, and the study may become an inspiration to solve the problems of this sort and bridge the gap between ChEBI ontology and drugs target groups.
Collapse
|
42
|
Chen L, Li BQ, Zheng MY, Zhang J, Feng KY, Cai YD. Prediction of effective drug combinations by chemical interaction, protein interaction and target enrichment of KEGG pathways. BIOMED RESEARCH INTERNATIONAL 2013; 2013:723780. [PMID: 24083237 PMCID: PMC3780555 DOI: 10.1155/2013/723780] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/08/2013] [Accepted: 07/24/2013] [Indexed: 12/11/2022]
Abstract
Drug combinatorial therapy could be more effective in treating some complex diseases than single agents due to better efficacy and reduced side effects. Although some drug combinations are being used, their underlying molecular mechanisms are still poorly understood. Therefore, it is of great interest to deduce a novel drug combination by their molecular mechanisms in a robust and rigorous way. This paper attempts to predict effective drug combinations by a combined consideration of: (1) chemical interaction between drugs, (2) protein interactions between drugs' targets, and (3) target enrichment of KEGG pathways. A benchmark dataset was constructed, consisting of 121 confirmed effective combinations and 605 random combinations. Each drug combination was represented by 465 features derived from the aforementioned three properties. Some feature selection techniques, including Minimum Redundancy Maximum Relevance and Incremental Feature Selection, were adopted to extract the key features. Random forest model was built with its performance evaluated by 5-fold cross-validation. As a result, 55 key features providing the best prediction result were selected. These important features may help to gain insights into the mechanisms of drug combinations, and the proposed prediction model could become a useful tool for screening possible drug combinations.
Collapse
Affiliation(s)
- Lei Chen
- Institute of Systems Biology, Shanghai University, Shanghai 200444, China
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
| | - Bi-Qing Li
- Key Laboratory of Systems Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Ming-Yue Zheng
- State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Shanghai 201203, China
| | - Jian Zhang
- Department of Ophthalmology, Shanghai First People's Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai 200080, China
| | - Kai-Yan Feng
- Beijing Genomics Institute, Shenzhen Beishan Industrial Zone, Shenzhen 518083, China
| | - Yu-Dong Cai
- Institute of Systems Biology, Shanghai University, Shanghai 200444, China
| |
Collapse
|
43
|
Predicting drugs side effects based on chemical-chemical interactions and protein-chemical interactions. BIOMED RESEARCH INTERNATIONAL 2013; 2013:485034. [PMID: 24078917 PMCID: PMC3776367 DOI: 10.1155/2013/485034] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/18/2013] [Accepted: 07/30/2013] [Indexed: 11/18/2022]
Abstract
A drug side effect is an undesirable effect which occurs in addition to the intended therapeutic effect of the drug. The unexpected side effects that many patients suffer from are the major causes of large-scale drug withdrawal. To address the problem, it is highly demanded by pharmaceutical industries to develop computational methods for predicting the side effects of drugs. In this study, a novel computational method was developed to predict the side effects of drug compounds by hybridizing the chemical-chemical and protein-chemical interactions. Compared to most of the previous works, our method can rank the potential side effects for any query drug according to their predicted level of risk. A training dataset and test datasets were constructed from the benchmark dataset that contains 835 drug compounds to evaluate the method. By a jackknife test on the training dataset, the 1st order prediction accuracy was 86.30%, while it was 89.16% on the test dataset. It is expected that the new method may become a useful tool for drug design, and that the findings obtained by hybridizing various interactions in a network system may provide useful insights for conducting in-depth pharmacological research as well, particularly at the level of systems biomedicine.
Collapse
|
44
|
Identifying chemicals with potential therapy of HIV based on protein-protein and protein-chemical interaction network. PLoS One 2013; 8:e65207. [PMID: 23762317 PMCID: PMC3675210 DOI: 10.1371/journal.pone.0065207] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2013] [Accepted: 04/23/2013] [Indexed: 12/27/2022] Open
Abstract
Acquired immune deficiency syndrome (AIDS) is a severe infectious disease that causes a large number of deaths every year. Traditional anti-AIDS drugs directly targeting the HIV-1 encoded enzymes including reverse transcriptase (RT), protease (PR) and integrase (IN) usually suffer from drug resistance after a period of treatment and serious side effects. In recent years, the emergence of numerous useful information of protein-protein interactions (PPI) in the HIV life cycle and related inhibitors makes PPI a new way for antiviral drug intervention. In this study, we identified 26 core human proteins involved in PPI between HIV-1 and host, that have great potential for HIV therapy. In addition, 280 chemicals that interact with three HIV drugs targeting human proteins can also interact with these 26 core proteins. All these indicate that our method as presented in this paper is quite promising. The method may become a useful tool, or at least plays a complementary role to the existing method, for identifying novel anti-HIV drugs.
Collapse
|
45
|
Chen L, Lu J, Zhang J, Feng KR, Zheng MY, Cai YD. Predicting chemical toxicity effects based on chemical-chemical interactions. PLoS One 2013; 8:e56517. [PMID: 23457578 PMCID: PMC3574107 DOI: 10.1371/journal.pone.0056517] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2012] [Accepted: 01/10/2013] [Indexed: 12/02/2022] Open
Abstract
Toxicity is a major contributor to high attrition rates of new chemical entities in drug discoveries. In this study, an order-classifier was built to predict a series of toxic effects based on data concerning chemical-chemical interactions under the assumption that interactive compounds are more likely to share similar toxicity profiles. According to their interaction confidence scores, the order from the most likely toxicity to the least was obtained for each compound. Ten test groups, each of them containing one training dataset and one test dataset, were constructed from a benchmark dataset consisting of 17,233 compounds. By a Jackknife test on each of these test groups, the 1st order prediction accuracies of the training dataset and the test dataset were all approximately 79.50%, substantially higher than the rate of 25.43% achieved by random guesses. Encouraged by the promising results, we expect that our method will become a useful tool in screening out drugs with high toxicity.
Collapse
Affiliation(s)
- Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai, China
| | - Jing Lu
- Drug Discovery and Design Center (DDDC), Shanghai Institute of Materia Medica, Shanghai, China
| | - Jian Zhang
- Department of Ophthalmology, Shanghai First People’s Hospital Affiliated to Shanghai Jiaotong University, Shanghai, China
| | - Kai-Rui Feng
- Simcyp Limited, Blades Enterprise Centre, Sheffield, United Kingdom
| | - Ming-Yue Zheng
- Drug Discovery and Design Center (DDDC), Shanghai Institute of Materia Medica, Shanghai, China
- * E-mail: (MYZ); (YDC)
| | - Yu-Dong Cai
- Institute of Systems Biology, Shanghai University, Shanghai, China
- * E-mail: (MYZ); (YDC)
| |
Collapse
|
46
|
Arukwe A, Eggen T, Möder M. Solid waste deposits as a significant source of contaminants of emerging concern to the aquatic and terrestrial environments - a developing country case study from Owerri, Nigeria. THE SCIENCE OF THE TOTAL ENVIRONMENT 2012; 438:94-102. [PMID: 22975307 DOI: 10.1016/j.scitotenv.2012.08.039] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/13/2012] [Revised: 08/06/2012] [Accepted: 08/07/2012] [Indexed: 05/02/2023]
Abstract
In developing countries, there are needs for scientific basis to sensitize communities on the problems arising from improper solid waste deposition and the acute and long-term consequences for areas receiving immobilized pollutants. In Nigeria, as in many other African countries, solid waste disposal by way of open dumping has been the only management option for such wastes. Herein, we have highlighted the challenges of solid waste deposit and management in developing countries, focusing on contaminants of emerging concern and leaching into the environment. We have analyzed sediments and run-off water samples from a solid waste dumping site in Owerri, Nigeria for organic load and compared these with data from representative world cities. Learning from previous incidents, we intend to introduce some perspective for awareness of contaminants of emerging concerns such as those with potential endocrine disrupting activities in wildlife and humans. Qualitative and quantitative data obtained by gas chromatography and mass spectrometric analysis (GC-MS) provide an overview on lipophilic and semi-polar substances released from solid waste, accumulated in sediments and transported via leachates. The chromatograms of the full scan analyses of the sediment extracts clearly point to contamination related to heavy oil. The homologous series of n-alkanes with chain lengths ranging between C16 and C30, as well as detected polyaromatic hydrocarbon (PAH) compounds such as anthracene, phenanthrene, fluoranthene and pyrene support the assumption that diesel fuel or high boiling fractions of oil are deposited on the site. Targeted quantitative analysis for selected compounds showed high concentration of substances typically released from man-made products such as plastics, textiles, household and consumer products. Phthalate, an integral component of plastic products, was the dominant compound group in all sediment samples and run-off water samples. Technical nonylphenols (mixture of isomers), metabolites of non-ionic surfactants (nonylphenol-polyethoxylates), UV-filter compound ethyl methoxy cinnamate (EHMC) and bisphenol A (BPA) were particularly determined in the sediment samples at high μg/kg dry weight concentration. Measuring contaminants in such areas will help in increasing governmental, societal and industrial awareness on the extent and seriousness of the contamination both at waste disposal sites and surrounding terrestrial and aquatic environments.
Collapse
Affiliation(s)
- Augustine Arukwe
- Department of Biology, Norwegian University of Science and Technology (NTNU), Høgskoleringen 5, 7491 Trondheim, Norway.
| | | | | |
Collapse
|
47
|
Gao YF, Chen L, Cai YD, Feng KY, Huang T, Jiang Y. Predicting metabolic pathways of small molecules and enzymes based on interaction information of chemicals and proteins. PLoS One 2012; 7:e45944. [PMID: 23029334 PMCID: PMC3448724 DOI: 10.1371/journal.pone.0045944] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2012] [Accepted: 08/23/2012] [Indexed: 12/25/2022] Open
Abstract
Metabolic pathway analysis, one of the most important fields in biochemistry, is pivotal to understanding the maintenance and modulation of the functions of an organism. Good comprehension of metabolic pathways is critical to understanding the mechanisms of some fundamental biological processes. Given a small molecule or an enzyme, how may one identify the metabolic pathways in which it may participate? Answering such a question is a first important step in understanding a metabolic pathway system. By utilizing the information provided by chemical-chemical interactions, chemical-protein interactions, and protein-protein interactions, a novel method was proposed by which to allocate small molecules and enzymes to 11 major classes of metabolic pathways. A benchmark dataset consisting of 3,348 small molecules and 654 enzymes of yeast was constructed to test the method. It was observed that the first order prediction accuracy evaluated by the jackknife test was 79.56% in identifying the small molecules and enzymes in a benchmark dataset. Our method may become a useful vehicle in predicting the metabolic pathways of small molecules and enzymes, providing a basis for some further analysis of the pathway systems.
Collapse
Affiliation(s)
- Yu-Fei Gao
- Department of Surgery, China-Japan Union Hospital of Jilin University, Changchun, China
| | - Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai, China
| | - Yu-Dong Cai
- Institute of Systems Biology, Shanghai University, Shanghai, China
| | - Kai-Yan Feng
- Beijing Genomics Institute, Shenzhen Beishan Industrial zone, Shenzhen, China
| | - Tao Huang
- Key Laboratory of Systems Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
- Shanghai Center for Bioinformation Technology, Shanghai, China
| | - Yang Jiang
- Department of Surgery, China-Japan Union Hospital of Jilin University, Changchun, China
| |
Collapse
|
48
|
Huang T, Wang J, Cai YD, Yu H, Chou KC. Hepatitis C virus network based classification of hepatocellular cirrhosis and carcinoma. PLoS One 2012; 7:e34460. [PMID: 22493692 PMCID: PMC3321022 DOI: 10.1371/journal.pone.0034460] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2011] [Accepted: 03/01/2012] [Indexed: 12/15/2022] Open
Abstract
Hepatitis C virus (HCV) is a main risk factor for liver cirrhosis and hepatocellular carcinoma, particularly to those patients with chronic liver disease or injury. The similar etiology leads to a high correlation of the patients suffering from the disease of liver cirrhosis with those suffering from the disease of hepatocellular carcinoma. However, the biological mechanism for the relationship between these two kinds of diseases is not clear. The present study was initiated in an attempt to investigate into the HCV infection protein network, in hopes to find good biomarkers for diagnosing the two diseases as well as gain insights into their progression mechanisms. To realize this, two potential biomarker pools were defined: (i) the target genes of HCV, and (ii) the between genes on the shortest paths among the target genes of HCV. Meanwhile, a predictor was developed for identifying the liver tissue samples among the following three categories: (i) normal, (ii) cirrhosis, and (iii) hepatocellular carcinoma. Interestingly, it was observed that the identification accuracy was higher with the tissue samples defined by extracting the features from the second biomarker pool than that with the samples defined based on the first biomarker pool. The identification accuracy by the jackknife validation for the between-genes approach was 0.960, indicating that the novel approach holds a quite promising potential in helping find effective biomarkers for diagnosing the liver cirrhosis disease and the hepatocellular carcinoma disease. It may also provide useful insights for in-depth study of the biological mechanisms of HCV-induced cirrhosis and hepatocellular carcinoma.
Collapse
Affiliation(s)
- Tao Huang
- Key Laboratory of Systems Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, People's Republic of China
| | | | | | | | | |
Collapse
|
49
|
Yan S, Wu G. Small Variations Between Species/Subtypes Attributed to Reassortment Evidenced from Polymerase Basic Protein 1 with Other Seven Proteins from Influenza A Virus. Transbound Emerg Dis 2012; 60:110-9. [DOI: 10.1111/j.1865-1682.2012.01323.x] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
|
50
|
Ye H, Tang K, Yang L, Cao Z, Li Y. Study of drug function based on similarity of pathway fingerprint. Protein Cell 2012; 3:132-9. [PMID: 22426982 DOI: 10.1007/s13238-012-2011-z] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2011] [Accepted: 01/04/2012] [Indexed: 02/06/2023] Open
Abstract
Drugs sharing similar therapeutic function may not bind to the same group of targets. However, their targets may be involved in similar pathway profiles which are associated with certain pathological process. In this study, pathway fingerprint was introduced to indicate the profile of significant pathways being influenced by the targets of drugs. Then drug-drug network was further constructed based on significant similarity of pathway fingerprints. In this way, the functions of a drug may be hinted by the enriched therapeutic functions of its neighboring drugs. In the test of 911 FDA approved drugs with more than one known target, 471 drugs could be connected into networks. 760 significant associations of drug-therapeutic function were generated, among which around 60% of them were supported by scientific literatures or ATC codes of drug functional classification. Therefore, pathway fingerprints may be useful to further study on the potential function of known drugs, or the unknown function of new drugs.
Collapse
Affiliation(s)
- Hao Ye
- State Key Laboratory of Bioreactor Engineering, East China University of Science & Technology, Shanghai, 200237, China
| | | | | | | | | |
Collapse
|