1
|
You Y, Lai X, Pan Y, Zheng H, Vera J, Liu S, Deng S, Zhang L. Artificial intelligence in cancer target identification and drug discovery. Signal Transduct Target Ther 2022; 7:156. [PMID: 35538061 PMCID: PMC9090746 DOI: 10.1038/s41392-022-00994-0] [Citation(s) in RCA: 80] [Impact Index Per Article: 40.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2021] [Revised: 03/14/2022] [Accepted: 04/05/2022] [Indexed: 02/08/2023] Open
Abstract
Artificial intelligence is an advanced method to identify novel anticancer targets and discover novel drugs from biology networks because the networks can effectively preserve and quantify the interaction between components of cell systems underlying human diseases such as cancer. Here, we review and discuss how to employ artificial intelligence approaches to identify novel anticancer targets and discover drugs. First, we describe the scope of artificial intelligence biology analysis for novel anticancer target investigations. Second, we review and discuss the basic principles and theory of commonly used network-based and machine learning-based artificial intelligence algorithms. Finally, we showcase the applications of artificial intelligence approaches in cancer target identification and drug discovery. Taken together, the artificial intelligence models have provided us with a quantitative framework to study the relationship between network characteristics and cancer, thereby leading to the identification of potential anticancer targets and the discovery of novel drug candidates.
Collapse
Affiliation(s)
- Yujie You
- College of Computer Science, Sichuan University, Chengdu, 610065, China
| | - Xin Lai
- Laboratory of Systems Tumor Immunology, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU) and Universitätsklinikum Erlangen, Erlangen, 91052, Germany
| | - Yi Pan
- Faculty of Computer Science and Control Engineering, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Room D513, 1068 Xueyuan Avenue, Shenzhen University Town, Shenzhen, 518055, China
| | - Huiru Zheng
- School of Computing, Ulster University, Belfast, BT15 1ED, UK
| | - Julio Vera
- Laboratory of Systems Tumor Immunology, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU) and Universitätsklinikum Erlangen, Erlangen, 91052, Germany
| | - Suran Liu
- College of Computer Science, Sichuan University, Chengdu, 610065, China
| | - Senyi Deng
- Institute of Thoracic Oncology, Department of Thoracic Surgery, West China Hospital, Sichuan University, Chengdu, 610065, China.
| | - Le Zhang
- College of Computer Science, Sichuan University, Chengdu, 610065, China.
- Key Laboratory of Systems Biology, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Hangzhou, 310024, China.
- Key Laboratory of Systems Health Science of Zhejiang Province, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou, 310024, China.
| |
Collapse
|
2
|
Nies HW, Mohamad MS, Zakaria Z, Chan WH, Remli MA, Nies YH. Enhanced Directed Random Walk for the Identification of Breast Cancer Prognostic Markers from Multiclass Expression Data. ENTROPY (BASEL, SWITZERLAND) 2021; 23:1232. [PMID: 34573857 PMCID: PMC8472068 DOI: 10.3390/e23091232] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/15/2021] [Revised: 09/14/2021] [Accepted: 09/16/2021] [Indexed: 12/12/2022]
Abstract
Artificial intelligence in healthcare can potentially identify the probability of contracting a particular disease more accurately. There are five common molecular subtypes of breast cancer: luminal A, luminal B, basal, ERBB2, and normal-like. Previous investigations showed that pathway-based microarray analysis could help in the identification of prognostic markers from gene expressions. For example, directed random walk (DRW) can infer a greater reproducibility power of the pathway activity between two classes of samples with a higher classification accuracy. However, most of the existing methods (including DRW) ignored the characteristics of different cancer subtypes and considered all of the pathways to contribute equally to the analysis. Therefore, an enhanced DRW (eDRW+) is proposed to identify breast cancer prognostic markers from multiclass expression data. An improved weight strategy using one-way ANOVA (F-test) and pathway selection based on the greatest reproducibility power is proposed in eDRW+. The experimental results show that the eDRW+ exceeds other methods in terms of AUC. Besides this, the eDRW+ identifies 294 gene markers and 45 pathway markers from the breast cancer datasets with better AUC. Therefore, the prognostic markers (pathway markers and gene markers) can identify drug targets and look for cancer subtypes with clinically distinct outcomes.
Collapse
Affiliation(s)
- Hui Wen Nies
- School of Computing, Faculty of Engineering, Universiti Teknologi Malaysia, Skudai 81310, Malaysia; (Z.Z.); (W.H.C.)
| | - Mohd Saberi Mohamad
- Health Data Science Lab, Department of Genetics and Genomics, College of Medical and Health Sciences, United Arab Emirates University, Al Ain 17666, United Arab Emirates;
| | - Zalmiyah Zakaria
- School of Computing, Faculty of Engineering, Universiti Teknologi Malaysia, Skudai 81310, Malaysia; (Z.Z.); (W.H.C.)
| | - Weng Howe Chan
- School of Computing, Faculty of Engineering, Universiti Teknologi Malaysia, Skudai 81310, Malaysia; (Z.Z.); (W.H.C.)
| | - Muhammad Akmal Remli
- Institute for Artificial Intelligence and Big Data, Universiti Malaysia Kelantan, Kota Bharu 16100, Malaysia;
| | - Yong Hui Nies
- Department of Anatomy, Faculty of Medicine, Universiti Kebangsaan Malaysia Medical Centre, Cheras, Kuala Lumpur 56000, Malaysia;
| |
Collapse
|
3
|
García del Valle EP, Lagunes García G, Prieto Santamaría L, Zanin M, Menasalvas Ruiz E, Rodríguez-González A. Disease networks and their contribution to disease understanding: A review of their evolution, techniques and data sources. J Biomed Inform 2019; 94:103206. [DOI: 10.1016/j.jbi.2019.103206] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2019] [Revised: 04/14/2019] [Accepted: 05/06/2019] [Indexed: 12/14/2022]
|
4
|
Yao J, Hurle MR, Nelson MR, Agarwal P. Predicting clinically promising therapeutic hypotheses using tensor factorization. BMC Bioinformatics 2019; 20:69. [PMID: 30736745 PMCID: PMC6368709 DOI: 10.1186/s12859-019-2664-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2018] [Accepted: 01/30/2019] [Indexed: 12/19/2022] Open
Abstract
BACKGROUND Determining which target to pursue is a challenging and error-prone first step in developing a therapeutic treatment for a disease, where missteps are potentially very costly given the long-time frames and high expenses of drug development. With current informatics technology and machine learning algorithms, it is now possible to computationally discover therapeutic hypotheses by predicting clinically promising drug targets based on the evidence associating drug targets with disease indications. We have collected this evidence from Open Targets and additional databases that covers 17 sources of evidence for target-indication association and represented the data as a tensor of 21,437 × 2211 × 17. RESULTS As a proof-of-concept, we identified examples of successes and failures of target-indication pairs in clinical trials across 875 targets and 574 disease indications to build a gold-standard data set of 6140 known clinical outcomes. We designed and executed three benchmarking strategies to examine the performance of multiple machine learning models: Logistic Regression, LASSO, Random Forest, Tensor Factorization and Gradient Boosting Machine. With 10-fold cross-validation, tensor factorization achieved AUROC = 0.82 ± 0.02 and AUPRC = 0.71 ± 0.03. Across multiple validation schemes, this was comparable or better than other methods. CONCLUSION In this work, we benchmarked a machine learning technique called tensor factorization for the problem of predicting clinical outcomes of therapeutic hypotheses. Results have shown that this method can achieve equal or better prediction performance compared with a variety of baseline models. We demonstrate one application of the method to predict outcomes of trials on novel indications of approved drug targets. This work can be expanded to targets and indications that have never been clinically tested and proposing novel target-indication hypotheses. Our proposed biologically-motivated cross-validation schemes provide insight into the robustness of the prediction performance. This has significant implications for all future methods that try to address this seminal problem in drug discovery.
Collapse
Affiliation(s)
- Jin Yao
- Computational Biology, GSK R&D, 1250 S. Collegeville Road, UP12-200, Collegeville, PA USA
| | - Mark R. Hurle
- Computational Biology, GSK R&D, 1250 S. Collegeville Road, UP12-200, Collegeville, PA USA
| | - Matthew R. Nelson
- Genetics, GSK R&D, 1250 S. Collegeville Road, UP12-200, Collegeville, PA USA
| | - Pankaj Agarwal
- Computational Biology, GSK R&D, 1250 S. Collegeville Road, UP12-200, Collegeville, PA USA
| |
Collapse
|
5
|
Finke MT, Filice RW, Kahn CE. Integrating ontologies of human diseases, phenotypes, and radiological diagnosis. J Am Med Inform Assoc 2019; 26:149-154. [PMID: 30624645 DOI: 10.1093/jamia/ocy161] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2018] [Accepted: 11/13/2018] [Indexed: 11/12/2022] Open
Abstract
Mappings between ontologies enable reuse and interoperability of biomedical knowledge. The Radiology Gamuts Ontology (RGO)-an ontology of 16 918 diseases, interventions, and imaging observations-provides a resource for differential diagnosis and automated textual report understanding in radiology. An automated process with subsequent manual review was used to identify exact and partial matches of RGO entities to the Disease Ontology (DO) and the Human Phenotype Ontology (HPO). Exact mappings identified equivalent concepts; partial mappings identified subclass and superclass relationships. A total of 7913 distinct RGO entities (46.8%) were mapped to one or both of the two target ontologies. Integration of RGO's causal knowledge resulted in 9605 axioms that expressed direct causal relationships between DO diseases and HPO phenotypic abnormalities, and allowed one to formulate queries about causal relations using the abstraction properties in those two ontologies. The mappings can be used to support automated diagnostic reasoning, data mining, and knowledge discovery.
Collapse
Affiliation(s)
- Michael T Finke
- Pacific Northwest University of Health Sciences, Yakima, WA, USA
| | - Ross W Filice
- Department of Radiology, MedStar Georgetown University Hospital, Washington, DC, USA
| | - Charles E Kahn
- Department of Radiology and Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
6
|
Mishra B, Kumar N, Mukhtar MS. Systems Biology and Machine Learning in Plant-Pathogen Interactions. MOLECULAR PLANT-MICROBE INTERACTIONS : MPMI 2019; 32:45-55. [PMID: 30418085 DOI: 10.1094/mpmi-08-18-0221-fi] [Citation(s) in RCA: 45] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/02/2023]
Abstract
Systems biology is an inclusive approach to study the static and dynamic emergent properties on a global scale by integrating multiomics datasets to establish qualitative and quantitative associations among multiple biological components. With an abundance of improved high throughput -omics datasets, network-based analyses and machine learning technologies are playing a pivotal role in comprehensive understanding of biological systems. Network topological features reveal most important nodes within a network as well as prioritize significant molecular components for diverse biological networks, including coexpression, protein-protein interaction, and gene regulatory networks. Machine learning techniques provide enormous predictive power through specific feature extraction from biological data. Deep learning, a subtype of machine learning, has plausible future applications because a domain expert for feature extraction is not needed in this algorithm. Inspired by diverse domains of biology, we here review classic systems biology techniques applied in plant immunity thus far. We also discuss additional advanced approaches in both graph theory and machine learning, which may provide new insights for understanding plant-microbe interactions. Finally, we propose a hybrid approach in plant immune systems that harnesses the power of both network biology and machine learning, with a potential to be applicable to both model systems and agronomically important crop plants.
Collapse
Affiliation(s)
| | | | - M Shahid Mukhtar
- 1 Department of Biology, and
- 2 Nutrition Obesity Research Center, University of Alabama at Birmingham, 1300 University Blvd., Birmingham 35294, U.S.A
| |
Collapse
|
7
|
Li XX, Yin J, Tang J, Li Y, Yang Q, Xiao Z, Zhang R, Wang Y, Hong J, Tao L, Xue W, Zhu F. Determining the Balance Between Drug Efficacy and Safety by the Network and Biological System Profile of Its Therapeutic Target. Front Pharmacol 2018; 9:1245. [PMID: 30429792 PMCID: PMC6220079 DOI: 10.3389/fphar.2018.01245] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2018] [Accepted: 10/12/2018] [Indexed: 12/14/2022] Open
Abstract
One of the most challenging puzzles in drug discovery is the identification and characterization of candidate drug of well-balanced profile between efficacy and safety. So far, extensive efforts have been made to evaluate this balance by estimating the quantitative structure–therapeutic relationship and exploring target profile of adverse drug reaction. Particularly, the therapeutic index (TI) has emerged as a key indicator illustrating this delicate balance, and a clinically successful agent requires a sufficient TI suitable for it corresponding indication. However, the TI information are largely unknown for most drugs, and the mechanism underlying the drugs with narrow TI (NTI drugs) is still elusive. In this study, the collective effects of human protein–protein interaction (PPI) network and biological system profile on the drugs' efficacy–safety balance were systematically evaluated. First, a comprehensive literature review of the FDA approved drugs confirmed their NTI status. Second, a popular feature selection algorithm based on artificial intelligence (AI) was adopted to identify key factors differencing the target mechanism between NTI and non-NTI drugs. Finally, this work revealed that the targets of NTI drugs were highly centralized and connected in human PPI network, and the number of similarity proteins and affiliated signaling pathways of the corresponding targets was much higher than those of non-NTI drugs. These findings together with the newly discovered features or feature groups clarified the key factors indicating drug's narrow TI, and could thus provide a novel direction for determining the delicate drug efficacy-safety balance.
Collapse
Affiliation(s)
- Xiao Xu Li
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China.,School of Pharmaceutical Sciences and Collaborative Innovation Center for Brain Science, Chongqing University, Chongqing, China
| | - Jiayi Yin
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Jing Tang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China.,School of Pharmaceutical Sciences and Collaborative Innovation Center for Brain Science, Chongqing University, Chongqing, China
| | - Yinghong Li
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China.,School of Pharmaceutical Sciences and Collaborative Innovation Center for Brain Science, Chongqing University, Chongqing, China
| | - Qingxia Yang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China.,School of Pharmaceutical Sciences and Collaborative Innovation Center for Brain Science, Chongqing University, Chongqing, China
| | - Ziyu Xiao
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Runyuan Zhang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Yunxia Wang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Jiajun Hong
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Lin Tao
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicine of Zhejiang Province, School of Medicine, Hangzhou Normal University, Hangzhou, China
| | - Weiwei Xue
- School of Pharmaceutical Sciences and Collaborative Innovation Center for Brain Science, Chongqing University, Chongqing, China
| | - Feng Zhu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China.,School of Pharmaceutical Sciences and Collaborative Innovation Center for Brain Science, Chongqing University, Chongqing, China
| |
Collapse
|
8
|
Hao T, Wang Q, Zhao L, Wu D, Wang E, Sun J. Analyzing of Molecular Networks for Human Diseases and Drug Discovery. Curr Top Med Chem 2018; 18:1007-1014. [PMID: 30101711 PMCID: PMC6174636 DOI: 10.2174/1568026618666180813143408] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2017] [Revised: 06/22/2018] [Accepted: 07/03/2018] [Indexed: 01/11/2023]
Abstract
Molecular networks represent the interactions and relations of genes/proteins, and also encode molecular mechanisms of biological processes, development and diseases. Among the molecular networks, protein-protein Interaction Networks (PINs) have become effective platforms for uncovering the molecular mechanisms of diseases and drug discovery. PINs have been constructed for various organisms and utilized to solve many biological problems. In human, most proteins present their complex functions by interactions with other proteins, and the sum of these interactions represents the human protein interactome. Especially in the research on human disease and drugs, as an emerging tool, the PIN provides a platform to systematically explore the molecular complexities of specific diseases and the references for drug design. In this review, we summarized the commonly used approaches to aid disease research and drug discovery with PINs, including the network topological analysis, identification of novel pathways, drug targets and sub-network biomarkers for diseases. With the development of bioinformatic techniques and biological networks, PINs will play an increasingly important role in human disease research and drug discovery.
Collapse
Affiliation(s)
- Tong Hao
- Tianjin Key Laboratory of Animal and Plant Resistance/College of Life Sciences, Tianjin Normal University, Tianjin 300387, China
| | - Qian Wang
- Tianjin Key Laboratory of Animal and Plant Resistance/College of Life Sciences, Tianjin Normal University, Tianjin 300387, China
| | - Lingxuan Zhao
- Tianjin Key Laboratory of Animal and Plant Resistance/College of Life Sciences, Tianjin Normal University, Tianjin 300387, China
| | - Dan Wu
- Tianjin Key Laboratory of Animal and Plant Resistance/College of Life Sciences, Tianjin Normal University, Tianjin 300387, China
| | - Edwin Wang
- Tianjin Key Laboratory of Animal and Plant Resistance/College of Life Sciences, Tianjin Normal University, Tianjin 300387, China.,University of Calgary Cumming School of Medicine, Calgary, Alberta T2N 4Z6, Canada
| | - Jinsheng Sun
- Tianjin Key Laboratory of Animal and Plant Resistance/College of Life Sciences, Tianjin Normal University, Tianjin 300387, China.,Tianjin Bohai Fisheries Research Institute, Tianjin 300221, China
| |
Collapse
|
9
|
Suratanee A, Plaimas K. Network-based association analysis to infer new disease-gene relationships using large-scale protein interactions. PLoS One 2018; 13:e0199435. [PMID: 29949603 PMCID: PMC6021074 DOI: 10.1371/journal.pone.0199435] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2018] [Accepted: 06/07/2018] [Indexed: 01/02/2023] Open
Abstract
Protein-protein interactions integrated with disease-gene associations represent important information for revealing protein functions under disease conditions to improve the prevention, diagnosis, and treatment of complex diseases. Although several studies have attempted to identify disease-gene associations, the number of possible disease-gene associations is very small. High-throughput technologies have been established experimentally to identify the association between genes and diseases. However, these techniques are still quite expensive, time consuming, and even difficult to perform. Thus, based on currently available data and knowledge, computational methods have served as alternatives to provide more possible associations to increase our understanding of disease mechanisms. Here, a new network-based algorithm, namely, Disease-Gene Association (DGA), was developed to calculate the association score of a query gene to a new possible set of diseases. First, a large-scale protein interaction network was constructed, and the relationship between two interacting proteins was calculated with regard to the disease relationship. Novel plausible disease-gene pairs were identified and statistically scored by our algorithm using neighboring protein information. The results yielded high performance for disease-gene prediction, with an F-measure of 0.78 and an AUC of 0.86. To identify promising candidates of disease-gene associations, the association coverage of genes and diseases were calculated and used with the association score to perform gene and disease selection. Based on gene selection, we identified promising pairs that exhibited evidence related to several important diseases, e.g., inflammation, lipid metabolism, inborn errors, xanthomatosis, cerebellar ataxia, cognitive deterioration, malignant neoplasms of the skin and malignant tumors of the cervix. Focusing on disease selection, we identified target genes that were important to blistering skin diseases and muscular dystrophy. In summary, our developed algorithm is simple, efficiently identifies disease–gene associations in the protein-protein interaction network and provides additional knowledge regarding disease-gene associations. This method can be generalized to other association studies to further advance biomedical science.
Collapse
Affiliation(s)
- Apichat Suratanee
- Department of Mathematics, Faculty of Applied Science, King Mongkut’s University of Technology North Bangkok, Bangkok, Thailand
- * E-mail: (AS); (KP)
| | - Kitiporn Plaimas
- Advanced Virtual and Intelligent Computing (AVIC) Center, Department of Mathematics and Computer Science, Faculty of Science, Chulalongkorn University, Bangkok, Thailand
- * E-mail: (AS); (KP)
| |
Collapse
|
10
|
Rouillard AD, Hurle MR, Agarwal P. Systematic interrogation of diverse Omic data reveals interpretable, robust, and generalizable transcriptomic features of clinically successful therapeutic targets. PLoS Comput Biol 2018; 14:e1006142. [PMID: 29782487 PMCID: PMC5983857 DOI: 10.1371/journal.pcbi.1006142] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2017] [Revised: 06/01/2018] [Accepted: 04/13/2018] [Indexed: 11/19/2022] Open
Abstract
Target selection is the first and pivotal step in drug discovery. An incorrect choice may not manifest itself for many years after hundreds of millions of research dollars have been spent. We collected a set of 332 targets that succeeded or failed in phase III clinical trials, and explored whether Omic features describing the target genes could predict clinical success. We obtained features from the recently published comprehensive resource: Harmonizome. Nineteen features appeared to be significantly correlated with phase III clinical trial outcomes, but only 4 passed validation schemes that used bootstrapping or modified permutation tests to assess feature robustness and generalizability while accounting for target class selection bias. We also used classifiers to perform multivariate feature selection and found that classifiers with a single feature performed as well in cross-validation as classifiers with more features (AUROC = 0.57 and AUPR = 0.81). The two predominantly selected features were mean mRNA expression across tissues and standard deviation of expression across tissues, where successful targets tended to have lower mean expression and higher expression variance than failed targets. This finding supports the conventional wisdom that it is favorable for a target to be present in the tissue(s) affected by a disease and absent from other tissues. Overall, our results suggest that it is feasible to construct a model integrating interpretable target features to inform target selection. We anticipate deeper insights and better models in the future, as researchers can reuse the data we have provided to improve methods for handling sample biases and learn more informative features. Code, documentation, and data for this study have been deposited on GitHub at https://github.com/arouillard/omic-features-successful-targets.
Collapse
Affiliation(s)
| | - Mark R. Hurle
- Computational Biology, GSK, Collegeville, PA, United States of America
| | - Pankaj Agarwal
- Computational Biology, GSK, Collegeville, PA, United States of America
| |
Collapse
|
11
|
Bai B, Xie B, Pan Z, Shan L, Zhao J, Zhu H. Identification of candidate genes and long non-coding RNAs associated with the effect of ATP5J in colorectal cancer. Int J Oncol 2018; 52:1129-1138. [PMID: 29484395 PMCID: PMC5843394 DOI: 10.3892/ijo.2018.4281] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2017] [Accepted: 02/15/2018] [Indexed: 12/27/2022] Open
Abstract
The incidence and development of colorectal cancer (CRC) is a process with multiple gene interactions. We have previously demonstrated that ATP synthase-coupling factor 6, mitochondrial (ATP5J) is associated with CRC migration and 5-fluorouracil resistance; nevertheless, the exact molecular mechanism remains unclear. The following study uses microarray and bioinformatics methods to identify candidate genes and long non-coding RNAs (lncRNAs) in CRC cells (two pairs) with upregulated and downregulated ATP5J. Briefly, a total of 2,190 differentially expressed mRNAs (DEmRNAs) were sorted. Reverse transcription-quantitative polymerase chain reaction (RT-qPCR) was performed for 4 DEmRNAs to validate the results of microarray analysis. Functional annotation and pathway enrichment were analyzed for DEmRNAs using the Database for Annotation, Visualization and Integrated Discovery. Significantly enriched pathways included the regulation of gene expression and cell growth. The protein-protein interaction network was constructed, and AKT serine/threonine kinase 2 (AKT2) was considered as one of the hub genes. For further analysis, 51 DEmRNAs and 30 DElncRNAs were selected that were positively or negatively associated with the expression of ATP5J in the two cell pairs. X-inactive specific transcript (XIST), premature ovarian failure 1B (POF1B) and calmin (CLMN) were identified in the DEmRNA-DElncRNA co-expression network. The expression of AKT2 and XIST in CRC cells was confirmed by RT-qPCR. To sum up, the candidate genes and lncRNAs, as well as potential signaling pathways, which were identified using integrated bioinformatics analysis, could improve the understanding of molecular events involved in the function of ATP5J in CRC.
Collapse
Affiliation(s)
- Bingjun Bai
- Department of Colorectal Surgery, Sir Run Run Shaw Hospital, School of Medicine, Zhejiang University, Zhejiang 310016, P.R. China
| | - Binbin Xie
- Department of Medical Oncology, Sir Run Run Shaw Hospital, School of Medicine, Zhejiang University, Zhejiang 310016, P.R. China
| | - Zongyou Pan
- Department of Sports Medicine, School of Medicine, Zhejiang University, Hangzhou, Zhejiang 310016, P.R. China
| | - Lina Shan
- Department of Colorectal Surgery, Sir Run Run Shaw Hospital, School of Medicine, Zhejiang University, Zhejiang 310016, P.R. China
| | - Jianpei Zhao
- Department of Colorectal Surgery, No. 2 Hospital of Ningbo, Ningbo, Zhejiang 315010, P.R. China
| | - Hongbo Zhu
- Department of Colorectal Surgery, Sir Run Run Shaw Hospital, School of Medicine, Zhejiang University, Zhejiang 310016, P.R. China
| |
Collapse
|
12
|
Chang NW, Dai HJ, Shih YY, Wu CY, Dela Rosa MAC, Obena RP, Chen YJ, Hsu WL, Oyang YJ. Biomarker identification of hepatocellular carcinoma using a methodical literature mining strategy. Database (Oxford) 2017; 2017:bax082. [PMID: 31725857 PMCID: PMC7243925 DOI: 10.1093/database/bax082] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2016] [Revised: 10/11/2017] [Accepted: 10/11/2017] [Indexed: 12/31/2022]
Abstract
Hepatocellular carcinoma (HCC), one of the most common causes of cancer-related deaths, carries a 5-year survival rate of 18%, underscoring the need for robust biomarkers. In spite of the increased availability of HCC related literatures, many of the promising biomarkers reported have not been validated for clinical use. To narrow down the wide range of possible biomarkers for further clinical validation, bioinformaticians need to sort them out using information provided in published works. Biomedical text mining is an automated way to obtain information of interest within the massive collection of biomedical knowledge, thus enabling extraction of data for biomarkers associated with certain diseases. This method can significantly reduce both the time and effort spent on studying important maladies such as liver diseases. Herein, we report a text mining-aided curation pipeline to identify potential biomarkers for liver cancer. The curation pipeline integrates PubMed E-Utilities to collect abstracts from PubMed and recognize several types of named entities by machine learning-based and pattern-based methods. Genes/proteins from evidential sentences were classified as candidate biomarkers using a convolutional neural network. Lastly, extracted biomarkers were ranked depending on several criteria, such as the frequency of keywords and articles and the journal impact factor, and then integrated into a meaningful list for bioinformaticians. Based on the developed pipeline, we constructed MarkerHub, which contains 2128 candidate biomarkers extracted from PubMed publications from 2008 to 2017. Database URL: http://markerhub.iis.sinica.edu.tw.
Collapse
Affiliation(s)
- Nai-Wen Chang
- Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taipei, Taiwan
- Institute of Information Science, Academia Sinica, Taipei, Taiwan
| | - Hong-Jie Dai
- Department of Computer Science and Information Engineering, National Taitung University, Taitung, Taiwan
- Interdisciplinary Program of Green and Information Technology, National Taitung University, Taitung, Taiwan
| | - Yung-Yu Shih
- Institute of Information Science, Academia Sinica, Taipei, Taiwan
| | - Chi-Yang Wu
- Institute of Information Science, Academia Sinica, Taipei, Taiwan
| | | | | | - Yu-Ju Chen
- Institute of Chemistry, Academia Sinica, Taipei, Taiwan
| | - Wen-Lian Hsu
- Institute of Information Science, Academia Sinica, Taipei, Taiwan
| | - Yen-Jen Oyang
- Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taipei, Taiwan
| |
Collapse
|
13
|
Fluck J, Madan S, Ansari S, Kodamullil AT, Karki R, Rastegar-Mojarad M, Catlett NL, Hayes W, Szostak J, Hoeng J, Peitsch M. Training and evaluation corpora for the extraction of causal relationships encoded in biological expression language (BEL). DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2016; 2016:baw113. [PMID: 27554092 PMCID: PMC4995071 DOI: 10.1093/database/baw113] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/23/2015] [Accepted: 07/07/2016] [Indexed: 01/21/2023]
Abstract
Success in extracting biological relationships is mainly dependent on the complexity of the task as well as the availability of high-quality training data. Here, we describe the new corpora in the systems biology modeling language BEL for training and testing biological relationship extraction systems that we prepared for the BioCreative V BEL track. BEL was designed to capture relationships not only between proteins or chemicals, but also complex events such as biological processes or disease states. A BEL nanopub is the smallest unit of information and represents a biological relationship with its provenance. In BEL relationships (called BEL statements), the entities are normalized to defined namespaces mainly derived from public repositories, such as sequence databases, MeSH or publicly available ontologies. In the BEL nanopubs, the BEL statements are associated with citation information and supportive evidence such as a text excerpt. To enable the training of extraction tools, we prepared BEL resources and made them available to the community. We selected a subset of these resources focusing on a reduced set of namespaces, namely, human and mouse genes, ChEBI chemicals, MeSH diseases and GO biological processes, as well as relationship types ‘increases’ and ‘decreases’. The published training corpus contains 11 000 BEL statements from over 6000 supportive text excerpts. For method evaluation, we selected and re-annotated two smaller subcorpora containing 100 text excerpts. For this re-annotation, the inter-annotator agreement was measured by the BEL track evaluation environment and resulted in a maximal F-score of 91.18% for full statement agreement. In addition, for a set of 100 BEL statements, we do not only provide the gold standard expert annotations, but also text excerpts pre-selected by two automated systems. Those text excerpts were evaluated and manually annotated as true or false supportive in the course of the BioCreative V BEL track task. Database URL:http://wiki.openbel.org/display/BIOC/Datasets
Collapse
Affiliation(s)
- Juliane Fluck
- Fraunhofer Institute for Algorithms and Scientific Computing, Schloss Birlinghoven, Sankt Augustin, Germany
| | - Sumit Madan
- Fraunhofer Institute for Algorithms and Scientific Computing, Schloss Birlinghoven, Sankt Augustin, Germany
| | - Sam Ansari
- Philip Morris International R&D, Philip Morris Products S.A, Quai Jeanrenaud 5, Neuchâtel, 2000, Switzerland
| | - Alpha T Kodamullil
- Fraunhofer Institute for Algorithms and Scientific Computing, Schloss Birlinghoven, Sankt Augustin, Germany
| | - Reagon Karki
- Fraunhofer Institute for Algorithms and Scientific Computing, Schloss Birlinghoven, Sankt Augustin, Germany
| | | | | | - William Hayes
- Selventa, One Alewife Center, Cambridge, MA 02140, USA
| | - Justyna Szostak
- Philip Morris International R&D, Philip Morris Products S.A, Quai Jeanrenaud 5, Neuchâtel, 2000, Switzerland
| | - Julia Hoeng
- Philip Morris International R&D, Philip Morris Products S.A, Quai Jeanrenaud 5, Neuchâtel, 2000, Switzerland
| | - Manuel Peitsch
- Philip Morris International R&D, Philip Morris Products S.A, Quai Jeanrenaud 5, Neuchâtel, 2000, Switzerland
| |
Collapse
|
14
|
Huang CH, Chang PMH, Hsu CW, Huang CYF, Ng KL. Drug repositioning for non-small cell lung cancer by using machine learning algorithms and topological graph theory. BMC Bioinformatics 2016; 17 Suppl 1:2. [PMID: 26817825 PMCID: PMC4895785 DOI: 10.1186/s12859-015-0845-0] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Background Non-small cell lung cancer (NSCLC) is one of the leading causes of death globally, and research into NSCLC has been accumulating steadily over several years. Drug repositioning is the current trend in the pharmaceutical industry for identifying potential new uses for existing drugs and accelerating the development process of drugs, as well as reducing side effects. Results This work integrates two approaches - machine learning algorithms and topological parameter-based classification - to develop a novel pipeline of drug repositioning to analyze four lung cancer microarray datasets, enriched biological processes, potential therapeutic drugs and targeted genes for NSCLC treatments. A total of 7 (8) and 11 (12) promising drugs (targeted genes) were discovered for treating early- and late-stage NSCLC, respectively. The effectiveness of these drugs is supported by the literature, experimentally determined in-vitro IC50 and clinical trials. This work provides better drug prediction accuracy than competitive research according to IC50 measurements. Conclusions With the novel pipeline of drug repositioning, the discovery of enriched pathways and potential drugs related to NSCLC can provide insight into the key regulators of tumorigenesis and the treatment of NSCLC. Based on the verified effectiveness of the targeted drugs predicted by this pipeline, we suggest that our drug-finding pipeline is effective for repositioning drugs.
Collapse
Affiliation(s)
- Chien-Hung Huang
- Department of Computer Science and Information Engineering, National Formosa University, Hu-Wei, 63205, Taiwan.
| | - Peter Mu-Hsin Chang
- Division of Hematology and Oncology, Department of Medicine, Taipei Veterans General Hospital; Faculty of Medicine, National Yang Ming University, Taipei, 112, Taiwan.
| | - Chia-Wei Hsu
- Department of Computer Science and Information Engineering, National Formosa University, Hu-Wei, 63205, Taiwan.
| | - Chi-Ying F Huang
- Institute of Biopharmaceutical Sciences, National Yang-Ming University, Taipei, 112, Taiwan.
| | - Ka-Lok Ng
- Department of Bioinformatics and Medical Engineering, Asia University, Taichung, 41354, Taiwan. .,Department of Medical Research, China Medical University Hospital, China Medical University, Taichung, 40402, Taiwan.
| |
Collapse
|
15
|
Affiliation(s)
- Ju Han Kim
- Division of Biomedical Informatics, Systems Biomedical Informatics Research Center Seoul National University College of Medicine, 103 Daehak-ro, Jongno-gu, Seoul 110-799, Republic of Korea
| |
Collapse
|