1
|
Nguyen T, Campbell A, Kumar A, Amponsah E, Fiterau M, Shahriyari L. Optimal fusion of genotype and drug embeddings in predicting cancer drug response. Brief Bioinform 2024; 25:bbae227. [PMID: 38754407 PMCID: PMC11097979 DOI: 10.1093/bib/bbae227] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2023] [Revised: 04/14/2024] [Accepted: 04/25/2024] [Indexed: 05/18/2024] Open
Abstract
Predicting cancer drug response using both genomics and drug features has shown some success compared to using genomics features alone. However, there has been limited research done on how best to combine or fuse the two types of features. Using a visible neural network with two deep learning branches for genes and drug features as the base architecture, we experimented with different fusion functions and fusion points. Our experiments show that injecting multiplicative relationships between gene and drug latent features into the original concatenation-based architecture DrugCell significantly improved the overall predictive performance and outperformed other baseline models. We also show that different fusion methods respond differently to different fusion points, indicating that the relationship between drug features and different hierarchical biological level of gene features is optimally captured using different methods. Considering both predictive performance and runtime speed, tensor product partial is the best-performing fusion function to combine late-stage representations of drug and gene features to predict cancer drug response.
Collapse
Affiliation(s)
- Trang Nguyen
- Department of Computer Science, University of Massachusetts Amherst, Amherst 01002, MA, United States
| | - Anthony Campbell
- Department of Computer Science, University of Massachusetts Amherst, Amherst 01002, MA, United States
| | - Ankit Kumar
- Department of Mathematics and Statistics, University of Massachusetts Amherst, Amherst 01002, MA, United States
| | - Edwin Amponsah
- Department of Mathematics and Statistics, University of Massachusetts Amherst, Amherst 01002, MA, United States
| | - Madalina Fiterau
- Department of Computer Science, University of Massachusetts Amherst, Amherst 01002, MA, United States
| | - Leili Shahriyari
- Department of Mathematics and Statistics, University of Massachusetts Amherst, Amherst 01002, MA, United States
| |
Collapse
|
2
|
Ogunleye A, Piyawajanusorn C, Ghislat G, Ballester PJ. Large-Scale Machine Learning Analysis Reveals DNA Methylation and Gene Expression Response Signatures for Gemcitabine-Treated Pancreatic Cancer. HEALTH DATA SCIENCE 2024; 4:0108. [PMID: 38486621 PMCID: PMC10904073 DOI: 10.34133/hds.0108] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/15/2023] [Accepted: 12/08/2023] [Indexed: 03/17/2024]
Abstract
Background: Gemcitabine is a first-line chemotherapy for pancreatic adenocarcinoma (PAAD), but many PAAD patients do not respond to gemcitabine-containing treatments. Being able to predict such nonresponders would hence permit the undelayed administration of more promising treatments while sparing gemcitabine life-threatening side effects for those patients. Unfortunately, the few predictors of PAAD patient response to this drug are weak, none of them exploiting yet the power of machine learning (ML). Methods: Here, we applied ML to predict the response of PAAD patients to gemcitabine from the molecular profiles of their tumors. More concretely, we collected diverse molecular profiles of PAAD patient tumors along with the corresponding clinical data (gemcitabine responses and clinical features) from the Genomic Data Commons resource. From systematically combining 8 tumor profiles with 16 classification algorithms, each of the resulting 128 ML models was evaluated by multiple 10-fold cross-validations. Results: Only 7 of these 128 models were predictive, which underlines the importance of carrying out such a large-scale analysis to avoid missing the most predictive models. These were here random forest using 4 selected mRNAs [0.44 Matthews correlation coefficient (MCC), 0.785 receiver operating characteristic-area under the curve (ROC-AUC)] and XGBoost combining 12 DNA methylation probes (0.32 MCC, 0.697 ROC-AUC). By contrast, the hENT1 marker obtained much worse random-level performance (practically 0 MCC, 0.5 ROC-AUC). Despite not being trained to predict prognosis (overall and progression-free survival), these ML models were also able to anticipate this patient outcome. Conclusions: We release these promising ML models so that they can be evaluated prospectively on other gemcitabine-treated PAAD patients.
Collapse
Affiliation(s)
- Adeolu Ogunleye
- Department of Organismal Biology,
Uppsala University, Uppsala, Sweden
| | | | - Ghita Ghislat
- Department of Life Sciences,
Imperial College London, London, UK
| | | |
Collapse
|
3
|
Lin Z, Shen H, Liu X, Ma W, Wang M, Ruan J, Yu H, Ma S, Sun X. Recent advances of artificial intelligence in melanoma clinical practice. Melanoma Res 2023; 33:454-461. [PMID: 37696256 DOI: 10.1097/cmr.0000000000000922] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/13/2023]
Abstract
Skin melanoma is a lethal cancer. The incidence of melanoma is increasing rapidly in all regions of the world. Despite significant breakthroughs in melanoma treatment in recent years, precise diagnosis of melanoma is still a challenge in some cases. Even specialized physicians may need time and effort to make accurate judgments. As artificial intelligence (AI) technology advances into medical practice, it may bring new solutions to this problem based on its efficiency, accuracy, and speed. This paper summarizes the recent progress of AI in melanoma-related applications, including melanoma diagnosis and classification, the discovery of new medication, guiding treatment, and prognostic assessment. The paper also compares the effectiveness of various algorithms in melanoma application and suggests future research directions for AI in melanoma clinical practice.
Collapse
Affiliation(s)
- Zijun Lin
- Guangdong Provincial Key Laboratory of Medical Molecular Diagnostics, The First Dongguan Affiliated Hospital, Guangdong Medical University
- Institute of Aging Research, School of Medical Technology, Guangdong Medical University
| | - Haoyan Shen
- School of Biomedical Engineering, Guangdong Medical University
| | - Xinguang Liu
- Guangdong Provincial Key Laboratory of Medical Molecular Diagnostics, The First Dongguan Affiliated Hospital, Guangdong Medical University
- Institute of Aging Research, School of Medical Technology, Guangdong Medical University
| | - Wanrui Ma
- Department of General Medicine, The First Dongguan Affiliated Hospital, Guangdong Medical University, Dongguan
| | - Mingfa Wang
- Department of Pathology, The Second Affiliated Hospital of Hainan Medical University, Haikou
| | - Jie Ruan
- Institute of Aging Research, School of Medical Technology, Guangdong Medical University
| | - Hongbin Yu
- Guangdong Provincial Key Laboratory of Medical Molecular Diagnostics, Chinese American Tumor Institute, Guangdong Medical University, Dongguan, China
| | - Sha Ma
- School of Biomedical Engineering, Guangdong Medical University
| | - Xuerong Sun
- Guangdong Provincial Key Laboratory of Medical Molecular Diagnostics, The First Dongguan Affiliated Hospital, Guangdong Medical University
- Institute of Aging Research, School of Medical Technology, Guangdong Medical University
| |
Collapse
|
4
|
Zhan Y, Guo J, Philip Chen CL, Meng XB. iBT-Net: an incremental broad transformer network for cancer drug response prediction. Brief Bioinform 2023:bbad256. [PMID: 37429577 DOI: 10.1093/bib/bbad256] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Revised: 05/30/2023] [Accepted: 06/15/2023] [Indexed: 07/12/2023] Open
Abstract
In modern precision medicine, it is an important research topic to predict cancer drug response. Due to incomplete chemical structures and complex gene features, however, it is an ongoing work to design efficient data-driven methods for predicting drug response. Moreover, since the clinical data cannot be easily obtained all at once, the data-driven methods may require relearning when new data are available, resulting in increased time consumption and cost. To address these issues, an incremental broad Transformer network (iBT-Net) is proposed for cancer drug response prediction. Different from the gene expression features learning from cancer cell lines, structural features are further extracted from drugs by Transformer. Broad learning system is then designed to integrate the learned gene features and structural features of drugs to predict the response. With the capability of incremental learning, the proposed method can further use new data to improve its prediction performance without retraining totally. Experiments and comparison studies demonstrate the effectiveness and superiority of iBT-Net under different experimental configurations and continuous data learning.
Collapse
Affiliation(s)
- Yongkang Zhan
- School of Computer Science & Engineering,South China University of Technology, 510006, China
| | - Jifeng Guo
- School of Computer Science & Engineering,South China University of Technology, 510006, China
| | - C L Philip Chen
- School of Computer Science & Engineering,South China University of Technology, 510006, China
- Brain and Affective Cognitive Research Center, Pazhou Lab, 510335, China
| | - Xian-Bing Meng
- School of Electromechanical Engineering, Guangdong University of Technology, 510006, China
| |
Collapse
|
5
|
Singh DP, Kaushik B. CTDN (Convolutional Temporal Based Deep- Neural Network): An Improvised Stacked Hybrid Computational Approach for Anticancer Drug Response Prediction. Comput Biol Chem 2023; 105:107868. [PMID: 37257399 DOI: 10.1016/j.compbiolchem.2023.107868] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2023] [Revised: 03/31/2023] [Accepted: 04/04/2023] [Indexed: 06/02/2023]
Abstract
The characterization of drug - metabolizing enzymes is a significant problem for customized therapy. It is important to choose the right drugs for cancer victims, and the ability to forecast how those drugs will react is usually based on the available information, genetic sequence, and structural properties. To the finest of our knowledge, this is the first study to evaluate optimization algorithms for selection of features and pharmacogenetics categorization using classification methods based on a successful evolutionary algorithm using datasets from the Cancer Cell Line Encyclopaedia (CCLE) and Genomics of Drug Sensitivity in Cancer (GDSC). The study proposes the uses of Firefly and Grey Wolf Optimization techniques for feature extraction, while comparing the traditional Machine Learning (ML), ensemble ML and Stacking Algorithm with the proposed Convolutional Temporal Deep Neural Network or CTDN. With the potential to increase efficiency from the suggested intelligible classifier model for a suggestive chemotherapeutic drugs response prediction, our study is important in particular for selecting an acceptable feature selection method. The comparison analysis demonstrates that the proposed model not only surpasses the prior state-of-the-art methods, but also uses Grey Wolf and Fire Fly Optimization to lessen multicollinearity and overfitting.
Collapse
Affiliation(s)
- Davinder Paul Singh
- School of Computer Science and Engineering, Shri Mata Vaishno Devi University, Katra 182320, Jammu and Kashmir, India.
| | - Baijnath Kaushik
- School of Computer Science and Engineering, Shri Mata Vaishno Devi University, Katra 182320, Jammu and Kashmir, India
| |
Collapse
|
6
|
Su W, Xie XQ, Liu XW, Gao D, Ma CY, Zulfiqar H, Yang H, Lin H, Yu XL, Li YW. iRNA-ac4C: A novel computational method for effectively detecting N4-acetylcytidine sites in human mRNA. Int J Biol Macromol 2023; 227:1174-1181. [PMID: 36470433 DOI: 10.1016/j.ijbiomac.2022.11.299] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2022] [Revised: 11/10/2022] [Accepted: 11/25/2022] [Indexed: 12/07/2022]
Abstract
RNA N4-acetylcytidine (ac4C) is the acetylation of cytidine at the nitrogen-4 position, which is a highly conserved RNA modification and involves a variety of biological processes. Hence, accurate identification of genome-wide ac4C sites is vital for understanding regulation mechanism of gene expression. In this work, a novel predictor, named iRNA-ac4C, was established to identify ac4C sites in human mRNA based on three feature extraction methods, including nucleotide composition, nucleotide chemical property, and accumulated nucleotide frequency. Subsequently, minimum-Redundancy-Maximum-Relevance combined with incremental feature selection strategies was utilized to select the optimal feature subset. According to the optimal feature subset, the best ac4C classification model was trained by gradient boosting decision tree with 10-fold cross-validation. The results of independent testing set indicated that our proposed method could produce encouraging generalization capabilities. For the convenience of other researchers, we established a user-friendly web server which is freely available at http://lin-group.cn/server/iRNA-ac4C/. We hope that the tool could provide guide for wet-experimental scholars.
Collapse
Affiliation(s)
- Wei Su
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 611731, China
| | - Xue-Qin Xie
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 611731, China
| | - Xiao-Wei Liu
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 611731, China
| | - Dong Gao
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 611731, China
| | - Cai-Yi Ma
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 611731, China
| | - Hasan Zulfiqar
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 611731, China
| | - Hui Yang
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 611731, China
| | - Hao Lin
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 611731, China.
| | - Xiao-Long Yu
- School of Materials Science and Engineering, Hainan University, Haikou 570228, China.
| | - Yan-Wen Li
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China; Key Laboratory of Intelligent Information Processing of Jilin Province, Northeast Normal University, Changchun 130117, China; Institute of Computational Biology, Northeast Normal University, Changchun 130117, China.
| |
Collapse
|
7
|
Singh DP, Kaushik B. A systematic literature review for the prediction of anticancer drug response using various machine-learning and deep-learning techniques. Chem Biol Drug Des 2023; 101:175-194. [PMID: 36303299 DOI: 10.1111/cbdd.14164] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Revised: 10/13/2022] [Accepted: 10/24/2022] [Indexed: 12/24/2022]
Abstract
Computational methods have gained prominence in healthcare research. The accessibility of healthcare data has greatly incited academicians and researchers to develop executions that help in prognosis of cancer drug response. Among various computational methods, machine-learning (ML) and deep-learning (DL) methods provide the most consistent and effectual approaches to handle the serious aftermaths of the deadly disease and drug administered to the patients. Hence, this systematic literature review has reviewed researches that have investigated drug discovery and prognosis of anticancer drug response using ML and DL algorithms. Fot this purpose, PRISMA guidelines have been followed to choose research papers from Google Scholar, PubMed, and Sciencedirect websites. A total count of 105 papers that align with the context of this review were chosen. Further, the review also presents accuracy of the existing ML and DL methods in the prediction of anticancer drug response. It has been found from the review that, amidst the availability of various studies, there are certain challenges associated with each method. Thus, future researchers can consider these limitations and challenges to develop a prominent anticancer drug response prediction method, and it would be greatly beneficial to the medical professionals in administering non-invasive treatment to the patients.
Collapse
Affiliation(s)
- Davinder Paul Singh
- School of Computer Science and Engineering, Shri Mata Vaishno Devi University, Katra, Jammu and Kashmir, India
| | - Baijnath Kaushik
- School of Computer Science and Engineering, Shri Mata Vaishno Devi University, Katra, Jammu and Kashmir, India
| |
Collapse
|
8
|
Zhao D, Wang L, Chen Z, Zhang L, Xu L. KRAS is a prognostic biomarker associated with diagnosis and treatment in multiple cancers. Front Genet 2022; 13:1024920. [PMID: 36330448 PMCID: PMC9624065 DOI: 10.3389/fgene.2022.1024920] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2022] [Accepted: 09/20/2022] [Indexed: 11/21/2022] Open
Abstract
KRAS encodes K-Ras proteins, which take part in the MAPK pathway. The expression level of KRAS is high in tumor patients. Our study compared KRAS expression levels between 33 kinds of tumor tissues. Additionally, we studied the association of KRAS expression levels with diagnostic and prognostic values, clinicopathological features, and tumor immunity. We established 22 immune-infiltrating cell expression datasets to calculate immune and stromal scores to evaluate the tumor microenvironment. KRAS genes, immune check-point genes and interacting genes were selected to construct the PPI network. We selected 79 immune checkpoint genes and interacting related genes to calculate the correlation. Based on the 33 tumor expression datasets, we conducted GSEA (genome set enrichment analysis) to show the KRAS and other co-expressed genes associated with cancers. KRAS may be a reliable prognostic biomarker in the diagnosis of cancer patients and has the potential to be included in cancer-targeted drugs.
Collapse
Affiliation(s)
- Da Zhao
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- School of food and drug, Shenzhen Polytechnic, Shenzhen, China
| | - Lizhuang Wang
- Beidahuang Industry Group General Hospital, Harbin, China
| | - Zheng Chen
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- School of food and drug, Shenzhen Polytechnic, Shenzhen, China
| | - Lijun Zhang
- School of food and drug, Shenzhen Polytechnic, Shenzhen, China
| | - Lei Xu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen, China
- *Correspondence: Lei Xu,
| |
Collapse
|
9
|
From single-omics to interactomics: How can ligand-induced perturbations modulate single-cell phenotypes? ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2022; 131:45-83. [PMID: 35871896 DOI: 10.1016/bs.apcsb.2022.05.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
Cells suffer from perturbations by different stimuli, which, consequently, rise to individual alterations in their profile and function that may end up affecting the tissue as a whole. This is no different if we consider the effect of a therapeutic agent on a biological system. As cells are exposed to external ligands their profile can change at different single-omics levels. Detecting how these changes take place through different sequencing technologies is key to a better understanding of the effects of therapeutic agents. Single-cell RNA-sequencing stands out as one of the most common approaches for cell profiling and perturbation analysis. As a result, single-cell transcriptomics data can be integrated with other omics data sources, such as proteomics and epigenomics data, to clarify the perturbation effects and mechanism at the cell level. Appropriate computational tools are key to process and integrate the available information. This chapter focuses on the recent advances on ligand-induced perturbation and single-cell omics computational tools and algorithms, their current limitations, and how the deluge of data can be used to improve the current process of drug research and development.
Collapse
|
10
|
Wang XS, Lee S, Zhang H, Tang G, Wang Y. An integral genomic signature approach for tailored cancer therapy using genome-wide sequencing data. Nat Commun 2022; 13:2936. [PMID: 35618721 PMCID: PMC9135729 DOI: 10.1038/s41467-022-30449-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2021] [Accepted: 04/29/2022] [Indexed: 11/19/2022] Open
Abstract
Low-cost multi-omics sequencing is expected to become clinical routine and transform precision oncology. Viable computational methods that can facilitate tailored intervention while tolerating sequencing biases are in high demand. Here we propose a class of transparent and interpretable computational methods called integral genomic signature (iGenSig) analyses, that address the challenges of cross-dataset modeling through leveraging information redundancies within high-dimensional genomic features, averaging feature weights to prevent overweighing, and extracting unbiased genomic information from large tumor cohorts. Using genomic dataset of chemical perturbations, we develop a battery of iGenSig models for predicting cancer drug responses, and validate the models using independent cell-line and clinical datasets. The iGenSig models for five drugs demonstrate predictive values in six clinical studies, among which the Erlotinib and 5-FU models significantly predict therapeutic responses in three studies, offering clinically relevant insights into their inverse predictive signature pathways. Together, iGenSig provides a computational framework to facilitate tailored cancer therapy based on multi-omics data.
Collapse
Affiliation(s)
- Xiao-Song Wang
- UPMC Hillman Cancer Center, University of Pittsburgh, Pittsburgh, PA, 15213, USA.
- Department of Pathology, University of Pittsburgh, Pittsburgh, PA, 15213, USA.
| | - Sanghoon Lee
- UPMC Hillman Cancer Center, University of Pittsburgh, Pittsburgh, PA, 15213, USA
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, 15206, USA
| | - Han Zhang
- UPMC Hillman Cancer Center, University of Pittsburgh, Pittsburgh, PA, 15213, USA
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, 15206, USA
| | - Gong Tang
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA, 15261, USA
| | - Yue Wang
- UPMC Hillman Cancer Center, University of Pittsburgh, Pittsburgh, PA, 15213, USA
- Department of Pathology, University of Pittsburgh, Pittsburgh, PA, 15213, USA
| |
Collapse
|
11
|
Identifying and Classifying Enhancers by Dinucleotide-Based Auto-Cross Covariance and Attention-Based Bi-LSTM. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2022; 2022:7518779. [PMID: 35422876 PMCID: PMC9005296 DOI: 10.1155/2022/7518779] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/20/2021] [Accepted: 03/12/2022] [Indexed: 11/17/2022]
Abstract
Enhancers are a class of noncoding DNA elements located near structural genes. In recent years, their identification and classification have been the focus of research in the field of bioinformatics. However, due to their high free scattering and position variability, although the performance of the prediction model has been continuously improved, there is still a lot of room for progress. In this paper, density-based spatial clustering of applications with noise (DBSCAN) was used to screen the physicochemical properties of dinucleotides to extract dinucleotide-based auto-cross covariance (DACC) features; then, the features are reduced by feature selection Python toolkit MRMD 2.0. The reduced features are input into the random forest to identify enhancers. The enhancer classification model was built by word2vec and attention-based Bi-LSTM. Finally, the accuracies of our enhancer identification and classification models were 77.25% and 73.50%, respectively, and the Matthews’ correlation coefficients (MCCs) were 0.5470 and 0.4881, respectively, which were better than the performance of most predictors.
Collapse
|
12
|
Shi H, Li S, Su X. Plant6mA: a predictor for predicting N6-methyladenine sites with lightweight structure in plant genomes. Methods 2022; 204:126-131. [DOI: 10.1016/j.ymeth.2022.02.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2022] [Revised: 02/20/2022] [Accepted: 02/24/2022] [Indexed: 10/19/2022] Open
|
13
|
Zhao Z, Yang W, Zhai Y, Liang Y, Zhao Y. Identify DNA-Binding Proteins Through the Extreme Gradient Boosting Algorithm. Front Genet 2022; 12:821996. [PMID: 35154264 PMCID: PMC8837382 DOI: 10.3389/fgene.2021.821996] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2021] [Accepted: 12/07/2021] [Indexed: 12/13/2022] Open
Abstract
The exploration of DNA-binding proteins (DBPs) is an important aspect of studying biological life activities. Research on life activities requires the support of scientific research results on DBPs. The decline in many life activities is closely related to DBPs. Generally, the detection method for identifying DBPs is achieved through biochemical experiments. This method is inefficient and requires considerable manpower, material resources and time. At present, several computational approaches have been developed to detect DBPs, among which machine learning (ML) algorithm-based computational techniques have shown excellent performance. In our experiments, our method uses fewer features and simpler recognition methods than other methods and simultaneously obtains satisfactory results. First, we use six feature extraction methods to extract sequence features from the same group of DBPs. Then, this feature information is spliced together, and the data are standardized. Finally, the extreme gradient boosting (XGBoost) model is used to construct an effective predictive model. Compared with other excellent methods, our proposed method has achieved better results. The accuracy achieved by our method is 78.26% for PDB2272 and 85.48% for PDB186. The accuracy of the experimental results achieved by our strategy is similar to that of previous detection methods.
Collapse
Affiliation(s)
- Ziye Zhao
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
| | - Wen Yang
- International Medical Center, Shenzhen University General Hospital, Shenzhen, China
| | - Yixiao Zhai
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
| | - Yingjian Liang
- Department of Obstetrics and Gynecology, The First Affiliated Hospital of Harbin Medical University, Harbin, China
- *Correspondence: Yingjian Liang, ; Yuming Zhao,
| | - Yuming Zhao
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
- *Correspondence: Yingjian Liang, ; Yuming Zhao,
| |
Collapse
|
14
|
Mahajan RA, Shaikh NK, Tikhe TB, Vyas R, Chavan SM. Hybrid Sea Lion Crow Search Algorithm-Based Stacked Autoencoder for Drug Sensitivity Prediction From Cancer Cell Lines. INTERNATIONAL JOURNAL OF SWARM INTELLIGENCE RESEARCH 2022. [DOI: 10.4018/ijsir.304723] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Cancer is the most dreadful diseases across world and providing better therapy to cancer patients is still remains as a major challenging task due to drug resistance of tumor cells. This paper proposes a Sea Lion Crow Search Algorithm (SLCSA) for drug sensitivity prediction. The drug sensitivity from cultured cell lines is predicted using stacked autoencoder and proposed SLCSA is derived by combination of Sea Lion Optimization (SLnO) and Crow Search Algorithm (CSA).The implemented approach has offered superior results with maximum value of testing accuracy for normal are 0.920, leukemia is 0.920, NSCLC is 0.912, and urogenital is 0.914.
Collapse
|
15
|
Chen Y, Juan L, Lv X, Shi L. Bioinformatics Research on Drug Sensitivity Prediction. Front Pharmacol 2021; 12:799712. [PMID: 34955863 PMCID: PMC8696280 DOI: 10.3389/fphar.2021.799712] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2021] [Accepted: 11/18/2021] [Indexed: 11/28/2022] Open
Abstract
Modeling-based anti-cancer drug sensitivity prediction has been extensively studied in recent years. While most drug sensitivity prediction models only use gene expression data, the remarkable impacts of gene mutation, methylation, and copy number variation on drug sensitivity are neglected. Drug sensitivity prediction can both help protect patients from some adverse drug reactions and improve the efficacy of treatment. Genomics data are extremely useful for drug sensitivity prediction task. This article reviews the role of drug sensitivity prediction, describes a variety of methods for predicting drug sensitivity. Moreover, the research significance of drug sensitivity prediction, as well as existing problems are well discussed.
Collapse
Affiliation(s)
- Yaojia Chen
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Liran Juan
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Xiao Lv
- Beidahuang Industry Group General Hospital, Harbin, China
| | - Lei Shi
- Department of Spine Surgery Changzheng Hospital, Naval Medical University, Shanghai, China
| |
Collapse
|
16
|
Guo Y, Ju Y, Chen D, Wang L. Research on the Computational Prediction of Essential Genes. Front Cell Dev Biol 2021; 9:803608. [PMID: 34938741 PMCID: PMC8685449 DOI: 10.3389/fcell.2021.803608] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2021] [Accepted: 11/22/2021] [Indexed: 11/19/2022] Open
Abstract
Genes, the nucleotide sequences that encode a polypeptide chain or functional RNA, are the basic genetic unit controlling biological traits. They are the guarantee of the basic structures and functions in organisms, and they store information related to biological factors and processes such as blood type, gestation, growth, and apoptosis. The environment and genetics jointly affect important physiological processes such as reproduction, cell division, and protein synthesis. Genes are related to a wide range of phenomena including growth, decline, illness, aging, and death. During the evolution of organisms, there is a class of genes that exist in a conserved form in multiple species. These genes are often located on the dominant strand of DNA and tend to have higher expression levels. The protein encoded by it usually either performs very important functions or is responsible for maintaining and repairing these essential functions. Such genes are called persistent genes. Among them, the irreplaceable part of the body’s life activities is the essential gene. For example, when starch is the only source of energy, the genes related to starch digestion are essential genes. Without them, the organism will die because it cannot obtain enough energy to maintain basic functions. The function of the proteins encoded by these genes is thought to be fundamental to life. Nowadays, DNA can be extracted from blood, saliva, or tissue cells for genetic testing, and detailed genetic information can be obtained using the most advanced scientific instruments and technologies. The information gained from genetic testing is useful to assess the potential risks of disease, and to help determine the prognosis and development of diseases. Such information is also useful for developing personalized medication and providing targeted health guidance to improve the quality of life. Therefore, it is of great theoretical and practical significance to identify important and essential genes. In this paper, the research status of essential genes and the essential genome database of bacteria are reviewed, the computational prediction method of essential genes based on communication coding theory is expounded, and the significance and practical application value of essential genes are discussed.
Collapse
Affiliation(s)
- Yuxin Guo
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China.,Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China.,Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou, China.,School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Ying Ju
- School of Informatics, Xiamen University, Xiamen, China
| | - Dong Chen
- College of Electrical and Information Engineering, Quzhou University, Quzhou, China
| | - Lihong Wang
- Beidahuang Industry Group General Hospital, Harbin, China
| |
Collapse
|
17
|
Liu X, Song C, Huang F, Fu H, Xiao W, Zhang W. GraphCDR: a graph neural network method with contrastive learning for cancer drug response prediction. Brief Bioinform 2021; 23:6415314. [PMID: 34727569 DOI: 10.1093/bib/bbab457] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2021] [Revised: 09/25/2021] [Accepted: 10/07/2021] [Indexed: 12/29/2022] Open
Abstract
Predicting the response of a cancer cell line to a therapeutic drug is an important topic in modern oncology that can help personalized treatment for cancers. Although numerous machine learning methods have been developed for cancer drug response (CDR) prediction, integrating diverse information about cancer cell lines, drugs and their known responses still remains a great challenge. In this paper, we propose a graph neural network method with contrastive learning for CDR prediction. GraphCDR constructs a graph neural network based on multi-omics profiles of cancer cell lines, the chemical structure of drugs and known cancer cell line-drug responses for CDR prediction, while a contrastive learning task is presented as a regularizer within a multi-task learning paradigm to enhance the generalization ability. In the computational experiments, GraphCDR outperforms state-of-the-art methods under different experimental configurations, and the ablation study reveals the key components of GraphCDR: biological features, known cancer cell line-drug responses and contrastive learning are important for the high-accuracy CDR prediction. The experimental analyses imply the predictive power of GraphCDR and its potential value in guiding anti-cancer drug selection.
Collapse
Affiliation(s)
- Xuan Liu
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Congzhi Song
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Feng Huang
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Haitao Fu
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Wenjie Xiao
- Information School, University of Washington, Washington, 98105, USA
| | - Wen Zhang
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| |
Collapse
|
18
|
Jiao S, Zou Q, Guo H, Shi L. iTTCA-RF: a random forest predictor for tumor T cell antigens. J Transl Med 2021; 19:449. [PMID: 34706730 PMCID: PMC8554859 DOI: 10.1186/s12967-021-03084-x] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2021] [Accepted: 09/16/2021] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND Cancer is one of the most serious diseases threatening human health. Cancer immunotherapy represents the most promising treatment strategy due to its high efficacy and selectivity and lower side effects compared with traditional treatment. The identification of tumor T cell antigens is one of the most important tasks for antitumor vaccines development and molecular function investigation. Although several machine learning predictors have been developed to identify tumor T cell antigen, more accurate tumor T cell antigen identification by existing methodology is still challenging. METHODS In this study, we used a non-redundant dataset of 592 tumor T cell antigens (positive samples) and 393 tumor T cell antigens (negative samples). Four types feature encoding methods have been studied to build an efficient predictor, including amino acid composition, global protein sequence descriptors and grouped amino acid and peptide composition. To improve the feature representation ability of the hybrid features, we further employed a two-step feature selection technique to search for the optimal feature subset. The final prediction model was constructed using random forest algorithm. RESULTS Finally, the top 263 informative features were selected to train the random forest classifier for detecting tumor T cell antigen peptides. iTTCA-RF provides satisfactory performance, with balanced accuracy, specificity and sensitivity values of 83.71%, 78.73% and 88.69% over tenfold cross-validation as well as 73.14%, 62.67% and 83.61% over independent tests, respectively. The online prediction server was freely accessible at http://lab.malab.cn/~acy/iTTCA . CONCLUSIONS We have proven that the proposed predictor iTTCA-RF is superior to the other latest models, and will hopefully become an effective and useful tool for identifying tumor T cell antigens presented in the context of major histocompatibility complex class I.
Collapse
Affiliation(s)
- Shihu Jiao
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Quan Zou
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Huannan Guo
- Department of Oncology, General Hospital of Heilongjiang Province Land Reclamation Bureau, Harbin, China.
| | - Lei Shi
- Department of Spine Surgery, Changzheng Hospital, Naval Medical University, Shanghai, China.
| |
Collapse
|
19
|
Jung HD, Sung YJ, Kim HU. Omics and Computational Modeling Approaches for the Effective Treatment of Drug-Resistant Cancer Cells. Front Genet 2021; 12:742902. [PMID: 34691155 PMCID: PMC8527086 DOI: 10.3389/fgene.2021.742902] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Accepted: 09/20/2021] [Indexed: 02/05/2023] Open
Abstract
Chemotherapy is a mainstream cancer treatment, but has a constant challenge of drug resistance, which consequently leads to poor prognosis in cancer treatment. For better understanding and effective treatment of drug-resistant cancer cells, omics approaches have been widely conducted in various forms. A notable use of omics data beyond routine data mining is to use them for computational modeling that allows generating useful predictions, such as drug responses and prognostic biomarkers. In particular, an increasing volume of omics data has facilitated the development of machine learning models. In this mini review, we highlight recent studies on the use of multi-omics data for studying drug-resistant cancer cells. We put a particular focus on studies that use computational models to characterize drug-resistant cancer cells, and to predict biomarkers and/or drug responses. Computational models covered in this mini review include network-based models, machine learning models and genome-scale metabolic models. We also provide perspectives on future research opportunities for combating drug-resistant cancer cells.
Collapse
Affiliation(s)
- Hae Deok Jung
- Department of Chemical and Biomolecular Engineering (BK21 four), Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South Korea
| | - Yoo Jin Sung
- Department of Chemical and Biomolecular Engineering (BK21 four), Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South Korea
| | - Hyun Uk Kim
- Department of Chemical and Biomolecular Engineering (BK21 four), Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South Korea.,KAIST Institute for Artificial Intelligence, KAIST, Daejeon, South Korea.,BioProcess Engineering Research Center and BioInformatics Research Center KAIST, Daejeon, South Korea
| |
Collapse
|
20
|
An X, Chen X, Yi D, Li H, Guan Y. Representation of molecules for drug response prediction. Brief Bioinform 2021; 23:6375515. [PMID: 34571534 DOI: 10.1093/bib/bbab393] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2021] [Revised: 08/28/2021] [Accepted: 08/30/2021] [Indexed: 12/18/2022] Open
Abstract
The rapid development of machine learning and deep learning algorithms in the recent decade has spurred an outburst of their applications in many research fields. In the chemistry domain, machine learning has been widely used to aid in drug screening, drug toxicity prediction, quantitative structure-activity relationship prediction, anti-cancer synergy score prediction, etc. This review is dedicated to the application of machine learning in drug response prediction. Specifically, we focus on molecular representations, which is a crucial element to the success of drug response prediction and other chemistry-related prediction tasks. We introduce three types of commonly used molecular representation methods, together with their implementation and application examples. This review will serve as a brief introduction of the broad field of molecular representations.
Collapse
Affiliation(s)
- Xin An
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Xi Chen
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Daiyao Yi
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Hongyang Li
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Yuanfang Guan
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| |
Collapse
|
21
|
Zulfiqar H, Sun ZJ, Huang QL, Yuan SS, Lv H, Dao FY, Lin H, Li YW. Deep-4mCW2V: A sequence-based predictor to identify N4-methylcytosine sites in Escherichia coli. Methods 2021; 203:558-563. [PMID: 34352373 DOI: 10.1016/j.ymeth.2021.07.011] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Revised: 07/22/2021] [Accepted: 07/29/2021] [Indexed: 10/20/2022] Open
Abstract
N4-methylcytosine (4mC) is a type of DNA modification which could regulate several biological progressions such as transcription regulation, replication and gene expressions. Precisely recognizing 4mC sites in genomic sequences can provide specific knowledge about their genetic roles. This study aimed to develop a deep learning-based model to predict 4mC sites in the Escherichia coli. In the model, DNA sequences were encoded by word embedding technique 'word2vec'. The obtained features were inputted into 1-D convolutional neural network (CNN) to discriminate 4mC sites from non-4mC sites in Escherichia coli genome. The examination on independent dataset showed that our model could yield the overall accuracy of 0.861, which was about 4.3% higher than the existing model. To provide convenience to scholars, we provided the data and source code of the model which can be freely download from https://github.com/linDing-groups/Deep-4mCW2V.
Collapse
Affiliation(s)
- Hasan Zulfiqar
- Center for Informational Biology and School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Zi-Jie Sun
- Center for Informational Biology and School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Qin-Lai Huang
- Center for Informational Biology and School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Shi-Shi Yuan
- Center for Informational Biology and School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Hao Lv
- Center for Informational Biology and School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Fu-Ying Dao
- Center for Informational Biology and School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Hao Lin
- Center for Informational Biology and School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China.
| | - Yan-Wen Li
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China; Key Laboratory of Intelligent Information Processing of Jilin Province, Northeast Normal University, Changchun 130117, China; Institute of Computational Biology, Northeast Normal University, Changchun 130117, China.
| |
Collapse
|
22
|
Zulfiqar H, Yuan SS, Huang QL, Sun ZJ, Dao FY, Yu XL, Lin H. Identification of cyclin protein using gradient boost decision tree algorithm. Comput Struct Biotechnol J 2021; 19:4123-4131. [PMID: 34527186 PMCID: PMC8346528 DOI: 10.1016/j.csbj.2021.07.013] [Citation(s) in RCA: 35] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Revised: 07/15/2021] [Accepted: 07/15/2021] [Indexed: 12/12/2022] Open
Abstract
Cyclin proteins are capable to regulate the cell cycle by forming a complex with cyclin-dependent kinases to activate cell cycle. Correct recognition of cyclin proteins could provide key clues for studying their functions. However, their sequences share low similarity, which results in poor prediction for sequence similarity-based methods. Thus, it is urgent to construct a machine learning model to identify cyclin proteins. This study aimed to develop a computational model to discriminate cyclin proteins from non-cyclin proteins. In our model, protein sequences were encoded by seven kinds of features that are amino acid composition, composition of k-spaced amino acid pairs, tri peptide composition, pseudo amino acid composition, geary correlation, normalized moreau-broto autocorrelation and composition/transition/distribution. Afterward, these features were optimized by using analysis of variance (ANOVA) and minimum redundancy maximum relevance (mRMR) with incremental feature selection (IFS) technique. A gradient boost decision tree (GBDT) classifier was trained on the optimal features. Five-fold cross-validated results showed that our model would identify cyclins with an accuracy of 93.06% and AUC value of 0.971, which are higher than the two recent studies on the same data.
Collapse
Affiliation(s)
- Hasan Zulfiqar
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Shi-Shi Yuan
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Qin-Lai Huang
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Zi-Jie Sun
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Fu-Ying Dao
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Xiao-Long Yu
- School of Materials Science and Engineering, Hainan University, Haikou 570228, China
| | - Hao Lin
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| |
Collapse
|
23
|
Zhu W, Guo Y, Zou Q. Prediction of presynaptic and postsynaptic neurotoxins based on feature extraction. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2021; 18:5943-5958. [PMID: 34517517 DOI: 10.3934/mbe.2021297] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
A neurotoxin is essentially a protein that mainly acts on the nervous system; it has a selective toxic effect on the central nervous system and neuromuscular nodes, can cause muscle paralysis and respiratory paralysis, and has strong lethality. According to their principle of action, neurotoxins are divided into presynaptic neurotoxins and postsynaptic neurotoxins. Correctly identifying presynaptic and postsynaptic nerve toxins provides important clues for future drug development and the discovery of drug targets. Therefore, a predictive model, Neu_LR, was constructed in this paper. The monoMonokGap method was used to extract the frequency characteristics of presynaptic and postsynaptic neurotoxin sequences and carry out feature selection, then, based on the important features obtained after dimensionality reduction, the prediction model Neu_LR was constructed using a logistic regression algorithm, and ten-fold cross-validation and independent test set validation were used. The final accuracy rates were 99.6078 and 94.1176%, respectively, which proved that the Neu_LR model had good predictive performance and robustness, and could meet the prediction requirements of presynaptic and postsynaptic neurotoxins. The data and source code of the model can be freely download from https://github.com/gyx123681/.
Collapse
Affiliation(s)
- Wen Zhu
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China
- Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou, China
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Yuxin Guo
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China
- Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou, China
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Quan Zou
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| |
Collapse
|
24
|
Ru X, Ye X, Sakurai T, Zou Q, Xu L, Lin C. Current status and future prospects of drug-target interaction prediction. Brief Funct Genomics 2021; 20:312-322. [PMID: 34189559 DOI: 10.1093/bfgp/elab031] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2021] [Revised: 06/01/2021] [Accepted: 06/04/2021] [Indexed: 01/09/2023] Open
Abstract
Drug-target interaction prediction is important for drug development and drug repurposing. Many computational methods have been proposed for drug-target interaction prediction due to their potential to the time and cost reduction. In this review, we introduce the molecular docking and machine learning-based methods, which have been widely applied to drug-target interaction prediction. Particularly, machine learning-based methods are divided into different types according to the data processing form and task type. For each type of method, we provide a specific description and propose some solutions to improve its capability. The knowledge of heterogeneous network and learning to rank are also summarized in this review. As far as we know, this is the first comprehensive review that summarizes the knowledge of heterogeneous network and learning to rank in the drug-target interaction prediction. Moreover, we propose three aspects that can be explored in depth for future research.
Collapse
Affiliation(s)
| | - Xiucai Ye
- Department of Computer Science, and Center for Artificial Intelligence Research (C-AIR), University of Tsukuba
| | - Tetsuya Sakurai
- Department of Computer Science and is the director of the C-AIR, University of Tsukuba
| | - Quan Zou
- University of Electronic Science and Technology of China
| | - Lei Xu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic
| | | |
Collapse
|
25
|
CWLy-RF: A novel approach for identifying cell wall lyases based on random forest classifier. Genomics 2021; 113:2919-2924. [PMID: 34186189 DOI: 10.1016/j.ygeno.2021.06.038] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2021] [Revised: 06/20/2021] [Accepted: 06/25/2021] [Indexed: 02/05/2023]
Abstract
Drug resistance of pathogenic bacteria has become increasingly serious due to the abuse of antibiotics in recent years. Researchers have found that cell wall lyases are effective antibacterial agents that can specifically recognize target bacteria and degrade bacterial peptidoglycan. Traditional wet experiments are usually expensive, time-consuming and laborious for the identification of lyases. Therefore, there is an urgent need to develop prediction tools based on computer methods to identify lyases quickly and accurately. In this paper, a new predictor, CWLy-RF, is proposed based on the random forest (RF) algorithm to identify cell wall lyases. In this method, we combined three features, namely, 400D, 188D and the composition of k-spaced amino acid group pairs, using mixed-feature representation methods. Afterward, we improved the feature representation ability with the selected top 100 features by using the information gain method and trained a predictive model using RF. The constructed prediction model is evaluated by using 10-fold cross-validation. The accuracy obtained was 96.09%, the AUC was 0.993, the MCC was 0.922, the sensitivity was 94.92%, and the specificity was 97.32%. We have proved that the proposed predictor CWLy-RF is superior to other latest models, and it will hopefully become an effective and useful tool for identifying lyases.
Collapse
|
26
|
Hunt C, Montgomery S, Berkenpas JW, Sigafoos N, Oakley JC, Espinosa J, Justice N, Kishaba K, Hippe K, Si D, Hou J, Ding H, Cao R. Recent Progress of Machine Learning in Gene Therapy. Curr Gene Ther 2021; 22:132-143. [PMID: 34161210 DOI: 10.2174/1566523221666210622164133] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2021] [Revised: 03/15/2021] [Accepted: 04/02/2021] [Indexed: 11/22/2022]
Abstract
With new developments in biomedical technology, it is now a viable therapeutic treatment to alter genes with techniques like CRISPR. At the same time, it is increasingly cheaper to do whole genome sequencing, resulting in rapid advancement in gene therapy and editing in precision medicine. Thus, understanding the current industry and academic applications of gene therapy provides an important backdrop to future scientific developments. Additionally, machine learning and artificial intelligence techniques allow for the reduction of time and money spent in the development of new gene therapy products and techniques. In this paper, we survey the current progress of gene therapy treatments for several diseases and explore machine learning applications in gene therapy. We also discuss the ethical implications of gene therapy and the use of machine learning in precision medicine. Machine learning and gene therapy are both topics gaining popularity in various publications, and we conclude that there is still room for continued research and application of machine learning techniques in the gene therapy field.
Collapse
Affiliation(s)
- Cassandra Hunt
- Department of Computer Science, Pacific Lutheran University, Tacoma, WA, United States
| | - Sandra Montgomery
- Department of Physics, Pacific Lutheran University, Tacoma, WA, United States
| | | | - Noel Sigafoos
- Department of Computer Science, Pacific Lutheran University, Tacoma, WA, United States
| | - John Christian Oakley
- Department of Computer Science, Pacific Lutheran University, Tacoma, WA, United States
| | - Jacob Espinosa
- Department of Mathematics, Pacific Lutheran University, Tacoma, WA, United States
| | - Nicola Justice
- Department of Mathematics, Pacific Lutheran University, Tacoma, WA, United States
| | - Kiyomi Kishaba
- Department of Humanities, Pacific Lutheran University, Tacoma, WA, United States
| | - Kyle Hippe
- Department of Computer Science, Pacific Lutheran University, Tacoma, WA, United States
| | - Dong Si
- Division of Computing Software Systems, University of Washington-Bothell, Bothell, WA, United States
| | - Jie Hou
- Department of Computer Science, Saint Louis University, St. Louis, MO, United States
| | - Hui Ding
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Renzhi Cao
- Department of Computer Science, Pacific Lutheran University, Tacoma, WA, United States
| |
Collapse
|
27
|
|
28
|
Zhao XM, Wu FX. Deep networks and network representation in bioinformatics. Methods 2021:S1046-2023(21)00102-X. [PMID: 33894378 DOI: 10.1016/j.ymeth.2021.04.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Affiliation(s)
- Xing-Ming Zhao
- Institute of Science and Technology for Brain-Inspired Intelligence and Research Institute of Intelligent Complex Systems, Fudan University, Shanghai 200433, China; MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence, and MOE Frontiers Center for Brain Science, China.
| | - Fang-Xiang Wu
- Division of Biomedical Engineering, Department of Mechanical Engineering and Department of Computer Science, University of Saskatchewan, Saskatoon, SK S7N 5A9, Canada
| |
Collapse
|
29
|
Niu K, Luo X, Zhang S, Teng Z, Zhang T, Zhao Y. iEnhancer-EBLSTM: Identifying Enhancers and Strengths by Ensembles of Bidirectional Long Short-Term Memory. Front Genet 2021; 12:665498. [PMID: 33833783 PMCID: PMC8021722 DOI: 10.3389/fgene.2021.665498] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2021] [Accepted: 03/01/2021] [Indexed: 12/26/2022] Open
Abstract
Enhancers are regulatory DNA sequences that could be bound by specific proteins named transcription factors (TFs). The interactions between enhancers and TFs regulate specific genes by increasing the target gene expression. Therefore, enhancer identification and classification have been a critical issue in the enhancer field. Unfortunately, so far there has been a lack of suitable methods to identify enhancers. Previous research has mainly focused on the features of the enhancer's function and interactions, which ignores the sequence information. As we know, the recurrent neural network (RNN) and long short-term memory (LSTM) models are currently the most common methods for processing time series data. LSTM is more suitable than RNN to address the DNA sequence. In this paper, we take the advantages of LSTM to build a method named iEnhancer-EBLSTM to identify enhancers. iEnhancer-ensembles of bidirectional LSTM (EBLSTM) consists of two steps. In the first step, we extract subsequences by sliding a 3-mer window along the DNA sequence as features. Second, EBLSTM model is used to identify enhancers from the candidate input sequences. We use the dataset from the study of Quang H et al. as the benchmarks. The experimental results from the datasets demonstrate the efficiency of our proposed model.
Collapse
Affiliation(s)
- Kun Niu
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
| | - Ximei Luo
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Shumei Zhang
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
| | - Zhixia Teng
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
| | - Tianjiao Zhang
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
| | - Yuming Zhao
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
| |
Collapse
|