1
|
Cavalcante BRR, Freitas RD, Siquara da Rocha LO, Santos RSB, Souza BSDF, Ramos PIP, Rocha GV, Gurgel Rocha CA. In silico approaches for drug repurposing in oncology: a scoping review. Front Pharmacol 2024; 15:1400029. [PMID: 38919258 PMCID: PMC11196849 DOI: 10.3389/fphar.2024.1400029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2024] [Accepted: 05/14/2024] [Indexed: 06/27/2024] Open
Abstract
Introduction: Cancer refers to a group of diseases characterized by the uncontrolled growth and spread of abnormal cells in the body. Due to its complexity, it has been hard to find an ideal medicine to treat all cancer types, although there is an urgent need for it. However, the cost of developing a new drug is high and time-consuming. In this sense, drug repurposing (DR) can hasten drug discovery by giving existing drugs new disease indications. Many computational methods have been applied to achieve DR, but just a few have succeeded. Therefore, this review aims to show in silico DR approaches and the gap between these strategies and their ultimate application in oncology. Methods: The scoping review was conducted according to the Arksey and O'Malley framework and the Joanna Briggs Institute recommendations. Relevant studies were identified through electronic searching of PubMed/MEDLINE, Embase, Scopus, and Web of Science databases, as well as the grey literature. We included peer-reviewed research articles involving in silico strategies applied to drug repurposing in oncology, published between 1 January 2003, and 31 December 2021. Results: We identified 238 studies for inclusion in the review. Most studies revealed that the United States, India, China, South Korea, and Italy are top publishers. Regarding cancer types, breast cancer, lymphomas and leukemias, lung, colorectal, and prostate cancer are the top investigated. Additionally, most studies solely used computational methods, and just a few assessed more complex scientific models. Lastly, molecular modeling, which includes molecular docking and molecular dynamics simulations, was the most frequently used method, followed by signature-, Machine Learning-, and network-based strategies. Discussion: DR is a trending opportunity but still demands extensive testing to ensure its safety and efficacy for the new indications. Finally, implementing DR can be challenging due to various factors, including lack of quality data, patient populations, cost, intellectual property issues, market considerations, and regulatory requirements. Despite all the hurdles, DR remains an exciting strategy for identifying new treatments for numerous diseases, including cancer types, and giving patients faster access to new medications.
Collapse
Affiliation(s)
- Bruno Raphael Ribeiro Cavalcante
- Gonçalo Moniz Institute, Oswaldo Cruz Foundation (IGM-FIOCRUZ/BA), Salvador, Brazil
- Department of Pathology and Forensic Medicine of the School of Medicine, Federal University of Bahia, Salvador, Brazil
| | - Raíza Dias Freitas
- Gonçalo Moniz Institute, Oswaldo Cruz Foundation (IGM-FIOCRUZ/BA), Salvador, Brazil
- Department of Social and Pediatric Dentistry of the School of Dentistry, Federal University of Bahia, Salvador, Brazil
| | - Leonardo de Oliveira Siquara da Rocha
- Gonçalo Moniz Institute, Oswaldo Cruz Foundation (IGM-FIOCRUZ/BA), Salvador, Brazil
- Department of Pathology and Forensic Medicine of the School of Medicine, Federal University of Bahia, Salvador, Brazil
| | | | - Bruno Solano de Freitas Souza
- Gonçalo Moniz Institute, Oswaldo Cruz Foundation (IGM-FIOCRUZ/BA), Salvador, Brazil
- D’Or Institute for Research and Education (IDOR), Salvador, Brazil
| | - Pablo Ivan Pereira Ramos
- Gonçalo Moniz Institute, Oswaldo Cruz Foundation (IGM-FIOCRUZ/BA), Salvador, Brazil
- Center of Data and Knowledge Integration for Health (CIDACS), Salvador, Brazil
| | - Gisele Vieira Rocha
- Gonçalo Moniz Institute, Oswaldo Cruz Foundation (IGM-FIOCRUZ/BA), Salvador, Brazil
- D’Or Institute for Research and Education (IDOR), Salvador, Brazil
| | - Clarissa Araújo Gurgel Rocha
- Gonçalo Moniz Institute, Oswaldo Cruz Foundation (IGM-FIOCRUZ/BA), Salvador, Brazil
- Department of Pathology and Forensic Medicine of the School of Medicine, Federal University of Bahia, Salvador, Brazil
- D’Or Institute for Research and Education (IDOR), Salvador, Brazil
- Department of Propaedeutics, School of Dentistry of the Federal University of Bahia, Salvador, Brazil
| |
Collapse
|
2
|
Li X, Liao M, Wang B, Zan X, Huo Y, Liu Y, Bao Z, Xu P, Liu W. A drug repurposing method based on inhibition effect on gene regulatory network. Comput Struct Biotechnol J 2023; 21:4446-4455. [PMID: 37731599 PMCID: PMC10507583 DOI: 10.1016/j.csbj.2023.09.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Revised: 09/05/2023] [Accepted: 09/07/2023] [Indexed: 09/22/2023] Open
Abstract
Numerous computational drug repurposing methods have emerged as efficient alternatives to costly and time-consuming traditional drug discovery approaches. Some of these methods are based on the assumption that the candidate drug should have a reversal effect on disease-associated genes. However, such methods are not applicable in the case that there is limited overlap between disease-related genes and drug-perturbed genes. In this study, we proposed a novel Drug Repurposing method based on the Inhibition Effect on gene regulatory network (DRIE) to identify potential drugs for cancer treatment. DRIE integrated gene expression profile and gene regulatory network to calculate inhibition score by using the shortest path in the disease-specific network. The results on eleven datasets indicated the superior performance of DRIE when compared to other state-of-the-art methods. Case studies showed that our method effectively discovered novel drug-disease associations. Our findings demonstrated that the top-ranked drug candidates had been already validated by CTD database. Additionally, it clearly identified potential agents for three cancers (colorectal, breast, and lung cancer), which was beneficial when annotating drug-disease relationships in the CTD. This study proposed a novel framework for drug repurposing, which would be helpful for drug discovery and development.
Collapse
Affiliation(s)
- Xianbin Li
- Institute of Computational Science and Technology, Guangzhou University, Guangzhou, China
- School of Computer Science of Information Technology, Qiannan Normal University for Nationalities, Duyun, China
| | - Minzhen Liao
- Institute of Computational Science and Technology, Guangzhou University, Guangzhou, China
| | - Bing Wang
- School of Medicine, Southeast University, Nanjing, China
| | - Xiangzhen Zan
- Institute of Computational Science and Technology, Guangzhou University, Guangzhou, China
| | - Yanhao Huo
- Institute of Computational Science and Technology, Guangzhou University, Guangzhou, China
| | - Yue Liu
- Institute of Computational Science and Technology, Guangzhou University, Guangzhou, China
| | - Zhenshen Bao
- Institute of Computational Science and Technology, Guangzhou University, Guangzhou, China
- School of Computer Science of Information Technology, Qiannan Normal University for Nationalities, Duyun, China
| | - Peng Xu
- Institute of Computational Science and Technology, Guangzhou University, Guangzhou, China
- School of Computer Science of Information Technology, Qiannan Normal University for Nationalities, Duyun, China
| | - Wenbin Liu
- Institute of Computational Science and Technology, Guangzhou University, Guangzhou, China
| |
Collapse
|
3
|
Casotti MC, Meira DD, Alves LNR, Bessa BGDO, Campanharo CV, Vicente CR, Aguiar CC, Duque DDA, Barbosa DG, dos Santos EDVW, Garcia FM, de Paula F, Santana GM, Pavan IP, Louro LS, Braga RFR, Trabach RSDR, Louro TS, de Carvalho EF, Louro ID. Translational Bioinformatics Applied to the Study of Complex Diseases. Genes (Basel) 2023; 14:419. [PMID: 36833346 PMCID: PMC9956936 DOI: 10.3390/genes14020419] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2022] [Revised: 01/29/2023] [Accepted: 01/31/2023] [Indexed: 02/10/2023] Open
Abstract
Translational Bioinformatics (TBI) is defined as the union of translational medicine and bioinformatics. It emerges as a major advance in science and technology by covering everything, from the most basic database discoveries, to the development of algorithms for molecular and cellular analysis, as well as their clinical applications. This technology makes it possible to access the knowledge of scientific evidence and apply it to clinical practice. This manuscript aims to highlight the role of TBI in the study of complex diseases, as well as its application to the understanding and treatment of cancer. An integrative literature review was carried out, obtaining articles through several websites, among them: PUBMED, Science Direct, NCBI-PMC, Scientific Electronic Library Online (SciELO), and Google Academic, published in English, Spanish, and Portuguese, indexed in the referred databases and answering the following guiding question: "How does TBI provide a scientific understanding of complex diseases?" An additional effort is aimed at the dissemination, inclusion, and perpetuation of TBI knowledge from the academic environment to society, helping the study, understanding, and elucidating of complex disease mechanics and their treatment.
Collapse
Affiliation(s)
- Matheus Correia Casotti
- Departamento de Ciências Biológicas, Universidade Federal do Espírito Santo, Vitória 29075-010, Espírito Santo, Brazil
| | - Débora Dummer Meira
- Departamento de Ciências Biológicas, Universidade Federal do Espírito Santo, Vitória 29075-010, Espírito Santo, Brazil
| | - Lyvia Neves Rebello Alves
- Departamento de Ciências Biológicas, Universidade Federal do Espírito Santo, Vitória 29075-010, Espírito Santo, Brazil
| | | | - Camilly Victória Campanharo
- Departamento de Ciências Biológicas, Universidade Federal do Espírito Santo, Vitória 29075-010, Espírito Santo, Brazil
| | - Creuza Rachel Vicente
- Departamento de Medicina Social, Universidade Federal do Espírito Santo, Vitória 29040-090, Espírito Santo, Brazil
| | - Carla Carvalho Aguiar
- Departamento de Ciências Biológicas, Universidade Federal do Espírito Santo, Vitória 29075-010, Espírito Santo, Brazil
| | - Daniel de Almeida Duque
- Departamento de Ciências Biológicas, Universidade Federal do Espírito Santo, Vitória 29075-010, Espírito Santo, Brazil
| | - Débora Gonçalves Barbosa
- Departamento de Ciências Biológicas, Universidade Federal do Espírito Santo, Vitória 29075-010, Espírito Santo, Brazil
| | | | - Fernanda Mariano Garcia
- Departamento de Ciências Biológicas, Universidade Federal do Espírito Santo, Vitória 29075-010, Espírito Santo, Brazil
| | - Flávia de Paula
- Departamento de Ciências Biológicas, Universidade Federal do Espírito Santo, Vitória 29075-010, Espírito Santo, Brazil
| | - Gabriel Mendonça Santana
- Departamento de Ciências Biológicas, Universidade Federal do Espírito Santo, Vitória 29075-010, Espírito Santo, Brazil
| | - Isabele Pagani Pavan
- Departamento de Ciências Biológicas, Universidade Federal do Espírito Santo, Vitória 29075-010, Espírito Santo, Brazil
| | - Luana Santos Louro
- Departamento de Ciências Biológicas, Universidade Federal do Espírito Santo, Vitória 29075-010, Espírito Santo, Brazil
| | - Raquel Furlani Rocon Braga
- Departamento de Ciências Biológicas, Universidade Federal do Espírito Santo, Vitória 29075-010, Espírito Santo, Brazil
| | - Raquel Silva dos Reis Trabach
- Departamento de Ciências Biológicas, Universidade Federal do Espírito Santo, Vitória 29075-010, Espírito Santo, Brazil
| | - Thomas Santos Louro
- Escola Superior de Ciências da Santa Casa de Misericórdia de Vitória (EMESCAM), Vitória 29027-502, Espírito Santo, Brazil
| | - Elizeu Fagundes de Carvalho
- Instituto de Biologia Roberto Alcantara Gomes (IBRAG), Universidade do Estado do Rio de Janeiro (UERJ), Rio de Janeiro 20551-030, Rio de Janeiro, Brazil
| | - Iúri Drumond Louro
- Departamento de Ciências Biológicas, Universidade Federal do Espírito Santo, Vitória 29075-010, Espírito Santo, Brazil
| |
Collapse
|
4
|
Zhao D, Wang L, Chen Z, Zhang L, Xu L. KRAS is a prognostic biomarker associated with diagnosis and treatment in multiple cancers. Front Genet 2022; 13:1024920. [PMID: 36330448 PMCID: PMC9624065 DOI: 10.3389/fgene.2022.1024920] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2022] [Accepted: 09/20/2022] [Indexed: 11/21/2022] Open
Abstract
KRAS encodes K-Ras proteins, which take part in the MAPK pathway. The expression level of KRAS is high in tumor patients. Our study compared KRAS expression levels between 33 kinds of tumor tissues. Additionally, we studied the association of KRAS expression levels with diagnostic and prognostic values, clinicopathological features, and tumor immunity. We established 22 immune-infiltrating cell expression datasets to calculate immune and stromal scores to evaluate the tumor microenvironment. KRAS genes, immune check-point genes and interacting genes were selected to construct the PPI network. We selected 79 immune checkpoint genes and interacting related genes to calculate the correlation. Based on the 33 tumor expression datasets, we conducted GSEA (genome set enrichment analysis) to show the KRAS and other co-expressed genes associated with cancers. KRAS may be a reliable prognostic biomarker in the diagnosis of cancer patients and has the potential to be included in cancer-targeted drugs.
Collapse
Affiliation(s)
- Da Zhao
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- School of food and drug, Shenzhen Polytechnic, Shenzhen, China
| | - Lizhuang Wang
- Beidahuang Industry Group General Hospital, Harbin, China
| | - Zheng Chen
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- School of food and drug, Shenzhen Polytechnic, Shenzhen, China
| | - Lijun Zhang
- School of food and drug, Shenzhen Polytechnic, Shenzhen, China
| | - Lei Xu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen, China
- *Correspondence: Lei Xu,
| |
Collapse
|
5
|
Chen M, Zhang X, Ju Y, Liu Q, Ding Y. iPseU-TWSVM: Identification of RNA pseudouridine sites based on TWSVM. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2022; 19:13829-13850. [PMID: 36654069 DOI: 10.3934/mbe.2022644] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
Biological sequence analysis is an important basic research work in the field of bioinformatics. With the explosive growth of data, machine learning methods play an increasingly important role in biological sequence analysis. By constructing a classifier for prediction, the input sequence feature vector is predicted and evaluated, and the knowledge of gene structure, function and evolution is obtained from a large amount of sequence information, which lays a foundation for researchers to carry out in-depth research. At present, many machine learning methods have been applied to biological sequence analysis such as RNA gene recognition and protein secondary structure prediction. As a biological sequence, RNA plays an important biological role in the encoding, decoding, regulation and expression of genes. The analysis of RNA data is currently carried out from the aspects of structure and function, including secondary structure prediction, non-coding RNA identification and functional site prediction. Pseudouridine (У) is the most widespread and rich RNA modification and has been discovered in a variety of RNAs. It is highly essential for the study of related functional mechanisms and disease diagnosis to accurately identify У sites in RNA sequences. At present, several computational approaches have been suggested as an alternative to experimental methods to detect У sites, but there is still potential for improvement in their performance. In this study, we present a model based on twin support vector machine (TWSVM) for У site identification. The model combines a variety of feature representation techniques and uses the max-relevance and min-redundancy methods to obtain the optimum feature subset for training. The independent testing accuracy is improved by 3.4% in comparison to current advanced У site predictors. The outcomes demonstrate that our model has better generalization performance and improves the accuracy of У site identification. iPseU-TWSVM can be a helpful tool to identify У sites.
Collapse
Affiliation(s)
- Mingshuai Chen
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang, China
| | - Xin Zhang
- Beidahuang Industry Group General Hospital, Harbin, China
| | - Ying Ju
- School of Informatics, Xiamen University, Xiamen, China
| | - Qing Liu
- Department of Anesthesiology, Hospital (T.C.M) Affiliated to Southwest Medical University, Luzhou, China
| | - Yijie Ding
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang, China
| |
Collapse
|
6
|
An integrated pan-cancer analysis of identifying biomarkers about the EGR family genes in human carcinomas. Comput Biol Med 2022; 148:105889. [DOI: 10.1016/j.compbiomed.2022.105889] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Revised: 06/25/2022] [Accepted: 07/16/2022] [Indexed: 12/24/2022]
|
7
|
Zhang Y, Lei X, Pan Y, Wu FX. Drug Repositioning with GraphSAGE and Clustering Constraints Based on Drug and Disease Networks. Front Pharmacol 2022; 13:872785. [PMID: 35620297 PMCID: PMC9127467 DOI: 10.3389/fphar.2022.872785] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2022] [Accepted: 04/11/2022] [Indexed: 11/29/2022] Open
Abstract
The understanding of therapeutic properties is important in drug repositioning and drug discovery. However, chemical or clinical trials are expensive and inefficient to characterize the therapeutic properties of drugs. Recently, artificial intelligence (AI)-assisted algorithms have received extensive attention for discovering the potential therapeutic properties of drugs and speeding up drug development. In this study, we propose a new method based on GraphSAGE and clustering constraints (DRGCC) to investigate the potential therapeutic properties of drugs for drug repositioning. First, the drug structure features and disease symptom features are extracted. Second, the drug–drug interaction network and disease similarity network are constructed according to the drug–gene and disease–gene relationships. Matrix factorization is adopted to extract the clustering features of networks. Then, all the features are fed to the GraphSAGE to predict new associations between existing drugs and diseases. Benchmark comparisons on two different datasets show that our method has reliable predictive performance and outperforms other six competing. We have also conducted case studies on existing drugs and diseases and aimed to predict drugs that may be effective for the novel coronavirus disease 2019 (COVID-19). Among the predicted anti-COVID-19 drug candidates, some drugs are being clinically studied by pharmacologists, and their binding sites to COVID-19-related protein receptors have been found via the molecular docking technology.
Collapse
Affiliation(s)
- Yuchen Zhang
- School of Computer Science, Shaanxi Normal University, Xi'an, China
| | - Xiujuan Lei
- School of Computer Science, Shaanxi Normal University, Xi'an, China
| | - Yi Pan
- Faculty of Computer Science and Control Engineering, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Fang-Xiang Wu
- Division of Biomedical Engineering, University of Saskatchewan, Saskatoon, SK, Canada
| |
Collapse
|
8
|
Bajbouj K, Qaisar R, Alshura MA, Ibrahim Z, Alebaji MB, Al Ani AW, Janajrah HM, Bilalaga MM, Omara AI, Abou Assaleh RS, Saber-Ayad MM, Elmoselhi AB. Synergistic Anti-Angiogenic Effect of Combined VEGFR Kinase Inhibitors, Lenvatinib, and Regorafenib: A Therapeutic Potential for Breast Cancer. Int J Mol Sci 2022; 23:ijms23084408. [PMID: 35457226 PMCID: PMC9028329 DOI: 10.3390/ijms23084408] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2022] [Revised: 04/08/2022] [Accepted: 04/14/2022] [Indexed: 12/20/2022] Open
Abstract
Background: Breast cancer currently affects more than two million women worldwide, and its incidence is steadily increasing. One of the most essential factors of invasion and metastasis of breast cancer cells is angiogenesis and non-angiogenic vascularization. Lenvatinib and Regorafenib share the same anti-angiogenic effect by inhibiting vascular endothelial growth factor receptors (VEGFRs subtypes 1 to 3) and have been approved for treating different types of cancer. Methods: We investigated Lenvatinib and Regorafenib effects on a well-established in-vitro model of breast cancer using MCF-7 (estrogen, progesterone receptor-positive, and HER2-negative), MDA-MB-231 (triple negative), as well as Human Umbilical Vascular Endothelial Cell line (HUVEC) cell lines. We performed the cell viability assay on four groups of cells, which included a control group, a Lenvatinib treated only group, a Regorafenib treated only group, and a group treated with a combination of both drugs at 24, 48, and 72 h. Data were analyzed as means ± standard deviation, and the drug−drug interactions with Compusyn software. Cellular migration assay, tube formation assay, and Western blots were conducted to determine the functional and the protein expression of downstream signals such as Caspase-9, anti-apoptotic Survivin, P-ERK, and total-ERK in the control and treatment groups. Results: MCF-7 cells showed a reduction in cell survival rates with higher dosing and longer incubation periods with each drug and with the combination of drugs. A synergistic interaction was identified (CI < 1) with both drugs on MCF7 at different dose combinations and at a higher dose in MDA-MB-231 cells. Furthermore, there was a marked decrease in the anti-angiogenic effect of both drugs in tube formation assay using MDA-MB-231 cells and survivin protein expression in MCF-7, and those antitumor markers showed a better outcome in drug combination than the use of each drug alone. Conclusion: Our result is the first to report the synergistic anti-angiogenic potential of combination therapy of Lenvatinib and Regorafenib. Therefore, it shows their therapeutic potential in breast cancer, including the aggressive types. Further studies are warranted to confirm and explore this therapeutic approach.
Collapse
|
9
|
Rao N, Kini R, Maniyar D, Amin R. Journey from Serendipity to Biologics. Pharm Chem J 2022. [DOI: 10.1007/s11094-022-02579-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
10
|
Zhang S, Jiang H, Gao B, Yang W, Wang G. Identification of Diagnostic Markers for Breast Cancer Based on Differential Gene Expression and Pathway Network. Front Cell Dev Biol 2022; 9:811585. [PMID: 35096840 PMCID: PMC8790293 DOI: 10.3389/fcell.2021.811585] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2021] [Accepted: 12/13/2021] [Indexed: 11/13/2022] Open
Abstract
Background: Breast cancer is the second largest cancer in the world, the incidence of breast cancer continues to rise worldwide, and women's health is seriously threatened. Therefore, it is very important to explore the characteristic changes of breast cancer from the gene level, including the screening of differentially expressed genes and the identification of diagnostic markers. Methods: The gene expression profiles of breast cancer were obtained from the TCGA database. The edgeR R software package was used to screen the differentially expressed genes between breast cancer patients and normal samples. The function and pathway enrichment analysis of these genes revealed significant enrichment of functions and pathways. Next, download these pathways from KEGG website, extract the gene interaction relations, construct the KEGG pathway gene interaction network. The potential diagnostic markers of breast cancer were obtained by combining the differentially expressed genes with the key genes in the network. Finally, these markers were used to construct the diagnostic prediction model of breast cancer, and the predictive ability of the model and the diagnostic ability of the markers were verified by internal and external data. Results: 1060 differentially expressed genes were identified between breast cancer patients and normal controls. Enrichment analysis revealed 28 significantly enriched pathways (p < 0.05). They were downloaded from KEGG website, and the gene interaction relations were extracted to construct the gene interaction network of KEGG pathway, which contained 1277 nodes and 7345 edges. The key nodes with a degree greater than 30 were extracted from the network, containing 154 genes. These 154 key genes shared 23 genes with differentially expressed genes, which serve as potential diagnostic markers for breast cancer. The 23 genes were used as features to construct the SVM classification model, and the model had good predictive ability in both the training dataset and the validation dataset (AUC = 0.960 and 0.907, respectively). Conclusion: This study showed that the difference of gene expression level is important for the diagnosis of breast cancer, and identified 23 breast cancer diagnostic markers, which provides valuable information for clinical diagnosis and basic treatment experiments.
Collapse
Affiliation(s)
- Shumei Zhang
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
| | - Haoran Jiang
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
| | - Bo Gao
- Department of Radiology, The Second Affiliated Hospital, Harbin Medical University, Harbin, China
| | - Wen Yang
- International Medical Center, Shenzhen University General Hospital, Shenzhen, China
| | - Guohua Wang
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
| |
Collapse
|
11
|
Han S, Wang N, Guo Y, Tang F, Xu L, Ju Y, Shi L. Application of Sparse Representation in Bioinformatics. Front Genet 2021; 12:810875. [PMID: 34976030 PMCID: PMC8715914 DOI: 10.3389/fgene.2021.810875] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2021] [Accepted: 12/01/2021] [Indexed: 11/15/2022] Open
Abstract
Inspired by L1-norm minimization methods, such as basis pursuit, compressed sensing, and Lasso feature selection, in recent years, sparse representation shows up as a novel and potent data processing method and displays powerful superiority. Researchers have not only extended the sparse representation of a signal to image presentation, but also applied the sparsity of vectors to that of matrices. Moreover, sparse representation has been applied to pattern recognition with good results. Because of its multiple advantages, such as insensitivity to noise, strong robustness, less sensitivity to selected features, and no “overfitting” phenomenon, the application of sparse representation in bioinformatics should be studied further. This article reviews the development of sparse representation, and explains its applications in bioinformatics, namely the use of low-rank representation matrices to identify and study cancer molecules, low-rank sparse representations to analyze and process gene expression profiles, and an introduction to related cancers and gene expression profile database.
Collapse
Affiliation(s)
- Shuguang Han
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Ning Wang
- Beidahuang Industry Group General Hospital, Harbin, China
| | - Yuxin Guo
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Furong Tang
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen, China
| | - Lei Xu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen, China
| | - Ying Ju
- School of Informatics, Xiamen University, Xiamen, China
- *Correspondence: Ying Ju, ; Lei Shi,
| | - Lei Shi
- Department of Spine Surgery, Changzheng Hospital, Naval Medical University, Shanghai, China
- *Correspondence: Ying Ju, ; Lei Shi,
| |
Collapse
|
12
|
Drug Repositioning and Subgroup Discovery for Precision Medicine Implementation in Triple Negative Breast Cancer. Cancers (Basel) 2021; 13:cancers13246278. [PMID: 34944904 PMCID: PMC8699385 DOI: 10.3390/cancers13246278] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2021] [Revised: 11/30/2021] [Accepted: 12/02/2021] [Indexed: 12/29/2022] Open
Abstract
Simple Summary The heterogeneity of complicated diseases like cancer negatively affects patients’ responses to treatment. Finding homogeneous subgroups of patients within the cancer population and finding the appropriate treatment for each subgroup will improve patients’ survival. In this study, we focus on triple-negative breast cancer (TNBC), where approximately 80% of patients do not entirely respond to chemotherapy. Our aim is to find subgroups of TNBC patients and identify drugs that have the potential to tailor treatments for each group through drug repositioning. After applying our method to TNBC, we found that different targeted mechanisms were suggested for different groups of patients. Our findings could help the research community to gain a better understanding of different subgroups within the TNBC population and can help the drugs to be repurposed with explainable results regarding the targeted mechanism. Abstract Breast cancer (BC) is the leading cause of death among female patients with cancer. Patients with triple-negative breast cancer (TNBC) have the lowest survival rate. TNBC has substantial heterogeneity within the BC population. This study utilized our novel patient stratification and drug repositioning method to find subgroups of BC patients that share common genetic profiles and that may respond similarly to the recommended drugs. After further examination of the discovered patient subgroups, we identified five homogeneous druggable TNBC subgroups. A drug repositioning algorithm was then applied to find the drugs with a high potential for each subgroup. Most of the top drugs for these subgroups were chemotherapy used for various types of cancer, including BC. After analyzing the biological mechanisms targeted by these drugs, ferroptosis was the common cell death mechanism induced by the top drugs in the subgroups with neoplasm subdivision and race as clinical variables. In contrast, the antioxidative effect on cancer cells was the common targeted mechanism in the subgroup of patients with an age less than 50. Literature reviews were used to validate our findings, which could provide invaluable insights to streamline the drug repositioning process and could be further studied in a wet lab setting and in clinical trials.
Collapse
|
13
|
Ahmed KA, Hasib TA, Paul SK, Saddam M, Mimi A, Saikat ASM, Faruque HA, Rahman MA, Uddin MJ, Kim B. Potential Role of CCN Proteins in Breast Cancer: Therapeutic Advances and Perspectives. Curr Oncol 2021; 28:4972-4985. [PMID: 34940056 PMCID: PMC8700172 DOI: 10.3390/curroncol28060417] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2021] [Revised: 11/20/2021] [Accepted: 11/24/2021] [Indexed: 12/24/2022] Open
Abstract
CCNs are a specific type of matricellular protein, which are essential signaling molecules, and play multiple roles in multicellular eukaryotes. This family of proteins consists of six separate members, which exist only in vertebrates. The architecture of CCN proteins is multi-modular comprising four distinct modules. CCN Proteins achieve their primary functional activities by binding with several integrin7 receptors. The CCN family has been linked to cell adhesion, chemotaxis and migration, mitogenesis, cell survival, angiogenesis, differentiation, tumorigenesis, chondrogenesis, and wound healing, among other biological interactions. Breast cancer is the most commonly diagnosed cancer worldwide and CCN regulated breast cancer stands at the top. A favorable or unfavorable association between various CCNs has been reported in patients with breast carcinomas. The pro-tumorigenic CCN1, CCN2, CCN3, and CCN4 may lead to human breast cancer, although the anti-tumorigenic actions of CCN5 and CCN6 are also present. Several studies have been conducted on CCN proteins and cancer in recent years. CCN1 and CCN3 have been shown to exhibit a dual nature of tumor inhibition and tumor suppression to some extent in quiet recent time. Pharmacological advances in treating breast cancer by targeting CCN proteins are also reported. In our study, we intend to provide an overview of these research works while keeping breast cancer in focus. This information may facilitate early diagnosis, early prognosis and the development of new therapeutic strategies.
Collapse
Affiliation(s)
- Kazi Ahsan Ahmed
- ABEx Bio-Research Center, East Azampur, Dhaka 1230, Bangladesh; (K.A.A.); (T.A.H.); (S.K.P.); (H.A.F.)
- Department of Biochemistry and Molecular Biology, Life Science Faculty, Bangabandhu Sheikh Mujibur Rahman Science and Technology University, Gopalganj 8100, Bangladesh; (M.S.); (A.M.); (A.S.M.S.)
- Bio-Science Research Initiative, Gopalganj 8100, Bangladesh
| | - Tasnin Al Hasib
- ABEx Bio-Research Center, East Azampur, Dhaka 1230, Bangladesh; (K.A.A.); (T.A.H.); (S.K.P.); (H.A.F.)
- Department of Biochemistry and Molecular Biology, Life Science Faculty, Bangabandhu Sheikh Mujibur Rahman Science and Technology University, Gopalganj 8100, Bangladesh; (M.S.); (A.M.); (A.S.M.S.)
- Bio-Science Research Initiative, Gopalganj 8100, Bangladesh
| | - Shamrat Kumar Paul
- ABEx Bio-Research Center, East Azampur, Dhaka 1230, Bangladesh; (K.A.A.); (T.A.H.); (S.K.P.); (H.A.F.)
- Department of Biochemistry and Molecular Biology, Life Science Faculty, Bangabandhu Sheikh Mujibur Rahman Science and Technology University, Gopalganj 8100, Bangladesh; (M.S.); (A.M.); (A.S.M.S.)
- Bio-Science Research Initiative, Gopalganj 8100, Bangladesh
| | - Md. Saddam
- Department of Biochemistry and Molecular Biology, Life Science Faculty, Bangabandhu Sheikh Mujibur Rahman Science and Technology University, Gopalganj 8100, Bangladesh; (M.S.); (A.M.); (A.S.M.S.)
- Bio-Science Research Initiative, Gopalganj 8100, Bangladesh
| | - Afsana Mimi
- Department of Biochemistry and Molecular Biology, Life Science Faculty, Bangabandhu Sheikh Mujibur Rahman Science and Technology University, Gopalganj 8100, Bangladesh; (M.S.); (A.M.); (A.S.M.S.)
| | - Abu Saim Mohammad Saikat
- Department of Biochemistry and Molecular Biology, Life Science Faculty, Bangabandhu Sheikh Mujibur Rahman Science and Technology University, Gopalganj 8100, Bangladesh; (M.S.); (A.M.); (A.S.M.S.)
| | - Hasan Al Faruque
- ABEx Bio-Research Center, East Azampur, Dhaka 1230, Bangladesh; (K.A.A.); (T.A.H.); (S.K.P.); (H.A.F.)
- Companion Diagnostics and Medical Technology Research Group, Daegu Gyeongbuk Institute of Science and Technology (DGIST), Daegu 42988, Korea
| | - Md. Ataur Rahman
- ABEx Bio-Research Center, East Azampur, Dhaka 1230, Bangladesh; (K.A.A.); (T.A.H.); (S.K.P.); (H.A.F.)
- Department of Pathology, College of Korean Medicine, Kyung Hee University, Seoul 02447, Korea
- Correspondence: (M.A.R.); (M.J.U.); (B.K.)
| | - Md. Jamal Uddin
- ABEx Bio-Research Center, East Azampur, Dhaka 1230, Bangladesh; (K.A.A.); (T.A.H.); (S.K.P.); (H.A.F.)
- Graduate School of Pharmaceutical Sciences, College of Pharmacy, Ewha Women’s University, Seoul 03760, Korea
- Correspondence: (M.A.R.); (M.J.U.); (B.K.)
| | - Bonglee Kim
- Department of Pathology, College of Korean Medicine, Kyung Hee University, Seoul 02447, Korea
- Correspondence: (M.A.R.); (M.J.U.); (B.K.)
| |
Collapse
|
14
|
Liu T, Chen J, Zhang Q, Hippe K, Hunt C, Le T, Cao R, Tang H. The Development of Machine Learning Methods in discriminating Secretory Proteins of Malaria Parasite. Curr Med Chem 2021; 29:807-821. [PMID: 34636289 DOI: 10.2174/0929867328666211005140625] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Revised: 07/28/2021] [Accepted: 08/15/2021] [Indexed: 11/22/2022]
Abstract
Malaria caused by Plasmodium falciparum is one of the major infectious diseases in the world. It is essential to exploit an effective method to predict secretory proteins of malaria parasites to develop effective cures and treatment. Biochemical assays can provide details for accurate identification of the secretory proteins, but these methods are expensive and time-consuming. In this paper, we summarized the machine learning-based identification algorithms and compared the construction strategies between different computational methods. Also, we discussed the use of machine learning to improve the ability of algorithms to identify proteins secreted by malaria parasites.
Collapse
Affiliation(s)
- Ting Liu
- School of Basic Medical Sciences, Southwest Medical University, Luzhou. China
| | - Jiamao Chen
- School of Basic Medical Sciences, Southwest Medical University, Luzhou. China
| | - Qian Zhang
- School of Basic Medical Sciences, Southwest Medical University, Luzhou. China
| | - Kyle Hippe
- Department of Computer Science, Pacific Lutheran University. United States
| | - Cassandra Hunt
- Department of Computer Science, Pacific Lutheran University. United States
| | - Thu Le
- Department of Computer Science, Pacific Lutheran University. United States
| | - Renzhi Cao
- Department of Computer Science, Pacific Lutheran University. United States
| | - Hua Tang
- School of Basic Medical Sciences, Southwest Medical University, Luzhou. China
| |
Collapse
|
15
|
Magura J, Hassan D, Moodley R, Mackraj I. Hesperidin-loaded nanoemulsions improve cytotoxicity, induce apoptosis, and downregulate miR-21 and miR-155 expression in MCF-7. J Microencapsul 2021; 38:486-495. [PMID: 34510994 DOI: 10.1080/02652048.2021.1979673] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
Hesperidin, a ubiquitous plant-based flavanone, was encapsulated into nanoemulsions (HP-NEM) using a spontaneous emulsification method to improve its solubility and enhance bioavailability and efficacy in breast cancer treatment using MCF-7 cell lines. The cytotoxic and apoptotic effects of HP-NEM against MCF-7 and its impact on oncomiRs, microRNA-21, and microRNA-155 expression were also assessed. The optimised HP-NEM displayed a spherical shape with 305 ± 40.8 nm, 0.308 ± 0.04, and -11.6 ± 3.30 mV and 93 ± 0.45% for particle size, polydispersity index (PDI), zeta-potential (ζ), and encapsulation efficiency, respectively. Cytotoxicity studies using MTT assay showed selective toxicity of the HP-NEM against MCF-7 without affecting normal cells (HEK 293). Treatment with the HP-NEM induced cell death through apoptosis, cell cycle arrest in the G2/M phase, and downregulated miR-21 and miR-155 expression in MCF-7. This study supports the use of HP-NEM as a potential therapeutic agent in breast cancer treatment.
Collapse
Affiliation(s)
- Judie Magura
- School of Laboratory Medicine and Medical Sciences, University of KwaZulu-Natal, Durban, South Africa
| | - Daniel Hassan
- Discipline of Pharmaceutical Sciences, University of KwaZulu-Natal, Durban, South Africa
| | - Roshila Moodley
- School of Chemistry and Physics, University of KwaZulu-Natal, Durban, South Africa
| | - Irene Mackraj
- Nelson R. Mandela School of Medicine, University of KwaZulu-Natal, Durban, South Africa
| |
Collapse
|
16
|
Ahmed F, Adnan M, Malik A, Tariq S, Kamal F, Ijaz B. Perception of breast cancer risk factors: Dysregulation of TGF-β/miRNA axis in Pakistani females. PLoS One 2021; 16:e0255243. [PMID: 34297787 PMCID: PMC8301651 DOI: 10.1371/journal.pone.0255243] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2021] [Accepted: 07/12/2021] [Indexed: 01/10/2023] Open
Abstract
Breast cancer poses a serious health risk for women throughout the world. Among the Asian population, Pakistani women have the highest risk of developing breast cancer. One out of nine women is diagnosed with breast cancer in Pakistan. The etiology and the risk factor leading to breast cancer are largely unknown. In the current study the risk factors that are most pertinent to the Pakistani population, the etiology, molecular mechanisms of tumor progression, and therapeutic targets of breast cancer are studied. A correlative, cross-sectional, descriptive, and questionnaire-based study was designed to predict the risk factors in breast cancer patients. Invasive Ductal Carcinoma (90%) and grade-II tumor (73.2%) formation are more common in our patient’s data set. Clinical parameters such as mean age of 47.5 years (SD ± 11.17), disturbed menstrual cycle (> 2), cousin marriages (repeated), and lactation period (< 0.5 Y) along with stress, dietary and environmental factors have an essential role in the development of breast cancer. In addition to this in silico analysis was performed to screen the miRNA regulating the TGF-beta pathway using TargetScanHuman, and correlation was depicted through Mindjet Manager. The information thus obtained was observed in breast cancer clinical samples both in peripheral blood mononuclear cells, and biopsy through quantitative real-time PCR. There was a significant dysregulation (**P>0.001) of the TGF-β1 signaling pathway and the miRNAs (miR-29a, miR-140, and miR-148a) in patients’ biopsy in grade and stage specifically, correlated with expression in blood samples. miRNAs (miR-29a and miR-140, miR-148a) can be an effective diagnostic and prognostic marker as they regulate SMAD4 and SMAD2 expression respectively in breast cancer blood and biopsy samples. Therefore, proactive therapeutic strategies can be devised considering negatively regulated cascade genes and amalgamated miRNAs to control breast cancer better.
Collapse
Affiliation(s)
- Fayyaz Ahmed
- Laboratory of Applied and Functional Genomics, National Center of Excellence in Molecular Biology, University of the Punjab Lahore, Lahore, Pakistan
| | - Muhammad Adnan
- Laboratory of Applied and Functional Genomics, National Center of Excellence in Molecular Biology, University of the Punjab Lahore, Lahore, Pakistan
| | - Ayesha Malik
- Laboratory of Applied and Functional Genomics, National Center of Excellence in Molecular Biology, University of the Punjab Lahore, Lahore, Pakistan
| | - Somayya Tariq
- Laboratory of Applied and Functional Genomics, National Center of Excellence in Molecular Biology, University of the Punjab Lahore, Lahore, Pakistan
| | - Farukh Kamal
- Department of Pathology, Fatima Jinnah Medical University, Lahore, Pakistan
| | - Bushra Ijaz
- Laboratory of Applied and Functional Genomics, National Center of Excellence in Molecular Biology, University of the Punjab Lahore, Lahore, Pakistan
- * E-mail:
| |
Collapse
|
17
|
Advani D, Kumar P. Therapeutic Targeting of Repurposed Anticancer Drugs in Alzheimer's Disease: Using the Multiomics Approach. ACS OMEGA 2021; 6:13870-13887. [PMID: 34095679 PMCID: PMC8173619 DOI: 10.1021/acsomega.1c01526] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/22/2021] [Accepted: 05/10/2021] [Indexed: 05/08/2023]
Abstract
AIM/HYPOTHESIS The complexity and heterogeneity of multiple pathological features make Alzheimer's disease (AD) a major culprit to global health. Drug repurposing is an inexpensive and reliable approach to redirect the existing drugs for new indications. The current study aims to study the possibility of repurposing approved anticancer drugs for AD treatment. We proposed an in silico pipeline based on "omics" data mining that combines genomics, transcriptomics, and metabolomics studies. We aimed to validate the neuroprotective properties of repurposed drugs and to identify the possible mechanism of action of the proposed drugs in AD. RESULTS We generated a list of AD-related genes and then searched DrugBank database and Therapeutic Target Database to find anticancer drugs related to potential AD targets. Specifically, we researched the available approved anticancer drugs and excluded the information of investigational and experimental drugs. We developed a computational pipeline to prioritize the anticancer drugs having a close association with AD targets. From data mining, we generated a list of 2914 AD-related genes and obtained 49 potential druggable targets by functional enrichment analysis. The protein-protein interaction (PPI) studies for these genes revealed 641 interactions. We found that 15 AD risk/direct PPI genes were associated with 30 approved oncology drugs. The computational validation of candidate drug-target interactions, structural and functional analysis, investigation of related molecular mechanisms, and literature-based analysis resulted in four repurposing candidates, of which three drugs were epidermal growth factor receptor (EGFR) inhibitors. CONCLUSION Our computational drug repurposing approach proposed EGFR inhibitors as potential repurposing drugs for AD. Consequently, our proposed framework could be used for drug repurposing for different indications in an economical and efficient way.
Collapse
Affiliation(s)
- Dia Advani
- Molecular Neuroscience and Functional
Genomics Laboratory, Delhi Technological
University, Shahabad Daulatpur, Bawana Road, Delhi 110042, India
| | - Pravir Kumar
- Molecular Neuroscience and Functional
Genomics Laboratory, Delhi Technological
University, Shahabad Daulatpur, Bawana Road, Delhi 110042, India
| |
Collapse
|
18
|
Abstract
Background:
Bioluminescence is a unique and significant phenomenon in nature.
Bioluminescence is important for the lifecycle of some organisms and is valuable in biomedical
research, including for gene expression analysis and bioluminescence imaging technology. In recent
years, researchers have identified a number of methods for predicting bioluminescent proteins
(BLPs), which have increased in accuracy, but could be further improved.
Method:
In this study, a new bioluminescent proteins prediction method, based on a voting
algorithm, is proposed. Four methods of feature extraction based on the amino acid sequence were
used. 314 dimensional features in total were extracted from amino acid composition,
physicochemical properties and k-spacer amino acid pair composition. In order to obtain the highest
MCC value to establish the optimal prediction model, a voting algorithm was then used to build the
model. To create the best performing model, the selection of base classifiers and vote counting rules
are discussed.
Results:
The proposed model achieved 93.4% accuracy, 93.4% sensitivity and
91.7% specificity in the test set, which was better than any other method. A previous prediction of
bioluminescent proteins in three lineages was also improved using the model building method,
resulting in greatly improved accuracy.
Collapse
Affiliation(s)
- Shulin Zhao
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Ying Ju
- School of Informatics, Xiamen University, Xiamen, China
| | - Xiucai Ye
- Department of Computer Science, University of Tsukuba, Tsukuba Science City, Japan
| | - Jun Zhang
- Rehabilitation Department, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin, China
| | - Shuguang Han
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
19
|
Yu L, Wang M, Yang Y, Xu F, Zhang X, Xie F, Gao L, Li X. Predicting therapeutic drugs for hepatocellular carcinoma based on tissue-specific pathways. PLoS Comput Biol 2021; 17:e1008696. [PMID: 33561121 PMCID: PMC7920387 DOI: 10.1371/journal.pcbi.1008696] [Citation(s) in RCA: 39] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2020] [Revised: 03/01/2021] [Accepted: 01/12/2021] [Indexed: 02/06/2023] Open
Abstract
Hepatocellular carcinoma (HCC) is a significant health problem worldwide with poor prognosis. Drug repositioning represents a profitable strategy to accelerate drug discovery in the treatment of HCC. In this study, we developed a new approach for predicting therapeutic drugs for HCC based on tissue-specific pathways and identified three newly predicted drugs that are likely to be therapeutic drugs for the treatment of HCC. We validated these predicted drugs by analyzing their overlapping drug indications reported in PubMed literature. By using the cancer cell line data in the database, we constructed a Connectivity Map (CMap) profile similarity analysis and KEGG enrichment analysis on their related genes. By experimental validation, we found securinine and ajmaline significantly inhibited cell viability of HCC cells and induced apoptosis. Among them, securinine has lower toxicity to normal liver cell line, which is worthy of further research. Our results suggested that the proposed approach was effective and accurate for discovering novel therapeutic options for HCC. This method also could be used to indicate unmarked drug-disease associations in the Comparative Toxicogenomics Database. Meanwhile, our method could also be applied to predict the potential drugs for other types of tumors by changing the database.
Collapse
Affiliation(s)
- Liang Yu
- School of Computer Science and Technology, Xidian University, Shaanxi, China
| | - Meng Wang
- Shandong Provincial Key Laboratory of Animal Cell and Developmental Biology, School of Life Sciences, Advanced Medical Research Institute, Shandong University, 72, Jimo District, Qingdao, Shandong, China
| | - Yang Yang
- Shandong Provincial Key Laboratory of Animal Cell and Developmental Biology, School of Life Sciences, Advanced Medical Research Institute, Shandong University, 72, Jimo District, Qingdao, Shandong, China
| | - Fengdan Xu
- School of Computer Science and Technology, Xidian University, Shaanxi, China
| | - Xu Zhang
- Shandong Provincial Key Laboratory of Animal Cell and Developmental Biology, School of Life Sciences, Advanced Medical Research Institute, Shandong University, 72, Jimo District, Qingdao, Shandong, China
| | - Fei Xie
- Shandong Provincial Key Laboratory of Animal Cell and Developmental Biology, School of Life Sciences, Advanced Medical Research Institute, Shandong University, 72, Jimo District, Qingdao, Shandong, China
| | - Lin Gao
- School of Computer Science and Technology, Xidian University, Shaanxi, China
| | - Xiangzhi Li
- Shandong Provincial Key Laboratory of Animal Cell and Developmental Biology, School of Life Sciences, Advanced Medical Research Institute, Shandong University, 72, Jimo District, Qingdao, Shandong, China
| |
Collapse
|
20
|
Liang G, Wu J, Xu L. A prognosis-related based method for miRNA selection on liver hepatocellular carcinoma prediction. Comput Biol Chem 2021; 91:107433. [PMID: 33540232 DOI: 10.1016/j.compbiolchem.2020.107433] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2020] [Revised: 12/16/2020] [Accepted: 12/20/2020] [Indexed: 12/18/2022]
Abstract
Hepatocellular carcinoma (HCC) is considered as the sixth most common cancer in the world, and it is also considered as one of the causes of death. Moreover, the poor prognosis of recurrence of HCC after surgery and metastasis is also a big problem for human health. If the disease can be diagnosed earlier, the survival rate of the patients will be improved significantly. In the early stage of hepatocellular carcinoma, the expression of miRNAs is likely to become abnormal. In our work, the expression profile of miRNAs of human HCC in cancer tissue is compared with their adjacent tissue samples collected from tumor cancer genomic Atlas (TCGA) platform, then the genes with significant difference are selected by Limma test. Selected genes are referred to predict miRNAs related to the prognosis of HCC patients. Finally, miRNAs regulated by target genes are selected by our method, and the experimental results demonstrated that our method is more efficient than biology wet experimental method with lower cost.
Collapse
Affiliation(s)
- Guangmin Liang
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen, 518000, China
| | - Jin Wu
- School of Management, Shenzhen Polytechnic, Shenzhen, 518000, China.
| | - Lei Xu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen, 518000, China.
| |
Collapse
|
21
|
Screening of Prospective Plant Compounds as H1R and CL1R Inhibitors and Its Antiallergic Efficacy through Molecular Docking Approach. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2021. [DOI: 10.1155/2021/6683407] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Allergens have the ability to enter the body and cause illness. Leukotriene is the widespread allergen which could stimulate mast cells to discharge histamine which causes allergy symptoms. An effective strategy for treating leukotriene-induced allergy is to find the inhibitors of leukotriene or histamine activity from phytochemicals. For this purpose, a library of 8,500 phytochemicals was generated using MOE software. The structures of histamine-1 receptor and cysteinyl leukotriene receptor-1 were predicted by the homology modeling method through the SWISS model. The phytochemicals were docked with predicted structures of histamine-1 and cysteinyl leukotriene receptor-1 in MOE software to determine the binding affinity of the phytochemicals against the targets. Moreover, chemoinformatics properties and ADMET of phytochemicals were assessed to find the drug likeness behavior of compounds. Compound ID 10054216 has the lowest
-score value for H-1 receptor that is -18.9186 kcal/mol which is lower than the value of standard -15.167 kcal/mol. The other compounds 393471, 71448939, 10722577, and 442614 also showed good
-score values than the standard. Moreover, compound ID 11843082 has the lowest
-score value for CL1R that is -15.481 kcal/mol which is lower than the value of standard -12.453 kcal/mol. The other compounds 72284, 5282102, 66559251, and 102506430 also showed good
-score values than the standard. In this research article, we performed molecular docking to find the best inhibitors against H1R and CL1R and their antiallergic efficacy. This in silico knowledge will be helpful in near future for the design of novel, safe, and less costing H-1 receptor and CL1R inhibitors with the aim to improve human life quality.
Collapse
|
22
|
Lv Z, Ding H, Wang L, Zou Q. A Convolutional Neural Network Using Dinucleotide One-hot Encoder for identifying DNA N6-Methyladenine Sites in the Rice Genome. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2020.09.056] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|
23
|
Nandakumar R, Dinu V. Developing a machine learning model to identify protein–protein interaction hotspots to facilitate drug discovery. PeerJ 2020; 8:e10381. [PMID: 33354416 PMCID: PMC7727375 DOI: 10.7717/peerj.10381] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2020] [Accepted: 10/27/2020] [Indexed: 02/01/2023] Open
Abstract
Throughout the history of drug discovery, an enzymatic-based approach for identifying new drug molecules has been primarily utilized. Recently, protein–protein interfaces that can be disrupted to identify small molecules that could be viable targets for certain diseases, such as cancer and the human immunodeficiency virus, have been identified. Existing studies computationally identify hotspots on these interfaces, with most models attaining accuracies of ~70%. Many studies do not effectively integrate information relating to amino acid chains and other structural information relating to the complex. Herein, (1) a machine learning model has been created and (2) its ability to integrate multiple features, such as those associated with amino-acid chains, has been evaluated to enhance the ability to predict protein–protein interface hotspots. Virtual drug screening analysis of a set of hotspots determined on the EphB2-ephrinB2 complex has also been performed. The predictive capabilities of this model offer an AUROC of 0.842, sensitivity/recall of 0.833, and specificity of 0.850. Virtual screening of a set of hotspots identified by the machine learning model developed in this study has identified potential medications to treat diseases caused by the overexpression of the EphB2-ephrinB2 complex, including prostate, gastric, colorectal and melanoma cancers which are linked to EphB2 mutations. The efficacy of this model has been demonstrated through its successful ability to predict drug-disease associations previously identified in literature, including cimetidine, idarubicin, pralatrexate for these conditions. In addition, nadolol, a beta blocker, has also been identified in this study to bind to the EphB2-ephrinB2 complex, and the possibility of this drug treating multiple cancers is still relatively unexplored.
Collapse
|
24
|
Cansaran-Duman D, Tanman Ü, Yangın S, Atakol O. The comparison of miRNAs that respond to anti-breast cancer drugs and usnic acid for the treatment of breast cancer. Cytotechnology 2020; 72:10.1007/s10616-020-00430-7. [PMID: 33128199 PMCID: PMC7695759 DOI: 10.1007/s10616-020-00430-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2020] [Accepted: 10/14/2020] [Indexed: 02/01/2023] Open
Abstract
This study was designed to compare usnic acid with anti-breast cancer drug molecules (A-BCDM) routinely used in the treatment of breast cancer. The miRNA information of 17 anti-breast cancer drug used in breast cancer treatment was obtained from the Small Molecule-miRNA Network-Based Inferance (SMIR-NBI) tool. We had been determined common and different expressed miRNAs between 17 A-BCDM & usnic acid and were classified according to the common miRNAs to reveal molecular similarity. As a result of the bioinformatic analyzes, 20 common miRNAs were determined between 17 A-BCDM and usnic acid. The common miRNAs were analyzed with bioinformatic tolls for determining pathways and targets. The most common miRNAs for 6 of 17 A-BCDM and usnic acid were determined as miR-374a-5p and miR-26a-5p. We compared the anti-proliferative effect of usnic acid and one of the 17 A-BCDM that tamoxifen on MDA-MB-231 triple negative breast cancer cell with real-time cell analysis system. The real time PCR assay was carried out with miR-26a-5p for evaluate to expression level of MDA-MB-231 breast cancer cell and MCF-12A non-cancerous epithelial breast cell. As a result of study, usnic acid as novel candidate drug molecule showed high similarity ratio with 5-Fluorouracil, Sulindac Sulfide, Curcumin and Cisplatin A-BCDM used in treatment of breast cancer. miR-26a-5p as common response miRNA of usnic acid and tamoxifen was showed a decreased level of expression by validated qRT-PCR assay. The obtained from study, in addition to 17 A-BCDM, usnic acid has also the potential to be used as a candidate molecule in the treatment of breast cancer. Moreover, miR-26a-5p might be used as a biomarker in the treatment of breast cancer but further analysis is required.
Collapse
Affiliation(s)
| | - Ümmügülsüm Tanman
- Ankara University, Biotechnology Institute, Keçiören, Ankara, Turkey
| | - Sevcan Yangın
- Ankara University, Biotechnology Institute, Keçiören, Ankara, Turkey
| | - Orhan Atakol
- Faculty of Science, Department of Chemistry, Ankara University, Tandoğan, Ankara, Turkey
| |
Collapse
|
25
|
A Method for Identifying Vesicle Transport Proteins Based on LibSVM and MRMD. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2020; 2020:8926750. [PMID: 33133228 PMCID: PMC7591939 DOI: 10.1155/2020/8926750] [Citation(s) in RCA: 45] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/04/2020] [Revised: 08/14/2020] [Accepted: 09/16/2020] [Indexed: 12/14/2022]
Abstract
With the development of computer technology, many machine learning algorithms have been applied to the field of biology, forming the discipline of bioinformatics. Protein function prediction is a classic research topic in this subject area. Though many scholars have made achievements in identifying protein by different algorithms, they often extract a large number of feature types and use very complex classification methods to obtain little improvement in the classification effect, and this process is very time-consuming. In this research, we attempt to utilize as few features as possible to classify vesicular transportation proteins and to simultaneously obtain a comparative satisfactory classification result. We adopt CTDC which is a submethod of the method of composition, transition, and distribution (CTD) to extract only 39 features from each sequence, and LibSVM is used as the classification method. We use the SMOTE method to deal with the problem of dataset imbalance. There are 11619 protein sequences in our dataset. We selected 4428 sequences to train our classification model and selected other 1832 sequences from our dataset to test the classification effect and finally achieved an accuracy of 71.77%. After dimension reduction by MRMD, the accuracy is 72.16%.
Collapse
|
26
|
Xu L, Liang G, Chen B, Tan X, Xiang H, Liao C. A Computational Method for the Identification of Endolysins and Autolysins. Protein Pept Lett 2020; 27:329-336. [PMID: 31577192 DOI: 10.2174/0929866526666191002104735] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2019] [Revised: 06/27/2019] [Accepted: 09/03/2019] [Indexed: 12/21/2022]
Abstract
BACKGROUND Cell lytic enzyme is a kind of highly evolved protein, which can destroy the cell structure and kill the bacteria. Compared with antibiotics, cell lytic enzyme will not cause serious problem of drug resistance of pathogenic bacteria. Thus, the study of cell wall lytic enzymes aims at finding an efficient way for curing bacteria infectious. Compared with using antibiotics, the problem of drug resistance becomes more serious. Therefore, it is a good choice for curing bacterial infections by using cell lytic enzymes. Cell lytic enzyme includes endolysin and autolysin and the difference between them is the purpose of the break of cell wall. The identification of the type of cell lytic enzymes is meaningful for the study of cell wall enzymes. OBJECTIVE In this article, our motivation is to predict the type of cell lytic enzyme. Cell lytic enzyme is helpful for killing bacteria, so it is meaningful for study the type of cell lytic enzyme. However, it is time consuming to detect the type of cell lytic enzyme by experimental methods. Thus, an efficient computational method for the type of cell lytic enzyme prediction is proposed in our work. METHODS We propose a computational method for the prediction of endolysin and autolysin. First, a data set containing 27 endolysins and 41 autolysins is built. Then the protein is represented by tripeptides composition. The features are selected with larger confidence degree. At last, the classifier is trained by the labeled vectors based on support vector machine. The learned classifier is used to predict the type of cell lytic enzyme. RESULTS Following the proposed method, the experimental results show that the overall accuracy can attain 97.06%, when 44 features are selected. Compared with Ding's method, our method improves the overall accuracy by nearly 4.5% ((97.06-92.9)/92.9%). The performance of our proposed method is stable, when the selected feature number is from 40 to 70. The overall accuracy of tripeptides optimal feature set is 94.12%, and the overall accuracy of Chou's amphiphilic PseAAC method is 76.2%. The experimental results also demonstrate that the overall accuracy is improved by nearly 18% when using the tripeptides optimal feature set. CONCLUSION The paper proposed an efficient method for identifying endolysin and autolysin. In this paper, support vector machine is used to predict the type of cell lytic enzyme. The experimental results show that the overall accuracy of the proposed method is 94.12%, which is better than some existing methods. In conclusion, the selected 44 features can improve the overall accuracy for identification of the type of cell lytic enzyme. Support vector machine performs better than other classifiers when using the selected feature set on the benchmark data set.
Collapse
Affiliation(s)
- Lei Xu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen, China
| | - Guangmin Liang
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen, China
| | - Baowen Chen
- School of Software, Shenzhen Institute of Information Technology, Shenzhen, China
| | - Xu Tan
- School of Software, Shenzhen Institute of Information Technology, Shenzhen, China
| | - Huaikun Xiang
- School of Automotive and Transportation Engineering, Shenzhen Polytechnic, Shenzhen, China
| | - Changrui Liao
- Key Laboratory of Optoelectronic Devices and Systems of Ministry of Education and Guangdong Province, College of Optoelectronic Engineering, Shenzhen University, Shenzhen, China
| |
Collapse
|
27
|
Prediction of N7-methylguanosine sites in human RNA based on optimal sequence features. Genomics 2020; 112:4342-4347. [PMID: 32721444 DOI: 10.1016/j.ygeno.2020.07.035] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2020] [Revised: 07/18/2020] [Accepted: 07/22/2020] [Indexed: 12/14/2022]
Abstract
N-7 methylguanosine (m7G) modification is a ubiquitous post-transcriptional RNA modification which is vital for maintaining RNA function and protein translation. Developing computational tools will help us to easily predict the m7G sites in RNA sequence. In this work, we designed a sequence-based method to identify the modification site in human RNA sequences. At first, several kinds of sequence features were extracted to code m7G and non-m7G samples. Subsequently, we used mRMR, F-score, and Relief to obtain the optimal subset of features which could produce the maximum prediction accuracy. In 10-fold cross-validation, results showed that the highest accuracy is 94.67% achieved by support vector machine (SVM) for identifying m7G sites in human genome. In addition, we examined the performances of other algorithms and found that the SVM-based model outperformed others. The results indicated that the predictor could be a useful tool for studying m7G. A prediction model is available at https://github.com/MapFM/m7g_model.git.
Collapse
|
28
|
Yu L, Shi Y, Zou Q, Wang S, Zheng L, Gao L. Exploring Drug Treatment Patterns Based on the Action of Drug and Multilayer Network Model. Int J Mol Sci 2020; 21:E5014. [PMID: 32708644 PMCID: PMC7404256 DOI: 10.3390/ijms21145014] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2020] [Revised: 07/13/2020] [Accepted: 07/14/2020] [Indexed: 02/01/2023] Open
Abstract
Some drugs can be used to treat multiple diseases, suggesting potential patterns in drug treatment. Determination of drug treatment patterns can improve our understanding of the mechanisms of drug action, enabling drug repurposing. A drug can be associated with a multilayer tissue-specific protein-protein interaction (TSPPI) network for the diseases it is used to treat. Proteins usually interact with other proteins to achieve functions that cause diseases. Hence, studying drug treatment patterns is similar to studying common module structures in multilayer TSPPI networks. Therefore, we propose a network-based model to study the treatment patterns of drugs. The method was designated SDTP (studying drug treatment pattern) and was based on drug effects and a multilayer network model. To demonstrate the application of the SDTP method, we focused on analysis of trichostatin A (TSA) in leukemia, breast cancer, and prostate cancer. We constructed a TSPPI multilayer network and obtained candidate drug-target modules from the network. Gene ontology analysis provided insights into the significance of the drug-target modules and co-expression networks. Finally, two modules were obtained as potential treatment patterns for TSA. Through analysis of the significance, composition, and functions of the selected drug-target modules, we validated the feasibility and rationality of our proposed SDTP method for identifying drug treatment patterns. In summary, our novel approach used a multilayer network model to overcome the shortcomings of single-layer networks and combined the network with information on drug activity. Based on the discovered drug treatment patterns, we can predict the potential diseases that the drug can treat. That is, if a disease-related protein module has a similar structure, then the drug is likely to be a potential drug for the treatment of the disease.
Collapse
Affiliation(s)
- Liang Yu
- School of Computer Science and Technology, Xidian University, Xi’an 710071, China; (Y.S.); (L.G.)
| | - Yayong Shi
- School of Computer Science and Technology, Xidian University, Xi’an 710071, China; (Y.S.); (L.G.)
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology, Chengdu 650004, China;
| | - Shuhang Wang
- Department of Radiology, Massachusetts General Hospital, Boston, MA 02114, USA;
| | - Liping Zheng
- School of Computer Science and Technology, Liaocheng University, Liaocheng 252000, China;
| | - Lin Gao
- School of Computer Science and Technology, Xidian University, Xi’an 710071, China; (Y.S.); (L.G.)
| |
Collapse
|
29
|
Yuan L, Guo F, Wang L, Zou Q. Prediction of tumor metastasis from sequencing data in the era of genome sequencing. Brief Funct Genomics 2020; 18:412-418. [PMID: 31204784 DOI: 10.1093/bfgp/elz010] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2019] [Revised: 02/22/2019] [Accepted: 04/26/2019] [Indexed: 02/01/2023] Open
Abstract
Tumor metastasis is the key reason for the high mortality rate of tumor. Growing number of scholars have begun to pay attention to the research on tumor metastasis and have achieved satisfactory results in this field. The advent of the era of sequencing has enabled us to study cancer metastasis at the molecular level, which is essential for understanding the molecular mechanism of metastasis, identifying diagnostic markers and therapeutic targets and guiding clinical decision-making. We reviewed the metastasis-related studies using sequencing data, covering detection of metastasis origin sites, determination of metastasis potential and identification of distal metastasis sites. These findings include the discovery of relevant markers and the presentation of prediction tools. Finally, we discussed the challenge of studying metastasis considering the difficulty of obtaining metastatic cancer data, the complexity of tumor heterogeneity and the uncertainty of sample labels.
Collapse
Affiliation(s)
- Linlin Yuan
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Fei Guo
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Lei Wang
- College of Computer Engineering & Applied Mathematics, Changsha University, Changsha, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China.,Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
30
|
Chen SI, Tseng HT, Hsieh CC. Evaluating the impact of soy compounds on breast cancer using the data mining approach. Food Funct 2020; 11:4561-4570. [PMID: 32400770 DOI: 10.1039/c9fo00976k] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Accumulating evidence has shown that soy intake is associated with the promotion of health and prevention of cancers. However, the relationship between the intake of soy compounds and the risk of breast cancer is still debatable. In this study, we use mathematical models for assessing the impact of soy phytoestrogens and protein/peptide intervention on breast cancer development using the datasets acquired from a large number of published studies. We used data mining models, including the decision tree classification and association rule methods, to analyze 478 data collected from 201 research papers. The results indicated that the intervention of soy proteins and peptides, especially lunasin (LUN) and bowman-birk protease inhibitor (BBI), has a positive impact on different types of breast cancer, while the effects of soy phytoestrogens are inconsistent in breast cancer development. Among soy phytoestrogens, daidzein (DAI) exhibited the highest negative impact on breast cancer, followed by coumestrol (COU), soysapogenol (SAP), genistein (GEN), and equol (EQ). With regard to the type of cancer, phytoestrogens should be carefully considered in estrogen receptor (ER)+ or progesterone receptor (PR)+ breast cancer. In the case of ER-, PR- or triple negative type, both soy categories can be used as auxiliary interventions. In summary, this is the first study to use data mining to explore the relationship between the intake of soy phytoestrogens or proteins/peptides and breast cancer development. Our findings indicate that soy intervention might reduce breast cancer development. However, the specific soy compound and cancer type should be considered before allocating a precise nutrient intervention.
Collapse
Affiliation(s)
- Sheng-I Chen
- Department of Industrial Engineering and Management, National Chiao Tung University, Hsinchu 30010, Taiwan
| | | | | |
Collapse
|
31
|
Dao FY, Lv H, Yang YH, Zulfiqar H, Gao H, Lin H. Computational identification of N6-methyladenosine sites in multiple tissues of mammals. Comput Struct Biotechnol J 2020; 18:1084-1091. [PMID: 32435427 PMCID: PMC7229270 DOI: 10.1016/j.csbj.2020.04.015] [Citation(s) in RCA: 58] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2020] [Revised: 04/20/2020] [Accepted: 04/21/2020] [Indexed: 12/12/2022] Open
Abstract
N6-methyladenosine (m6A) is the methylation of the adenosine at the nitrogen-6 position, which is the most abundant RNA methylation modification and involves a series of important biological processes. Accurate identification of m6A sites in genome-wide is invaluable for better understanding their biological functions. In this work, an ensemble predictor named iRNA-m6A was established to identify m6A sites in multiple tissues of human, mouse and rat based on the data from high-throughput sequencing techniques. In the proposed predictor, RNA sequences were encoded by physical-chemical property matrix, mono-nucleotide binary encoding and nucleotide chemical property. Subsequently, these features were optimized by using minimum Redundancy Maximum Relevance (mRMR) feature selection method. Based on the optimal feature subset, the best m6A classification models were trained by Support Vector Machine (SVM) with 5-fold cross-validation test. Prediction results on independent dataset showed that our proposed method could produce the excellent generalization ability. We also established a user-friendly webserver called iRNA-m6A which can be freely accessible at http://lin-group.cn/server/iRNA-m6A. This tool will provide more convenience to users for studying m6A modification in different tissues.
Collapse
Affiliation(s)
| | | | - Yu-He Yang
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Hasan Zulfiqar
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Hui Gao
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Hao Lin
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| |
Collapse
|
32
|
Juengpanich S, Topatana W, Lu C, Staiculescu D, Li S, Cao J, Lin J, Hu J, Chen M, Chen J, Cai X. Role of cellular, molecular and tumor microenvironment in hepatocellular carcinoma: Possible targets and future directions in the regorafenib era. Int J Cancer 2020; 147:1778-1792. [PMID: 32162677 DOI: 10.1002/ijc.32970] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2019] [Revised: 03/02/2020] [Accepted: 03/09/2020] [Indexed: 12/12/2022]
Abstract
Hepatocellular carcinoma (HCC) remains as one of the major causes of cancer-related mortality, despite the recent development of new therapeutic options. Regorafenib, an oral multikinase inhibitor, is the first systemic therapy that has a survival benefit for patients with advanced HCC that have a poor response to sorafenib. Even though regorafenib has been approved by the FDA, the clinical trial for regorafenib treatment does not show significant improvement in overall survival. The impaired efficacy of regorafenib caused by various resistance mechanisms, including epithelial-mesenchymal transitions, inflammation, angiogenesis, hypoxia, oxidative stress, fibrosis and autophagy, still needs to be resolved. In this review, we provide insight on regorafenib microenvironmental, molecular and cellular mechanisms and interactions in HCC treatment. The aim of this review is to help physicians select patients that would obtain the maximal benefits from regorafenib in HCC therapy.
Collapse
Affiliation(s)
- Sarun Juengpanich
- Department of General Surgery, Sir Run-Run Shaw Hospital, Zhejiang University, Hangzhou, China.,School of Medicine, Zhejiang University, Hangzhou, China
| | - Win Topatana
- Department of General Surgery, Sir Run-Run Shaw Hospital, Zhejiang University, Hangzhou, China.,School of Medicine, Zhejiang University, Hangzhou, China
| | - Chen Lu
- Department of General Surgery, Sir Run-Run Shaw Hospital, Zhejiang University, Hangzhou, China.,School of Medicine, Zhejiang University, Hangzhou, China
| | - Daniel Staiculescu
- Department of Radiation Oncology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
| | - Shijie Li
- Department of General Surgery, Sir Run-Run Shaw Hospital, Zhejiang University, Hangzhou, China
| | - Jiasheng Cao
- Department of General Surgery, Sir Run-Run Shaw Hospital, Zhejiang University, Hangzhou, China
| | - Jiacheng Lin
- School of Medicine, Zhejiang University, Hangzhou, China
| | - Jiahao Hu
- Department of General Surgery, Sir Run-Run Shaw Hospital, Zhejiang University, Hangzhou, China
| | - Mingyu Chen
- Department of General Surgery, Sir Run-Run Shaw Hospital, Zhejiang University, Hangzhou, China.,School of Medicine, Zhejiang University, Hangzhou, China
| | - Jiang Chen
- Department of General Surgery, Sir Run-Run Shaw Hospital, Zhejiang University, Hangzhou, China.,Department of Radiation Oncology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
| | - Xiujun Cai
- Department of General Surgery, Sir Run-Run Shaw Hospital, Zhejiang University, Hangzhou, China.,School of Medicine, Zhejiang University, Hangzhou, China
| |
Collapse
|
33
|
Hou R, Wang L, Wu YJ. Predicting ATP-Binding Cassette Transporters Using the Random Forest Method. Front Genet 2020; 11:156. [PMID: 32269586 PMCID: PMC7109328 DOI: 10.3389/fgene.2020.00156] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2019] [Accepted: 02/11/2020] [Indexed: 12/21/2022] Open
Abstract
ATP-binding cassette (ABC) proteins play important roles in a wide variety of species. These proteins are involved in absorbing nutrients, exporting toxic substances, and regulating potassium channels, and they contribute to drug resistance in cancer cells. Therefore, the identification of ABC transporters is an urgent task. The present study used 188D as the feature extraction method, which is based on sequence information and physicochemical properties. We also visualized the feature extracted by t-Distributed Stochastic Neighbor Embedding (t-SNE). The sample based on the features extracted by 188D may be separated. Further, random forest (RF) is an efficient classifier to identify proteins. Under the 10-fold cross-validation of the model proposed here for a training set, the average accuracy rate of 10 training sets was 89.54%. We obtained values of 0.87 for specificity, 0.92 for sensitivity, and 0.79 for MCC. In the testing set, the accuracy achieved was 89%. These results suggest that the model combining 188D with RF is an optimal tool to identify ABC transporters.
Collapse
Affiliation(s)
- Ruiyan Hou
- Laboratory of Molecular Toxicology, State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing, China.,College of Life Science, University of Chinese Academy of Sciences, Beijing, China
| | - Lida Wang
- Department of Scientific Research, General Hospital of Heilongjiang Province Land Reclamation Bureau, Harbin, China
| | - Yi-Jun Wu
- Laboratory of Molecular Toxicology, State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
34
|
Dou L, Li X, Ding H, Xu L, Xiang H. Is There Any Sequence Feature in the RNA Pseudouridine Modification Prediction Problem? MOLECULAR THERAPY. NUCLEIC ACIDS 2020; 19:293-303. [PMID: 31865116 PMCID: PMC6931122 DOI: 10.1016/j.omtn.2019.11.014] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/08/2019] [Revised: 10/29/2019] [Accepted: 11/11/2019] [Indexed: 01/01/2023]
Abstract
Pseudouridine (Ψ) is the most abundant RNA modification and has been found in many kinds of RNAs, including snRNA, rRNA, tRNA, mRNA, and snoRNA. Thus, Ψ sites play a significant role in basic research and drug development. Although some experimental techniques have been developed to identify Ψ sites, they are expensive and time consuming, especially in the post-genomic era with the explosive growth of known RNA sequences. Thus, highly accurate computational methods are urgently required to quickly detect the Ψ sites on uncharacterized RNA sequences. Several predictors have been proposed using multifarious features, but their evaluated performances are still unsatisfactory. In this study, we first identified Ψ sites for H. sapiens, S. cerevisiae, and M. musculus using the sequence features from the bi-profile Bayes (BPB) method based on the random forest (RF) and support vector machine (SVM) algorithms, where the performances were evaluated using 5-fold cross-validation and independent tests. It was found that the SVM-based accuracies were 3.55% and 5.09% lower than the iPseU-CUU predictor for the H_990 and S_628 datasets, respectively. Almost the same-level results were obtained for M_994 and an independent H_200 dataset, even showing a 5.0% improvement for S_200. Then, three different kinds of features, including basic Kmer, general parallel correlation pseudo-dinucleotide composition (PC-PseDNC-General), and nucleotide chemical property (NCP) and nucleotide density (ND) from the iRNA-PseU method, were combined with BPB to show their comprehensive performances, where the effective features are selected by the max-relevance-max-distance (MRMD) method. The best evaluated accuracies of the combined features for the S_628 and M_994 datasets were achieved at 70.54% and 72.45%, which were 2.39% and 0.65% higher than iPseU-CUU. For the S_200 dataset, it was also improved 8% from 69% to 77%. However, there was no obvious improvement for H. sapiens, which was evaluated as approximately 63.23% and 72.0% for the H_990 and H_200 datasets, respectively. The overall performances for Ψ identification using BPB features as well as the combined features were not obviously improved. Although some kinds of feature extraction methods based on the RNA sequence information have been applied to construct the predictors in previous studies, the corresponding accuracies are generally in the range of 60%-70%. Thus, researchers need to reconsider whether there is any sequence feature in the RNA Ψ modification prediction problem.
Collapse
Affiliation(s)
- Lijun Dou
- School of Automotive and Transportation Engineering, Shenzhen Polytechnic, Shenzhen, China; Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Xiaoling Li
- Department of Oncology, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin, China
| | - Hui Ding
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Lei Xu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen, China.
| | - Huaikun Xiang
- School of Automotive and Transportation Engineering, Shenzhen Polytechnic, Shenzhen, China.
| |
Collapse
|
35
|
Meng C, Zhang J, Ye X, Guo F, Zou Q. Review and comparative analysis of machine learning-based phage virion protein identification methods. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2020; 1868:140406. [PMID: 32135196 DOI: 10.1016/j.bbapap.2020.140406] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/01/2020] [Revised: 02/14/2020] [Accepted: 02/27/2020] [Indexed: 02/01/2023]
Abstract
Phage virion protein (PVP) identification plays key role in elucidating relationships between phages and hosts. Moreover, PVP identification can facilitate the design of related biochemical entities. Recently, several machine learning approaches have emerged for this purpose and have shown their potential capacities. In this study, the proposed PVP identifiers are systemically reviewed, and the related algorithms and tools are comprehensively analyzed. We summarized the common framework of these PVP identifiers and constructed our own novel identifiers based upon the framework. Furthermore, we focus on a performance comparison of all PVP identifiers by using a training dataset and an independent dataset. Highlighting the pros and cons of these identifiers demonstrates that g-gap DPC (dipeptide composition) features are capable of representing characteristics of PVPs. Moreover, SVM (support vector machine) is proven to be the more effective classifier to distinguish PVPs and non-PVPs.
Collapse
Affiliation(s)
- Chaolu Meng
- College of Intelligence and Computing, Tianjin University, Tianjin, China; College of Computer and Information Engineering, Inner Mongolia Agricultural University, Hohhot, China
| | - Jun Zhang
- Rehabilitation Department, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin, China
| | - Xiucai Ye
- Department of Computer Science, University of Tsukuba, Tsukuba, Science City, Japan
| | - Fei Guo
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China; Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China.
| |
Collapse
|
36
|
Lv Z, Zhang J, Ding H, Zou Q. RF-PseU: A Random Forest Predictor for RNA Pseudouridine Sites. Front Bioeng Biotechnol 2020; 8:134. [PMID: 32175316 PMCID: PMC7054385 DOI: 10.3389/fbioe.2020.00134] [Citation(s) in RCA: 62] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2020] [Accepted: 02/10/2020] [Indexed: 12/21/2022] Open
Abstract
One of the ubiquitous chemical modifications in RNA, pseudouridine modification is crucial for various cellular biological and physiological processes. To gain more insight into the functional mechanisms involved, it is of fundamental importance to precisely identify pseudouridine sites in RNA. Several useful machine learning approaches have become available recently, with the increasing progress of next-generation sequencing technology; however, existing methods cannot predict sites with high accuracy. Thus, a more accurate predictor is required. In this study, a random forest-based predictor named RF-PseU is proposed for prediction of pseudouridylation sites. To optimize feature representation and obtain a better model, the light gradient boosting machine algorithm and incremental feature selection strategy were used to select the optimum feature space vector for training the random forest model RF-PseU. Compared with previous state-of-the-art predictors, the results on the same benchmark data sets of three species demonstrate that RF-PseU performs better overall. The integrated average leave-one-out cross-validation and independent testing accuracy scores were 71.4% and 74.7%, respectively, representing increments of 3.63% and 4.77% versus the best existing predictor. Moreover, the final RF-PseU model for prediction was built on leave-one-out cross-validation and provides a reliable and robust tool for identifying pseudouridine sites. A web server with a user-friendly interface is accessible at http://148.70.81.170:10228/rfpseu.
Collapse
Affiliation(s)
- Zhibin Lv
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Jun Zhang
- Rehabilitation Department, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin, China
| | - Hui Ding
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
37
|
Ru X, Wang L, Li L, Ding H, Ye X, Zou Q. Exploration of the correlation between GPCRs and drugs based on a learning to rank algorithm. Comput Biol Med 2020; 119:103660. [PMID: 32090901 DOI: 10.1016/j.compbiomed.2020.103660] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2019] [Revised: 02/04/2020] [Accepted: 02/12/2020] [Indexed: 02/01/2023]
Abstract
Exploring the protein - drug correlation can not only solve the problem of selecting candidate compounds but also solve related problems such as drug redirection and finding potential drug targets. Therefore, many researchers have proposed different machine learning methods for prediction of protein-drug correlations. However, many existing models simply divide the protein-drug relationship into related or irrelevant categories and do not deeply explore the most relevant target (or drug) for a given drug (or target). In order to solve this problem, this paper applies the ranking concept to the prediction of the GPCR (G Protein-Coupled Receptors)-drug correlation. This study uses two different types of data sets to explore candidate compound and potential target problems, and both sets achieved good results. In addition, this study also found that the family to which a protein belongs is not an inherent factor that affects the ranking of GPCR-drug correlations; however, if the drug affects other family members of the protein, then the protein is likely to be a potential target of the drug. This study showed that the learning to rank algorithm is a good tool for exploring protein-drug correlations.
Collapse
Affiliation(s)
- Xiaoqing Ru
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China; School of Information and Electrical Engineering, Hebei University of Engineering, Handan, China
| | - Lida Wang
- Scientific Research Department, Heilongjiang Agricultural Recalmation General Hospital, Harbin, China.
| | - Lihong Li
- School of Information and Electrical Engineering, Hebei University of Engineering, Handan, China
| | - Hui Ding
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Xiucai Ye
- Department of Computer Science, University of Tsukuba, Tsukuba Science City, Japan
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China; Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China.
| |
Collapse
|
38
|
Yu L, Xu F, Gao L. Predict New Therapeutic Drugs for Hepatocellular Carcinoma Based on Gene Mutation and Expression. Front Bioeng Biotechnol 2020; 8:8. [PMID: 32047745 PMCID: PMC6997129 DOI: 10.3389/fbioe.2020.00008] [Citation(s) in RCA: 39] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2019] [Accepted: 01/07/2020] [Indexed: 02/01/2023] Open
Abstract
Hepatocellular carcinoma (HCC) is the fourth most common primary liver tumor and is an important medical problem worldwide. However, the use of current therapies for HCC is no possible to be cured, and despite numerous attempts and clinical trials, there are not so many approved targeted treatments for HCC. So, it is necessary to identify additional treatment strategies to prevent the growth of HCC tumors. We are looking for a systematic drug repositioning bioinformatics method to identify new drug candidates for the treatment of HCC, which considers not only aberrant genomic information, but also the changes of transcriptional landscapes. First, we screen the collection of HCC feature genes, i.e., kernel genes, which frequently mutated in most samples of HCC based on human mutation data. Then, the gene expression data of HCC in TCGA are combined to classify the kernel genes of HCC. Finally, the therapeutic score (TS) of each drug is calculated based on the kolmogorov-smirnov statistical method. Using this strategy, we identify five drugs that associated with HCC, including three drugs that could treat HCC and two drugs that might have side-effect on HCC. In addition, we also make Connectivity Map (CMap) profiles similarity analysis and KEGG enrichment analysis on drug targets. All these findings suggest that our approach is effective for accurate predicting novel therapeutic options for HCC and easily to be extended to other tumors.
Collapse
Affiliation(s)
- Liang Yu
- School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Fengdan Xu
- School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Lin Gao
- School of Computer Science and Technology, Xidian University, Xi'an, China
| |
Collapse
|
39
|
Song X, Zhuang Y, Lan Y, Lin Y, Min X. Comprehensive Review and Comparison for Anticancer Peptides Identification Models. Curr Protein Pept Sci 2020; 22:CPPS-EPUB-103745. [PMID: 31957608 DOI: 10.2174/1389203721666200117162958] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2019] [Revised: 05/16/2019] [Accepted: 05/30/2019] [Indexed: 11/22/2022]
Abstract
Anticancer peptides (ACPs) eliminate pathogenic bacteria and kill tumor cells, showing no hemolysis and no damages to normal human cells. This unique ability explores the possibility of ACPs as therapeutic delivery and its potential applications in clinical therapy. Identifying ACPs is one of the most fundamental and central problems in new antitumor drug research. During the past decades, a number of machine learning-based prediction tools have been developed to solve this important task. However, the predictions produced by various tools are difficult to quantify and compare. Therefore, in this article, we provide a comprehensive review of existing machine learning methods for ACPs prediction and fair comparison of the predictors. To evaluate current prediction tools, we conducted a comparative study and analyzed the existing ACPs predictor from 10 public literatures. The comparative results obtained suggest that Support Vector Machine-based model with features combination provided significant improvement in the overall performance, when compared to the other machine learning method-based prediction models.
Collapse
|
40
|
Coban N, Pirim D, Erkan AF, Dogan B, Ekici B. Hsa-miR-584-5p as a novel candidate biomarker in Turkish men with severe coronary artery disease. Mol Biol Rep 2019; 47:1361-1369. [PMID: 31863331 DOI: 10.1007/s11033-019-05235-2] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2019] [Accepted: 12/07/2019] [Indexed: 12/16/2022]
Abstract
Coronary artery disease (CAD) is still the preliminary cause of mortality and morbidity in the developed world. Identification of novel predictive and therapeutic biomarkers is crucial for accurate diagnosis, prognosis and treatment of the CAD. The aim of this study was to detect novel candidate miRNA biomarker that may be used in the management of CAD. We performed miRNA profiling in whole blood samples of angiographically confirmed Turkish men with CAD and non-CAD controls with insignificant coronary stenosis. Validation of microarray results was performed by qRT-PCR in a larger cohort of 62 samples. We subsequently assessed the diagnostic value of the miRNA and correlations of miRNA with clinical parameters. miRNA-target identification and network analyses were conducted by Ingenuity Pathway Analysis (IPA) software. Hsa-miR-584-5p was one of the top significantly dysregulated miRNA observed in miRNA microarray. Men-specific down-regulation (p = 0.040) of hsa-miR-584-5p was confirmed by qRT-PCR. ROC curve analysis highlighted the potential diagnostic value of hsa-miR-584-5p with a power area under the curve (AUC) of 0.714 and 0.643 in men and in total sample, respectively. The expression levels of hsa-miR-584-5p showed inverse correlation with stenosis and Gensini scores. IPA revealed CDH13 as the only CAD related predicted target for the miRNA with biological evidence of its involvement in CAD. This study suggests that hsa-miR-584-5p, known to be tumor suppressor miRNA, as a candidate biomarker for CAD and highlighted its putative role in the CAD pathogenesis. The validation of results in larger samples incorporating functional studies warrant further research.
Collapse
Affiliation(s)
- Neslihan Coban
- Department of Genetics, Aziz Sancar Institute for Experimental Medicine, Istanbul University, Istanbul, Turkey.
| | - Dilek Pirim
- Faculty of Arts & Science, Department of Molecular Biology and Genetics, Bursa Uludag University, Bursa, Turkey
| | - Aycan Fahri Erkan
- Faculty of Medicine, Department of Cardiology, Ufuk University, Ankara, Turkey
| | - Berkcan Dogan
- Institute of Graduate Studies in Sciences, Department of Molecular Biology and Genetics, Istanbul University, Istanbul, Turkey
- Department of Medical Genetics, Bursa Uludag University, Bursa, Turkey
| | - Berkay Ekici
- Faculty of Medicine, Department of Cardiology, Ufuk University, Ankara, Turkey
| |
Collapse
|
41
|
Ru X, Cao P, Li L, Zou Q. Selecting Essential MicroRNAs Using a Novel Voting Method. MOLECULAR THERAPY. NUCLEIC ACIDS 2019; 18:16-23. [PMID: 31479921 PMCID: PMC6727015 DOI: 10.1016/j.omtn.2019.07.019] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/28/2019] [Revised: 06/20/2019] [Accepted: 07/08/2019] [Indexed: 02/06/2023]
Abstract
Among the large number of known microRNAs (miRNAs), some miRNAs play negligible roles in cell regulation. Therefore, selecting essential miRNAs is an important initial step for a deeper understanding of miRNAs and their functions. In this study, we generated 60 classification models by combining 12 representative feature extraction methods and 5 commonly used classification algorithms. The optimal model for essential miRNA classification that we obtained is based on the Mismatch feature extraction method combined with the random forest algorithm. The F-Measure, area under the curve, and accuracy values of this model were 93.2%, 96.7%, and 93.0%, respectively. We also found that the distribution of the positive and negative examples of the first few features greatly influenced the classification results. The feature extraction methods performed best when the differences between the positive and negative examples were obvious, and this led to better classification of essential miRNAs. Because each classifier's predictions for the same sample may be different, we employed a novel voting method to improve the accuracy of the classification of essential miRNAs. The performance results showed that the best classification results were obtained when five classification models were used in the voting. The five classification models were constructed based on the Mismatch, pseudo-distance structure status pair composition, Subsequence, Kmer, and Triplet feature extraction methods. The voting result was 95.3%. Our results suggest that the voting method can be an important tool for selecting essential miRNAs.
Collapse
Affiliation(s)
- Xiaoqing Ru
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China; School of Information and Electrical Engineering, Hebei University of Engineering, Handan, China
| | - Peigang Cao
- Department of Cardiology, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin, China
| | - Lihong Li
- School of Information and Electrical Engineering, Hebei University of Engineering, Handan, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China; Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China.
| |
Collapse
|
42
|
Pirim D, Dogan B. In silico identification of putative roles of food-derived xeno-mirs on diet-associated cancer. Nutr Cancer 2019; 72:481-488. [DOI: 10.1080/01635581.2019.1670854] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Affiliation(s)
- Dilek Pirim
- Department of Molecular Biology and Genetics, Uludag University, Bursa, Turkey
| | - Berkcan Dogan
- Department of Biology and Genetics, Istanbul University Institute of Graduate Studies in Science, Istanbul, Turkey
| |
Collapse
|
43
|
Cheng L, Zhao H, Wang P, Zhou W, Luo M, Li T, Han J, Liu S, Jiang Q. Computational Methods for Identifying Similar Diseases. MOLECULAR THERAPY. NUCLEIC ACIDS 2019; 18:590-604. [PMID: 31678735 PMCID: PMC6838934 DOI: 10.1016/j.omtn.2019.09.019] [Citation(s) in RCA: 75] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/25/2019] [Revised: 09/11/2019] [Accepted: 09/12/2019] [Indexed: 02/01/2023]
Abstract
Although our knowledge of human diseases has increased dramatically, the molecular basis, phenotypic traits, and therapeutic targets of most diseases still remain unclear. An increasing number of studies have observed that similar diseases often are caused by similar molecules, can be diagnosed by similar markers or phenotypes, or can be cured by similar drugs. Thus, the identification of diseases similar to known ones has attracted considerable attention worldwide. To this end, the associations between diseases at the molecular, phenotypic, and taxonomic levels were used to measure the pairwise similarity in diseases. The corresponding performance assessment strategies for these methods involving the terms “category-based,” “simulated-patient-based,” and “benchmark-data-based” were thus further emphasized. Then, frequently used methods were evaluated using a benchmark-data-based strategy. To facilitate the assessment of disease similarity scores, researchers have designed dozens of tools that implement these methods for calculating disease similarity. Currently, disease similarity has been advantageous in predicting noncoding RNA (ncRNA) function and therapeutic drugs for diseases. In this article, we review disease similarity methods, evaluation strategies, tools, and their applications in the biomedical community. We further evaluate the performance of these methods and discuss the current limitations and future trends for calculating disease similarity.
Collapse
Affiliation(s)
- Liang Cheng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Hengqiang Zhao
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Pingping Wang
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Wenyang Zhou
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Meng Luo
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Tianxin Li
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Junwei Han
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China.
| | - Shulin Liu
- Systemomics Center, College of Pharmacy, and Genomics Research Center (State-Province Key Laboratories of Biomedicine-Pharmaceutics of China), Harbin Medical University, Harbin, Heilongjiang, China; Department of Microbiology, Immunology and Infectious Diseases, University of Calgary, Calgary, AB, Canada.
| | - Qinghua Jiang
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China.
| |
Collapse
|
44
|
Joshi P, Katsushima K, Zhou R, Meoded A, Stapleton S, Jallo G, Raabe E, Eberhart CG, Perera RJ. The therapeutic and diagnostic potential of regulatory noncoding RNAs in medulloblastoma. Neurooncol Adv 2019; 1:vdz023. [PMID: 31763623 PMCID: PMC6859950 DOI: 10.1093/noajnl/vdz023] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Medulloblastoma, a central nervous system tumor that predominantly affects children, always requires aggressive therapy. Nevertheless, it frequently recurs as resistant disease and is associated with high morbidity and mortality. While recent efforts to subclassify medulloblastoma based on molecular features have advanced our basic understanding of medulloblastoma pathogenesis, optimal targets to increase therapeutic efficacy and reduce side effects remain largely undefined. Noncoding RNAs (ncRNAs) with known regulatory roles, particularly long noncoding RNAs (lncRNAs) and microRNAs (miRNAs), are now known to participate in medulloblastoma biology, although their functional significance remains obscure in many cases. Here we review the literature on regulatory ncRNAs in medulloblastoma. In providing a comprehensive overview of ncRNA studies, we highlight how different lncRNAs and miRNAs have oncogenic or tumor suppressive roles in medulloblastoma. These ncRNAs possess subgroup specificity that can be exploited to personalize therapy by acting as theranostic targets. Several of the already identified ncRNAs appear specific to medulloblastoma stem cells, the most difficult-to-treat component of the tumor that drives metastasis and acquired resistance, thereby providing opportunities for therapy in relapsing, disseminating, and therapy-resistant disease. Delivering ncRNAs to tumors remains challenging, but this limitation is gradually being overcome through the use of advanced technologies such as nanotechnology and rational biomaterial design.
Collapse
Affiliation(s)
- Piyush Joshi
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, School of Medicine, Johns Hopkins University, Baltimore, Maryland.,Cancer and Blood Disorders Institute, Johns Hopkins All Children's Hospital, St. Petersburg, Florida
| | - Keisuke Katsushima
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, School of Medicine, Johns Hopkins University, Baltimore, Maryland.,Cancer and Blood Disorders Institute, Johns Hopkins All Children's Hospital, St. Petersburg, Florida
| | - Rui Zhou
- Cancer and Blood Disorders Institute, Johns Hopkins All Children's Hospital, St. Petersburg, Florida
| | - Avner Meoded
- Pediatric Neuroradiology, Johns Hopkins All Children's Hospital, St. Petersburg, Florida
| | - Stacie Stapleton
- Cancer and Blood Disorders Institute, Johns Hopkins All Children's Hospital, St. Petersburg, Florida
| | - George Jallo
- Institute Brain Protection Sciences, Johns Hopkins All Children's Hospital, St. Petersburg, Florida
| | - Eric Raabe
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, School of Medicine, Johns Hopkins University, Baltimore, Maryland.,Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, Maryland
| | - Charles G Eberhart
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, School of Medicine, Johns Hopkins University, Baltimore, Maryland.,Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, Maryland
| | - Ranjan J Perera
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, School of Medicine, Johns Hopkins University, Baltimore, Maryland.,Cancer and Blood Disorders Institute, Johns Hopkins All Children's Hospital, St. Petersburg, Florida.,Sanford Burnham Prebys Medical Discovery Institute, NCI-Designated Cancer Center, La Jolla, California
| |
Collapse
|
45
|
Zhang L, Luo B, Dang YW, He RQ, Peng ZG, Chen G, Feng ZB. Clinical Significance of microRNA-196b-5p in Hepatocellular Carcinoma and its Potential Molecular Mechanism. J Cancer 2019; 10:5355-5370. [PMID: 31632480 PMCID: PMC6775707 DOI: 10.7150/jca.29293] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2018] [Accepted: 08/06/2019] [Indexed: 12/18/2022] Open
Abstract
Objective: To enquire into the clinical significance and potential molecular mechanism of microRNA (miRNA)-196b-5p in hepatocellular carcinoma (HCC). Methods: Quantitative reverse transcription and polymerase chain reaction (qRT-PCR) were utilized to examine miR-196b-5p expression level in 67 HCC paraffin embedded tissues and corresponding adjacent tissues. Correlations of miR-196b-5p expression level with clinicopathological characteristics were analyzed in our study. The expression level and clinical significance of miR-196b-5p in HCC were also evaluated in The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO) database. We made predictions of the target genes of miR-196b-5p by twelve online software and then selected genes predicted by at least 5 software. Subsequently, in order to obtain the potential target genes of miR-196b-5p, we overlapped the predicted target genes and down-regulated mRNAs in HCC based on TCGA database. Then, we performed the Gene Ontology (GO) and the Disease Ontology (DO) functional annotation, Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis and Protein-Protein Interaction (PPI) network construction of those miR-196b-5p potential target genes. Results: Higher expression level of miR-196b-5p was seen in HCC tissues than in the corresponding adjacent tissues based on qRT-PCR (P = 0.0007). The expression level of miR-196b-5p was linked with tumor size (P = 0.03), tumor node (P = 0.024), vascular invasion (P = 0.029) and capsular invasion (P = 0.026) in HCC patients. Comprehensive meta-analysis of miR-196b-5p expression based on TCGA, GEO and qRT-PCR verified that higher expression level of miR-196b-5p was observed in HCC tissues than in normal control liver tissues (SMD = 0.56, 95%CI: 0.39-0.72, Pheterogeneity = 0.275, I2 = 18.3%). GO annotation revealed that the top terms in biological process, cellular component and molecular function were single-organism catabolic process, neuronal cell body and transmembrane receptor protein kinase activity, respectively. The most relevant disease in DO annotation was arteriosclerosis. The tryptophan metabolism pathway ranked first in KEGG pathway enrichment analysis. The PPI network showed that IGF1, FOXO1, AR and FOS were mostly likely to become the core genes of miR-196b-5p potential target genes, which however required further experiments for validation. Conclusion: The miR-196b-5p was observed to show higher expression in HCC tissues than in normal control liver tissues. Moreover, the miR-196b-5p expression level had correlations with the clinicopathological parameters such as vascular invasion of HCC, but the molecular mechanisms of miR-196b-5p in HCC still need further elucidation and verification.
Collapse
Affiliation(s)
- Lu Zhang
- Department of Pathology, First Affiliated Hospital of Guangxi Medical University, No. 6 Shuangyong Road, Nanning, Guangxi Zhuang Autonomous Region 530021, P. R. China
| | - Bin Luo
- Department of Medical Oncology, First Affiliated Hospital of Guangxi Medical University, No. 6 Shuangyong Road, Nanning, Guangxi Zhuang Autonomous Region 530021, P. R. China
| | - Yi-Wu Dang
- Department of Pathology, First Affiliated Hospital of Guangxi Medical University, No. 6 Shuangyong Road, Nanning, Guangxi Zhuang Autonomous Region 530021, P. R. China
| | - Rong-Quan He
- Department of Medical Oncology, First Affiliated Hospital of Guangxi Medical University, No. 6 Shuangyong Road, Nanning, Guangxi Zhuang Autonomous Region 530021, P. R. China
| | - Zhi-Gang Peng
- Department of Medical Oncology, First Affiliated Hospital of Guangxi Medical University, No. 6 Shuangyong Road, Nanning, Guangxi Zhuang Autonomous Region 530021, P. R. China
| | - Gang Chen
- Department of Pathology, First Affiliated Hospital of Guangxi Medical University, No. 6 Shuangyong Road, Nanning, Guangxi Zhuang Autonomous Region 530021, P. R. China
| | - Zhen-Bo Feng
- Department of Pathology, First Affiliated Hospital of Guangxi Medical University, No. 6 Shuangyong Road, Nanning, Guangxi Zhuang Autonomous Region 530021, P. R. China
| |
Collapse
|
46
|
Wang C, Guo J, Zhao N, Liu Y, Liu X, Liu G, Guo M. A Cancer Survival Prediction Method Based on Graph Convolutional Network. IEEE Trans Nanobioscience 2019; 19:117-126. [PMID: 31443039 DOI: 10.1109/tnb.2019.2936398] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
BACKGROUND AND OBJECTIVE Cancer, as the most challenging part in the human disease history, has always been one of the main threats to human life and health. The high mortality of cancer is largely due to the complexity of cancer and the significant differences in clinical outcomes. Therefore, it will be significant to improve accuracy of cancer survival prediction, which has become one of the main fields of cancer research. Many calculation models for cancer survival prediction have been proposed at present, but most of them generate prediction models only by using single genomic data or clinical data. Multiple genomic data and clinical data have not been integrated yet to take a comprehensive consideration of cancers and predict their survival. METHOD In order to effectively integrate multiple genomic data (including genetic expression, copy number alteration, DNA methylation and exon expression) and clinical data and apply them to predictive studies on cancer survival, similar network fusion algorithm (SNF) was proposed in this paper to integrate multiple genomic data and clinical data so as to generate sample similarity matrix, min-redundancy and max-relevance algorithm (mRMR) was used to conduct feature selection of multiple genomic data and clinical data of cancer samples and generate sample feature matrix, and finally two matrixes were used for semi-supervised training through graph convolutional network (GCN) so as to obtain a cancer survival prediction method integrating multiple genomic data and clinical data based on graph convolutional network (GCGCN). RESULT Performance indexes of GCGCN model indicate that both multiple genomic data and clinical data play significant roles in the accurate survival time prediction of cancer patients. It is compared with existing survival prediction methods, and results show that cancer survival prediction method GCGCN which integrates multiple genomic data and clinical data has obviously superior prediction effect than existing survival prediction methods. CONCLUSION All study results in this paper have verified effectiveness and superiority of GCGCN in the aspect of cancer survival prediction.
Collapse
|
47
|
Meng C, Wei L, Zou Q. SecProMTB: Support Vector Machine‐Based Classifier for Secretory Proteins Using Imbalanced Data Sets Applied toMycobacterium tuberculosis. Proteomics 2019; 19:e1900007. [DOI: 10.1002/pmic.201900007] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2019] [Revised: 03/25/2019] [Indexed: 11/08/2022]
Affiliation(s)
- Chaolu Meng
- College of Intelligence and ComputingTianjin University 300350 Tianjin China
- College of Computer and Information EngineeringInner Mongolia Agricultural University 010018 Hohhot China
| | - Leyi Wei
- College of Intelligence and ComputingTianjin University 300350 Tianjin China
| | - Quan Zou
- College of Intelligence and ComputingTianjin University 300350 Tianjin China
- Institute of Fundamental and Frontier SciencesUniversity of Electronic Science and Technology of China 610054 Chengdu China
- Center for Informational BiologyUniversity of Electronic Science and Technology of China 610054 Chengdu China
| |
Collapse
|
48
|
Guala D, Ogris C, Müller N, Sonnhammer ELL. Genome-wide functional association networks: background, data & state-of-the-art resources. Brief Bioinform 2019; 21:1224-1237. [PMID: 31281921 PMCID: PMC7373183 DOI: 10.1093/bib/bbz064] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2019] [Revised: 04/29/2019] [Accepted: 05/04/2019] [Indexed: 02/06/2023] Open
Abstract
The vast amount of experimental data from recent advances in the field of high-throughput biology begs for integration into more complex data structures such as genome-wide functional association networks. Such networks have been used for elucidation of the interplay of intra-cellular molecules to make advances ranging from the basic science understanding of evolutionary processes to the more translational field of precision medicine. The allure of the field has resulted in rapid growth of the number of available network resources, each with unique attributes exploitable to answer different biological questions. Unfortunately, the high volume of network resources makes it impossible for the intended user to select an appropriate tool for their particular research question. The aim of this paper is to provide an overview of the underlying data and representative network resources as well as to mention methods of integration, allowing a customized approach to resource selection. Additionally, this report will provide a primer for researchers venturing into the field of network integration.
Collapse
Affiliation(s)
- Dimitri Guala
- Science for Life Laboratory, Stockholm Bioinformatics Center, Department of Biochemistry and Biophysics, Stockholm University, Box 1031, 17121 Solna, Sweden
| | - Christoph Ogris
- Computational Cell Maps, Institute of Computational Biology, Helmholtz Center Munich, Ingolstädter Landstr. 1, 85764 Neuherberg, Germany
| | - Nikola Müller
- Computational Cell Maps, Institute of Computational Biology, Helmholtz Center Munich, Ingolstädter Landstr. 1, 85764 Neuherberg, Germany
| | - Erik L L Sonnhammer
- Science for Life Laboratory, Stockholm Bioinformatics Center, Department of Biochemistry and Biophysics, Stockholm University, Box 1031, 17121 Solna, Sweden
| |
Collapse
|
49
|
Qu K, Guo F, Liu X, Lin Y, Zou Q. Application of Machine Learning in Microbiology. Front Microbiol 2019; 10:827. [PMID: 31057526 PMCID: PMC6482238 DOI: 10.3389/fmicb.2019.00827] [Citation(s) in RCA: 95] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2019] [Accepted: 04/01/2019] [Indexed: 02/01/2023] Open
Abstract
Microorganisms are ubiquitous and closely related to people's daily lives. Since they were first discovered in the 19th century, researchers have shown great interest in microorganisms. People studied microorganisms through cultivation, but this method is expensive and time consuming. However, the cultivation method cannot keep a pace with the development of high-throughput sequencing technology. To deal with this problem, machine learning (ML) methods have been widely applied to the field of microbiology. Literature reviews have shown that ML can be used in many aspects of microbiology research, especially classification problems, and for exploring the interaction between microorganisms and the surrounding environment. In this study, we summarize the application of ML in microbiology.
Collapse
Affiliation(s)
- Kaiyang Qu
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Fei Guo
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Xiangrong Liu
- School of Information Science and Technology, Xiamen University, Xiamen, China
| | - Yuan Lin
- School of Information Science and Technology, Xiamen University, Xiamen, China
- Department of System Integration, Sparebanken Vest, Bergen, Norway
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
50
|
Ru X, Li L, Wang C. Identification of Phage Viral Proteins With Hybrid Sequence Features. Front Microbiol 2019; 10:507. [PMID: 30972038 PMCID: PMC6443926 DOI: 10.3389/fmicb.2019.00507] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2018] [Accepted: 02/27/2019] [Indexed: 02/01/2023] Open
Abstract
The uniqueness of bacteriophages plays an important role in bioinformatics research. In real applications, the function of the bacteriophage virion proteins is the main area of interest. Therefore, it is very important to classify bacteriophage virion proteins and non-phage virion proteins accurately. Extracting comprehensive and effective sequence features from proteins plays a vital role in protein classification. In order to more fully represent protein information, this paper is more comprehensive and effective by combining the features extracted by the feature information representation algorithm based on sequence information (CCPA) and the feature representation algorithm based on sequence and structure information. After extracting features, the Max-Relevance-Max-Distance (MRMD) algorithm is used to select the optimal feature set with the strongest correlation between class labels and low redundancy between features. Given the randomness of the samples selected by the random forest classification algorithm and the randomness features for producing each node variable, a random forest method is employed to perform 10-fold cross-validation on the bacteriophage protein classification. The accuracy of this model is as high as 93.5% in the classification of phage proteins in this study. This study also found that, among the eight physicochemical properties considered, the charge property has the greatest impact on the classification of bacteriophage proteins These results indicate that the model discussed in this paper is an important tool in bacteriophage protein research.
Collapse
Affiliation(s)
- Xiaoqing Ru
- School of Information and Electrical Engineering, Hebei University of Engineering, Handan, China
| | - Lihong Li
- School of Information and Electrical Engineering, Hebei University of Engineering, Handan, China
| | - Chunyu Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| |
Collapse
|