1
|
Zhao H, Zhu B, Jiang T, Cui Z, Wu H. Identification of DNA-protein binding residues through integration of Transformer encoder and Bi-directional Long Short-Term Memory. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2024; 21:170-185. [PMID: 38303418 DOI: 10.3934/mbe.2024008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/03/2024]
Abstract
DNA-protein binding is crucial for the normal development and function of organisms. The significance of accurately identifying DNA-protein binding sites lies in its role in disease prevention and the development of innovative approaches to disease treatment. In the present study, we introduce a precise and robust identifier for DNA-protein binding residues. In the context of protein representation, we combine the evolutionary information of the protein, represented by its position-specific scoring matrix, with the spatial information of the protein's secondary structure, enriching the overall informational content. This approach initially employs a combination of Bi-directional Long Short-Term Memory and Transformer encoder to jointly extract the interdependencies among residues within the protein sequence. Subsequently, convolutional operations are applied to the resulting feature matrix to capture local features of the residues. Experimental results on the benchmark dataset demonstrate that our method exhibits a higher level of competitiveness when compared to contemporary classifiers. Specifically, our method achieved an MCC of 0.349, SP of 96.50%, SN of 44.03% and ACC of 94.59% on the PDNA-41 dataset.
Collapse
Affiliation(s)
- Haipeng Zhao
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, China
| | - Baozhong Zhu
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, China
| | | | - Zhiming Cui
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, China
| | - Hongjie Wu
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, China
| |
Collapse
|
2
|
Arican OC, Gumus O. PredDRBP-MLP: Prediction of DNA-binding proteins and RNA-binding proteins by multilayer perceptron. Comput Biol Med 2023; 164:107317. [PMID: 37562328 DOI: 10.1016/j.compbiomed.2023.107317] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Revised: 07/27/2023] [Accepted: 08/07/2023] [Indexed: 08/12/2023]
Abstract
Proteins interact with many molecules in order to maintain the vital activities in cells. Proteins that interact with DNA are called DNA-binding proteins (DBP), and proteins that interact with RNA are called RNA-binding proteins (RBP). Since DBPs and RBPs are involved in critical biological processes, their classification is quite important. Although the convolutional neural network and bidirectional long-short-term memory hybrid model (CNN-BiLSTM) is very popular in DBP and RBP classification, it has problems such as requirement of high processing power and long training time. Therefore, a multilayer perceptron (MLP) based predictor, PredDRBP-MLP (Predictor of DNA-Binding Proteins and RNA-Binding Proteins - Multilayer Perceptron) was developed in this study. PredDRBP-MLP is an artificial learning model that performs multi-class classification of DBPs, RBPs and non-nucleic acid-binding proteins (NNABP). PredDRBP-MLP achieved quite successful results on the independent dataset, specifically in the NNABP class, compared to the existing predictors, in addition to requiring lower processing power and being able to train quicker compared to CNN-BiLSTM based predictors. In NNABP class, PredDRBP-MLP predictor achieved 0.578 precision, 0.522 recall and 0.549 F1-score, while other multi-class predictor achieved 0.486 precision, 0.183 recall and 0.266 F1-score. A desktop application was developed for PredDRBP-MLP. The application is freely accessible at https://sourceforge.net/projects/preddrbp-mlp.
Collapse
Affiliation(s)
- Ozgur Can Arican
- Department of Health Bioinformatics, Ege University, 35100, Izmir, Turkey.
| | - Ozgur Gumus
- Department of Computer Engineering, Ege University, 35100, Izmir, Turkey.
| |
Collapse
|
3
|
PAI-1 is a potential transcriptional silencer that supports bladder cancer cell activity. Sci Rep 2022; 12:12186. [PMID: 35842542 PMCID: PMC9288475 DOI: 10.1038/s41598-022-16518-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2022] [Accepted: 07/11/2022] [Indexed: 12/12/2022] Open
Abstract
The extracellular activity of Plasminogen activator inhibitor-1 (PAI-1) is well described, acting as an inhibitor of tissue plasminogen activator and urokinase-type plasminogen activator, impacting fibrinolysis. Recent studies have revealed a pro-tumorigenic role of PAI-1 in human cancers, via the regulation of angiogenesis and tumor cell survival. In this study, immunohistochemical staining of 939 human bladder cancer specimens showed that PAI-1 expression levels correlated with tumor grade, tumor stage and overall survival. The typical subcellular localization of PAI-1 is cytoplasmic, but in approximately a quarter of the cases, PAI-1 was observed to be localized to both the tumor cell cytoplasm and the nucleus. To investigate the potential function of nuclear PAI-1 in tumor biology we applied chromatin immunoprecipitation (ChIP)-sequencing, gene expression profiling, and rapid immunoprecipitation mass spectrometry to a pair of bladder cancer cell lines. ChIP-sequencing revealed that PAI-1 can bind DNA at distal intergenic regions, suggesting a role as a transcriptional coregulator. The downregulation of PAI-1 in bladder cancer cell lines caused the upregulation of numerous genes, and the integration of ChIP-sequence and RNA-sequence data identified 57 candidate genes subject to PAI-1 regulation. Taken together, the data suggest that nuclear PAI-1 can influence gene expression programs and support malignancy.
Collapse
|
4
|
Sepahdar Z, Saghiri R, Miroliaei M, Salimi M. In silico approach to probe the binding affinity between OMVs harboring the Z EGFR affibody and the EGF receptor. J Mol Model 2022; 28:113. [PMID: 35381900 DOI: 10.1007/s00894-022-05043-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2021] [Accepted: 01/25/2022] [Indexed: 11/27/2022]
Abstract
There is a growing interest in designing a nanocarrier containing an EGFR targeting affibody to direct toward cancer cells. Here, cytolysin A was cloned at the N-terminus of ZEGFR:1907 affibody to guarantee its surface presentation on the OMVs while targeting the epidermal growth factor receptors (EGFRs). A separate construct including a fusogenic peptide (GALA) was also designed for the endosomal escape of the nanocarrier. Binding of the two constructs ClyA-affiEGFR and ClyA-affiEGFR-GALA to domain III of EGFR was investigated using molecular docking and molecular dynamic simulations. The higher stability of the ClyA-affiEGFR-GALA/EGFR as compared to the ClyA-affiEGFR/EGFR complex was evident. The ClyA-affiEGFR-GALA structure showed a higher RMSD during the first half of the simulation time implying a much less stable behavior. Plateau state of the radius of gyration plot of ClyA-affiEGFR-GALA confirmed a well-folded structure in the presence of the GALA sequence. Solvent accessible surface area for both proteins was in the same range. The data obtained from hydrogen bond analysis revealed a more equilibrated and stable form of the ClyA-affiEGFR-GALA structure upon interaction with EGFR. The data provided here was a requisite for our biological evaluation of the synthesized constructs as a component of a novel drug delivery system.
Collapse
Affiliation(s)
- Zahra Sepahdar
- Department of Cell and Molecular Biology & Microbiology, Faculty of Biological Science and Technology, University of Isfahan, Isfahan, Iran
| | - Reza Saghiri
- Biochemistry Department, Pasteur Institute of Iran, Tehran, Iran
| | - Mehran Miroliaei
- Department of Cell and Molecular Biology & Microbiology, Faculty of Biological Science and Technology, University of Isfahan, Isfahan, Iran.
| | - Mona Salimi
- Physiology and Pharmacology Department, Pasteur Institute of Iran, Tehran, Iran.
| |
Collapse
|
5
|
Pasternak Z, Chapnik N, Yosef R, Kopelman NM, Jurkevitch E, Segev E. Identifying protein function and functional links based on large-scale co-occurrence patterns. PLoS One 2022; 17:e0264765. [PMID: 35239724 PMCID: PMC8893610 DOI: 10.1371/journal.pone.0264765] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2021] [Accepted: 02/16/2022] [Indexed: 11/23/2022] Open
Abstract
Objective The vast majority of known proteins have not been experimentally tested even at the level of measuring their expression, and the function of many proteins remains unknown. In order to decipher protein function and examine functional associations, we developed "Cliquely", a software tool based on the exploration of co-occurrence patterns. Computational model Using a set of more than 23 million proteins divided into 404,947 orthologous clusters, we explored the co-occurrence graph of 4,742 fully sequenced genomes from the three domains of life. Edge weights in this graph represent co-occurrence probabilities. We use the Bron–Kerbosch algorithm to detect maximal cliques in this graph, fully-connected subgraphs that represent meaningful biological networks from different functional categories. Main results We demonstrate that Cliquely can successfully identify known networks from various pathways, including nitrogen fixation, glycolysis, methanogenesis, mevalonate and ribosome proteins. Identifying the virulence-associated type III secretion system (T3SS) network, Cliquely also added 13 previously uncharacterized novel proteins to the T3SS network, demonstrating the strength of this approach. Cliquely is freely available and open source. Users can employ the tool to explore co-occurrence networks using a protein of interest and a customizable level of stringency, either for the entire dataset or for a one of the three domains—Archaea, Bacteria, or Eukarya.
Collapse
Affiliation(s)
- Zohar Pasternak
- Division of Identification and Forensic Science, Israel Police, Jerusalem, Israel
- Faculty of Management of Technology, Holon Institute of Technology, Holon, Israel
| | - Noam Chapnik
- Faculty of Management of Technology, Holon Institute of Technology, Holon, Israel
| | - Roy Yosef
- Faculty of Management of Technology, Holon Institute of Technology, Holon, Israel
| | - Naama M. Kopelman
- Faculty of Science, Holon Institute of Technology, Holon, Israel
- * E-mail:
| | - Edouard Jurkevitch
- Department of Plant Pathology and Microbiology, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Elad Segev
- Faculty of Science, Holon Institute of Technology, Holon, Israel
| |
Collapse
|
6
|
Cui F, Li S, Zhang Z, Sui M, Cao C, El-Latif Hesham A, Zou Q. DeepMC-iNABP: Deep learning for multiclass identification and classification of nucleic acid-binding proteins. Comput Struct Biotechnol J 2022; 20:2020-2028. [PMID: 35521556 PMCID: PMC9065708 DOI: 10.1016/j.csbj.2022.04.029] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Revised: 04/06/2022] [Accepted: 04/20/2022] [Indexed: 11/29/2022] Open
Abstract
Nucleic acid-binding proteins (NABPs), including DNA-binding proteins (DBPs) and RNA-binding proteins (RBPs), play vital roles in gene expression. Accurate identification of these proteins is crucial. However, there are two existing challenges: one is the problem of ignoring DNA- and RNA-binding proteins (DRBPs), and the other is a cross-predicting problem referring to DBP predictors predicting DBPs as RBPs, and vice versa. In this study, we proposed a computational predictor, called DeepMC-iNABP, with the goal of solving these difficulties by utilizing a multiclass classification strategy and deep learning approaches. DBPs, RBPs, DRBPs and non-NABPs as separate classes of data were used for training the DeepMC-iNABP model. The results on test data collected in this study and two independent test datasets showed that DeepMC-iNABP has a strong advantage in identifying the DRBPs and has the ability to alleviate the cross-prediction problem to a certain extent. The web-server of DeepMC-iNABP is freely available at http://www.deepmc-inabp.net/. The datasets used in this research can also be downloaded from the website.
Collapse
Affiliation(s)
- Feifei Cui
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, China
| | - Shuang Li
- Beidahuang Industry Group General Hospital, Harbin 150001, China
| | - Zilong Zhang
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, China
| | - Miaomiao Sui
- Graduate School Agricultural and Life Science, The University of Tokyo, Tokyo 1138657, Japan
| | - Chen Cao
- School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing 211166, China
| | - Abd El-Latif Hesham
- Genetics Department, Faculty of Agriculture, Beni-Suef University, Beni-Suef 62511, Egypt
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, China
- Corresponding author at: Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China.
| |
Collapse
|
7
|
Zaitzeff A, Leiby N, Motta FC, Haase SB, Singer JM. Improved datasets and evaluation methods for the automatic prediction of DNA-binding proteins. Bioinformatics 2021; 38:44-51. [PMID: 34415301 DOI: 10.1093/bioinformatics/btab603] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2021] [Revised: 08/04/2021] [Accepted: 08/18/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Accurate automatic annotation of protein function relies on both innovative models and robust datasets. Due to their importance in biological processes, the identification of DNA-binding proteins directly from protein sequence has been the focus of many studies. However, the datasets used to train and evaluate these methods have suffered from substantial flaws. We describe some of the weaknesses of the datasets used in previous DNA-binding protein literature and provide several new datasets addressing these problems. We suggest new evaluative benchmark tasks that more realistically assess real-world performance for protein annotation models. We propose a simple new model for the prediction of DNA-binding proteins and compare its performance on the improved datasets to two previously published models. In addition, we provide extensive tests showing how the best models predict across taxa. RESULTS Our new gradient boosting model, which uses features derived from a published protein language model, outperforms the earlier models. Perhaps surprisingly, so does a baseline nearest neighbor model using BLAST percent identity. We evaluate the sensitivity of these models to perturbations of DNA-binding regions and control regions of protein sequences. The successful data-driven models learn to focus on DNA-binding regions. When predicting across taxa, the best models are highly accurate across species in the same kingdom and can provide some information when predicting across kingdoms. AVAILABILITY AND IMPLEMENTATION The data and results for this article can be found at https://doi.org/10.5281/zenodo.5153906. The code for this article can be found at https://doi.org/10.5281/zenodo.5153683. The code, data and results can also be found at https://github.com/AZaitzeff/tools_for_dna_binding_proteins.
Collapse
Affiliation(s)
| | - Nicholas Leiby
- Two Six Research, Two Six Technologies, Arlington, VA 22203, USA
| | - Francis C Motta
- Department of Mathematical Sciences, Florida Atlantic University, Boca Raton, FL 33431, USA
| | - Steven B Haase
- Department of Biology, Duke University, Durham, NC 27708, USA
| | | |
Collapse
|
8
|
Xie J, Zhang X, Zheng J, Hong X, Tong X, Liu X, Xue Y, Wang X, Zhang Y, Liu S. Two novel RNA-binding proteins identification through computational prediction and experimental validation. Genomics 2021; 114:149-160. [PMID: 34921931 DOI: 10.1016/j.ygeno.2021.12.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2021] [Revised: 08/05/2021] [Accepted: 12/13/2021] [Indexed: 11/16/2022]
Abstract
Since RBPs play important roles in the cell, it's particularly important to find new RBPs. We performed iRIP-seq and CLIP-seq to verify two proteins, CLIP1 and DMD, predicted by RBPPred whether are RBPs or not. The experimental results confirm that these two proteins have RNA-binding activity. We identified significantly enriched binding motifs UGGGGAGG, CUUCCG and CCCGU for CLIP1 (iRIP-seq), DMD (iRIP-seq) and DMD (CLIP-seq), respectively. The computational KEGG and GO analysis show that the CLIP1 and DMD share some biological processes and functions. Besides, we found that the SNPs between DMD and its RNA partners may be associated with Becker muscular dystrophy, Duchenne muscular dystrophy, Dilated cardiomyopathy 3B and Cardiovascular phenotype. Among the thirteen cancers data, CLIP1 and another 300 oncogenes always co-occur, and 123 of these 300 genes interact with CLIP1. These cancers may be associated with the mutations occurred in both CLIP1 and the genes it interacts with.
Collapse
Affiliation(s)
- Juan Xie
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Xiaoli Zhang
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Jinfang Zheng
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Xu Hong
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Xiaoxue Tong
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Xudong Liu
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Yaqiang Xue
- Laboratory for Genome Regulation and Human Health, ABLife Inc., Wuhan, Hubei 430075, China
| | - Xuelian Wang
- ABLife BioBigData Institute, Wuhan, Hubei 430075, China
| | - Yi Zhang
- ABLife BioBigData Institute, Wuhan, Hubei 430075, China
| | - Shiyong Liu
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China.
| |
Collapse
|
9
|
Etzion-Fuchs A, Todd DA, Singh M. dSPRINT: predicting DNA, RNA, ion, peptide and small molecule interaction sites within protein domains. Nucleic Acids Res 2021; 49:e78. [PMID: 33999210 PMCID: PMC8287948 DOI: 10.1093/nar/gkab356] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2020] [Revised: 03/30/2021] [Accepted: 04/22/2021] [Indexed: 01/08/2023] Open
Abstract
Domains are instrumental in facilitating protein interactions with DNA, RNA, small molecules, ions and peptides. Identifying ligand-binding domains within sequences is a critical step in protein function annotation, and the ligand-binding properties of proteins are frequently analyzed based upon whether they contain one of these domains. To date, however, knowledge of whether and how protein domains interact with ligands has been limited to domains that have been observed in co-crystal structures; this leaves approximately two-thirds of human protein domain families uncharacterized with respect to whether and how they bind DNA, RNA, small molecules, ions and peptides. To fill this gap, we introduce dSPRINT, a novel ensemble machine learning method for predicting whether a domain binds DNA, RNA, small molecules, ions or peptides, along with the positions within it that participate in these types of interactions. In stringent cross-validation testing, we demonstrate that dSPRINT has an excellent performance in uncovering ligand-binding positions and domains. We also apply dSPRINT to newly characterize the molecular functions of domains of unknown function. dSPRINT's predictions can be transferred from domains to sequences, enabling predictions about the ligand-binding properties of 95% of human genes. The dSPRINT framework and its predictions for 6503 human protein domains are freely available at http://protdomain.princeton.edu/dsprint.
Collapse
Affiliation(s)
- Anat Etzion-Fuchs
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Carl Icahn Laboratory, Princeton, NJ 08544, USA
| | - David A Todd
- Department of Computer Science, Princeton University, 35 Olden Street, Princeton, NJ 08544, USA
| | - Mona Singh
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Carl Icahn Laboratory, Princeton, NJ 08544, USA.,Department of Computer Science, Princeton University, 35 Olden Street, Princeton, NJ 08544, USA
| |
Collapse
|
10
|
Ma X, Zhao H, Yan H, Sheng M, Cao Y, Yang K, Xu H, Xu W, Gao Z, Su Z. Refinement of bamboo genome annotations through integrative analyses of transcriptomic and epigenomic data. Comput Struct Biotechnol J 2021; 19:2708-2718. [PMID: 34093986 PMCID: PMC8131310 DOI: 10.1016/j.csbj.2021.04.068] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2020] [Revised: 04/19/2021] [Accepted: 04/26/2021] [Indexed: 01/07/2023] Open
Abstract
Bamboo, one of the most crucial nontimber forest resources worldwide, has the capacity for rapid growth. In recent years, the genome of moso bamboo (Phyllostachys edulis) has been decoded, and a large amount of transcriptome data has been published. In this study, we generated the genome-wide profiles of the histone modification H3K4me3 in leaf, stem, and root tissues of bamboo. The trends in the distribution patterns were similar to those in rice. We developed a processing pipeline for predicting novel transcripts to refine the structural annotation of the genome using H3K4me3 ChIP-seq data and 29 RNA-seq datasets. As a result, 12,460 novel transcripts were predicted in the bamboo genome. Compared with the transcripts in the newly released version 2.0 of the bamboo genome, these novel transcripts are tissue-specific and shorter, and most have a single exon. Some representative novel transcripts were validated by semiquantitative RT-PCR and qRT-PCR analyses. Furthermore, we put these novel transcripts back into the ChIP-seq analysis pipeline and discovered that the percentages of H3K4me3 in genic elements were increased. Overall, this work integrated transcriptomic data and epigenomic data to refine the annotation of the genome in order to discover more functional genes and study bamboo growth and development, and the application of this predicted pipeline may help refine the structural annotation of the genome in other species.
Collapse
Affiliation(s)
- Xuelian Ma
- State Key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Hansheng Zhao
- Key Laboratory of National Forestry and Grassland Administration/Beijing for Bamboo & Rattan Science and Technology, Institute of Gene Science and Industrialization for Bamboo and Rattan Resources, International Center for Bamboo and Rattan, Beijing 100102, China
| | - Hengyu Yan
- State Key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University, Beijing 100193, China.,College of Agronomy, Qingdao Agricultural University, Qingdao, Shandong, China
| | - Minghao Sheng
- State Key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Yaxin Cao
- State Key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Kebin Yang
- Key Laboratory of National Forestry and Grassland Administration/Beijing for Bamboo & Rattan Science and Technology, Institute of Gene Science and Industrialization for Bamboo and Rattan Resources, International Center for Bamboo and Rattan, Beijing 100102, China
| | - Hao Xu
- Key Laboratory of National Forestry and Grassland Administration/Beijing for Bamboo & Rattan Science and Technology, Institute of Gene Science and Industrialization for Bamboo and Rattan Resources, International Center for Bamboo and Rattan, Beijing 100102, China
| | - Wenying Xu
- State Key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Zhimin Gao
- Key Laboratory of National Forestry and Grassland Administration/Beijing for Bamboo & Rattan Science and Technology, Institute of Gene Science and Industrialization for Bamboo and Rattan Resources, International Center for Bamboo and Rattan, Beijing 100102, China
| | - Zhen Su
- State Key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| |
Collapse
|
11
|
Girgis HZ, James BT, Luczak BB. Identity: rapid alignment-free prediction of sequence alignment identity scores using self-supervised general linear models. NAR Genom Bioinform 2021; 3:lqab001. [PMID: 33554117 PMCID: PMC7850047 DOI: 10.1093/nargab/lqab001] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2019] [Revised: 12/07/2020] [Accepted: 01/08/2021] [Indexed: 11/12/2022] Open
Abstract
Pairwise global alignment is a fundamental step in sequence analysis. Optimal alignment algorithms are quadratic-slow especially on long sequences. In many applications that involve large sequence datasets, all what is needed is calculating the identity scores (percentage of identical nucleotides in an optimal alignment-including gaps-of two sequences); there is no need for visualizing how every two sequences are aligned. For these applications, we propose Identity, which produces global identity scores for a large number of pairs of DNA sequences using alignment-free methods and self-supervised general linear models. For the first time, the new tool can predict pairwise identity scores in linear time and space. On two large-scale sequence databases, Identity provided the best compromise between sensitivity and precision while being faster than BLAST, Mash, MUMmer4 and USEARCH by 2-80 times. Identity was the best performing tool when searching for low-identity matches. While constructing phylogenetic trees from about 6000 transcripts, the tree due to the scores reported by Identity was the closest to the reference tree (in contrast to andi, FSWM and Mash). Identity is capable of producing pairwise identity scores of millions-of-nucleotides-long bacterial genomes; this task cannot be accomplished by any global-alignment-based tool. Availability: https://github.com/BioinformaticsToolsmith/Identity.
Collapse
Affiliation(s)
- Hani Z Girgis
- Bioinformatics Toolsmith Laboratory, Department of Electrical Engineering and Computer Science, Texas A&M University-Kingsville, 700 University Boulevard, Kingsville, TX 78363, USA
| | - Benjamin T James
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, 32 Vassar Street, Cambridge, MA 02139, USA
| | - Brian B Luczak
- Department of Mathematics, Vanderbilt University, 1326 Stevenson Center Lane, Nashville, TN 3721, USA
| |
Collapse
|
12
|
Jin Y, Zhang M, Duan R, Yang J, Yang Y, Wang J, Jiang C, Yao B, Li L, Yuan H, Zha X, Ma C. Long noncoding RNA FGF14-AS2 inhibits breast cancer metastasis by regulating the miR-370-3p/FGF14 axis. Cell Death Discov 2020; 6:103. [PMID: 33083023 PMCID: PMC7548970 DOI: 10.1038/s41420-020-00334-7] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2020] [Revised: 08/20/2020] [Accepted: 09/03/2020] [Indexed: 12/18/2022] Open
Abstract
Long noncoding RNAs (lncRNAs) have emerged as important regulators in cancers, including breast cancer. However, the overall biological roles and clinical significance of most lncRNAs are not fully understood. This study aimed to elucidate the potential role of a novel lncRNA FGF14-AS2 and the mechanisms underlying metastasis in breast cancer. The lncRNA FGF14-AS2 was significantly downregulated in breast cancer tissues; patients with lower FGF14-AS2 expression had advanced clinical stage. In vitro and in vivo assays of FGF14-AS2 alterations revealed a complex integrated phenotype affecting breast cancer cell migration, invasion, and tumor metastasis. Mechanistically, FGF14-AS2 functioned as a competing endogenous RNA of miR-370-3p, thereby leading to the activation of its coding counterpart, FGF14. Clinically, we observed increased miR-370-3p expression in breast cancer tissues, whereas FGF14 expression was decreased in breast cancer tissues compared to the adjacent normal breast tissues. FGF14-AS2 expression was significantly negatively correlated with miR-370-3p expression, and correlated positively to FGF14 expression. Collectively, our findings support a model in which the FGF14-AS2/miR-370-3p/FGF14 axis is a critical regulator in breast cancer metastasis, suggesting a new therapeutic direction in breast cancer.
Collapse
Affiliation(s)
- Yucui Jin
- Jiangsu Key Laboratory of Xenotransplantation, Nanjing Medical University, Longmian Road 101, Nanjing, People's Republic of China.,Department of Medical Genetics, Nanjing Medical University, Longmian Road 101, Nanjing, People's Republic of China
| | - Ming Zhang
- Department of Medical Genetics, Nanjing Medical University, Longmian Road 101, Nanjing, People's Republic of China
| | - Rui Duan
- Department of Medical Genetics, Nanjing Medical University, Longmian Road 101, Nanjing, People's Republic of China
| | - Jiashu Yang
- Department of Medical Genetics, Nanjing Medical University, Longmian Road 101, Nanjing, People's Republic of China
| | - Ying Yang
- Department of Medical Genetics, Nanjing Medical University, Longmian Road 101, Nanjing, People's Republic of China
| | - Jue Wang
- Division of Breast Surgery, The First Affiliated Hospital of Nanjing Medical University, Nanjing, People's Republic of China
| | - Chaojun Jiang
- Division of Breast Surgery, The First Affiliated Hospital of Nanjing Medical University, Nanjing, People's Republic of China
| | - Bing Yao
- Department of Medical Genetics, Nanjing Medical University, Longmian Road 101, Nanjing, People's Republic of China
| | - Lingyun Li
- Jiangsu Key Laboratory of Xenotransplantation, Nanjing Medical University, Longmian Road 101, Nanjing, People's Republic of China.,Department of Medical Genetics, Nanjing Medical University, Longmian Road 101, Nanjing, People's Republic of China
| | - Hongyan Yuan
- Department of Oncology and Lombardi Comprehensive Cancer Center, Lombardi Comprehensive Cancer Center, Washington, DC USA
| | - Xiaoming Zha
- Division of Breast Surgery, The First Affiliated Hospital of Nanjing Medical University, Nanjing, People's Republic of China
| | - Changyan Ma
- Jiangsu Key Laboratory of Xenotransplantation, Nanjing Medical University, Longmian Road 101, Nanjing, People's Republic of China.,Department of Medical Genetics, Nanjing Medical University, Longmian Road 101, Nanjing, People's Republic of China
| |
Collapse
|
13
|
Fibroblast Growth Factor-14 Acts as Tumor Suppressor in Lung Adenocarcinomas. Cells 2020; 9:cells9081755. [PMID: 32707902 PMCID: PMC7466013 DOI: 10.3390/cells9081755] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2020] [Revised: 07/17/2020] [Accepted: 07/20/2020] [Indexed: 12/25/2022] Open
Abstract
Investigation of the molecular dynamics in lung cancer is crucial for the development of new treatment strategies. Fibroblast growth factor (FGF) 14 belongs to the FGF family, which might play a crucial role in cancer progression. We analyzed lung adenocarcinoma (LUAC) patients samples and found that FGF14 was downregulated, correlating with reduced survival and oncogenic mutation status. FGF14 overexpression in lung cancer cell lines resulted in decreased proliferation, colony formation, and migration, as well as increased expression of epithelial markers and a decreased expression of mesenchymal markers, indicating a mesenchymal to epithelial transition in vitro. We verified these findings using small interfering RNA against FGF14 and further confirmed the suppressive effect of FGF14 in a NOD.Cg-Prkdcscid Il2rgtm1Wjl/SzJ immunodeficient xenograft tumor model. Moreover, FGF14 overexpressing tumor cell RNA sequencing data suggests that genes affected by FGF14 were related to the extracellular matrix, playing a role in proliferation and migration. Notably, newly identified FGF14 target genes, adenosine deaminase RNA specific B1 (ADARB1), collagen and calcium-binding epidermal growth factor domain-containing protein 1 (CCBE1), α1 chain of collagen XI (COL11A1), and mucin 16 (MUC16) expression was negatively correlated with overall survival when FGF14 was downregulated in LUAC. These findings led us to suggest that FGF14 regulates proliferation and migration in LUAC.
Collapse
|
14
|
Su T, Huang L, Zhang N, Peng S, Li X, Wei G, Zhai E, Zeng Z, Xu L. FGF14 Functions as a Tumor Suppressor through Inhibiting PI3K/AKT/mTOR Pathway in Colorectal Cancer. J Cancer 2020; 11:819-825. [PMID: 31949485 PMCID: PMC6959027 DOI: 10.7150/jca.36316] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2019] [Accepted: 11/03/2019] [Indexed: 12/26/2022] Open
Abstract
We identified that Fibroblast Growth Factor 14 (FGF14) was preferentially methylated in colorectal cancer (CRC). In this study, we aimed to investigate the epigenetic regulation, biological function and molecular mechanism of FGF14 in CRC. The expression of FGF14 in CRC cell lines, normal human colon epithelial cell line, CRC tissues and paired adjacent normal tissues was detected by PCR and Western blot. The biological function of FGF14 in CRC was interrogated by cell viability assay, colony formation, flow cytometry, cell invasion and migration assay, as well as in vivo study. We found FGF14 was downregulated or silenced in all (10/10) CRC cell lines, while it was expressed in normal colonic tissues and normal human colon epithelial cell line. The expression of FGF14 was lower in primary CRCs as compared to their adjacent normal tissues. Significant higher methylation of FGF14 was observed in CRCs than that in normal tissues based on the data from TCGA database. The loss of FGF14 gene expression was restored by treatment with DNA methyltransferase inhibitor 5-Aza. Re-expression of FGF14 in CRC cell lines inhibited cell viability and colony formation, and induced cell apoptosis. FGF14 induced mitochondrial apoptosis and inhibited PI3K/AKT/mTOR pathway. In xenograft mouse model, overexpression of FGF14 significantly reduced tumor growth (P<0.001). In conclusion, FGF14 is a novel tumor suppressor, which suppresses cell proliferation and induces cell apoptosis via mediating PI3K/AKT/mTOR pathway.
Collapse
Affiliation(s)
- Tianhong Su
- Department of Liver Surgery, The First Affiliated Hospital, Sun Yat-sen University, Guangzhou, Guangdong, China
| | - Linlin Huang
- Department of Gastroenterology and Hepatology, The First Affiliated Hospital, Sun Yat-sen University, Guangzhou, Guangdong, China.,Department of Gastroenterology and Hepatology, Guangdong Provincial People's Hospital/Guangdong Academy of Medical Sciences, Guangzhou, Guangdong, China
| | - Ning Zhang
- Department of Gastroenterology and Hepatology, The First Affiliated Hospital, Sun Yat-sen University, Guangzhou, Guangdong, China
| | - Sui Peng
- Department of Gastroenterology and Hepatology, The First Affiliated Hospital, Sun Yat-sen University, Guangzhou, Guangdong, China.,Clinical Trials Unit, The First Affiliated Hospital, Sun Yat-sen University, Guangzhou, Guangdong, China.,Precision Medicine Institute, The First Affiliated Hospital, Sun Yat-sen University, Guangzhou, Guangdong, China
| | - Xiaoxing Li
- Precision Medicine Institute, The First Affiliated Hospital, Sun Yat-sen University, Guangzhou, Guangdong, China
| | - Guangyan Wei
- Department of Liver Surgery, The First Affiliated Hospital, Sun Yat-sen University, Guangzhou, Guangdong, China
| | - Ertao Zhai
- Department of Gastrointestinal Surgery, The First Affiliated Hospital, Sun Yat-sen University, Guangzhou, Guangdong, China
| | - Zhirong Zeng
- Department of Gastroenterology and Hepatology, The First Affiliated Hospital, Sun Yat-sen University, Guangzhou, Guangdong, China
| | - Lixia Xu
- Precision Medicine Institute, The First Affiliated Hospital, Sun Yat-sen University, Guangzhou, Guangdong, China.,Department of Oncology, The First Affiliated Hospital, Sun Yat-sen University, Guangdong, China
| |
Collapse
|
15
|
Šimčíková D, Heneberg P. Refinement of evolutionary medicine predictions based on clinical evidence for the manifestations of Mendelian diseases. Sci Rep 2019; 9:18577. [PMID: 31819097 PMCID: PMC6901466 DOI: 10.1038/s41598-019-54976-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2018] [Accepted: 11/21/2019] [Indexed: 12/28/2022] Open
Abstract
Prediction methods have become an integral part of biomedical and biotechnological research. However, their clinical interpretations are largely based on biochemical or molecular data, but not clinical data. Here, we focus on improving the reliability and clinical applicability of prediction algorithms. We assembled and curated two large non-overlapping large databases of clinical phenotypes. These phenotypes were caused by missense variations in 44 and 63 genes associated with Mendelian diseases. We used these databases to establish and validate the model, allowing us to improve the predictions obtained from EVmutation, SNAP2 and PoPMuSiC 2.1. The predictions of clinical effects suffered from a lack of specificity, which appears to be the common constraint of all recently used prediction methods, although predictions mediated by these methods are associated with nearly absolute sensitivity. We introduced evidence-based tailoring of the default settings of the prediction methods; this tailoring substantially improved the prediction outcomes. Additionally, the comparisons of the clinically observed and theoretical variations led to the identification of large previously unreported pools of variations that were under negative selection during molecular evolution. The evolutionary variation analysis approach described here is the first to enable the highly specific identification of likely disease-causing missense variations that have not yet been associated with any clinical phenotype.
Collapse
Affiliation(s)
- Daniela Šimčíková
- Charles University, Third Faculty of Medicine, Prague, Czech Republic
| | - Petr Heneberg
- Charles University, Third Faculty of Medicine, Prague, Czech Republic.
| |
Collapse
|
16
|
Bonetta R, Valentino G. Machine learning techniques for protein function prediction. Proteins 2019; 88:397-413. [PMID: 31603244 DOI: 10.1002/prot.25832] [Citation(s) in RCA: 67] [Impact Index Per Article: 13.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2019] [Revised: 07/05/2019] [Accepted: 09/17/2019] [Indexed: 12/17/2022]
Abstract
Proteins play important roles in living organisms, and their function is directly linked with their structure. Due to the growing gap between the number of proteins being discovered and their functional characterization (in particular as a result of experimental limitations), reliable prediction of protein function through computational means has become crucial. This paper reviews the machine learning techniques used in the literature, following their evolution from simple algorithms such as logistic regression to more advanced methods like support vector machines and modern deep neural networks. Hyperparameter optimization methods adopted to boost prediction performance are presented. In parallel, the metamorphosis in the features used by these algorithms from classical physicochemical properties and amino acid composition, up to text-derived features from biomedical literature and learned feature representations using autoencoders, together with feature selection and dimensionality reduction techniques, are also reviewed. The success stories in the application of these techniques to both general and specific protein function prediction are discussed.
Collapse
Affiliation(s)
- Rosalin Bonetta
- Centre for Molecular Medicine and Biobanking, University of Malta, Msida, Malta
| | - Gianluca Valentino
- Department of Communications and Computer Engineering, University of Malta, Msida, Malta
| |
Collapse
|
17
|
Hong J, Luo Y, Zhang Y, Ying J, Xue W, Xie T, Tao L, Zhu F. Protein functional annotation of simultaneously improved stability, accuracy and false discovery rate achieved by a sequence-based deep learning. Brief Bioinform 2019; 21:1437-1447. [PMID: 31504150 PMCID: PMC7412958 DOI: 10.1093/bib/bbz081] [Citation(s) in RCA: 90] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2019] [Revised: 05/27/2019] [Accepted: 06/10/2019] [Indexed: 11/12/2022] Open
Abstract
Functional annotation of protein sequence with high accuracy has become one of the most important issues in modern biomedical studies, and computational approaches of significantly accelerated analysis process and enhanced accuracy are greatly desired. Although a variety of methods have been developed to elevate protein annotation accuracy, their ability in controlling false annotation rates remains either limited or not systematically evaluated. In this study, a protein encoding strategy, together with a deep learning algorithm, was proposed to control the false discovery rate in protein function annotation, and its performances were systematically compared with that of the traditional similarity-based and de novo approaches. Based on a comprehensive assessment from multiple perspectives, the proposed strategy and algorithm were found to perform better in both prediction stability and annotation accuracy compared with other de novo methods. Moreover, an in-depth assessment revealed that it possessed an improved capacity of controlling the false discovery rate compared with traditional methods. All in all, this study not only provided a comprehensive analysis on the performances of the newly proposed strategy but also provided a tool for the researcher in the fields of protein function annotation.
Collapse
Affiliation(s)
- Jiajun Hong
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicine of Zhejiang Province, School of Medicine, Hangzhou Normal University, Hangzhou, China.,College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Yongchao Luo
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Yang Zhang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China.,School of Pharmaceutical Sciences, Chongqing University, Chongqing, China
| | - Junbiao Ying
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Weiwei Xue
- School of Pharmaceutical Sciences, Chongqing University, Chongqing, China
| | - Tian Xie
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicine of Zhejiang Province, School of Medicine, Hangzhou Normal University, Hangzhou, China
| | - Lin Tao
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicine of Zhejiang Province, School of Medicine, Hangzhou Normal University, Hangzhou, China
| | - Feng Zhu
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicine of Zhejiang Province, School of Medicine, Hangzhou Normal University, Hangzhou, China.,College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| |
Collapse
|
18
|
Nimrod G, Fischman S, Austin M, Herman A, Keyes F, Leiderman O, Hargreaves D, Strajbl M, Breed J, Klompus S, Minton K, Spooner J, Buchanan A, Vaughan TJ, Ofran Y. Computational Design of Epitope-Specific Functional Antibodies. Cell Rep 2018; 25:2121-2131.e5. [DOI: 10.1016/j.celrep.2018.10.081] [Citation(s) in RCA: 40] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2018] [Revised: 09/14/2018] [Accepted: 10/23/2018] [Indexed: 12/12/2022] Open
|
19
|
Improving conditional random field model for prediction of protein-RNA residue-base contacts. QUANTITATIVE BIOLOGY 2018. [DOI: 10.1007/s40484-018-0136-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/16/2022]
|