1
|
Nana Teukam YG, Kwate Dassi L, Manica M, Probst D, Schwaller P, Laino T. Language models can identify enzymatic binding sites in protein sequences. Comput Struct Biotechnol J 2024; 23:1929-1937. [PMID: 38736695 PMCID: PMC11087710 DOI: 10.1016/j.csbj.2024.04.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Revised: 04/05/2024] [Accepted: 04/05/2024] [Indexed: 05/14/2024] Open
Abstract
Recent advances in language modeling have had a tremendous impact on how we handle sequential data in science. Language architectures have emerged as a hotbed of innovation and creativity in natural language processing over the last decade, and have since gained prominence in modeling proteins and chemical processes, elucidating structural relationships from textual/sequential data. Surprisingly, some of these relationships refer to three-dimensional structural features, raising important questions on the dimensionality of the information encoded within sequential data. Here, we demonstrate that the unsupervised use of a language model architecture to a language representation of bio-catalyzed chemical reactions can capture the signal at the base of the substrate-binding site atomic interactions. This allows us to identify the three-dimensional binding site position in unknown protein sequences. The language representation comprises a reaction-simplified molecular-input line-entry system (SMILES) for substrate and products, and amino acid sequence information for the enzyme. This approach can recover, with no supervision, 52.13% of the binding site when considering co-crystallized substrate-enzyme structures as ground truth, vastly outperforming other attention-based models.
Collapse
Affiliation(s)
| | - Loïc Kwate Dassi
- IBM Research Europe, Saümerstrasse 4, 8803 Rüschlikon, Switzerland
| | - Matteo Manica
- IBM Research Europe, Saümerstrasse 4, 8803 Rüschlikon, Switzerland
| | - Daniel Probst
- IBM Research Europe, Saümerstrasse 4, 8803 Rüschlikon, Switzerland
- National Center for Competence in Research-Catalysis (NCCR-Catalysis), Switzerland
| | - Philippe Schwaller
- IBM Research Europe, Saümerstrasse 4, 8803 Rüschlikon, Switzerland
- National Center for Competence in Research-Catalysis (NCCR-Catalysis), Switzerland
| | - Teodoro Laino
- IBM Research Europe, Saümerstrasse 4, 8803 Rüschlikon, Switzerland
- National Center for Competence in Research-Catalysis (NCCR-Catalysis), Switzerland
| |
Collapse
|
2
|
Zhang E, Li Z, Dong L, Feng Y, Sun G, Xu X, Wang Z, Cui C, Wang W, Yang J. Exploration of Molecular Mechanisms of Immunity in the Pacific Oyster ( Crassostrea gigas) in Response to Vibrio alginolyticus Invasion. Animals (Basel) 2024; 14:1707. [PMID: 38891754 PMCID: PMC11171025 DOI: 10.3390/ani14111707] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2024] [Revised: 05/31/2024] [Accepted: 06/04/2024] [Indexed: 06/21/2024] Open
Abstract
Over the years, oysters have faced recurring mass mortality issues during the summer breeding season, with Vibrio infection emerging as a significant contributing factor. Tubules of gill filaments were confirmed to be in the hematopoietic position in Crassostrea gigas, which produce hemocytes with immune defense capabilities. Additionally, the epithelial cells of oyster gills produce immune effectors to defend against pathogens. In light of this, we performed a transcriptome analysis of gill tissues obtained from C. gigas infected with Vibrio alginolyticus for 12 h and 48 h. Through this analysis, we identified 1024 differentially expressed genes (DEGs) at 12 h post-injection and 1079 DEGs at 48 h post-injection. Enrichment analysis of these DEGs revealed a significant association with immune-related Gene Ontology (GO) terms and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. To further investigate the immune response, we constructed a protein-protein interaction (PPI) network using the DEGs enriched in immune-associated KEGG pathways. This network provided insights into the interactions and relationships among these genes, shedding light on the underlying mechanisms of the innate immune defense mechanism in oyster gills. To ensure the accuracy of our findings, we validated 16 key genes using quantitative RT-PCR. Overall, this study represents the first exploration of the innate immune defense mechanism in oyster gills using a PPI network approach. The findings provide valuable insights for future research on oyster pathogen control and the development of oysters with enhanced antimicrobial resistance.
Collapse
Affiliation(s)
- Enshuo Zhang
- School of Agriculture, Ludong University, Yantai 264025, China (Z.L.); (X.X.); (C.C.)
| | - Zan Li
- School of Agriculture, Ludong University, Yantai 264025, China (Z.L.); (X.X.); (C.C.)
- Yantai Haiyu Marine Technology Co., Ltd., Yantai 264000, China
| | - Luyao Dong
- College of Fisheries and Life Science, Shanghai Ocean University, Shanghai 201306, China
| | - Yanwei Feng
- School of Agriculture, Ludong University, Yantai 264025, China (Z.L.); (X.X.); (C.C.)
- Yantai Haiyu Marine Technology Co., Ltd., Yantai 264000, China
| | - Guohua Sun
- School of Agriculture, Ludong University, Yantai 264025, China (Z.L.); (X.X.); (C.C.)
- Yantai Haiyu Marine Technology Co., Ltd., Yantai 264000, China
| | - Xiaohui Xu
- School of Agriculture, Ludong University, Yantai 264025, China (Z.L.); (X.X.); (C.C.)
- Yantai Haiyu Marine Technology Co., Ltd., Yantai 264000, China
| | - Zhongping Wang
- Yantai Kongtong Island Industrial Co., Ltd., Yantai 264000, China
| | - Cuiju Cui
- School of Agriculture, Ludong University, Yantai 264025, China (Z.L.); (X.X.); (C.C.)
| | - Weijun Wang
- School of Agriculture, Ludong University, Yantai 264025, China (Z.L.); (X.X.); (C.C.)
- Yantai Haiyu Marine Technology Co., Ltd., Yantai 264000, China
- College of Fisheries and Life Science, Shanghai Ocean University, Shanghai 201306, China
- Yantai Kongtong Island Industrial Co., Ltd., Yantai 264000, China
| | - Jianmin Yang
- School of Agriculture, Ludong University, Yantai 264025, China (Z.L.); (X.X.); (C.C.)
- Yantai Haiyu Marine Technology Co., Ltd., Yantai 264000, China
- Yantai Kongtong Island Industrial Co., Ltd., Yantai 264000, China
| |
Collapse
|
3
|
Liu Y, Wu S, Lan K, Wang Q, Ye T, Jin H, Hu T, Xie T, Wei Q, Yin X. An Investigation of the JAZ Family and the CwMYC2-like Protein to Reveal Their Regulation Roles in the MeJA-Induced Biosynthesis of β-Elemene in Curcuma wenyujin. Int J Mol Sci 2023; 24:15004. [PMID: 37834452 PMCID: PMC10573570 DOI: 10.3390/ijms241915004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2023] [Revised: 09/28/2023] [Accepted: 10/03/2023] [Indexed: 10/15/2023] Open
Abstract
β-Elemene (C15H24), a sesquiterpenoid compound isolated from the volatile oil of Curcuma wenyujin, has been proven to be effective for multiple cancers and is widely used in clinical treatment. Unfortunately, the β-elemene content in C. wenyujin is very low, which cannot meet market demands. Our previous research showed that methyl jasmonate (MeJA) induced the accumulation of β-elemene in C. wenyujin. However, the regulatory mechanism is unclear. In this study, 20 jasmonate ZIM-domain (JAZ) proteins in C. wenyujin were identified, which are the core regulatory factors of the JA signaling pathway. Then, the conservative domains, motifs composition, and evolutionary relationships of CwJAZs were analyzed comprehensively and systematically. The interaction analysis indicated that CwJAZs can form homodimers or heterodimers. Fifteen out of twenty CwJAZs were significantly induced via MeJA treatment. As the master switch of the JA signaling pathway, the CwMYC2-like protein has also been identified and demonstrated to interact with CwJAZ2/3/4/5/7/15/17/20. Further research found that the overexpression of the CwMYC2-like gene increased the accumulation of β-elemene in C. wenyujin leaves. Simultaneously, the expressions of HMGR, HMGS, DXS, DXR, MCT, HDS, HDR, and FPPS related to β-elemene biosynthesis were also up-regulated by the CwMYC2-like protein. These results indicate that CwJAZs and the CwMYC2-like protein respond to the JA signal to regulate the biosynthesis of β-elemene in C. wenyujin.
Collapse
Affiliation(s)
- Yuyang Liu
- School of Pharmacy, Hangzhou Normal University, Hangzhou 311121, China; (Y.L.); (S.W.); (K.L.); (Q.W.); (T.Y.); (H.J.); (T.H.); (T.X.)
- Key Laboratory of Elemene Class Anti-Cancer Chinese Medicines, Engineering Laboratory of Development and Application of Traditional Chinese Medicines, Collaborative Innovation Center of Traditional Chinese Medicines of Zhejiang Province, Hangzhou Normal University, Hangzhou 311121, China
| | - Shiyi Wu
- School of Pharmacy, Hangzhou Normal University, Hangzhou 311121, China; (Y.L.); (S.W.); (K.L.); (Q.W.); (T.Y.); (H.J.); (T.H.); (T.X.)
| | - Kaer Lan
- School of Pharmacy, Hangzhou Normal University, Hangzhou 311121, China; (Y.L.); (S.W.); (K.L.); (Q.W.); (T.Y.); (H.J.); (T.H.); (T.X.)
| | - Qian Wang
- School of Pharmacy, Hangzhou Normal University, Hangzhou 311121, China; (Y.L.); (S.W.); (K.L.); (Q.W.); (T.Y.); (H.J.); (T.H.); (T.X.)
| | - Tingyu Ye
- School of Pharmacy, Hangzhou Normal University, Hangzhou 311121, China; (Y.L.); (S.W.); (K.L.); (Q.W.); (T.Y.); (H.J.); (T.H.); (T.X.)
| | - Huanan Jin
- School of Pharmacy, Hangzhou Normal University, Hangzhou 311121, China; (Y.L.); (S.W.); (K.L.); (Q.W.); (T.Y.); (H.J.); (T.H.); (T.X.)
- Key Laboratory of Elemene Class Anti-Cancer Chinese Medicines, Engineering Laboratory of Development and Application of Traditional Chinese Medicines, Collaborative Innovation Center of Traditional Chinese Medicines of Zhejiang Province, Hangzhou Normal University, Hangzhou 311121, China
| | - Tianyuan Hu
- School of Pharmacy, Hangzhou Normal University, Hangzhou 311121, China; (Y.L.); (S.W.); (K.L.); (Q.W.); (T.Y.); (H.J.); (T.H.); (T.X.)
- Key Laboratory of Elemene Class Anti-Cancer Chinese Medicines, Engineering Laboratory of Development and Application of Traditional Chinese Medicines, Collaborative Innovation Center of Traditional Chinese Medicines of Zhejiang Province, Hangzhou Normal University, Hangzhou 311121, China
| | - Tian Xie
- School of Pharmacy, Hangzhou Normal University, Hangzhou 311121, China; (Y.L.); (S.W.); (K.L.); (Q.W.); (T.Y.); (H.J.); (T.H.); (T.X.)
- Key Laboratory of Elemene Class Anti-Cancer Chinese Medicines, Engineering Laboratory of Development and Application of Traditional Chinese Medicines, Collaborative Innovation Center of Traditional Chinese Medicines of Zhejiang Province, Hangzhou Normal University, Hangzhou 311121, China
| | - Qiuhui Wei
- School of Pharmacy, Hangzhou Normal University, Hangzhou 311121, China; (Y.L.); (S.W.); (K.L.); (Q.W.); (T.Y.); (H.J.); (T.H.); (T.X.)
- Key Laboratory of Elemene Class Anti-Cancer Chinese Medicines, Engineering Laboratory of Development and Application of Traditional Chinese Medicines, Collaborative Innovation Center of Traditional Chinese Medicines of Zhejiang Province, Hangzhou Normal University, Hangzhou 311121, China
| | - Xiaopu Yin
- School of Pharmacy, Hangzhou Normal University, Hangzhou 311121, China; (Y.L.); (S.W.); (K.L.); (Q.W.); (T.Y.); (H.J.); (T.H.); (T.X.)
- Key Laboratory of Elemene Class Anti-Cancer Chinese Medicines, Engineering Laboratory of Development and Application of Traditional Chinese Medicines, Collaborative Innovation Center of Traditional Chinese Medicines of Zhejiang Province, Hangzhou Normal University, Hangzhou 311121, China
| |
Collapse
|
4
|
Saikat ASM. Computational approaches for molecular characterization and structure-based functional elucidation of a hypothetical protein from Mycobacterium tuberculosis. Genomics Inform 2023; 21:e25. [PMID: 37415455 PMCID: PMC10326535 DOI: 10.5808/gi.23001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2023] [Revised: 05/04/2023] [Accepted: 05/04/2023] [Indexed: 07/08/2023] Open
Abstract
Adaptation of infections and hosts has resulted in several metabolic mechanisms adopted by intracellular pathogens to combat the defense responses and the lack of fuel during infection. Human tuberculosis caused by Mycobacterium tuberculosis (MTB) is the world's first cause of mortality tied to a single disease. This study aims to characterize and anticipate potential antigen characteristics for promising vaccine candidates for the hypothetical protein of MTB through computational strategies. The protein is associated with the catalyzation of dithiol oxidation and/or disulfide reduction because of the protein's anticipated disulfide oxidoreductase properties. This investigation analyzed the protein's physicochemical characteristics, protein-protein interactions, subcellular locations, anticipated active sites, secondary and tertiary structures, allergenicity, antigenicity, and toxicity properties. The protein has significant active amino acid residues with no allergenicity, elevated antigenicity, and no toxicity.
Collapse
Affiliation(s)
- Abu Saim Mohammad Saikat
- Department of Biochemistry and Molecular Biology, Life Science Faculty, Bangabandhu Sheikh Mujibur Rahman Science and Technology University, Gopalganj 8100, Bangladesh
| |
Collapse
|
5
|
Figueiredo PR, Santos SFG, Almeida BC, Simões I, Carvalho ATP. Introduction of a Glycine Linker Connecting the Heavy and Light Chains in Synthetic Cardosin B-Derived Rennet Changes the Specificity of Subpocket S3'. J Phys Chem B 2021; 125:4368-4374. [PMID: 33905253 DOI: 10.1021/acs.jpcb.1c01826] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
The development of plant-based synthetic rennets is of high commercial interest, due to the current great consumer demand for animal product alternatives. A previously developed recombinant form of the aspartic protease cardosin B with a three-glycine linker showed great potential due to its good performance in milk coagulation. This enzyme was found to be more specific and less proteolytically active than the native form for milk clotting, but the underlying structural causes for these activity changes were not completely clear. Here, we have performed molecular dynamics simulations with the recombinant enzyme with and without the linker. Our results showed that the introduction of the linker changes the subpocket S3', which is located more than 4 nm away. These results showcase how small modifications in proteins can have significant effects in distant regions in the protein structure that affect their biotechnological applications.
Collapse
Affiliation(s)
- Pedro R Figueiredo
- CNC-Center for Neuroscience and Cell Biology, Institute for Interdisciplinary Research (IIIUC), University of Coimbra, 3004-504 Coimbra, Portugal
| | - Sónia F G Santos
- CNC-Center for Neuroscience and Cell Biology, Institute for Interdisciplinary Research (IIIUC), University of Coimbra, 3004-504 Coimbra, Portugal
| | - Beatriz C Almeida
- CNC-Center for Neuroscience and Cell Biology, Institute for Interdisciplinary Research (IIIUC), University of Coimbra, 3004-504 Coimbra, Portugal
| | - Isaura Simões
- CNC-Center for Neuroscience and Cell Biology, Institute for Interdisciplinary Research (IIIUC), University of Coimbra, 3004-504 Coimbra, Portugal
| | - Alexandra T P Carvalho
- CNC-Center for Neuroscience and Cell Biology, Institute for Interdisciplinary Research (IIIUC), University of Coimbra, 3004-504 Coimbra, Portugal
| |
Collapse
|
6
|
Hayes M, Mora L. Alternative Proteins as a Source of Bioactive Peptides: The Edible Snail and Generation of Hydrolysates Containing Peptides with Bioactive Potential for Use as Functional Foods. Foods 2021; 10:276. [PMID: 33573120 PMCID: PMC7912061 DOI: 10.3390/foods10020276] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2020] [Revised: 01/06/2021] [Accepted: 01/08/2021] [Indexed: 11/16/2022] Open
Abstract
Members of the Phylum Mollusca include shellfish such as oysters and squid but also the edible garden snail known as Helix aspersa. This snail species is consumed as a delicacy in countries including France (where they are known as petit-gris), southern Spain (where they are known as Bobe), Nigeria, Greece, Portugal and Italy but is not a traditional food in many other countries. However, it is considered an excellent protein source with a balanced amino acid profile and an environmentally friendly, sustainable protein source. The aim of this work was to develop a different dietary form of snail protein by generating protein hydrolysate ingredients from the edible snail using enzyme technology. A second aim was to assess the bioactive peptide content and potential health benefits of these hydrolysates. H. aspersa hydrolysates were made using the enzyme Alcalase® and the nutritional profile of these hydrolysates was determined. In addition, the bioactive peptide content of developed hydrolysates was identified using mass spectrometry. The potential heart health benefits of developed snail hydrolysates were measured in vitro using the Angiotensin-I-converting Enzyme (ACE-1; EC 3.4.15.1) inhibition assay, and the ACE-1 inhibitory drug Captopril© was used as a positive control. The generated H. aspersa hydrolysates were found to inhibit ACE-1 by 95.60% (±0.011) when assayed at a concentration of 1 mg/mL (n = 9) compared to the positive control Captopril© which inhibited ACE-1 by 96.53% (±0.0156) when assayed at a concentration of 0.005 mg/mL (n = 3). A total of 113 unique peptide sequences were identified following MS analysis with peptides identified ranging from 628.35 Da (peptide GGGLVGGI-protein accession number sp|P54334|XKDO_BACSU) to 2343.14 Da (peptide GPAGVPGLPGAKGDHGFPGSSGRRGD-protein accession number sp|Q7SIB2|CO4A1_BOVIN) in size using the BIOPEP-UWM database.
Collapse
Affiliation(s)
- Maria Hayes
- Teagasc Food Research Centre, Food BioSciences Department, Ashtown, Dublin 15, Ireland
| | - Leticia Mora
- Instituto de Agroquímica y Tecnología de Alimentos, Burjassot CSIC, 46980 Valencia, Spain;
| |
Collapse
|
7
|
Soybean (Glycine max) Protein Hydrolysates as Sources of Peptide Bitter-Tasting Indicators: An Analysis Based on Hybrid and Fragmentomic Approaches. APPLIED SCIENCES-BASEL 2020. [DOI: 10.3390/app10072514] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
The aim of this study was to analyze soybean proteins as sources of peptides likely to be bitter using fragmentomic and hybrid approaches involving in silico and in vitro studies. The bitterness of peptides (called parent peptides) was theoretically estimated based on the presence of bitter-tasting motifs, particularly those defined as bitter-tasting indicators. They were selected based on previously published multilinear stepwise regression results. Bioinformatic-assisted analyses covered the hydrolysis of five major soybean-originating protein sequences using bromelain, ficin, papain, and proteinase K. Verification of the results in experimental conditions included soy protein concentrate (SPC) hydrolysis, RP-HPLC (for monitoring the proteolysis), and identification of peptides using RP-HPLC-MS/MS. Discrepancies between in silico and in vitro results were observed when identifying parent peptide SPC hydrolysate samples. However, both analyses revealed that conglycinins were the most abundant sources of parent peptides likely to taste bitter. The compatibility percentage of the in silico and in vitro results was 3%. Nine parent peptides with the following sequences were identified in SPC hydrolysates: LSVISPK, DVLVIPLG, LIVILNG, NPFLFG, ISSTIV, PQMIIV, PFPSIL, DDFFL, and FFEITPEK (indicators are in bold). The fragmentomic idea of research might provide a supportive method for predicting the bitterness of hydrolysates. However, this statement needs to be confirmed experimentally.
Collapse
|
8
|
BIOPEP-UWM Database of Bioactive Peptides: Current Opportunities. Int J Mol Sci 2019; 20:ijms20235978. [PMID: 31783634 PMCID: PMC6928608 DOI: 10.3390/ijms20235978] [Citation(s) in RCA: 377] [Impact Index Per Article: 75.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2019] [Revised: 11/21/2019] [Accepted: 11/25/2019] [Indexed: 12/11/2022] Open
Abstract
The BIOPEP-UWM™ database of bioactive peptides (formerly BIOPEP) has recently become a popular tool in the research on bioactive peptides, especially on these derived from foods and being constituents of diets that prevent development of chronic diseases. The database is continuously updated and modified. The addition of new peptides and the introduction of new information about the existing ones (e.g., chemical codes and references to other databases) is in progress. New opportunities include the possibility of annotating peptides containing D-enantiomers of amino acids, batch processing option, converting amino acid sequences into SMILES code, new quantitative parameters characterizing the presence of bioactive fragments in protein sequences, and finding proteinases that release particular peptides.
Collapse
|
9
|
Kaiser F, Labudde D. Unsupervised Discovery of Geometrically Common Structural Motifs and Long-Range Contacts in Protein 3D Structures. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:671-680. [PMID: 29990265 DOI: 10.1109/tcbb.2017.2786250] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
The essential role of small evolutionarily conserved structural units in proteins has been extensively researched and validated. A popular example are serine proteases, where the peptide cleavage reaction is realized by a configuration of only three residues. Brought to spatial proximity during the protein folding process, such structural motifs are often long-range contacts and usually hard to detect at sequence level. Due to the constantly increasing resource of protein 3D structure data, the computational identification of structural motifs can contribute significantly to the understanding of protein fold and function. Thus, we propose a method to discover structural motifs of high geometrical similarity and desired sequence separation in protein 3D structure data. By utilizing methods originated from data mining, no a priori knowledge is required. The applicability of the method is demonstrated by the identification of the catalytic unit of serine proteases and the ion-coordination center of cupredoxins. Furthermore, large-scale analysis of the entire Protein Data Bank points towards the presence of ubiquitous structural motifs, independent of any specific fold or function. We envision that our method is suitable to uncover functional mechanisms and to derive fingerprint libraries of structural motifs, which could be used to assess protein family association.
Collapse
|
10
|
Ma L, Wang DD, Zou B, Yan H. An Eigen-Binding Site Based Method for the Analysis of Anti-EGFR Drug Resistance in Lung Cancer Treatment. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:1187-1194. [PMID: 27187970 DOI: 10.1109/tcbb.2016.2568184] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
We explore the drug resistance mechanism in non-small cell lung cancer treatment by characterizing the drug-binding site of a protein mutant based on local surface and energy features. These features are transformed to an eigen-binding site space and used for drug resistance level prediction and analysis.
Collapse
|
11
|
Liu ZP, Liu S, Chen R, Huang X, Wu LY. Structure alignment-based classification of RNA-binding pockets reveals regional RNA recognition motifs on protein surfaces. BMC Bioinformatics 2017; 18:27. [PMID: 28077065 PMCID: PMC5225598 DOI: 10.1186/s12859-016-1410-1] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2016] [Accepted: 12/07/2016] [Indexed: 11/23/2022] Open
Abstract
Background Many critical biological processes are strongly related to protein-RNA interactions. Revealing the protein structure motifs for RNA-binding will provide valuable information for deciphering protein-RNA recognition mechanisms and benefit complementary structural design in bioengineering. RNA-binding events often take place at pockets on protein surfaces. The structural classification of local binding pockets determines the major patterns of RNA recognition. Results In this work, we provide a novel framework for systematically identifying the structure motifs of protein-RNA binding sites in the form of pockets on regional protein surfaces via a structure alignment-based method. We first construct a similarity network of RNA-binding pockets based on a non-sequential-order structure alignment method for local structure alignment. By using network community decomposition, the RNA-binding pockets on protein surfaces are clustered into groups with structural similarity. With a multiple structure alignment strategy, the consensus RNA-binding pockets in each group are identified. The crucial recognition patterns, as well as the protein-RNA binding motifs, are then identified and analyzed. Conclusions Large-scale RNA-binding pockets on protein surfaces are grouped by measuring their structural similarities. This similarity network-based framework provides a convenient method for modeling the structural relationships of functional pockets. The local structural patterns identified serve as structure motifs for the recognition with RNA on protein surfaces. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1410-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Zhi-Ping Liu
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, Shandong, 250061, China
| | - Shutang Liu
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, Shandong, 250061, China
| | - Ruitang Chen
- Department of Computer Science, Stanford University, Stanford, CA, 94305, USA
| | - Xiaopeng Huang
- Institute of Applied Mathematics, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, 100190, China.,National Center for Mathematics and Interdisciplinary Sciences, Chinese Academy of Sciences, Beijing, 100190, China.,University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Ling-Yun Wu
- Institute of Applied Mathematics, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, 100190, China. .,National Center for Mathematics and Interdisciplinary Sciences, Chinese Academy of Sciences, Beijing, 100190, China. .,University of Chinese Academy of Sciences, Beijing, 100049, China.
| |
Collapse
|
12
|
gDNA-Prot: Predict DNA-binding proteins by employing support vector machine and a novel numerical characterization of protein sequence. J Theor Biol 2016; 406:8-16. [PMID: 27378005 DOI: 10.1016/j.jtbi.2016.06.002] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2016] [Revised: 05/19/2016] [Accepted: 06/01/2016] [Indexed: 11/24/2022]
Abstract
DNA-binding proteins are the functional proteins in cells, which play an important role in various essential biological activities. An effective and fast computational method gDNA-Prot is proposed to predict DNA-binding proteins in this paper, which is a DNA-binding predictor that combines the support vector machine classifier and a novel kind of feature called graphical representation. The DNA-binding protein sequence information was described with the 20 probabilities of amino acids and the 23 new numerical graphical representation features of a protein sequence, based on 23 physicochemical properties of 20 amino acids. The Principal Components Analysis (PCA) was employed as feature selection method for removing the irrelevant features and reducing redundant features. The Sigmod function and Min-max normalization methods for PCA were applied to accelerate the training speed and obtain higher accuracy. Experiments demonstrated that the Principal Components Analysis with Sigmod function generated the best performance. The gDNA-Prot method was also compared with the DNAbinder, iDNA-Prot and DNA-Prot. The results suggested that gDNA-Prot outperformed the DNAbinder and iDNA-Prot. Although the DNA-Prot outperformed gDNA-Prot, gDNA-Prot was faster and convenient to predict the DNA-binding proteins. Additionally, the proposed gNDA-Prot method is available at http://sourceforge.net/projects/gdnaprot.
Collapse
|
13
|
Minkiewicz P, Darewicz M, Iwaniak A, Sokołowska J, Starowicz P, Bucholska J, Hrynkiewicz M. Common Amino Acid Subsequences in a Universal Proteome--Relevance for Food Science. Int J Mol Sci 2015; 16:20748-73. [PMID: 26340620 PMCID: PMC4613229 DOI: 10.3390/ijms160920748] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2015] [Revised: 08/18/2015] [Accepted: 08/24/2015] [Indexed: 02/06/2023] Open
Abstract
A common subsequence is a fragment of the amino acid chain that occurs in more than one protein. Common subsequences may be an object of interest for food scientists as biologically active peptides, epitopes, and/or protein markers that are used in comparative proteomics. An individual bioactive fragment, in particular the shortest fragment containing two or three amino acid residues, may occur in many protein sequences. An individual linear epitope may also be present in multiple sequences of precursor proteins. Although recent recommendations for prediction of allergenicity and cross-reactivity include not only sequence identity, but also similarities in secondary and tertiary structures surrounding the common fragment, local sequence identity may be used to screen protein sequence databases for potential allergens in silico. The main weakness of the screening process is that it overlooks allergens and cross-reactivity cases without identical fragments corresponding to linear epitopes. A single peptide may also serve as a marker of a group of allergens that belong to the same family and, possibly, reveal cross-reactivity. This review article discusses the benefits for food scientists that follow from the common subsequences concept.
Collapse
Affiliation(s)
- Piotr Minkiewicz
- Department of Food Biochemistry, University of Warmia and Mazury in Olsztyn, Plac Cieszyński 1, Olsztyn-Kortowo 10-726, Poland.
| | - Małgorzata Darewicz
- Department of Food Biochemistry, University of Warmia and Mazury in Olsztyn, Plac Cieszyński 1, Olsztyn-Kortowo 10-726, Poland.
| | - Anna Iwaniak
- Department of Food Biochemistry, University of Warmia and Mazury in Olsztyn, Plac Cieszyński 1, Olsztyn-Kortowo 10-726, Poland.
| | - Jolanta Sokołowska
- Department of Food Biochemistry, University of Warmia and Mazury in Olsztyn, Plac Cieszyński 1, Olsztyn-Kortowo 10-726, Poland.
| | - Piotr Starowicz
- Department of Food Biochemistry, University of Warmia and Mazury in Olsztyn, Plac Cieszyński 1, Olsztyn-Kortowo 10-726, Poland.
| | - Justyna Bucholska
- Department of Food Biochemistry, University of Warmia and Mazury in Olsztyn, Plac Cieszyński 1, Olsztyn-Kortowo 10-726, Poland.
| | - Monika Hrynkiewicz
- Department of Food Biochemistry, University of Warmia and Mazury in Olsztyn, Plac Cieszyński 1, Olsztyn-Kortowo 10-726, Poland.
| |
Collapse
|
14
|
Meysman P, Zhou C, Cule B, Goethals B, Laukens K. Mining the entire Protein DataBank for frequent spatially cohesive amino acid patterns. BioData Min 2015; 8:4. [PMID: 25657820 PMCID: PMC4318390 DOI: 10.1186/s13040-015-0038-4] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2014] [Accepted: 01/18/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The three-dimensional structure of a protein is an essential aspect of its functionality. Despite the large diversity in protein structures and functionality, it is known that there are common patterns and preferences in the contacts between amino acid residues, or between residues and other biomolecules, such as DNA. The discovery and characterization of these patterns is an important research topic within structural biology as it can give fundamental insight into protein structures and can aid in the prediction of unknown structures. RESULTS Here we apply an efficient spatial pattern miner to search for sets of amino acids that occur frequently in close spatial proximity in the protein structures of the Protein DataBank. This allowed us to mine for a new class of amino acid patterns, that we term FreSCOs (Frequent Spatially Cohesive Component sets), which feature synergetic combinations. To demonstrate the relevance of these FreSCOs, they were compared in relation to the thermostability of the protein structure and the interaction preferences of DNA-protein complexes. In both cases, the results matched well with prior investigations using more complex methods on smaller data sets. CONCLUSIONS The currently characterized protein structures feature a diverse set of frequent amino acid patterns that can be related to the stability of the protein molecular structure and that are independent from protein function or specific conserved domains.
Collapse
Affiliation(s)
- Pieter Meysman
- Advanced Database Research and Modelling (ADReM), Department of Mathematics and Computer Science, University of Antwerp, Antwerp, Belgium
- Biomedical Informatics Research Center Antwerp (biomina), University of Antwerp/Antwerp University Hospital, Edegem, Belgium
| | - Cheng Zhou
- Advanced Database Research and Modelling (ADReM), Department of Mathematics and Computer Science, University of Antwerp, Antwerp, Belgium
| | - Boris Cule
- Advanced Database Research and Modelling (ADReM), Department of Mathematics and Computer Science, University of Antwerp, Antwerp, Belgium
| | - Bart Goethals
- Advanced Database Research and Modelling (ADReM), Department of Mathematics and Computer Science, University of Antwerp, Antwerp, Belgium
| | - Kris Laukens
- Advanced Database Research and Modelling (ADReM), Department of Mathematics and Computer Science, University of Antwerp, Antwerp, Belgium
- Biomedical Informatics Research Center Antwerp (biomina), University of Antwerp/Antwerp University Hospital, Edegem, Belgium
| |
Collapse
|
15
|
Wang W, Liu J, Xiong Y, Zhu L, Zhou X. Analysis and classification of DNA-binding sites in single-stranded and double-stranded DNA-binding proteins using protein information. IET Syst Biol 2014; 8:176-83. [PMID: 25075531 DOI: 10.1049/iet-syb.2013.0048] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
Single-stranded DNA-binding proteins (SSBs) and double-stranded DNA-binding proteins (DSBs) play different roles in biological processes when they bind to single-stranded DNA (ssDNA) or double-stranded DNA (dsDNA). However, the underlying binding mechanisms of SSBs and DSBs have not yet been fully understood. Here, the authors firstly constructed two groups of ssDNA and dsDNA specific binding sites from two non-redundant sets of SSBs and DSBs. They further analysed the relationship between the two classes of binding sites and a newly proposed set of features (residue charge distribution, secondary structure and spatial shape). To assess and utilise the predictive power of these features, they trained a classification model using support vector machine to make predictions about the ssDNA and the dsDNA binding sites. The author's analysis and prediction results indicated that the two classes of binding sites can be distinguishable by the three types of features, and the final classifier using all the features achieved satisfactory performance. In conclusion, the proposed features will deepen their understanding of the specificity of proteins which bind to ssDNA or dsDNA.
Collapse
Affiliation(s)
- Wei Wang
- School of Computer, Wuhan University, Wuhan, Hubei, People's Republic of China
| | - Juan Liu
- School of Computer, Wuhan University, Wuhan, Hubei, People's Republic of China.
| | - Yi Xiong
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana 47907, USA
| | - Lida Zhu
- School of Computer, Wuhan University, Wuhan, Hubei, People's Republic of China
| | - Xionghui Zhou
- School of Computer, Wuhan University, Wuhan, Hubei, People's Republic of China
| |
Collapse
|
16
|
newDNA-Prot: Prediction of DNA-binding proteins by employing support vector machine and a comprehensive sequence representation. Comput Biol Chem 2014; 52:51-9. [PMID: 25240115 DOI: 10.1016/j.compbiolchem.2014.09.002] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2014] [Revised: 09/05/2014] [Accepted: 09/06/2014] [Indexed: 11/21/2022]
Abstract
Identification of DNA-binding proteins is essential in studying cellular activities as the DNA-binding proteins play a pivotal role in gene regulation. In this study, we propose newDNA-Prot, a DNA-binding protein predictor that employs support vector machine classifier and a comprehensive feature representation. The sequence representation are categorized into 6 groups: primary sequence based, evolutionary profile based, predicted secondary structure based, predicted relative solvent accessibility based, physicochemical property based and biological function based features. The mRMR, wrapper and two-stage feature selection methods are employed for removing irrelevant features and reducing redundant features. Experiments demonstrate that the two-stage method performs better than the mRMR and wrapper methods. We also perform a statistical analysis on the selected features and results show that more than 95% of the selected features are statistically significant and they cover all 6 feature groups. The newDNA-Prot method is compared with several state of the art algorithms, including iDNA-Prot, DNAbinder and DNA-Prot. The results demonstrate that newDNA-Prot method outperforms the iDNA-Prot, DNAbinder and DNA-Prot methods. More specific, newDNA-Prot improves the runner-up method, DNA-Prot for around 10% on several evaluation measures. The proposed newDNA-Prot method is available at http://sourceforge.net/projects/newdnaprot/
Collapse
|
17
|
Zhou C, Meysman P, Cule B, Laukens K, Goethals B. Discovery of Spatially Cohesive Itemsets in Three-Dimensional Protein Structures. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2014; 11:814-825. [PMID: 26356855 DOI: 10.1109/tcbb.2014.2311795] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
In this paper we present a cohesive structural itemset miner aiming to discover interesting patterns in a set of data objects within a multidimensional spatial structure by combining the cohesion and the support of the pattern. We propose two ways to build the itemset miner, VertexOne and VertexAll, in an attempt to find a balance between accuracy and run-times. The experiments show that VertexOne performs better, and finds almost the same itemsets as VertexAll in a much shorter time. The usefulness of the method is demonstrated by applying it to find interesting patterns of amino acids in spatial proximity within a set of proteins based on their atomic coordinates in the protein molecular structure. Several patterns found by the cohesive structural itemset miner contain amino acids that frequently co-occur in the spatial structure, even if they are distant in the primary protein sequence and only brought together by protein folding. Further various indications were found that some of the discovered patterns seem to represent common underlying support structures within the proteins.
Collapse
|
18
|
Wang L, Zhang W, Gao Q, Xiong C. Prediction of hot spots in protein interfaces using extreme learning machines with the information of spatial neighbour residues. IET Syst Biol 2014; 8:184-90. [DOI: 10.1049/iet-syb.2013.0049] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Affiliation(s)
- Lin Wang
- School of Computer Science and Information Engineering, Tianjin University of Science and TechnologyTianjin300222People's Republic of China
| | - Wenjuan Zhang
- Faculty of Fundamental CoursesTianjin Foreign Studies UniversityTianjin300204People's Republic of China
| | - Qiang Gao
- Key Lab of Industrial Fermentation Microbiology, Ministry of Education & Tianjin CityCollege of Biotechnology, Tianjin University of Science and TechnologyTianjin300457People's Republic of China
| | - Congcong Xiong
- School of Computer Science and Information Engineering, Tianjin University of Science and TechnologyTianjin300222People's Republic of China
| |
Collapse
|
19
|
Zou C, Gong J, Li H. An improved sequence based prediction protocol for DNA-binding proteins using SVM and comprehensive feature analysis. BMC Bioinformatics 2013; 14:90. [PMID: 23497329 PMCID: PMC3602657 DOI: 10.1186/1471-2105-14-90] [Citation(s) in RCA: 55] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2012] [Accepted: 03/04/2013] [Indexed: 11/10/2022] Open
Abstract
Background DNA-binding proteins (DNA-BPs) play a pivotal role in both eukaryotic and prokaryotic proteomes. There have been several computational methods proposed in the literature to deal with the DNA-BPs, many informative features and properties were used and proved to have significant impact on this problem. However the ultimate goal of Bioinformatics is to be able to predict the DNA-BPs directly from primary sequence. Results In this work, the focus is how to transform these informative features into uniform numeric representation appropriately and improve the prediction accuracy of our SVM-based classifier for DNA-BPs. A systematic representation of some selected features known to perform well is investigated here. Firstly, four kinds of protein properties are obtained and used to describe the protein sequence. Secondly, three different feature transformation methods (OCTD, AC and SAA) are adopted to obtain numeric feature vectors from three main levels: Global, Nonlocal and Local of protein sequence and their performances are exhaustively investigated. At last, the mRMR-IFS feature selection method and ensemble learning approach are utilized to determine the best prediction model. Besides, the optimal features selected by mRMR-IFS are illustrated based on the observed results which may provide useful insights for revealing the mechanisms of protein-DNA interactions. For five-fold cross-validation over the DNAdset and DNAaset, we obtained an overall accuracy of 0.940 and 0.811, MCC of 0.881 and 0.614 respectively. Conclusions The good results suggest that it can efficiently develop an entirely sequence-based protocol that transforms and integrates informative features from different scales used by SVM to predict DNA-BPs accurately. Moreover, a novel systematic framework for sequence descriptor-based protein function prediction is proposed here.
Collapse
Affiliation(s)
- Chuanxin Zou
- Shanghai Key Laboratory of New Drug Design, State Key Laboratory of Bioreactor Engineering, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | | | | |
Collapse
|
20
|
MAURER-STROH SEBASTIAN, GAO HE, HAN HAO, BAETEN LIES, SCHYMKOWITZ JOOST, ROUSSEAU FREDERIC, ZHANG LOUXIN, EISENHABER FRANK. MOTIF DISCOVERY WITH DATA MINING IN 3D PROTEIN STRUCTURE DATABASES: DISCOVERY, VALIDATION AND PREDICTION OF THE U-SHAPE ZINC BINDING ("HUF-ZINC") MOTIF. J Bioinform Comput Biol 2013; 11:1340008. [DOI: 10.1142/s0219720013400088] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Data mining in protein databases, derivatives from more fundamental protein 3D structure and sequence databases, has considerable unearthed potential for the discovery of sequence motif—structural motif—function relationships as the finding of the U-shape (Huf-Zinc) motif, originally a small student's project, exemplifies. The metal ion zinc is critically involved in universal biological processes, ranging from protein-DNA complexes and transcription regulation to enzymatic catalysis and metabolic pathways. Proteins have evolved a series of motifs to specifically recognize and bind zinc ions. Many of these, so called zinc fingers, are structurally independent globular domains with discontinuous binding motifs made up of residues mostly far apart in sequence. Through a systematic approach starting from the BRIX structure fragment database, we discovered that there exists another predictable subset of zinc-binding motifs that not only have a conserved continuous sequence pattern but also share a characteristic local conformation, despite being included in totally different overall folds. While this does not allow general prediction of all Zn binding motifs, a HMM-based web server, Huf-Zinc, is available for prediction of these novel, as well as conventional, zinc finger motifs in protein sequences. The Huf-Zinc webserver can be freely accessed through this URL ( http://mendel.bii.a-star.edu.sg/METHODS/hufzinc/ ).
Collapse
Affiliation(s)
- SEBASTIAN MAURER-STROH
- Bioinformatics Institute (BII), Agency for Science and Technology (A*STAR), 30 Biopolis Street, #07-01, Matrix, 138671, Singapore
- School of Biological Sciences (SBS), Nanyang Technological University (NTU), 60 Nanyang Drive, 637551, Singapore
| | - HE GAO
- Bioinformatics Institute (BII), Agency for Science and Technology (A*STAR), 30 Biopolis Street, #07-01, Matrix, 138671, Singapore
- NUS Graduate School for Integrative Sciences and Engineering, National University of Singapore, Centre for Life Sciences, #05-01, 28 Medical Drive, Singapore 117456, Singapore
| | - HAO HAN
- Bioinformatics Institute (BII), Agency for Science and Technology (A*STAR), 30 Biopolis Street, #07-01, Matrix, 138671, Singapore
| | - LIES BAETEN
- VIB Switch Laboratory, Katholieke Universiteit Leuven, Herestraat 49, Box 802, 3000 Leuven, Belgium
| | - JOOST SCHYMKOWITZ
- VIB Switch Laboratory, Katholieke Universiteit Leuven, Herestraat 49, Box 802, 3000 Leuven, Belgium
| | - FREDERIC ROUSSEAU
- VIB Switch Laboratory, Katholieke Universiteit Leuven, Herestraat 49, Box 802, 3000 Leuven, Belgium
| | - LOUXIN ZHANG
- Department of Mathematics, National University of Singapore, 10 Lower Kent Ridge Road, Singapore 119076, Singapore
| | - FRANK EISENHABER
- Bioinformatics Institute (BII), Agency for Science and Technology (A*STAR), 30 Biopolis Street, #07-01, Matrix, 138671, Singapore
- Department of Biological Sciences (DBS), National University of Singapore (NUS), 8 Medical Drive 4, 117597, Singapore
- School of Computer Engineering (SCE), Nanyang Technological University (NTU), 50 Nanyang Drive, 637553, Singapore
| |
Collapse
|
21
|
Darewicz M, Dziuba B, Minkiewicz P, Dziuba J. The Preventive Potential of Milk and Colostrum Proteins and Protein Fragments. FOOD REVIEWS INTERNATIONAL 2011. [DOI: 10.1080/87559129.2011.563396] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
|
22
|
Daniluk P, Lesyng B. A novel method to compare protein structures using local descriptors. BMC Bioinformatics 2011; 12:344. [PMID: 21849047 PMCID: PMC3179968 DOI: 10.1186/1471-2105-12-344] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2011] [Accepted: 08/17/2011] [Indexed: 11/15/2022] Open
Abstract
Background Protein structure comparison is one of the most widely performed tasks in bioinformatics. However, currently used methods have problems with the so-called "difficult similarities", including considerable shifts and distortions of structure, sequential swaps and circular permutations. There is a demand for efficient and automated systems capable of overcoming these difficulties, which may lead to the discovery of previously unknown structural relationships. Results We present a novel method for protein structure comparison based on the formalism of local descriptors of protein structure - DEscriptor Defined Alignment (DEDAL). Local similarities identified by pairs of similar descriptors are extended into global structural alignments. We demonstrate the method's capability by aligning structures in difficult benchmark sets: curated alignments in the SISYPHUS database, as well as SISY and RIPC sets, including non-sequential and non-rigid-body alignments. On the most difficult RIPC set of sequence alignment pairs the method achieves an accuracy of 77% (the second best method tested achieves 60% accuracy). Conclusions DEDAL is fast enough to be used in whole proteome applications, and by lowering the threshold of detectable structure similarity it may shed additional light on molecular evolution processes. It is well suited to improving automatic classification of structure domains, helping analyze protein fold space, or to improving protein classification schemes. DEDAL is available online at http://bioexploratorium.pl/EP/DEDAL.
Collapse
Affiliation(s)
- Paweł Daniluk
- Faculty of Physics, Department of Biophysics and CoE BioExploratorium, University of Warsaw, Żwirki i Wigury 93, Warsaw, Poland
| | | |
Collapse
|
23
|
Liu ZP, Wu LY, Wang Y, Zhang XS, Chen L. Prediction of protein-RNA binding sites by a random forest method with combined features. ACTA ACUST UNITED AC 2010; 26:1616-22. [PMID: 20483814 DOI: 10.1093/bioinformatics/btq253] [Citation(s) in RCA: 120] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION Protein-RNA interactions play a key role in a number of biological processes, such as protein synthesis, mRNA processing, mRNA assembly, ribosome function and eukaryotic spliceosomes. As a result, a reliable identification of RNA binding site of a protein is important for functional annotation and site-directed mutagenesis. Accumulated data of experimental protein-RNA interactions reveal that a RNA binding residue with different neighbor amino acids often exhibits different preferences for its RNA partners, which in turn can be assessed by the interacting interdependence of the amino acid fragment and RNA nucleotide. RESULTS In this work, we propose a novel classification method to identify the RNA binding sites in proteins by combining a new interacting feature (interaction propensity) with other sequence- and structure-based features. Specifically, the interaction propensity represents a binding specificity of a protein residue to the interacting RNA nucleotide by considering its two-side neighborhood in a protein residue triplet. The sequence as well as the structure-based features of the residues are combined together to discriminate the interaction propensity of amino acids with RNA. We predict RNA interacting residues in proteins by implementing a well-built random forest classifier. The experiments show that our method is able to detect the annotated protein-RNA interaction sites in a high accuracy. Our method achieves an accuracy of 84.5%, F-measure of 0.85 and AUC of 0.92 prediction of the RNA binding residues for a dataset containing 205 non-homologous RNA binding proteins, and also outperforms several existing RNA binding residue predictors, such as RNABindR, BindN, RNAProB and PPRint, and some alternative machine learning methods, such as support vector machine, naive Bayes and neural network in the comparison study. Furthermore, we provide some biological insights into the roles of sequences and structures in protein-RNA interactions by both evaluating the importance of features for their contributions in predictive accuracy and analyzing the binding patterns of interacting residues. AVAILABILITY All the source data and code are available at http://www.aporc.org/doc/wiki/PRNA or http://www.sysbio.ac.cn/datatools.asp CONTACT lnchen@sibs.ac.cn SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Zhi-Ping Liu
- Key Laboratory of Systems Biology, SIBS-Novo Nordisk Translational Research Centre for PreDiabetes, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | | | | | | | | |
Collapse
|
24
|
Proteomic analysis reveals altered expression of proteins related to glutathione metabolism and apoptosis in the small intestine of zinc oxide-supplemented piglets. Amino Acids 2009; 37:209-18. [DOI: 10.1007/s00726-009-0242-y] [Citation(s) in RCA: 77] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2009] [Accepted: 01/12/2009] [Indexed: 10/21/2022]
|
25
|
Wang J, Wu G, Zhou H, Wang F. Emerging technologies for amino acid nutrition research in the post-genome era. Amino Acids 2008; 37:177-86. [DOI: 10.1007/s00726-008-0193-8] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2008] [Accepted: 10/05/2008] [Indexed: 12/30/2022]
|