1
|
Pilla SP, Bahadur RP. Residue conservation elucidates the evolution of r-proteins in ribosomal assembly and function. Int J Biol Macromol 2019; 140:323-329. [PMID: 31421176 DOI: 10.1016/j.ijbiomac.2019.08.127] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2019] [Revised: 08/14/2019] [Accepted: 08/14/2019] [Indexed: 02/08/2023]
Abstract
Ribosomes are the translational machineries having two unequal subunits, small subunit (SSU) and large subunit (LSU) across all the domains of life. Origin and evolution of ribosome are encoded in its structure, and the core of the ribosome is highly conserved. Here, we have used Shannon entropy to analyze the evolution of ribosomal proteins (r-proteins) across the three domains of life. Moreover, we have analyzed the residue conservation at protein-protein (PP) and protein-RNA (PR) interfaces in SSU and LSU. Furthermore, we have studied the evolution of early, intermediate and late binding r-proteins. We show that the r-proteins of Thermus thermophilus are better conserved during the evolution. Furthermore, we find the late binders are better conserved than the early and the intermediate binders. The residues at the interior of the r-proteins are the most conserved followed by those at the interface and the solvent accessible surface. Additionally, we show that the residues at the PP interfaces are better conserved than those at the PR interfaces. However, between PR and PP interfaces, the multi-interface residues at the former are better conserved than those at the latter ones. Our findings may provide insights into the evolution of r-proteins in ribosomal assembly and function.
Collapse
Affiliation(s)
- Smita P Pilla
- Computational Structural Biology Laboratory, Department of Biotechnology, Indian Institute of Technology Kharagpur, Kharagpur 721302, India
| | - Ranjit Prasad Bahadur
- Computational Structural Biology Laboratory, Department of Biotechnology, Indian Institute of Technology Kharagpur, Kharagpur 721302, India.
| |
Collapse
|
2
|
Pilla SP, R B, Bahadur RP. Dissecting protein‐protein interactions in proteasome assembly: Implication to its self‐assembly. J Mol Recognit 2019; 32:e2784. [DOI: 10.1002/jmr.2784] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2018] [Revised: 03/07/2019] [Accepted: 03/19/2019] [Indexed: 01/18/2023]
Affiliation(s)
- Smita P. Pilla
- Computational Structural Biology Laboratory, Department of BiotechnologyIndian Institute of Technology Kharagpur Kharagpur India
| | - Babu R
- Computational Structural Biology Laboratory, Department of BiotechnologyIndian Institute of Technology Kharagpur Kharagpur India
| | - Ranjit P. Bahadur
- Computational Structural Biology Laboratory, Department of BiotechnologyIndian Institute of Technology Kharagpur Kharagpur India
| |
Collapse
|
3
|
Raza K. Protein Features Identification for Machine Learning-Based Prediction of Protein-Protein Interactions. COMMUNICATIONS IN COMPUTER AND INFORMATION SCIENCE 2017:305-317. [DOI: 10.1007/978-981-10-6544-6_28] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/29/2023]
|
4
|
Barik A, Nithin C, Karampudi NBR, Mukherjee S, Bahadur RP. Probing binding hot spots at protein-RNA recognition sites. Nucleic Acids Res 2015; 44:e9. [PMID: 26365245 PMCID: PMC4737170 DOI: 10.1093/nar/gkv876] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2015] [Accepted: 08/23/2015] [Indexed: 01/30/2023] Open
Abstract
We use evolutionary conservation derived from structure alignment of polypeptide sequences along with structural and physicochemical attributes of protein–RNA interfaces to probe the binding hot spots at protein–RNA recognition sites. We find that the degree of conservation varies across the RNA binding proteins; some evolve rapidly compared to others. Additionally, irrespective of the structural class of the complexes, residues at the RNA binding sites are evolutionary better conserved than those at the solvent exposed surfaces. For recognitions involving duplex RNA, residues interacting with the major groove are better conserved than those interacting with the minor groove. We identify multi-interface residues participating simultaneously in protein–protein and protein–RNA interfaces in complexes where more than one polypeptide is involved in RNA recognition, and show that they are better conserved compared to any other RNA binding residues. We find that the residues at water preservation site are better conserved than those at hydrated or at dehydrated sites. Finally, we develop a Random Forests model using structural and physicochemical attributes for predicting binding hot spots. The model accurately predicts 80% of the instances of experimental ΔΔG values in a particular class, and provides a stepping-stone towards the engineering of protein–RNA recognition sites with desired affinity.
Collapse
Affiliation(s)
- Amita Barik
- Computational Structural Biology Laboratory, Department of Biotechnology, Indian Institute of Technology Kharagpur, Kharagpur-721302, India
| | - Chandran Nithin
- Computational Structural Biology Laboratory, Department of Biotechnology, Indian Institute of Technology Kharagpur, Kharagpur-721302, India
| | | | - Sunandan Mukherjee
- Computational Structural Biology Laboratory, Department of Biotechnology, Indian Institute of Technology Kharagpur, Kharagpur-721302, India
| | - Ranjit Prasad Bahadur
- Computational Structural Biology Laboratory, Department of Biotechnology, Indian Institute of Technology Kharagpur, Kharagpur-721302, India Advanced Technology Development Centre, Indian Institute of Technology Kharagpur, Kharagpur-721302, India
| |
Collapse
|
5
|
Yan J, Friedrich S, Kurgan L. A comprehensive comparative review of sequence-based predictors of DNA- and RNA-binding residues. Brief Bioinform 2015; 17:88-105. [DOI: 10.1093/bib/bbv023] [Citation(s) in RCA: 70] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2014] [Indexed: 01/07/2023] Open
|
6
|
Aumentado-Armstrong TT, Istrate B, Murgita RA. Algorithmic approaches to protein-protein interaction site prediction. Algorithms Mol Biol 2015; 10:7. [PMID: 25713596 PMCID: PMC4338852 DOI: 10.1186/s13015-015-0033-9] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2014] [Accepted: 01/07/2015] [Indexed: 12/19/2022] Open
Abstract
Interaction sites on protein surfaces mediate virtually all biological activities, and their identification holds promise for disease treatment and drug design. Novel algorithmic approaches for the prediction of these sites have been produced at a rapid rate, and the field has seen significant advancement over the past decade. However, the most current methods have not yet been reviewed in a systematic and comprehensive fashion. Herein, we describe the intricacies of the biological theory, datasets, and features required for modern protein-protein interaction site (PPIS) prediction, and present an integrative analysis of the state-of-the-art algorithms and their performance. First, the major sources of data used by predictors are reviewed, including training sets, evaluation sets, and methods for their procurement. Then, the features employed and their importance in the biological characterization of PPISs are explored. This is followed by a discussion of the methodologies adopted in contemporary prediction programs, as well as their relative performance on the datasets most recently used for evaluation. In addition, the potential utility that PPIS identification holds for rational drug design, hotspot prediction, and computational molecular docking is described. Finally, an analysis of the most promising areas for future development of the field is presented.
Collapse
|
7
|
Touw WG, Baakman C, Black J, te Beek TAH, Krieger E, Joosten RP, Vriend G. A series of PDB-related databanks for everyday needs. Nucleic Acids Res 2014; 43:D364-8. [PMID: 25352545 PMCID: PMC4383885 DOI: 10.1093/nar/gku1028] [Citation(s) in RCA: 599] [Impact Index Per Article: 59.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
We present a series of databanks (http://swift.cmbi.ru.nl/gv/facilities/) that hold information that is computationally derived from Protein Data Bank (PDB) entries and that might augment macromolecular structure studies. These derived databanks run parallel to the PDB, i.e. they have one entry per PDB entry. Several of the well-established databanks such as HSSP, PDBREPORT and PDB_REDO have been updated and/or improved. The software that creates the DSSP databank, for example, has been rewritten to better cope with π-helices. A large number of databanks have been added to aid computational structural biology; some examples are lists of residues that make crystal contacts, lists of contacting residues using a series of contact definitions or lists of residue accessibilities. PDB files are not the optimal presentation of the underlying data for many studies. We therefore made a series of databanks that hold PDB files in an easier to use or more consistent representation. The BDB databank holds X-ray PDB files with consistently represented B-factors. We also added several visualization tools to aid the users of our databanks.
Collapse
Affiliation(s)
- Wouter G Touw
- Centre for Molecular and Biomolecular Informatics, CMBI, Radboud university medical center, Geert Grooteplein Zuid 26-28 6525 GA Nijmegen, The Netherlands
| | - Coos Baakman
- Centre for Molecular and Biomolecular Informatics, CMBI, Radboud university medical center, Geert Grooteplein Zuid 26-28 6525 GA Nijmegen, The Netherlands
| | - Jon Black
- Centre for Molecular and Biomolecular Informatics, CMBI, Radboud university medical center, Geert Grooteplein Zuid 26-28 6525 GA Nijmegen, The Netherlands
| | - Tim A H te Beek
- Bio-Prodict BV, Nieuwe Marktstraat 54E, 6511 AA Nijmegen, The Netherlands
| | - E Krieger
- Centre for Molecular and Biomolecular Informatics, CMBI, Radboud university medical center, Geert Grooteplein Zuid 26-28 6525 GA Nijmegen, The Netherlands
| | - Robbie P Joosten
- Centre for Molecular and Biomolecular Informatics, CMBI, Radboud university medical center, Geert Grooteplein Zuid 26-28 6525 GA Nijmegen, The Netherlands Department of Biochemistry, Netherlands Cancer Institute, Plesmanlaan 121, Amsterdam 1066 CX, The Netherlands
| | - Gert Vriend
- Centre for Molecular and Biomolecular Informatics, CMBI, Radboud university medical center, Geert Grooteplein Zuid 26-28 6525 GA Nijmegen, The Netherlands
| |
Collapse
|
8
|
Papanikolaou N, Pavlopoulos GA, Pafilis E, Theodosiou T, Schneider R, Satagopam VP, Ouzounis CA, Eliopoulos AG, Promponas VJ, Iliopoulos I. BioTextQuest(+): a knowledge integration platform for literature mining and concept discovery. ACTA ACUST UNITED AC 2014; 30:3249-56. [PMID: 25100685 DOI: 10.1093/bioinformatics/btu524] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
SUMMARY The iterative process of finding relevant information in biomedical literature and performing bioinformatics analyses might result in an endless loop for an inexperienced user, considering the exponential growth of scientific corpora and the plethora of tools designed to mine PubMed(®) and related biological databases. Herein, we describe BioTextQuest(+), a web-based interactive knowledge exploration platform with significant advances to its predecessor (BioTextQuest), aiming to bridge processes such as bioentity recognition, functional annotation, document clustering and data integration towards literature mining and concept discovery. BioTextQuest(+) enables PubMed and OMIM querying, retrieval of abstracts related to a targeted request and optimal detection of genes, proteins, molecular functions, pathways and biological processes within the retrieved documents. The front-end interface facilitates the browsing of document clustering per subject, the analysis of term co-occurrence, the generation of tag clouds containing highly represented terms per cluster and at-a-glance popup windows with information about relevant genes and proteins. Moreover, to support experimental research, BioTextQuest(+) addresses integration of its primary functionality with biological repositories and software tools able to deliver further bioinformatics services. The Google-like interface extends beyond simple use by offering a range of advanced parameterization for expert users. We demonstrate the functionality of BioTextQuest(+) through several exemplary research scenarios including author disambiguation, functional term enrichment, knowledge acquisition and concept discovery linking major human diseases, such as obesity and ageing. AVAILABILITY The service is accessible at http://bioinformatics.med.uoc.gr/biotextquest. CONTACT g.pavlopoulos@gmail.com or georgios.pavlopoulos@esat.kuleuven.be SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Nikolas Papanikolaou
- Division of Basic Sciences, University of Crete, Medical School, Heraklion 71110, Greece, Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Heraklion, Greece, Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Campus Belval, 7, avenue des Hauts-Fourneaux, L-4362 Esch sur Alzette, Luxembourg, Biological Computation & Process Laboratory (BCPL), Chemical Process & Energy Resources Institute (CPERI), Centre for Research & Technology Hellas (CERTH), PO Box 361, GR-57001 Thessalonica, Greece, Donnelly Centre for Cellular & Biomolecular Research, University of Toronto, Toronto, Ontario, Canada, Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology Hellas, 70013 Heraklion, Crete, Greece and Department of Biological Sciences, Bioinformatics Research Laboratory, University of Cyprus, PO Box 20537, CY 1678, Nicosia, Cyprus
| | - Georgios A Pavlopoulos
- Division of Basic Sciences, University of Crete, Medical School, Heraklion 71110, Greece, Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Heraklion, Greece, Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Campus Belval, 7, avenue des Hauts-Fourneaux, L-4362 Esch sur Alzette, Luxembourg, Biological Computation & Process Laboratory (BCPL), Chemical Process & Energy Resources Institute (CPERI), Centre for Research & Technology Hellas (CERTH), PO Box 361, GR-57001 Thessalonica, Greece, Donnelly Centre for Cellular & Biomolecular Research, University of Toronto, Toronto, Ontario, Canada, Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology Hellas, 70013 Heraklion, Crete, Greece and Department of Biological Sciences, Bioinformatics Research Laboratory, University of Cyprus, PO Box 20537, CY 1678, Nicosia, Cyprus
| | - Evangelos Pafilis
- Division of Basic Sciences, University of Crete, Medical School, Heraklion 71110, Greece, Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Heraklion, Greece, Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Campus Belval, 7, avenue des Hauts-Fourneaux, L-4362 Esch sur Alzette, Luxembourg, Biological Computation & Process Laboratory (BCPL), Chemical Process & Energy Resources Institute (CPERI), Centre for Research & Technology Hellas (CERTH), PO Box 361, GR-57001 Thessalonica, Greece, Donnelly Centre for Cellular & Biomolecular Research, University of Toronto, Toronto, Ontario, Canada, Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology Hellas, 70013 Heraklion, Crete, Greece and Department of Biological Sciences, Bioinformatics Research Laboratory, University of Cyprus, PO Box 20537, CY 1678, Nicosia, Cyprus
| | - Theodosios Theodosiou
- Division of Basic Sciences, University of Crete, Medical School, Heraklion 71110, Greece, Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Heraklion, Greece, Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Campus Belval, 7, avenue des Hauts-Fourneaux, L-4362 Esch sur Alzette, Luxembourg, Biological Computation & Process Laboratory (BCPL), Chemical Process & Energy Resources Institute (CPERI), Centre for Research & Technology Hellas (CERTH), PO Box 361, GR-57001 Thessalonica, Greece, Donnelly Centre for Cellular & Biomolecular Research, University of Toronto, Toronto, Ontario, Canada, Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology Hellas, 70013 Heraklion, Crete, Greece and Department of Biological Sciences, Bioinformatics Research Laboratory, University of Cyprus, PO Box 20537, CY 1678, Nicosia, Cyprus
| | - Reinhard Schneider
- Division of Basic Sciences, University of Crete, Medical School, Heraklion 71110, Greece, Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Heraklion, Greece, Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Campus Belval, 7, avenue des Hauts-Fourneaux, L-4362 Esch sur Alzette, Luxembourg, Biological Computation & Process Laboratory (BCPL), Chemical Process & Energy Resources Institute (CPERI), Centre for Research & Technology Hellas (CERTH), PO Box 361, GR-57001 Thessalonica, Greece, Donnelly Centre for Cellular & Biomolecular Research, University of Toronto, Toronto, Ontario, Canada, Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology Hellas, 70013 Heraklion, Crete, Greece and Department of Biological Sciences, Bioinformatics Research Laboratory, University of Cyprus, PO Box 20537, CY 1678, Nicosia, Cyprus
| | - Venkata P Satagopam
- Division of Basic Sciences, University of Crete, Medical School, Heraklion 71110, Greece, Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Heraklion, Greece, Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Campus Belval, 7, avenue des Hauts-Fourneaux, L-4362 Esch sur Alzette, Luxembourg, Biological Computation & Process Laboratory (BCPL), Chemical Process & Energy Resources Institute (CPERI), Centre for Research & Technology Hellas (CERTH), PO Box 361, GR-57001 Thessalonica, Greece, Donnelly Centre for Cellular & Biomolecular Research, University of Toronto, Toronto, Ontario, Canada, Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology Hellas, 70013 Heraklion, Crete, Greece and Department of Biological Sciences, Bioinformatics Research Laboratory, University of Cyprus, PO Box 20537, CY 1678, Nicosia, Cyprus
| | - Christos A Ouzounis
- Division of Basic Sciences, University of Crete, Medical School, Heraklion 71110, Greece, Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Heraklion, Greece, Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Campus Belval, 7, avenue des Hauts-Fourneaux, L-4362 Esch sur Alzette, Luxembourg, Biological Computation & Process Laboratory (BCPL), Chemical Process & Energy Resources Institute (CPERI), Centre for Research & Technology Hellas (CERTH), PO Box 361, GR-57001 Thessalonica, Greece, Donnelly Centre for Cellular & Biomolecular Research, University of Toronto, Toronto, Ontario, Canada, Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology Hellas, 70013 Heraklion, Crete, Greece and Department of Biological Sciences, Bioinformatics Research Laboratory, University of Cyprus, PO Box 20537, CY 1678, Nicosia, Cyprus Division of Basic Sciences, University of Crete, Medical School, Heraklion 71110, Greece, Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Heraklion, Greece, Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Campus Belval, 7, avenue des Hauts-Fourneaux, L-4362 Esch sur Alzette, Luxembourg, Biological Computation & Process Laboratory (BCPL), Chemical Process & Energy Resources Institute (CPERI), Centre for Research & Technology Hellas (CERTH), PO Box 361, GR-57001 Thessalonica, Greece, Donnelly Centre for Cellular & Biomolecular Research, University of Toronto, Toronto, Ontario, Canada, Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology Hellas, 70013 Heraklion, Crete, Greece and Department of Biological Sciences, Bioinformatics Research Laboratory, University of Cyprus, PO Box 20537, CY 1678, Nicosia, Cyprus
| | - Aristides G Eliopoulos
- Division of Basic Sciences, University of Crete, Medical School, Heraklion 71110, Greece, Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Heraklion, Greece, Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Campus Belval, 7, avenue des Hauts-Fourneaux, L-4362 Esch sur Alzette, Luxembourg, Biological Computation & Process Laboratory (BCPL), Chemical Process & Energy Resources Institute (CPERI), Centre for Research & Technology Hellas (CERTH), PO Box 361, GR-57001 Thessalonica, Greece, Donnelly Centre for Cellular & Biomolecular Research, University of Toronto, Toronto, Ontario, Canada, Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology Hellas, 70013 Heraklion, Crete, Greece and Department of Biological Sciences, Bioinformatics Research Laboratory, University of Cyprus, PO Box 20537, CY 1678, Nicosia, Cyprus Division of Basic Sciences, University of Crete, Medical School, Heraklion 71110, Greece, Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Heraklion, Greece, Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Campus Belval, 7, avenue des Hauts-Fourneaux, L-4362 Esch sur Alzette, Luxembourg, Biological Computation & Process Laboratory (BCPL), Chemical Process & Energy Resources Institute (CPERI), Centre for Research & Technology Hellas (CERTH), PO Box 361, GR-57001 Thessalonica, Greece, Donnelly Centre for Cellular & Biomolecular Research, University of Toronto, Toronto, Ontario, Canada, Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology Hellas, 70013 Heraklion, Crete, Greece and Department of Biological Sciences, Bioinformatics Research Laboratory, University of Cyprus, PO Box 20537, CY 1678, Nicosia, Cyprus
| | - Vasilis J Promponas
- Division of Basic Sciences, University of Crete, Medical School, Heraklion 71110, Greece, Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Heraklion, Greece, Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Campus Belval, 7, avenue des Hauts-Fourneaux, L-4362 Esch sur Alzette, Luxembourg, Biological Computation & Process Laboratory (BCPL), Chemical Process & Energy Resources Institute (CPERI), Centre for Research & Technology Hellas (CERTH), PO Box 361, GR-57001 Thessalonica, Greece, Donnelly Centre for Cellular & Biomolecular Research, University of Toronto, Toronto, Ontario, Canada, Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology Hellas, 70013 Heraklion, Crete, Greece and Department of Biological Sciences, Bioinformatics Research Laboratory, University of Cyprus, PO Box 20537, CY 1678, Nicosia, Cyprus
| | - Ioannis Iliopoulos
- Division of Basic Sciences, University of Crete, Medical School, Heraklion 71110, Greece, Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Heraklion, Greece, Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Campus Belval, 7, avenue des Hauts-Fourneaux, L-4362 Esch sur Alzette, Luxembourg, Biological Computation & Process Laboratory (BCPL), Chemical Process & Energy Resources Institute (CPERI), Centre for Research & Technology Hellas (CERTH), PO Box 361, GR-57001 Thessalonica, Greece, Donnelly Centre for Cellular & Biomolecular Research, University of Toronto, Toronto, Ontario, Canada, Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology Hellas, 70013 Heraklion, Crete, Greece and Department of Biological Sciences, Bioinformatics Research Laboratory, University of Cyprus, PO Box 20537, CY 1678, Nicosia, Cyprus
| |
Collapse
|
9
|
Park K, Kim KB. miRTar Hunter: a prediction system for identifying human microRNA target sites. Mol Cells 2013; 35:195-201. [PMID: 23475422 PMCID: PMC3887917 DOI: 10.1007/s10059-013-2165-4] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2012] [Revised: 02/08/2013] [Accepted: 02/08/2013] [Indexed: 10/27/2022] Open
Abstract
MicroRNAs (miRNAs) are important regulators of gene expression and play crucial roles in many biological processes including apoptosis, differentiation, development, and tumorigenesis. Recent estimates suggest that more than 50% of human protein coding genes may be regulated by miRNAs and that each miRNA may bind to 300-400 target genes. Approximately 1,000 human miRNAs have been identified so far with each having up to hundreds of unique target mRNAs. However, the targets for a majority of these miRNAs have not been identified due to the lack of large-scale experimental detection techniques. Experimental detection of miRNA target sites is a costly and time-consuming process, even though identification of miRNA targets is critical to unraveling their functions in various biological processes. To identify miRNA targets, we developed miRTar Hunter, a novel computational approach for predicting target sites regardless of the presence or absence of a seed match or evolutionary sequence conservation. Our approach is based on a dynamic programming algorithm that incorporates more sequence-specific features and reflects the properties of various types of target sites that determine diverse aspects of complementarities between miRNAs and their targets. We evaluated the performance of our algorithm on 532 known human miRNA:target pairs and 59 experimentally-verified negative miRNA:target pairs, and also compared our method with three popular programs for 481 miRNA:target pairs. miRTar Hunter outperformed three popular existing algorithms in terms of recall and precision, indicating that our unique scheme to quantify the determinants of complementary sites is effective at detecting miRNA targets. miRTar Hunter is now available at http://203.230.194.162/~kbkim.
Collapse
Affiliation(s)
| | - Ki-Bong Kim
- Department of Biomedical Technology, Sangmyung University, Cheonan 330–720,
Korea
| |
Collapse
|
10
|
Noivirt-Brik O, Hazan G, Unger R, Ofran Y. Non-local residue–residue contacts in proteins are more conserved than local ones. Bioinformatics 2012. [DOI: 10.1093/bioinformatics/bts694] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
11
|
Chen R, Chen W, Yang S, Wu D, Wang Y, Tian Y, Shi Y. Rigorous assessment and integration of the sequence and structure based features to predict hot spots. BMC Bioinformatics 2011; 12:311. [PMID: 21798070 PMCID: PMC3176265 DOI: 10.1186/1471-2105-12-311] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2010] [Accepted: 07/29/2011] [Indexed: 12/02/2022] Open
Abstract
Background Systematic mutagenesis studies have shown that only a few interface residues termed hot spots contribute significantly to the binding free energy of protein-protein interactions. Therefore, hot spots prediction becomes increasingly important for well understanding the essence of proteins interactions and helping narrow down the search space for drug design. Currently many computational methods have been developed by proposing different features. However comparative assessment of these features and furthermore effective and accurate methods are still in pressing need. Results In this study, we first comprehensively collect the features to discriminate hot spots and non-hot spots and analyze their distributions. We find that hot spots have lower relASA and larger relative change in ASA, suggesting hot spots tend to be protected from bulk solvent. In addition, hot spots have more contacts including hydrogen bonds, salt bridges, and atomic contacts, which favor complexes formation. Interestingly, we find that conservation score and sequence entropy are not significantly different between hot spots and non-hot spots in Ab+ dataset (all complexes). While in Ab- dataset (antigen-antibody complexes are excluded), there are significant differences in two features between hot pots and non-hot spots. Secondly, we explore the predictive ability for each feature and the combinations of features by support vector machines (SVMs). The results indicate that sequence-based feature outperforms other combinations of features with reasonable accuracy, with a precision of 0.69, a recall of 0.68, an F1 score of 0.68, and an AUC of 0.68 on independent test set. Compared with other machine learning methods and two energy-based approaches, our approach achieves the best performance. Moreover, we demonstrate the applicability of our method to predict hot spots of two protein complexes. Conclusion Experimental results show that support vector machine classifiers are quite effective in predicting hot spots based on sequence features. Hot spots cannot be fully predicted through simple analysis based on physicochemical characteristics, but there is reason to believe that integration of features and machine learning methods can remarkably improve the predictive performance for hot spots.
Collapse
Affiliation(s)
- Ruoying Chen
- 1College of Life Sciences, Graduate University of Chinese Academy ofSciences, Beijing 100049, China
| | | | | | | | | | | | | |
Collapse
|
12
|
Shirdel EA, Xie W, Mak TW, Jurisica I. NAViGaTing the micronome--using multiple microRNA prediction databases to identify signalling pathway-associated microRNAs. PLoS One 2011; 6:e17429. [PMID: 21364759 PMCID: PMC3045450 DOI: 10.1371/journal.pone.0017429] [Citation(s) in RCA: 166] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2010] [Accepted: 02/02/2011] [Indexed: 02/07/2023] Open
Abstract
Background MicroRNAs are a class of small RNAs known to regulate gene expression at the transcript level, the protein level, or both. Since microRNA binding is sequence-based but possibly structure-specific, work in this area has resulted in multiple databases storing predicted microRNA:target relationships computed using diverse algorithms. We integrate prediction databases, compare predictions to in vitro data, and use cross-database predictions to model the microRNA:transcript interactome – referred to as the micronome – to study microRNA involvement in well-known signalling pathways as well as associations with disease. We make this data freely available with a flexible user interface as our microRNA Data Integration Portal — mirDIP (http://ophid.utoronto.ca/mirDIP). Results mirDIP integrates prediction databases to elucidate accurate microRNA:target relationships. Using NAViGaTOR to produce interaction networks implicating microRNAs in literature-based, KEGG-based and Reactome-based pathways, we find these signalling pathway networks have significantly more microRNA involvement compared to chance (p<0.05), suggesting microRNAs co-target many genes in a given pathway. Further examination of the micronome shows two distinct classes of microRNAs; universe microRNAs, which are involved in many signalling pathways; and intra-pathway microRNAs, which target multiple genes within one signalling pathway. We find universe microRNAs to have more targets (p<0.0001), to be more studied (p<0.0002), and to have higher degree in the KEGG cancer pathway (p<0.0001), compared to intra-pathway microRNAs. Conclusions Our pathway-based analysis of mirDIP data suggests microRNAs are involved in intra-pathway signalling. We identify two distinct classes of microRNAs, suggesting a hierarchical organization of microRNAs co-targeting genes both within and between pathways, and implying differential involvement of universe and intra-pathway microRNAs at the disease level.
Collapse
Affiliation(s)
- Elize A. Shirdel
- Department of Medical Biophysics, University of Toronto, Toronto, Ontario
- Ontario Cancer Institute, Princess Margaret Hospital/University Health Network and The Campbell Family Institute for Cancer Research, Toronto, Ontario, Canada
| | - Wing Xie
- Ontario Cancer Institute, Princess Margaret Hospital/University Health Network and The Campbell Family Institute for Cancer Research, Toronto, Ontario, Canada
| | - Tak W. Mak
- Department of Medical Biophysics, University of Toronto, Toronto, Ontario
- Campbell Family Institute for Breast Cancer Research, Ontario Cancer Institute, Princess Margaret Hospital/University Health Network, Toronto, Ontario, Canada
| | - Igor Jurisica
- Department of Medical Biophysics, University of Toronto, Toronto, Ontario
- Ontario Cancer Institute, Princess Margaret Hospital/University Health Network and The Campbell Family Institute for Cancer Research, Toronto, Ontario, Canada
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
- * E-mail:
| |
Collapse
|
13
|
Joosten RP, te Beek TAH, Krieger E, Hekkelman ML, Hooft RWW, Schneider R, Sander C, Vriend G. A series of PDB related databases for everyday needs. Nucleic Acids Res 2010; 39:D411-9. [PMID: 21071423 PMCID: PMC3013697 DOI: 10.1093/nar/gkq1105] [Citation(s) in RCA: 526] [Impact Index Per Article: 37.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
The Protein Data Bank (PDB) is the world-wide repository of macromolecular structure information. We present a series of databases that run parallel to the PDB. Each database holds one entry, if possible, for each PDB entry. DSSP holds the secondary structure of the proteins. PDBREPORT holds reports on the structure quality and lists errors. HSSP holds a multiple sequence alignment for all proteins. The PDBFINDER holds easy to parse summaries of the PDB file content, augmented with essentials from the other systems. PDB_REDO holds re-refined, and often improved, copies of all structures solved by X-ray. WHY_NOT summarizes why certain files could not be produced. All these systems are updated weekly. The data sets can be used for the analysis of properties of protein structures in areas ranging from structural genomics, to cancer biology and protein design.
Collapse
Affiliation(s)
- Robbie P Joosten
- Department of Biochemistry, Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX Amsterdam, The Netherlands
| | | | | | | | | | | | | | | |
Collapse
|
14
|
Zhang J, Gunner MR. Multiconformation continuum electrostatics analysis of the effects of a buried Asp introduced near heme a in Rhodobacter sphaeroides cytochrome c oxidase. Biochemistry 2010; 49:8043-52. [PMID: 20701325 DOI: 10.1021/bi100663u] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Cytochrome c oxidase (CcO) reduces O(2) to water via a series of proton-coupled electron transfers, generating a transmembrane electrochemical gradient. Coupling electron and proton transfer requires changing the pK(a) values of buried residues at each stage in the reaction cycle. Heme a is a key cofactor in the CcO electron transfer chain. Mutation of Ser44 to Asp has been reported [Mills, D. A., et al. (2008) Biochemistry 47, 11499-11509], changing the hydrogen bond acceptor from His102, the heme a axial ligand in Rhodobactor sphaeroides CcO. This adds an acidic residue to the CcO interior. The electrochemical behavior of heme a in wild-type and S44D CcO is compared using the continuum electrostatics program MCCE. The introduced, deeply buried Asp remains ionized at physiological pH only when the nearby heme is oxidized. Heme a reduction is now calculated to be strongly coupled to Asp proton binding, while with Ser44, it is weakly coupled to small protonation shifts at multiple sites, increasing the pH dependence in the mutant. At pH 7, the partially ionized Asp 44 is calculated to lower the heme redox potential by 50 mV as expected given the thermodynamics of coupled electron and proton transfers. This highlights an curious finding in the experimental results where a low Asp pK(a) is found together with a stabilized reduced heme. The stabilization of a heme oxidation in a model complex by a hydrogen bond to the axial His ligand calculated with continuum electrostatics and with density functional theory were in good agreement.
Collapse
Affiliation(s)
- Jun Zhang
- Physics Department, J-419, City College of New York, 160 Convent Avenue, New York, New York 10031, USA
| | | |
Collapse
|
15
|
Protein Secondary Structure Prediction with Bidirectional Recurrent Neural Nets: Can Weight Updating for Each Residue Enhance Performance? ACTA ACUST UNITED AC 2010. [DOI: 10.1007/978-3-642-16239-8_19] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]
|
16
|
Deng L, Guan J, Dong Q, Zhou S. Prediction of protein-protein interaction sites using an ensemble method. BMC Bioinformatics 2009; 10:426. [PMID: 20015386 PMCID: PMC2808167 DOI: 10.1186/1471-2105-10-426] [Citation(s) in RCA: 58] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2009] [Accepted: 12/16/2009] [Indexed: 01/23/2023] Open
Abstract
Background Prediction of protein-protein interaction sites is one of the most challenging and intriguing problems in the field of computational biology. Although much progress has been achieved by using various machine learning methods and a variety of available features, the problem is still far from being solved. Results In this paper, an ensemble method is proposed, which combines bootstrap resampling technique, SVM-based fusion classifiers and weighted voting strategy, to overcome the imbalanced problem and effectively utilize a wide variety of features. We evaluate the ensemble classifier using a dataset extracted from 99 polypeptide chains with 10-fold cross validation, and get a AUC score of 0.86, with a sensitivity of 0.76 and a specificity of 0.78, which are better than that of the existing methods. To improve the usefulness of the proposed method, two special ensemble classifiers are designed to handle the cases of missing homologues and structural information respectively, and the performance is still encouraging. The robustness of the ensemble method is also evaluated by effectively classifying interaction sites from surface residues as well as from all residues in proteins. Moreover, we demonstrate the applicability of the proposed method to identify interaction sites from the non-structural proteins (NS) of the influenza A virus, which may be utilized as potential drug target sites. Conclusion Our experimental results show that the ensemble classifiers are quite effective in predicting protein interaction sites. The Sub-EnClassifiers with resampling technique can alleviate the imbalanced problem and the combination of Sub-EnClassifiers with a wide variety of feature groups can significantly improve prediction performance.
Collapse
Affiliation(s)
- Lei Deng
- Department of Computer Science and Technology, Tongji University, Shanghai 201804, China.
| | | | | | | |
Collapse
|
17
|
Walsh I, Martin AJM, Mooney C, Rubagotti E, Vullo A, Pollastri G. Ab initio and homology based prediction of protein domains by recursive neural networks. BMC Bioinformatics 2009; 10:195. [PMID: 19558651 PMCID: PMC2711945 DOI: 10.1186/1471-2105-10-195] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2008] [Accepted: 06/26/2009] [Indexed: 11/10/2022] Open
Abstract
Background Proteins, especially larger ones, are often composed of individual evolutionary units, domains, which have their own function and structural fold. Predicting domains is an important intermediate step in protein analyses, including the prediction of protein structures. Results We describe novel systems for the prediction of protein domain boundaries powered by Recursive Neural Networks. The systems rely on a combination of primary sequence and evolutionary information, predictions of structural features such as secondary structure, solvent accessibility and residue contact maps, and structural templates, both annotated for domains (from the SCOP dataset) and unannotated (from the PDB). We gauge the contribution of contact maps, and PDB and SCOP templates independently and for different ranges of template quality. We find that accurately predicted contact maps are informative for the prediction of domain boundaries, while the same is not true for contact maps predicted ab initio. We also find that gap information from PDB templates is informative, but, not surprisingly, less than SCOP annotations. We test both systems trained on templates of all qualities, and systems trained only on templates of marginal similarity to the query (less than 25% sequence identity). While the first batch of systems produces near perfect predictions in the presence of fair to good templates, the second batch outperforms or match ab initio predictors down to essentially any level of template quality. We test all systems in 5-fold cross-validation on a large non-redundant set of multi-domain and single domain proteins. The final predictors are state-of-the-art, with a template-less prediction boundary recall of 50.8% (precision 38.7%) within ± 20 residues and a single domain recall of 80.3% (precision 78.1%). The SCOP-based predictors achieve a boundary recall of 74% (precision 77.1%) again within ± 20 residues, and classify single domain proteins as such in over 85% of cases, when we allow a mix of bad and good quality templates. If we only allow marginal templates (max 25% sequence identity to the query) the scores remain high, with boundary recall and precision of 59% and 66.3%, and 80% of all single domain proteins predicted correctly. Conclusion The systems presented here may prove useful in large-scale annotation of protein domains in proteins of unknown structure. The methods are available as public web servers at the address: and we plan on running them on a multi-genomic scale and make the results public in the near future.
Collapse
Affiliation(s)
- Ian Walsh
- School of Computer Science and Informatics, University College Dublin, Belfield, Dublin 4, Ireland.
| | | | | | | | | | | |
Collapse
|
18
|
Identification of DNA-binding proteins using structural, electrostatic and evolutionary features. J Mol Biol 2009; 387:1040-53. [PMID: 19233205 DOI: 10.1016/j.jmb.2009.02.023] [Citation(s) in RCA: 65] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2008] [Revised: 02/12/2009] [Accepted: 02/12/2009] [Indexed: 11/22/2022]
Abstract
DNA-binding proteins (DBPs) participate in various crucial processes in the life-cycle of the cells, and the identification and characterization of these proteins is of great importance. We present here a random forests classifier for identifying DBPs among proteins with known 3D structures. First, clusters of evolutionarily conserved regions (patches) on the surface of proteins were detected using the PatchFinder algorithm; earlier studies showed that these regions are typically the functionally important regions of proteins. Next, we trained a classifier using features like the electrostatic potential, cluster-based amino acid conservation patterns and the secondary structure content of the patches, as well as features of the whole protein, including its dipole moment. Using 10-fold cross-validation on a dataset of 138 DBPs and 110 proteins that do not bind DNA, the classifier achieved a sensitivity and a specificity of 0.90, which is overall better than the performance of published methods. Furthermore, when we tested five different methods on 11 new DBPs that did not appear in the original dataset, only our method annotated all correctly. The resulting classifier was applied to a collection of 757 proteins of known structure and unknown function. Of these proteins, 218 were predicted to bind DNA, and we anticipate that some of them interact with DNA using new structural motifs. The use of complementary computational tools supports the notion that at least some of them do bind DNA.
Collapse
|
19
|
Nimrod G, Schushan M, Steinberg DM, Ben-Tal N. Detection of functionally important regions in "hypothetical proteins" of known structure. Structure 2009; 16:1755-63. [PMID: 19081051 DOI: 10.1016/j.str.2008.10.017] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2008] [Revised: 10/16/2008] [Accepted: 10/19/2008] [Indexed: 10/21/2022]
Abstract
Structural genomics initiatives provide ample structures of "hypothetical proteins" (i.e., proteins of unknown function) at an ever increasing rate. However, without function annotation, this structural goldmine is of little use to biologists who are interested in particular molecular systems. To this end, we used (an improved version of) the PatchFinder algorithm for the detection of functional regions on the protein surface, which could mediate its interactions with, e.g., substrates, ligands, and other proteins. Examination, using a data set of annotated proteins, showed that PatchFinder outperforms similar methods. We collected 757 structures of hypothetical proteins and their predicted functional regions in the N-Func database. Inspection of several of these regions demonstrated that they are useful for function prediction. For example, we suggested an interprotein interface and a putative nucleotide-binding site. A web-server implementation of PatchFinder and the N-Func database are available at http://patchfinder.tau.ac.il/.
Collapse
Affiliation(s)
- Guy Nimrod
- Department of Biochemistry, George S. Wise Faculty of Life Sciences, Tel Aviv University, 69978 Tel Aviv, Israel
| | | | | | | |
Collapse
|
20
|
Chen XW, Jeong JC. Sequence-based prediction of protein interaction sites with an integrative method. Bioinformatics 2009; 25:585-91. [DOI: 10.1093/bioinformatics/btp039] [Citation(s) in RCA: 112] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
21
|
Reva B, Antipin Y, Sander C. Determinants of protein function revealed by combinatorial entropy optimization. Genome Biol 2008; 8:R232. [PMID: 17976239 PMCID: PMC2258190 DOI: 10.1186/gb-2007-8-11-r232] [Citation(s) in RCA: 222] [Impact Index Per Article: 13.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2007] [Accepted: 11/01/2007] [Indexed: 11/10/2022] Open
Abstract
We use a new algorithm (combinatorial entropy optimization [CEO]) to identify specificity residues and functional subfamilies in sets of proteins related by evolution. Specificity residues are conserved within a subfamily but differ between subfamilies, and they typically encode functional diversity. We obtain good agreement between predicted specificity residues and experimentally known functional residues in protein interfaces. Such predicted functional determinants are useful for interpreting the functional consequences of mutations in natural evolution and disease.
Collapse
Affiliation(s)
- Boris Reva
- Computational Biology Center, Memorial Sloan-Kettering Cancer Center, 1275 York Avenue, New York, NY 10065, USA.
| | | | | |
Collapse
|
22
|
Abstract
MOTIVATION Thousands of proteins are known to bind to DNA; for most of them the mechanism of action and the residues that bind to DNA, i.e. the binding sites, are yet unknown. Experimental identification of binding sites requires expensive and laborious methods such as mutagenesis and binding essays. Hence, such studies are not applicable on a large scale. If the 3D structure of a protein is known, it is often possible to predict DNA-binding sites in silico. However, for most proteins, such knowledge is not available. RESULTS It has been shown that DNA-binding residues have distinct biophysical characteristics. Here we demonstrate that these characteristics are so distinct that they enable accurate prediction of the residues that bind DNA directly from amino acid sequence, without requiring any additional experimental or structural information. In a cross-validation based on the largest non-redundant dataset of high-resolution protein-DNA complexes available today, we found that 89% of our predictions are confirmed by experimental data. Thus, it is now possible to identify DNA-binding sites on a proteomic scale even in the absence of any experimental data or 3D-structural information. AVAILABILITY http://cubic.bioc.columbia.edu/services/disis.
Collapse
Affiliation(s)
- Yanay Ofran
- Department of Biochemistry and Molecular Biophysics, Columbia University, 630 West 168th Street, New York, NY 10032, USA.
| | | | | |
Collapse
|
23
|
Ofran Y, Rost B. Protein-protein interaction hotspots carved into sequences. PLoS Comput Biol 2007; 3:e119. [PMID: 17630824 PMCID: PMC1914369 DOI: 10.1371/journal.pcbi.0030119] [Citation(s) in RCA: 177] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2006] [Accepted: 05/11/2007] [Indexed: 11/24/2022] Open
Abstract
Protein-protein interactions, a key to almost any biological process, are mediated by molecular mechanisms that are not entirely clear. The study of these mechanisms often focuses on all residues at protein-protein interfaces. However, only a small subset of all interface residues is actually essential for recognition or binding. Commonly referred to as "hotspots," these essential residues are defined as residues that impede protein-protein interactions if mutated. While no in silico tool identifies hotspots in unbound chains, numerous prediction methods were designed to identify all the residues in a protein that are likely to be a part of protein-protein interfaces. These methods typically identify successfully only a small fraction of all interface residues. Here, we analyzed the hypothesis that the two subsets correspond (i.e., that in silico methods may predict few residues because they preferentially predict hotspots). We demonstrate that this is indeed the case and that we can therefore predict directly from the sequence of a single protein which residues are interaction hotspots (without knowledge of the interaction partner). Our results suggested that most protein complexes are stabilized by similar basic principles. The ability to accurately and efficiently identify hotspots from sequence enables the annotation and analysis of protein-protein interaction hotspots in entire organisms and thus may benefit function prediction and drug development. The server for prediction is available at http://www.rostlab.org/services/isis.
Collapse
Affiliation(s)
- Yanay Ofran
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York, USA.
| | | |
Collapse
|
24
|
Tung CH, Huang JW, Yang JM. Kappa-alpha plot derived structural alphabet and BLOSUM-like substitution matrix for rapid search of protein structure database. Genome Biol 2007; 8:R31. [PMID: 17335583 PMCID: PMC1868941 DOI: 10.1186/gb-2007-8-3-r31] [Citation(s) in RCA: 65] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2006] [Revised: 01/05/2007] [Accepted: 03/03/2007] [Indexed: 11/23/2022] Open
Abstract
3D BLAST, a novel protein structure database search tool, is a useful tool for analysing novel structures, capable of returning a list of aligned structures ordered according to E-values. We present a novel protein structure database search tool, 3D-BLAST, that is useful for analyzing novel structures and can return a ranked list of alignments. This tool has the features of BLAST (for example, robust statistical basis, and effective and reliable search capabilities) and employs a kappa-alpha (κ, α) plot derived structural alphabet and a new substitution matrix. 3D-BLAST searches more than 12,000 protein structures in 1.2 s and yields good results in zones with low sequence similarity.
Collapse
Affiliation(s)
- Chi-Hua Tung
- Institute of Bioinformatics, National Chiao Tung University, 75 Po-Ai Street, Hsinchu, 30050, Taiwan
| | - Jhang-Wei Huang
- Institute of Bioinformatics, National Chiao Tung University, 75 Po-Ai Street, Hsinchu, 30050, Taiwan
| | - Jinn-Moon Yang
- Institute of Bioinformatics, National Chiao Tung University, 75 Po-Ai Street, Hsinchu, 30050, Taiwan
- Department of Biological Science and Technology, National Chiao Tung University, 75 Po-Ai Street, Hsinchu, 30050, Taiwan
- Core Facility for Structural Bioinformatics, National Chiao Tung University, 75 Po-Ai Street, Hsinchu, Taiwan
| |
Collapse
|
25
|
Abstract
MOTIVATION Large-scale experiments reveal pairs of interacting proteins but leave the residues involved in the interactions unknown. These interface residues are essential for understanding the mechanism of interaction and are often desired drug targets. Reliable identification of residues that reside in protein-protein interface typically requires analysis of protein structure. Therefore, for the vast majority of proteins, for which there is no high-resolution structure, there is no effective way of identifying interface residues. RESULTS Here we present a machine learning-based method that identifies interacting residues from sequence alone. Although the method is developed using transient protein-protein interfaces from complexes of experimentally known 3D structures, it never explicitly uses 3D information. Instead, we combine predicted structural features with evolutionary information. The strongest predictions of the method reached over 90% accuracy in a cross-validation experiment. Our results suggest that despite the significant diversity in the nature of protein-protein interactions, they all share common basic principles and that these principles are identifiable from sequence alone.
Collapse
Affiliation(s)
- Yanay Ofran
- CUBIC & North-East Structural Genomics Consortium, Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY 10032, USA.
| | | |
Collapse
|
26
|
Predicting residue contacts using pragmatic correlated mutations method: reducing the false positives. BMC Bioinformatics 2006; 7:503. [PMID: 17109752 PMCID: PMC1654194 DOI: 10.1186/1471-2105-7-503] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2006] [Accepted: 11/16/2006] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Predicting residues' contacts using primary amino acid sequence alone is an important task that can guide 3D structure modeling and can verify the quality of the predicted 3D structures. The correlated mutations (CM) method serves as the most promising approach and it has been used to predict amino acids pairs that are distant in the primary sequence but form contacts in the native 3D structure of homologous proteins. RESULTS Here we report a new implementation of the CM method with an added set of selection rules (filters). The parameters of the algorithm were optimized against fifteen high resolution crystal structures with optimization criterion that maximized the confidentiality of the predictions. The optimization resulted in a true positive ratio (TPR) of 0.08 for the CM without filters and a TPR of 0.14 for the CM with filters. The protocol was further benchmarked against 65 high resolution structures that were not included in the optimization test. The benchmarking resulted in a TPR of 0.07 for the CM without filters and to a TPR of 0.09 for the CM with filters. CONCLUSION Thus, the inclusion of selection rules resulted to an overall improvement of 30%. In addition, the pair-wise comparison of TPR for each protein without and with filters resulted in an average improvement of 1.7. The methodology was implemented into a web server http://www.ces.clemson.edu/compbio/recon that is freely available to the public. The purpose of this implementation is to provide the 3D structure predictors with a tool that can help with ranking alternative models by satisfying the largest number of predicted contacts, as well as it can provide a confidence score for contacts in cases where structure is known.
Collapse
|
27
|
Neshich G, Borro LC, Higa RH, Kuser PR, Yamagishi MEB, Franco EH, Krauchenco JN, Fileto R, Ribeiro AA, Bezerra GBP, Velludo TM, Jimenez TS, Furukawa N, Teshima H, Kitajima K, Bava A, Sarai A, Togawa RC, Mancini AL. The Diamond STING server. Nucleic Acids Res 2005; 33:W29-35. [PMID: 15980473 PMCID: PMC1160158 DOI: 10.1093/nar/gki397] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2005] [Revised: 03/14/2005] [Accepted: 03/14/2005] [Indexed: 11/21/2022] Open
Abstract
Diamond STING is a new version of the STING suite of programs for a comprehensive analysis of a relationship between protein sequence, structure, function and stability. We have added a number of new functionalities by both providing more structure parameters to the STING Database and by improving/expanding the interface for enhanced data handling. The integration among the STING components has also been improved. A new key feature is the ability of the STING server to handle local files containing protein structures (either modeled or not yet deposited to the Protein Data Bank) so that they can be used by the principal STING components: (Java)Protein Dossier ((J)PD) and STING Report. The current capabilities of the new STING version and a couple of biologically relevant applications are described here. We have provided an example where Diamond STING identifies the active site amino acids and folding essential amino acids (both previously determined by experiments) by filtering out all but those residues by selecting the numerical values/ranges for a set of corresponding parameters. This is the fundamental step toward a more interesting endeavor-the prediction of such residues. Diamond STING is freely accessible at http://sms.cbi.cnptia.embrapa.br and http://trantor.bioc.columbia.edu/SMS.
Collapse
Affiliation(s)
- Goran Neshich
- Núcleo de Bioinformática Estrutural, Embrapa/Informática Agropecuária Campinas, Brazil.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
28
|
Neshich G, Mancini AL, Yamagishi MEB, Kuser PR, Fileto R, Pinto IP, Palandrani JF, Krauchenco JN, Baudet C, Montagner AJ, Higa RH. STING Report: convenient web-based application for graphic and tabular presentations of protein sequence, structure and function descriptors from the STING database. Nucleic Acids Res 2005; 33:D269-74. [PMID: 15608194 PMCID: PMC540065 DOI: 10.1093/nar/gki111] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2004] [Revised: 10/18/2004] [Accepted: 10/18/2004] [Indexed: 11/25/2022] Open
Abstract
The Sting Report is a versatile web-based application for extraction and presentation of detailed information about any individual amino acid of a protein structure stored in the STING Database. The extracted information is presented as a series of GIF images and tables, containing the values of up to 125 sequence/structure/function descriptors/parameters. The GIF images are generated by the Gold STING modules. The HTML page resulting from the STING Report query can be printed and, most importantly, it can be composed and visualized on a computer platform with an elementary configuration. Using the STING Report, a user can generate a collection of customized reports for amino acids of specific interest. Such a collection comes as an ideal match for a demand for the rapid and detailed consultation and documentation of data about structure/function. The inclusion of information generated with STING Report in a research report or even a textbook, allows for the increased density of its contents. STING Report is freely accessible within the Gold STING Suite at http://www.cbi.cnptia.embrapa.br, http://www.es.embnet.org/SMS/, http://gibk26.bse.kyutech.ac.jp/SMS/ and http://trantor.bioc.columbia.edu/SMS (option: STING Report).
Collapse
Affiliation(s)
- Goran Neshich
- Núcleo de Bioinformática Estrutural, Embrapa/Informática Agropecuária, 13083-886 Campinas, Brazil.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
29
|
Higa RH, Togawa RC, Montagner AJ, Palandrani JCF, Okimoto IKS, Kuser PR, Yamagishi MEB, Mancini AL, Neshich G. STING Millennium Suite: integrated software for extensive analyses of 3d structures of proteins and their complexes. BMC Bioinformatics 2004; 5:107. [PMID: 15301693 PMCID: PMC514601 DOI: 10.1186/1471-2105-5-107] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2004] [Accepted: 08/09/2004] [Indexed: 11/12/2022] Open
Abstract
BACKGROUND The integration of many aspects of protein/DNA structure analysis is an important requirement for software products in general area of structural bioinformatics. In fact, there are too few software packages on the internet which can be described as successful in this respect. We might say that what is still missing is publicly available, web based software for interactive analysis of the sequence/structure/function of proteins and their complexes with DNA and ligands. Some of existing software packages do have certain level of integration and do offer analysis of several structure related parameters, however not to the extent generally demanded by a user. RESULTS We are reporting here about new Sting Millennium Suite (SMS) version which is fully accessible (including for local files at client end), web based software for molecular structure and sequence/structure/function analysis. The new SMS client version is now operational also on Linux boxes and it works with non-public pdb formatted files (structures not deposited at the RCSB/PDB), eliminating earlier requirement for the registration if SMS components were to be used with user's local files. At the same time the new SMS offers some important additions and improvements such as link to ProTherm as well as significant re-engineering of SMS component ConSSeq. Also, we have added 3 new SMS mirror sites to existing network of global SMS servers: Argentina, Japan and Spain. CONCLUSION SMS is already established software package and many key data base and software servers worldwide, do offer either a link to, or host the SMS. SMS (Sting Millennium Suite) is web-based publicly available software developed to aid researches in their quest for translating information about the structures of macromolecules into knowledge. SMS allows to a user to interactively analyze molecular structures, cross-referencing visualized information with a correlated one, available across the internet. SMS is already used as a didactic tool by some universities. SMS analysis is now possible on Linux OS boxes and with no requirement for registration when using local files.
Collapse
Affiliation(s)
- Roberto H Higa
- Núcleo de Bioinformática, Centro Nacional de Pesquisa Agropecuária, Empresa Brasileira de Pesquisa Agropecuária, Campinas, SP, Brazil
| | - Roberto C Togawa
- Laboratório de Bioinformática, Embrapa/Recursos Genéticos e Biotecnologia, Empresa Brasileira de Pesquisa Agropecuária, Brasília, DF, Brazil
| | - Arnaldo J Montagner
- Núcleo de Bioinformática, Centro Nacional de Pesquisa Agropecuária, Empresa Brasileira de Pesquisa Agropecuária, Campinas, SP, Brazil
| | - Juliana CF Palandrani
- Núcleo de Bioinformática, Centro Nacional de Pesquisa Agropecuária, Empresa Brasileira de Pesquisa Agropecuária, Campinas, SP, Brazil
| | - Igor KS Okimoto
- Núcleo de Bioinformática, Centro Nacional de Pesquisa Agropecuária, Empresa Brasileira de Pesquisa Agropecuária, Campinas, SP, Brazil
| | - Paula R Kuser
- Núcleo de Bioinformática, Centro Nacional de Pesquisa Agropecuária, Empresa Brasileira de Pesquisa Agropecuária, Campinas, SP, Brazil
| | - Michel EB Yamagishi
- Núcleo de Bioinformática, Centro Nacional de Pesquisa Agropecuária, Empresa Brasileira de Pesquisa Agropecuária, Campinas, SP, Brazil
| | - Adauto L Mancini
- Núcleo de Bioinformática, Centro Nacional de Pesquisa Agropecuária, Empresa Brasileira de Pesquisa Agropecuária, Campinas, SP, Brazil
| | - Goran Neshich
- Núcleo de Bioinformática, Centro Nacional de Pesquisa Agropecuária, Empresa Brasileira de Pesquisa Agropecuária, Campinas, SP, Brazil
| |
Collapse
|
30
|
Neshich G, Rocchia W, Mancini AL, Yamagishi MEB, Kuser PR, Fileto R, Baudet C, Pinto IP, Montagner AJ, Palandrani JF, Krauchenco JN, Torres RC, Souza S, Togawa RC, Higa RH. JavaProtein Dossier: a novel web-based data visualization tool for comprehensive analysis of protein structure. Nucleic Acids Res 2004; 32:W595-601. [PMID: 15215458 PMCID: PMC441618 DOI: 10.1093/nar/gkh480] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2004] [Revised: 04/15/2004] [Accepted: 05/04/2004] [Indexed: 11/13/2022] Open
Abstract
JavaProtein Dossier ((J)PD) is a new concept, database and visualization tool providing one of the largest collections of the physicochemical parameters describing proteins' structure, stability, function and interaction with other macromolecules. By collecting as many descriptors/parameters as possible within a single database, we can achieve a better use of the available data and information. Furthermore, data grouping allows us to generate different parameters with the potential to provide new insights into the sequence-structure-function relationship. In (J)PD, residue selection can be performed according to multiple criteria. (J)PD can simultaneously display and analyze all the physicochemical parameters of any pair of structures, using precalculated structural alignments, allowing direct parameter comparison at corresponding amino acid positions among homologous structures. In order to focus on the physicochemical (and consequently pharmacological) profile of proteins, visualization tools (showing the structure and structural parameters) also had to be optimized. Our response to this challenge was the use of Java technology with its exceptional level of interactivity. (J)PD is freely accessible (within the Gold Sting Suite) at http://sms.cbi.cnptia.embrapa.br, http://mirrors.rcsb.org/SMS, http://trantor.bioc.columbia.edu/SMS and http://www.es.embnet.org/SMS/ (Option: (Java)Protein Dossier).
Collapse
Affiliation(s)
- Goran Neshich
- Núcleo de Bioinformática Estrutural, Embrapa/Informática Agropecuária, Campinas, Brazil.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
31
|
Enright AJ, John B, Gaul U, Tuschl T, Sander C, Marks DS. MicroRNA targets in Drosophila. Genome Biol 2003; 5:R1. [PMID: 14709173 PMCID: PMC395733 DOI: 10.1186/gb-2003-5-1-r1] [Citation(s) in RCA: 2496] [Impact Index Per Article: 118.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2003] [Revised: 11/14/2003] [Accepted: 11/21/2003] [Indexed: 12/15/2022] Open
Abstract
A computational method for whole-genome prediction of microRNA target genes is presented. Application of this method to the Drosophila melanogaster, Drosophila pseudoobscura and Anopheles gambiae genomes identifies several hundred target genes potentially regulated by one or more known microRNAs. Background The recent discoveries of microRNA (miRNA) genes and characterization of the first few target genes regulated by miRNAs in Caenorhabditis elegans and Drosophila melanogaster have set the stage for elucidation of a novel network of regulatory control. We present a computational method for whole-genome prediction of miRNA target genes. The method is validated using known examples. For each miRNA, target genes are selected on the basis of three properties: sequence complementarity using a position-weighted local alignment algorithm, free energies of RNA-RNA duplexes, and conservation of target sites in related genomes. Application to the D. melanogaster, Drosophila pseudoobscura and Anopheles gambiae genomes identifies several hundred target genes potentially regulated by one or more known miRNAs. Results These potential targets are rich in genes that are expressed at specific developmental stages and that are involved in cell fate specification, morphogenesis and the coordination of developmental processes, as well as genes that are active in the mature nervous system. High-ranking target genes are enriched in transcription factors two-fold and include genes already known to be under translational regulation. Our results reaffirm the thesis that miRNAs have an important role in establishing the complex spatial and temporal patterns of gene activity necessary for the orderly progression of development and suggest additional roles in the function of the mature organism. In addition the results point the way to directed experiments to determine miRNA functions. Conclusions The emerging combinatorics of miRNA target sites in the 3' untranslated regions of messenger RNAs are reminiscent of transcriptional regulation in promoter regions of DNA, with both one-to-many and many-to-one relationships between regulator and target. Typically, more than one miRNA regulates one message, indicative of cooperative translational control. Conversely, one miRNA may have several target genes, reflecting target multiplicity. As a guide to focused experiments, we provide detailed online information about likely target genes and binding sites in their untranslated regions, organized by miRNA or by gene and ranked by likelihood of match. The target prediction algorithm is freely available and can be applied to whole genome sequences using identified miRNA sequences.
Collapse
Affiliation(s)
- Anton J Enright
- Computational Biology Center, Memorial Sloan-Kettering Cancer Center, 1275 York Avenue, New York, NY 10021, USA
| | - Bino John
- Computational Biology Center, Memorial Sloan-Kettering Cancer Center, 1275 York Avenue, New York, NY 10021, USA
| | - Ulrike Gaul
- Laboratory of Developmental Neurogenetics, The Rockefeller University, 1230 York Avenue, New York, NY 10021, USA
| | - Thomas Tuschl
- Laboratory of RNA Molecular Biology, The Rockefeller University, 1230 York Avenue, New York, NY 10021, USA
| | - Chris Sander
- Computational Biology Center, Memorial Sloan-Kettering Cancer Center, 1275 York Avenue, New York, NY 10021, USA
| | - Debora S Marks
- Columbia Genome Center, Russ Berrie Pavilion, 1150 St. Nicholas Avenue, New York, NY 10032, USA
| |
Collapse
|
32
|
Abstract
Enzyme function conservation has been used to derive the threshold of sequence identity necessary to transfer function from a protein of known function to an unknown protein. Using pairwise sequence comparison, several studies suggested that when the sequence identity is above 40%, enzyme function is well conserved. In contrast, Rost argued that because of database bias, the results from such simple pairwise comparisons might be misleading. Thus, by grouping enzyme sequences into families based on sequence similarity and selecting representative sequences for comparison, he showed that enzyme function starts to diverge quickly when the sequence identity is below 70%. Here, we employ a strategy similar to Rost's to reduce the database bias; however, we classify enzyme families based not only on sequence similarity, but also on functional similarity, i.e. sequences in each family must have the same four digits or the same first three digits of the enzyme commission (EC) number. Furthermore, instead of selecting representative sequences for comparison, we calculate the function conservation of each enzyme family and then average the degree of enzyme function conservation across all enzyme families. Our analysis suggests that for functional transferability, 40% sequence identity can still be used as a confident threshold to transfer the first three digits of an EC number; however, to transfer all four digits of an EC number, above 60% sequence identity is needed to have at least 90% accuracy. Moreover, when PSI-BLAST is used, the magnitude of the E-value is found to be weakly correlated with the extent of enzyme function conservation in the third iteration of PSI-BLAST. As a result, functional annotation based on the E-values from PSI-BLAST should be used with caution. We also show that by employing an enzyme family-specific sequence identity threshold above which 100% functional conservation is required, functional inference of unknown sequences can be accurately accomplished. However, this comes at a cost: those true positive sequences below this threshold cannot be uniquely identified.
Collapse
Affiliation(s)
- Weidong Tian
- Center of Excellence in Bioinformatics, University at Buffalo, The State University of New York, 901 Washington Street, Buffalo, NY 14203, USA
| | | |
Collapse
|
33
|
Aloy P, Ceulemans H, Stark A, Russell RB. The relationship between sequence and interaction divergence in proteins. J Mol Biol 2003; 332:989-98. [PMID: 14499603 DOI: 10.1016/j.jmb.2003.07.006] [Citation(s) in RCA: 262] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
There is currently a gap in knowledge between complexes of known three-dimensional structure and those known from other experimental methods such as affinity purifications or the two-hybrid system. This gap can sometimes be bridged by methods that extrapolate interaction information from one complex structure to homologues of the interacting proteins. To do this, it is important to know if and when proteins of the same type (e.g. family, superfamily or fold) interact in the same way. Here, we study interactions of known structure to address this question. We found all instances within the structural classification of proteins database of the same domain pairs interacting in different complexes, and then compared them with a simple measure (interaction RMSD). When plotted against sequence similarity we find that close homologues (30-40% or higher sequence identity) almost invariably interact the same way. Conversely, similarity only in fold (i.e. without additional evidence for a common ancestor) is only rarely associated with a similarity in interaction. The results suggest that there is a twilight zone of sequence similarity where it is not possible to say whether or not domains will interact similarly. We also discuss the rare instances of fold similarities interacting the same way, and those where obviously homologous proteins interact differently.
Collapse
Affiliation(s)
- Patrick Aloy
- Structural and Computational Biology Programme, EMBL Heidelberg, Meyerhofstrasse 1, 69117, Heidelberg, Germany
| | | | | | | |
Collapse
|
34
|
Abstract
Compensated frameshift mutation is a modification of the reading frame of a gene that takes place by way of various molecular events. It appears to be a widespread event that is only observed when homologous amino acid and nucleodotide sequences are compared. To identify these mutation events, the sequence analysis rationale was based on the search for short regions that would have much lower degrees of conservation in protein, but not in DNA, in well-conserved beta-glucosidase families. We have restricted our study to a seed set of sequences of O-glycoside hydrolase families 1 and 3. We found compensated frameshift mutation in the family of 1 beta-glucosidases for the Erwinia herbicola, Cellulomonas fimi, and (non-cyanogenic) Trifolium repens gene sequences, and in the family of 3 beta-glucosidases for the Clostridium thermocellum and Clostridium stercorarium gene sequences. By computational treatment, the observed mutation events in the gene frameshifting sub-sequence have been neutralised. Each nucleotide insertion must be eliminated and each nucleotide deletion must be substituted by the symbol N (any nucleotide). When the frameshifting fragments of the amino acid sequences were substituted by the computationally neutralised subsequences, the beta-glucosidase alignments were improved. We also discuss the structural implications of the compensated frameshift mutations events.
Collapse
Affiliation(s)
- Antonio Rojas
- Evolutionary Genomics Group, Department of Biochemistry and Biotechnology, Rovira i Virgili University, Pl. Imperial Tàrraco, 1. E-43005, Catalonia, Tarragona, Spain
| | | | | | | | | |
Collapse
|
35
|
Neshich G, Togawa RC, Mancini AL, Kuser PR, Yamagishi MEB, Pappas G, Torres WV, Fonseca e Campos T, Ferreira LL, Luna FM, Oliveira AG, Miura RT, Inoue MK, Horita LG, de Souza DF, Dominiquini F, Alvaro A, Lima CS, Ogawa FO, Gomes GB, Palandrani JF, dos Santos GF, de Freitas EM, Mattiuz AR, Costa IC, de Almeida CL, Souza S, Baudet C, Higa RH. STING Millennium: A web-based suite of programs for comprehensive and simultaneous analysis of protein structure and sequence. Nucleic Acids Res 2003; 31:3386-92. [PMID: 12824333 PMCID: PMC168984 DOI: 10.1093/nar/gkg578] [Citation(s) in RCA: 37] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2003] [Revised: 04/02/2003] [Accepted: 04/02/2003] [Indexed: 12/20/2022] Open
Abstract
STING Millennium Suite (SMS) is a new web-based suite of programs and databases providing visualization and a complex analysis of molecular sequence and structure for the data deposited at the Protein Data Bank (PDB). SMS operates with a collection of both publicly available data (PDB, HSSP, Prosite) and its own data (contacts, interface contacts, surface accessibility). Biologists find SMS useful because it provides a variety of algorithms and validated data, wrapped-up in a user friendly web interface. Using SMS it is now possible to analyze sequence to structure relationships, the quality of the structure, nature and volume of atomic contacts of intra and inter chain type, relative conservation of amino acids at the specific sequence position based on multiple sequence alignment, indications of folding essential residue (FER) based on the relationship of the residue conservation to the intra-chain contacts and Calpha-Calpha and Cbeta-Cbeta distance geometry. Specific emphasis in SMS is given to interface forming residues (IFR)-amino acids that define the interactive portion of the protein surfaces. SMS may simultaneously display and analyze previously superimposed structures. PDB updates trigger SMS updates in a synchronized fashion. SMS is freely accessible for public data at http://www.cbi.cnptia.embrapa.br, http://mirrors.rcsb.org/SMS and http://trantor.bioc.columbia.edu/SMS.
Collapse
Affiliation(s)
- Goran Neshich
- Núcleo de Bioinformática Estrutural, Embrapa/Informática Agropecuária, Campinas, Brazil.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
36
|
Jackson DB, Minch E, Munro RE. Bioinformatics. EXS 2003:31-69. [PMID: 12613171 DOI: 10.1007/978-3-0348-7997-2_3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/01/2023]
|
37
|
Armon A, Graur D, Ben-Tal N. ConSurf: an algorithmic tool for the identification of functional regions in proteins by surface mapping of phylogenetic information. J Mol Biol 2001; 307:447-63. [PMID: 11243830 DOI: 10.1006/jmbi.2000.4474] [Citation(s) in RCA: 351] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Experimental approaches for the identification of functionally important regions on the surface of a protein involve mutagenesis, in which exposed residues are replaced one after another while the change in binding to other proteins or changes in activity are recorded. However, practical considerations limit the use of these methods to small-scale studies, precluding a full mapping of all the functionally important residues on the surface of a protein. We present here an alternative approach involving the use of evolutionary data in the form of multiple-sequence alignment for a protein family to identify hot spots and surface patches that are likely to be in contact with other proteins, domains, peptides, DNA, RNA or ligands. The underlying assumption in this approach is that key residues that are important for binding should be conserved throughout evolution, just like residues that are crucial for maintaining the protein fold, i.e. buried residues. A main limitation in the implementation of this approach is that the sequence space of a protein family may be unevenly sampled, e.g. mammals may be overly represented. Thus, a seemingly conserved position in the alignment may reflect a taxonomically uneven sampling, rather than being indicative of structural or functional importance. To avoid this problem, we present here a novel methodology based on evolutionary relations among proteins as revealed by inferred phylogenetic trees, and demonstrate its capabilities for mapping binding sites in SH2 and PTB signaling domains. A computer program that implements these ideas is available freely at: http://ashtoret.tau.ac.il/ approximately rony
Collapse
Affiliation(s)
- A Armon
- Department of Biochemistry, George S. Wise Faculty of Life Sciences, Tel Aviv University, Ramat Aviv, Israel
| | | | | |
Collapse
|
38
|
Bystroff C, Baker D. Prediction of local structure in proteins using a library of sequence-structure motifs. J Mol Biol 1998; 281:565-77. [PMID: 9698570 DOI: 10.1006/jmbi.1998.1943] [Citation(s) in RCA: 246] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
We describe a new method for local protein structure prediction based on a library of short sequence pattern that correlate strongly with protein three-dimensional structural elements. The library was generated using an automated method for finding correlations between protein sequence and local structure, and contains most previously described local sequence-structure correlations as well as new relationships, including a diverging type-II beta-turn, a frayed helix, and a proline-terminated helix. The query sequence is scanned for segments 7 to 19 residues in length that strongly match one of the 82 patterns in the library. Matching segments are assigned the three-dimensional structure characteristic of the corresponding sequence pattern, and backbone torsion angles for the entire query sequence are then predicted by piecing together mutually compatible segment predictions. In predictions of local structure in a test set of 55 proteins, about 50% of all residues, and 76% of residues covered by high-confidence predictions, were found in eight-residue segments within 1.4 A of their true structures. The predictions are complementary to traditional secondary structure predictions because they are considerably more specific in turn regions, and may contribute to ab initio tertiary structure prediction and fold recognition.
Collapse
Affiliation(s)
- C Bystroff
- Department of Biochemistry, University of Washington, Seattle, WA, 98195-7350, USA.
| | | |
Collapse
|
39
|
Kolaskar AS, Joshi RR. Molecular dynamics simulation of a 13-mer duplex DNA: a PvuII substrate. J Biomol Struct Dyn 1998; 15:1155-65. [PMID: 9669560 DOI: 10.1080/07391102.1998.10509009] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
Parallel version of AMBER 4.1 was ported and optimised on the Indian parallel supercomputer PARAM OpenFrame built around Sun Ultra Sparc processors. This version of AMBER program was then used to carry out molecular dynamics (MD) simulations on 5'-TGACCAGCTGGTC-3', a substrate for PvuII enzyme. MD simulations in water are carried out under following conditions: (i) unconstrained at 300 K (230 ps); (ii) unconstrained at 283 K (500 ps); (iii) Watson-Crick basepair constrained at 283 K (1 ns); and (iv) Watson-Crick basepair constrained with ions at 283 K (1.2 ns). In all these simulation studies, the molecule was observed to be bending and maximum distortions in the double helix around was seen around the G7:C7' basepair, which is the phosphodiester bond that is cleaved by PvuII. Analysis of MD simulation with ions carried out for 1.2 ns also pointed out that the conformation of double helix alternates between a conformation close to B-form and close to A-form. It is argued that a bent non-standard conformation is recognised by the PvuII enzyme. The maximum bend occurs at the G7:C7' region, weakening the phosphodiester bond and allows His48 to get placed in such a fashion to permit the scission through a general base mechanism. The bending and distortion observed is a property of the sequence which acts as a substrate for PvuII enzyme. This is confirmed by carrying out MD studies on the Dickerson's sequence d(CGCGAATTCGCG)2 as a reference molecule, which practically does not bend or get deformed.
Collapse
Affiliation(s)
- A S Kolaskar
- Bioinformatics Centre, University of Pune, Ganeshkind, India.
| | | |
Collapse
|
40
|
Abstract
The interconnected nature of interactions in protein structures appears to be the major hurdle in preventing the construction of accurate comparative models. We present an algorithm that uses graph theory to handle this problem. Each possible conformation of a residue in an amino acid sequence is represented using the notion of a node in a graph. Each node is given a weight based on the degree of the interaction between its side-chain atoms and the local main-chain atoms. Edges are then drawn between pairs of residue conformations/nodes that are consistent with each other (i.e. clash-free and satisfying geometrical constraints). The edges are weighted based on the interactions between the atoms of the two nodes. Once the entire graph is constructed, all the maximal sets of completely connected nodes (cliques) are found using a clique-finding algorithm. The cliques with the best weights represent the optimal combinations of the various main-chain and side-chain possibilities, taking the respective environments into account. The algorithm is used in a comparative modeling scenario to build side-chains, regions of main chain, and mix and match between different homologs in a context-sensitive manner. The predictive power of this method is assessed by applying it to cases where the experimental structure is not known in advance.
Collapse
Affiliation(s)
- R Samudrala
- Center for Advanced Research in Biotechnology, University of Maryland Biotechnology Institute, Rockville 20850, USA
| | | |
Collapse
|
41
|
García-Vallvé S, Rojas A, Palau J, Romeu A. Circular permutants in beta-glucosidases (family 3) within a predicted double-domain topology that includes a (beta/alpha)8-barrel. Proteins 1998; 31:214-23. [PMID: 9593194 DOI: 10.1002/(sici)1097-0134(19980501)31:2<214::aid-prot10>3.0.co;2-j] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
By predicting the general secondary structure for beta-glucosidases (family 3), in conjunction with existing knowledge of the circular permutants present in B. fibrisolvens and R. albus, we were able to find the canonical elements of the secondary structure. The way these elements are linked suggests that there is a double-domain topology made up of a (beta/alpha)8-barrel domain and a "mainly all-beta" domain. A number of already known conserved motifs are located within (or near) the C-terminal part of the putative parallel beta-strands of the (bet/alpha)8-barrel, which is consistent with what is known about the location of catalytical sites for enzymes that have this domain topology. Within the circular permutants, two beta/alpha units are located at the N-terminal part of the molecule, whereas the other six beta/alpha units are located at the C-terminal end. In this way, the circular permutants can be seen to have a putative discontinuous double-domain topology.
Collapse
Affiliation(s)
- S García-Vallvé
- University Rovira i Virgili, Department of Biochemistry and Biotechnology, Tarragona, Spain
| | | | | | | |
Collapse
|
42
|
Benton D. Integrated access to genomic and other bioinformation: an essential ingredient of the drug discovery process. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 1998; 8:121-155. [PMID: 9522473 DOI: 10.1080/10629369808039138] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2023]
Abstract
Due to the high rate of data production and the need of researchers to have rapid access to new data, public databases have become the major medium through which genome mapping and sequencing data as well as macromolecular structural data are published. There are now more than 250 databases of biomolecular, structural, genetic, or phenotypic data, many of which are doubling in size annually. These databases, many of which were created and are maintained by experimentalists for their own research use, provide valuable collections of organized, validated data. However, the very number and diversity of databases now make efficient data resource discovery as important as effective data resource use. Existing autonomous biological databases contain related data which are more valuable when interconnected than when isolated. Political and scientific realities dictate that these databases will be built by different teams, in different locations, for different purposes, and using different data models and supporting DBMSs. As a consequence, connecting the related data they contain is not straightforward. Experience with existing biological databases indicates that it is possible to form useful queries across these databases, but that doing so usually requires expertise in the semantic structure of each source database. Advancing to the next level of integration among biological information resources poses significant technical and sociological challenges.
Collapse
Affiliation(s)
- D Benton
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892-6050, USA
| |
Collapse
|
43
|
Dandekar T, König R. Computational methods for the prediction of protein folds. BIOCHIMICA ET BIOPHYSICA ACTA 1997; 1343:1-15. [PMID: 9428653 DOI: 10.1016/s0167-4838(97)00132-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
|
44
|
|
45
|
Abstract
Databases of multiple sequence alignments are a valuable aid to protein sequence classification and analysis. One of the main challenges when constructing such a database is to simultaneously satisfy the conflicting demands of completeness on the one hand and quality of alignment and domain definitions on the other. The latter properties are best dealt with by manual approaches, whereas completeness in practice is only amenable to automatic methods. Herein we present a database based on hidden Markov model profiles (HMMs), which combines high quality and completeness. Our database, Pfam, consists of parts A and B. Pfam-A is curated and contains well-characterized protein domain families with high quality alignments, which are maintained by using manually checked seed alignments and HMMs to find and align all members. Pfam-B contains sequence families that were generated automatically by applying the Domainer algorithm to cluster and align the remaining protein sequences after removal of Pfam-A domains. By using Pfam, a large number of previously unannotated proteins from the Caenorhabditis elegans genome project were classified. We have also identified many novel family memberships in known proteins, including new kazal, Fibronectin type III, and response regulator receiver domains. Pfam-A families have permanent accession numbers and form a library of HMMs available for searching and automatic annotation of new protein sequences.
Collapse
Affiliation(s)
- E L Sonnhammer
- Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
| | | | | |
Collapse
|
46
|
Karplus K, Sjölander K, Barrett C, Cline M, Haussler D, Hughey R, Holm L, Sander C. Predicting protein structure using hidden Markov models. Proteins 1997. [DOI: 10.1002/(sici)1097-0134(1997)1+<134::aid-prot18>3.0.co;2-p] [Citation(s) in RCA: 77] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
47
|
Abstract
A protein sequence folds into a unique three-dimensional protein structure. Different sequences, though, can fold into similar structures. How stable is a protein structure with respect to sequence changes? What percentage of the sequence is 'anchor' residues, that is, residues crucial for protein structure and function? Here, answers to these questions are pursued by analyzing large numbers of structurally homologous protein pairs. Most pairs of similar structures have sequence identity as low as expected from randomly related sequences (8-9%). On average, only 3-4% of all residues are 'anchor' residues. The symmetric shape of the distribution at low sequence identity suggests that for most structures, four billion years of evolution was sufficient to reach an equilibrium. The mean identities for convergent (different ancestor) and divergent (same ancestor) evolution of proteins to similar structures are quite close and hence, in most cases, it is difficult to distinguish between the two effects. In particular, low levels of sequence identity appear not to be indicative of convergent evolution.
Collapse
Affiliation(s)
- B Rost
- EMBL, Heidelberg, Germany.
| |
Collapse
|
48
|
Abstract
The materials of bioinformatics are biological data, and its methods are derived from a wide variety of computational techniques. Recent years have seen an explosive growth in biological data, and the development of novel computational methods. These methods have become essential to research progress in structural biology, genomics, structure-based drug design and molecular evolution. The development and maintenance of a robust infrastructure of biological data is of equal importance if biotechnology is to take maximum advantage of research advances in a wide variety of fields. While bioinformatics has already made important contributions, it faces significant challenges as it matures.
Collapse
Affiliation(s)
- D Benton
- National Center for Human Genome Research, National Institutes of Health, Bethesda, MD 20892-6050, USA. benton@extra,nchgr.nih.gov
| |
Collapse
|
49
|
Abstract
Every sequence comparison method requires a set of scores. For aligning protein sequences, substitution scores are based on models of amino acid conservation and properties, and matrices of these scores have substantially improved in recent years. Position-specific scoring matrices provide representations of sequence families that are capable of detecting subtle similarities. Comprehensive evaluations can effectively guide the choice of scores for sequence alignment and searching applications, including those that aid in the prediction of protein structures.
Collapse
Affiliation(s)
- S Henikoff
- Howard Hughes Medical Institute, Basic Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98104, USA.
| |
Collapse
|