1
|
Ferrer Florensa A, Almagro Armenteros J, Nielsen H, Aarestrup F, Clausen P. SpanSeq: similarity-based sequence data splitting method for improved development and assessment of deep learning projects. NAR Genom Bioinform 2024; 6:lqae106. [PMID: 39157582 PMCID: PMC11327874 DOI: 10.1093/nargab/lqae106] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2024] [Revised: 07/26/2024] [Accepted: 08/05/2024] [Indexed: 08/20/2024] Open
Abstract
The use of deep learning models in computational biology has increased massively in recent years, and it is expected to continue with the current advances in the fields such as Natural Language Processing. These models, although able to draw complex relations between input and target, are also inclined to learn noisy deviations from the pool of data used during their development. In order to assess their performance on unseen data (their capacity to generalize), it is common to split the available data randomly into development (train/validation) and test sets. This procedure, although standard, has been shown to produce dubious assessments of generalization due to the existing similarity between samples in the databases used. In this work, we present SpanSeq, a database partition method for machine learning that can scale to most biological sequences (genes, proteins and genomes) in order to avoid data leakage between sets. We also explore the effect of not restraining similarity between sets by reproducing the development of two state-of-the-art models on bioinformatics, not only confirming the consequences of randomly splitting databases on the model assessment, but expanding those repercussions to the model development. SpanSeq is available at https://github.com/genomicepidemiology/SpanSeq.
Collapse
Affiliation(s)
- Alfred Ferrer Florensa
- Research Group for Genomic Epidemiology, DTU National Food Institute, Technical University of Denmark, Anker Engelunds Vej 1, 2800 Kongens Lyngby, Denmark
| | - Jose Juan Almagro Armenteros
- Informatics and Predictive Sciences Research, Bristol Myers Squibb Company, Calle Isaac Newton 4, 41092 Sevilla, Spain
| | - Henrik Nielsen
- Section for Bioinformatics, Department of Health Technology, Technical University of Denmark, Anker Engelunds Vej 1, 2800 Kongens Lyngby, Denmark
| | - Frank Møller Aarestrup
- Research Group for Genomic Epidemiology, DTU National Food Institute, Technical University of Denmark, Anker Engelunds Vej 1, 2800 Kongens Lyngby, Denmark
| | - Philip Thomas Lanken Conradsen Clausen
- Research Group for Genomic Epidemiology, DTU National Food Institute, Technical University of Denmark, Anker Engelunds Vej 1, 2800 Kongens Lyngby, Denmark
| |
Collapse
|
2
|
Teufel F, Gíslason MH, Almagro Armenteros JJ, Johansen A, Winther O, Nielsen H. GraphPart: homology partitioning for biological sequence analysis. NAR Genom Bioinform 2023; 5:lqad088. [PMID: 37850036 PMCID: PMC10578201 DOI: 10.1093/nargab/lqad088] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Revised: 08/25/2023] [Accepted: 09/19/2023] [Indexed: 10/19/2023] Open
Abstract
When splitting biological sequence data for the development and testing of predictive models, it is necessary to avoid too-closely related pairs of sequences ending up in different partitions. If this is ignored, performance of prediction methods will tend to be overestimated. Several algorithms have been proposed for homology reduction, where sequences are removed until no too-closely related pairs remain. We present GraphPart, an algorithm for homology partitioning that divides the data such that closely related sequences always end up in the same partition, while keeping as many sequences as possible in the dataset. Evaluation of GraphPart on Protein, DNA and RNA datasets shows that it is capable of retaining a larger number of sequences per dataset, while providing homology separation on a par with reduction approaches.
Collapse
Affiliation(s)
- Felix Teufel
- Department of Biology, University of Copenhagen, 2200 Copenhagen, Denmark
- Digital Science & Innovation, Novo Nordisk A/S, 2760 Måløv, Denmark
| | - Magnús Halldór Gíslason
- Department of Genomic Medicine, Copenhagen University Hospital/Rigshospitalet, 2100 Copenhagen, Denmark
| | - José Juan Almagro Armenteros
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
- Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA
| | | | - Ole Winther
- Department of Biology, University of Copenhagen, 2200 Copenhagen, Denmark
- Department of Genomic Medicine, Copenhagen University Hospital/Rigshospitalet, 2100 Copenhagen, Denmark
- Department of Applied Mathematics and Computer Science, Technical University of Denmark, 2800 Kgs. Lyngby, Denmark
| | - Henrik Nielsen
- Department of Health Technology, Technical University of Denmark, 2800 Kgs. Lyngby, Denmark
| |
Collapse
|
3
|
Plonski AP, Reed SM. Assessing protein homology models with docking reproducibility. J Mol Graph Model 2023; 121:108430. [PMID: 36812741 DOI: 10.1016/j.jmgm.2023.108430] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2022] [Revised: 02/08/2023] [Accepted: 02/10/2023] [Indexed: 02/12/2023]
Abstract
Results of the recent Critical Assessment of Protein Structure (CASP) competitions demonstrate that protein backbones can be predicted with very high accuracy. In particular, the artificial intelligence methods of AlphaFold 2 from DeepMind were able to produce structures that were similar enough to experimental structures that many described the problem of protein prediction solved. However, for such structures to be used for drug docking studies requires precision in the placement of side chain atoms as well. Here we built a library of 1334 small molecules and examined how reproducibly they bound to the same site on a protein using QuickVina-W, a branch of the program Autodock that is optimized for blind searches. We discovered that the higher the backbone quality of the homology model the greater the similarity between the small molecule docking to the experimental and modeled structures. Furthermore, we found that specific subsets of this library were particularly useful for identifying small differences between the best of the best modeled structures. Specifically, when the number of rotatable bonds in the small molecule increased, differences in binding sites became more apparent.
Collapse
|
4
|
Rahman J, Newton MAH, Islam MKB, Sattar A. Enhancing protein inter-residue real distance prediction by scrutinising deep learning models. Sci Rep 2022; 12:787. [PMID: 35039537 PMCID: PMC8764118 DOI: 10.1038/s41598-021-04441-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2021] [Accepted: 12/17/2021] [Indexed: 12/29/2022] Open
Abstract
Protein structure prediction (PSP) has achieved significant progress lately via prediction of inter-residue distances using deep learning models and exploitation of the predictions during conformational search. In this context, prediction of large inter-residue distances and also prediction of distances between residues separated largely in the protein sequence remain challenging. To deal with these challenges, state-of-the-art inter-residue distance prediction algorithms have used large sets of coevolutionary and non-coevolutionary features. In this paper, we argue that the more the types of features used, the more the kinds of noises introduced and then the deep learning model has to overcome the noises to improve the accuracy of the predictions. Also, multiple features capturing similar underlying characteristics might not necessarily have significantly better cumulative effect. So we scrutinise the feature space to reduce the types of features to be used, but at the same time, we strive to improve the prediction accuracy. Consequently, for inter-residue real distance prediction, in this paper, we propose a deep learning model named scrutinised distance predictor (SDP), which uses only 2 coevolutionary and 3 non-coevolutionary features. On several sets of benchmark proteins, our proposed SDP method improves mean Local Distance Different Test (LDDT) scores at least by 10% over existing state-of-the-art methods. The SDP program along with its data is available from the website https://gitlab.com/mahnewton/sdp .
Collapse
Affiliation(s)
- Julia Rahman
- School of Information and Communication Technology, Griffith University, Southport, Australia.
| | - M A Hakim Newton
- Institute of Integrated and Intelligent Systems, Griffith University, Southport, Australia.
| | - Md Khaled Ben Islam
- School of Information and Communication Technology, Griffith University, Southport, Australia
| | - Abdul Sattar
- School of Information and Communication Technology, Griffith University, Southport, Australia
- Institute of Integrated and Intelligent Systems, Griffith University, Southport, Australia
| |
Collapse
|
5
|
Schaap-Johansen AL, Vujović M, Borch A, Hadrup SR, Marcatili P. T Cell Epitope Prediction and Its Application to Immunotherapy. Front Immunol 2021; 12:712488. [PMID: 34603286 PMCID: PMC8479193 DOI: 10.3389/fimmu.2021.712488] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2021] [Accepted: 07/12/2021] [Indexed: 12/13/2022] Open
Abstract
T cells play a crucial role in controlling and driving the immune response with their ability to discriminate peptides derived from healthy as well as pathogenic proteins. In this review, we focus on the currently available computational tools for epitope prediction, with a particular focus on tools aimed at identifying neoepitopes, i.e. cancer-specific peptides and their potential for use in immunotherapy for cancer treatment. This review will cover how these tools work, what kind of data they use, as well as pros and cons in their respective applications.
Collapse
Affiliation(s)
| | - Milena Vujović
- Department of Health Technology, Technical University of Denmark, Lyngby, Denmark
| | - Annie Borch
- Department of Health Technology, Technical University of Denmark, Lyngby, Denmark
| | - Sine Reker Hadrup
- Department of Health Technology, Technical University of Denmark, Lyngby, Denmark
| | - Paolo Marcatili
- Department of Health Technology, Technical University of Denmark, Lyngby, Denmark
| |
Collapse
|
6
|
Liu Y, Zhou Z, Wang F, Kewes G, Wen S, Burger S, Ebrahimi Wakiani M, Xi P, Yang J, Yang X, Benson O, Jin D. Axial localization and tracking of self-interference nanoparticles by lateral point spread functions. Nat Commun 2021; 12:2019. [PMID: 33795675 PMCID: PMC8016974 DOI: 10.1038/s41467-021-22283-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2020] [Accepted: 02/19/2021] [Indexed: 11/20/2022] Open
Abstract
Sub-diffraction limited localization of fluorescent emitters is a key goal of microscopy imaging. Here, we report that single upconversion nanoparticles, containing multiple emission centres with random orientations, can generate a series of unique, bright and position-sensitive patterns in the spatial domain when placed on top of a mirror. Supported by our numerical simulation, we attribute this effect to the sum of each single emitter’s interference with its own mirror image. As a result, this configuration generates a series of sophisticated far-field point spread functions (PSFs), e.g. in Gaussian, doughnut and archery target shapes, strongly dependent on the phase difference between the emitter and its image. In this way, the axial locations of nanoparticles are transferred into far-field patterns. We demonstrate a real-time distance sensing technology with a localization accuracy of 2.8 nm, according to the atomic force microscope (AFM) characterization values, smaller than 1/350 of the excitation wavelength. Here, the authors show that single upconversion nanoparticles can generate position-sensitive patterns in the spatial domain when placed on a mirror. They attribute this to the single emitter’s interference with its own mirror image and show how this can be used to obtain axial localisation of the particle.
Collapse
Affiliation(s)
- Yongtao Liu
- Institute for Biomedical Materials and Devices (IBMD), Faculty of Science, University of Technology Sydney, Sydney, NSW, 2007, Australia
| | - Zhiguang Zhou
- Institute for Biomedical Materials and Devices (IBMD), Faculty of Science, University of Technology Sydney, Sydney, NSW, 2007, Australia
| | - Fan Wang
- Institute for Biomedical Materials and Devices (IBMD), Faculty of Science, University of Technology Sydney, Sydney, NSW, 2007, Australia. .,School of Electrical and Data Engineering, Faculty of Engineering and Information Technology, University of Technology Sydney, Ultimo, NSW, 2007, Australia.
| | - Günter Kewes
- AG Nanooptik, Institut für Physik & IRIS Adlershof, Humboldt Universität zu Berlin, Newtonstraße 15, 12489, Berlin, Germany
| | - Shihui Wen
- Institute for Biomedical Materials and Devices (IBMD), Faculty of Science, University of Technology Sydney, Sydney, NSW, 2007, Australia
| | - Sven Burger
- JCMwave GmbH, Bolivarallee 22, 14050, Berlin, Germany.,Zuse Institute Berlin, Takustraße 7, 14195, Berlin, Germany
| | - Majid Ebrahimi Wakiani
- Institute for Biomedical Materials and Devices (IBMD), Faculty of Science, University of Technology Sydney, Sydney, NSW, 2007, Australia.,School of Biomedical Engineering, Faculty of Science, University of Technology, Sydney, NSW, 2007, Australia
| | - Peng Xi
- Institute for Biomedical Materials and Devices (IBMD), Faculty of Science, University of Technology Sydney, Sydney, NSW, 2007, Australia.,Department of Biomedical Engineering, College of Engineering, Peking University, Beijing, 100871, China.,UTS-SUStech Joint Research Centre for Biomedical Materials & Devices, Department of Biomedical Engineering, Southern University of Science and Technology, Shenzhen, Guangdong, 518055, PR China
| | - Jiong Yang
- Institute for Biomedical Materials and Devices (IBMD), Faculty of Science, University of Technology Sydney, Sydney, NSW, 2007, Australia.,School of Chemical Engineering, University of New South Wales (UNSW), Sydney Campus, Sydney, NSW, 2052, Australia
| | - Xusan Yang
- Department of Biomedical Engineering, College of Engineering, Peking University, Beijing, 100871, China.,School of Applied and Engineering Physics, Cornell University, Ithaca, NY, 14853, USA
| | - Oliver Benson
- AG Nanooptik, Institut für Physik & IRIS Adlershof, Humboldt Universität zu Berlin, Newtonstraße 15, 12489, Berlin, Germany.
| | - Dayong Jin
- Institute for Biomedical Materials and Devices (IBMD), Faculty of Science, University of Technology Sydney, Sydney, NSW, 2007, Australia. .,UTS-SUStech Joint Research Centre for Biomedical Materials & Devices, Department of Biomedical Engineering, Southern University of Science and Technology, Shenzhen, Guangdong, 518055, PR China.
| |
Collapse
|
7
|
Lemke T, Berg A, Jain A, Peter C. EncoderMap(II): Visualizing Important Molecular Motions with Improved Generation of Protein Conformations. J Chem Inf Model 2019; 59:4550-4560. [PMID: 31647645 DOI: 10.1021/acs.jcim.9b00675] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
Dimensionality reduction can be used to project high-dimensional molecular data into a simplified, low-dimensional map. One feature of our recently introduced dimensionality reduction technique EncoderMap, which relies on the combination of an autoencoder with multidimensional scaling, is its ability to do the reverse. It is able to generate conformations for any selected points in the low-dimensional map. This transfers the simplified, low-dimensional map back into the high-dimensional conformational space. Although the output is again high-dimensional, certain aspects of the simplification are preserved. The generated conformations only mirror the most dominant conformational differences that determine the positions of conformational states in the low-dimensional map. This allows depicting such differences and-in consequence-visualizing molecular motions and gives a unique perspective on high-dimensional conformational data. In our previous work, protein conformations described in backbone dihedral angle space were used as the input for EncoderMap, and conformations were also generated in this space. For large proteins, however, the generation of conformations is inaccurate with this approach due to the local character of backbone dihedral angles. Here, we present an improved variant of EncoderMap which is able to generate large protein conformations that are accurate in short-range and long-range orders. This is achieved by differentiable reconstruction of Cartesian coordinates from the generated dihedrals, which allows adding a contribution to the cost function that monitors the accuracy of all pairwise distances between the Cα-atoms of the generated conformations. The improved capabilities to generate conformations of large, even multidomain, proteins are demonstrated for two examples: diubiquitin and a part of the Ssa1 Hsp70 yeast chaperone. We show that the improved variant of EncoderMap can nicely visualize motions of protein domains relative to each other but is also able to highlight important conformational changes within the individual domains.
Collapse
Affiliation(s)
- Tobias Lemke
- Theoretical Chemistry , University of Konstanz , 78547 Konstanz , Baden-Württemberg , Germany
| | - Andrej Berg
- Theoretical Chemistry , University of Konstanz , 78547 Konstanz , Baden-Württemberg , Germany
| | - Alok Jain
- Theoretical Chemistry , University of Konstanz , 78547 Konstanz , Baden-Württemberg , Germany.,Department of Biotechnology , National Institute of Pharmaceutical Education and Research Ahmedabad , Gandhinagar , Gujarat 382355 , India
| | - Christine Peter
- Theoretical Chemistry , University of Konstanz , 78547 Konstanz , Baden-Württemberg , Germany
| |
Collapse
|
8
|
Syomin BV, Ilyin YV. Virus-Like Particles as an Instrument of Vaccine Production. Mol Biol 2019; 53:323-334. [PMID: 32214478 PMCID: PMC7088979 DOI: 10.1134/s0026893319030154] [Citation(s) in RCA: 52] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2018] [Revised: 12/19/2018] [Accepted: 12/24/2018] [Indexed: 12/13/2022]
Abstract
The paper discusses the techniques which are currently implemented for vaccine production based on virus-like particles (VLPs). The factors which determine the characteristics of VLP monomers assembly are provided in detail. Analysis of the literature demonstrates that the development of the techniques of VLP production and immobilization of target antigens on their surface have led to the development of universal platforms which make it possible for virtually any known antigen to be exposed on the particle surface in a highly concentrated form. As a result, the focus of attention has shifted from the approaches to VLP production to the development of a precise interface between the organism's immune system and the peptides inducing a strong immune response to pathogens or the organism's own pathological cells. Immunome-specified methods for vaccine design and the prospects of immunoprophylaxis are discussed. Certain examples of vaccines against viral diseases and cancers are considered.
Collapse
Affiliation(s)
- B. V. Syomin
- Institute for Statistical Studies and Economics of Knowledge (ISSEK),
National Research University Higher School of Economics, 101000 Moscow, Russia
| | - Y. V. Ilyin
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, 119991 Moscow, Russia
| |
Collapse
|
9
|
Abstract
Since the 1980s, deep learning and biomedical data have been coevolving and feeding each other. The breadth, complexity, and rapidly expanding size of biomedical data have stimulated the development of novel deep learning methods, and application of these methods to biomedical data have led to scientific discoveries and practical solutions. This overview provides technical and historical pointers to the field, and surveys current applications of deep learning to biomedical data organized around five subareas, roughly of increasing spatial scale: chemoinformatics, proteomics, genomics and transcriptomics, biomedical imaging, and health care. The black box problem of deep learning methods is also briefly discussed.
Collapse
Affiliation(s)
- Pierre Baldi
- Department of Computer Science, Institute for Genomics and Bioinformatics, and Center for Machine Learning and Intelligent Systems, University of California, Irvine, California 92697, USA
| |
Collapse
|
10
|
Mwangi HN, Wagacha P, Mathenge P, Sijenyi F, Mulaa F. Structure of the 40S ribosomal subunit of Plasmodium falciparum by homology and de novo modeling. Acta Pharm Sin B 2017; 7:97-105. [PMID: 28119814 PMCID: PMC5237758 DOI: 10.1016/j.apsb.2016.10.003] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2016] [Revised: 06/13/2016] [Accepted: 09/05/2016] [Indexed: 01/21/2023] Open
Abstract
Generation of three dimensional structures of macromolecules using in silico structural modeling technologies such as homology and de novo modeling has improved dramatically and increased the speed by which tertiary structures of organisms can be generated. This is especially the case if a homologous crystal structure is already available. High-resolution structures can be rapidly created using only their sequence information as input, a process that has the potential to increase the speed of scientific discovery. In this study, homology modeling and structure prediction tools such as RNA123 and SWISS–MODEL were used to generate the 40S ribosomal subunit from Plasmodium falciparum. This structure was modeled using the published crystal structure from Tetrahymena thermophila, a homologous eukaryote. In the absence of the Plasmodium falciparum 40S ribosomal crystal structure, the model accurately depicts a global topology, secondary and tertiary connections, and gives an overall root mean square deviation (RMSD) value of 3.9 Å relative to the template׳s crystal structure. Deviations are somewhat larger in areas with no homology between the templates. These results demonstrate that this approach has the power to identify motifs of interest in RNA and identify potential drug targets for macromolecules whose crystal structures are unknown. The results also show the utility of RNA homology modeling software for structure determination and lay the groundwork for applying this approach to larger and more complex eukaryotic ribosomes and other RNA-protein complexes. Structures generated from this study can be used in in silico screening experiments and lead to the determination of structures for targets/hit complexes.
Collapse
|
11
|
Kukic P, Mirabello C, Tradigo G, Walsh I, Veltri P, Pollastri G. Toward an accurate prediction of inter-residue distances in proteins using 2D recursive neural networks. BMC Bioinformatics 2014; 15:6. [PMID: 24410833 PMCID: PMC3893389 DOI: 10.1186/1471-2105-15-6] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2013] [Accepted: 12/20/2013] [Indexed: 11/21/2022] Open
Abstract
Background Protein inter-residue contact maps provide a translation and rotation invariant topological representation of a protein. They can be used as an intermediary step in protein structure predictions. However, the prediction of contact maps represents an unbalanced problem as far fewer examples of contacts than non-contacts exist in a protein structure. In this study we explore the possibility of completely eliminating the unbalanced nature of the contact map prediction problem by predicting real-value distances between residues. Predicting full inter-residue distance maps and applying them in protein structure predictions has been relatively unexplored in the past. Results We initially demonstrate that the use of native-like distance maps is able to reproduce 3D structures almost identical to the targets, giving an average RMSD of 0.5Å. In addition, the corrupted physical maps with an introduced random error of ±6Å are able to reconstruct the targets within an average RMSD of 2Å. After demonstrating the reconstruction potential of distance maps, we develop two classes of predictors using two-dimensional recursive neural networks: an ab initio predictor that relies only on the protein sequence and evolutionary information, and a template-based predictor in which additional structural homology information is provided. We find that the ab initio predictor is able to reproduce distances with an RMSD of 6Å, regardless of the evolutionary content provided. Furthermore, we show that the template-based predictor exploits both sequence and structure information even in cases of dubious homology and outperforms the best template hit with a clear margin of up to 3.7Å. Lastly, we demonstrate the ability of the two predictors to reconstruct the CASP9 targets shorter than 200 residues producing the results similar to the state of the machine learning art approach implemented in the Distill server. Conclusions The methodology presented here, if complemented by more complex reconstruction protocols, can represent a possible path to improve machine learning algorithms for 3D protein structure prediction. Moreover, it can be used as an intermediary step in protein structure predictions either on its own or complemented by NMR restraints.
Collapse
Affiliation(s)
- Predrag Kukic
- School of Computer Science and Informatics, Complex and Adaptive Systems Laboratory, University College Dublin, Belfield, Dublin 4, Ireland.
| | | | | | | | | | | |
Collapse
|
12
|
Ding W, Xie J, Dai D, Zhang H, Xie H, Zhang W. CNNcon: improved protein contact maps prediction using cascaded neural networks. PLoS One 2013; 8:e61533. [PMID: 23626696 PMCID: PMC3634008 DOI: 10.1371/journal.pone.0061533] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2012] [Accepted: 03/11/2013] [Indexed: 11/18/2022] Open
Abstract
BACKGROUNDS Despite continuing progress in X-ray crystallography and high-field NMR spectroscopy for determination of three-dimensional protein structures, the number of unsolved and newly discovered sequences grows much faster than that of determined structures. Protein modeling methods can possibly bridge this huge sequence-structure gap with the development of computational science. A grand challenging problem is to predict three-dimensional protein structure from its primary structure (residues sequence) alone. However, predicting residue contact maps is a crucial and promising intermediate step towards final three-dimensional structure prediction. Better predictions of local and non-local contacts between residues can transform protein sequence alignment to structure alignment, which can finally improve template based three-dimensional protein structure predictors greatly. METHODS CNNcon, an improved multiple neural networks based contact map predictor using six sub-networks and one final cascade-network, was developed in this paper. Both the sub-networks and the final cascade-network were trained and tested with their corresponding data sets. While for testing, the target protein was first coded and then input to its corresponding sub-networks for prediction. After that, the intermediate results were input to the cascade-network to finish the final prediction. RESULTS The CNNcon can accurately predict 58.86% in average of contacts at a distance cutoff of 8 Å for proteins with lengths ranging from 51 to 450. The comparison results show that the present method performs better than the compared state-of-the-art predictors. Particularly, the prediction accuracy keeps steady with the increase of protein sequence length. It indicates that the CNNcon overcomes the thin density problem, with which other current predictors have trouble. This advantage makes the method valuable to the prediction of long length proteins. As a result, the effective prediction of long length proteins could be possible by the CNNcon.
Collapse
Affiliation(s)
- Wang Ding
- School of Computer Engineering and Science, Shanghai University, Shanghai, People’s Republic of China
| | - Jiang Xie
- School of Computer Engineering and Science, Shanghai University, Shanghai, People’s Republic of China
- Institute of Systems Biology, Shanghai University, Shanghai, People’s Republic of China
- Department of Mathematics, University of California Irvine, Irvine, California, United States of America
| | - Dongbo Dai
- School of Computer Engineering and Science, Shanghai University, Shanghai, People’s Republic of China
| | - Huiran Zhang
- School of Computer Engineering and Science, Shanghai University, Shanghai, People’s Republic of China
| | - Hao Xie
- College of Stomatology, Wuhan University, Wuhan, People’s Republic of China
| | - Wu Zhang
- School of Computer Engineering and Science, Shanghai University, Shanghai, People’s Republic of China
- Institute of Systems Biology, Shanghai University, Shanghai, People’s Republic of China
- * E-mail:
| |
Collapse
|
13
|
Uncovering the molecular machinery of the human spindle--an integration of wet and dry systems biology. PLoS One 2012; 7:e31813. [PMID: 22427808 PMCID: PMC3302876 DOI: 10.1371/journal.pone.0031813] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2011] [Accepted: 01/18/2012] [Indexed: 11/19/2022] Open
Abstract
The mitotic spindle is an essential molecular machine involved in cell division, whose composition has been studied extensively by detailed cellular biology, high-throughput proteomics, and RNA interference experiments. However, because of its dynamic organization and complex regulation it is difficult to obtain a complete description of its molecular composition. We have implemented an integrated computational approach to characterize novel human spindle components and have analysed in detail the individual candidates predicted to be spindle proteins, as well as the network of predicted relations connecting known and putative spindle proteins. The subsequent experimental validation of a number of predicted novel proteins confirmed not only their association with the spindle apparatus but also their role in mitosis. We found that 75% of our tested proteins are localizing to the spindle apparatus compared to a success rate of 35% when expert knowledge alone was used. We compare our results to the previously published MitoCheck study and see that our approach does validate some findings by this consortium. Further, we predict so-called "hidden spindle hub", proteins whose network of interactions is still poorly characterised by experimental means and which are thought to influence the functionality of the mitotic spindle on a large scale. Our analyses suggest that we are still far from knowing the complete repertoire of functionally important components of the human spindle network. Combining integrated bio-computational approaches and single gene experimental follow-ups could be key to exploring the still hidden regions of the human spindle system.
Collapse
|
14
|
Kishimura H. Enzymatic properties of starfish phospholipase A2 and its application. ADVANCES IN FOOD AND NUTRITION RESEARCH 2012; 65:437-456. [PMID: 22361205 DOI: 10.1016/b978-0-12-416003-3.00029-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Industrial phospholipase A2 (PLA2) mainly produced from porcine pancreas is used for production of lysolecithin which is well known as an excellent natural emulsifier for food, cosmetic, and pharmaceutical industries. Since the outbreak of bovine spongiform encephalopathy (BSE) or religious tradition, it is hoped that the new sources of PLA2, as well as other enzymes and proteins, will be developed instead of mammal. From these backgrounds, we studied for PLA2 from marine organisms and found that starfish Asterina pectinifera PLA2 possesses extremely high activity and characteristic polar-group specificity comparing with commercially available PLA2 from porcine pancreas. Therefore, it was suggested that the starfish A. pectinifera would be a potential source of PLA2, and the PLA2 can be utilized as alternative enzyme of mammalian PLA2.
Collapse
|
15
|
Wei Y, Floudas CA. Enhanced Inter-helical Residue Contact Prediction in Transmembrane Proteins. Chem Eng Sci 2011; 66:4356-4369. [PMID: 21892227 PMCID: PMC3164537 DOI: 10.1016/j.ces.2011.04.033] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
In this paper, based on a recent work by McAllister and Floudas who developed a mathematical optimization model to predict the contacts in transmembrane alpha-helical proteins from a limited protein data set [1], we have enhanced this method by 1) building a more comprehensive data set for transmembrane alpha-helical proteins and this enhanced data set is then used to construct the probability sets, MIN-1N and MIN-2N, for residue contact prediction, 2) enhancing the mathematical model via modifications of several important physical constraints and 3) applying a new blind contact prediction scheme on different protein sets proposed from analyzing the contact prediction on 65 proteins from Fuchs et al. [2]. The blind contact prediction scheme has been tested on two different membrane protein sets. Firstly it is applied to five carefully selected proteins from the training set. The contact prediction of these five proteins uses probability sets built by excluding the target protein from the training set, and an average accuracy of 56% was obtained. Secondly, it is applied to six independent membrane proteins with complicated topologies, and the prediction accuracies are 73% for 2ZY9A, 21% for 3KCUA, 46% for 2W1PA, 64% for 3CN5A, 77% for 3IXZA and 83% for 3K3FA. The average prediction accuracy for the six proteins is 60.7%. The proposed approach is also compared with a support vector machine method (TMhit [3]) and it is shown that it exhibits better prediction accuracy.
Collapse
Affiliation(s)
- Y. Wei
- Department of Chemical and Biological Engineering, Princeton University, Princeton, NJ 08544-5263, U.S.A
| | - C. A. Floudas
- Department of Chemical and Biological Engineering, Princeton University, Princeton, NJ 08544-5263, U.S.A
| |
Collapse
|
16
|
Altshuler EP, Serebryanaya DV, Katrukha AG. Generation of recombinant antibodies and means for increasing their affinity. BIOCHEMISTRY (MOSCOW) 2011; 75:1584-605. [PMID: 21417996 DOI: 10.1134/s0006297910130067] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Highly specific interaction with foreign molecules is a unique feature of antibodies. Since 1975, when Keller and Milstein proposed the method of hybridoma technology and prepared mouse monoclonal antibodies, many antibodies specific to various antigens have been obtained. Recent development of methods for preparation of recombinant DNA libraries and in silico bioinformatics approaches for protein structure analysis makes possible antibody preparation using gene engineering approaches. The development of gene engineering methods allowed creating recombinant antibodies and improving characteristics of existing antibodies; this significantly extends the applicability of antibodies. By modifying biochemical and immunochemical properties of antibodies by changing their amino acid sequences it is possible to create antibodies with properties optimal for certain tasks. For example, application of recombinant technologies resulted in antibody preparation of high affinity significantly exceeding the initial affinity of natural antibodies. In this review we summarize information about the structure, modes of preparation, and application of recombinant antibodies and their fragments and also consider the main approaches used to increase antibody affinity.
Collapse
Affiliation(s)
- E P Altshuler
- Department of Biochemistry, Faculty of Biology, Lomonosov Moscow State University, Russia
| | | | | |
Collapse
|
17
|
Abstract
Web-based protein structure databases come in a wide variety of types and levels of information content. Those having the most general interest are the various atlases that describe each experimentally determined protein structure and provide useful links, analyses and schematic diagrams relating to its 3D structure and biological function. Also of great interest are the databases that classify 3D structures by their folds as these can reveal evolutionary relationships which may be hard to detect from sequence comparison alone. Related to these are the numerous servers that compare folds-particularly useful for newly solved structures, and especially those of unknown function. Beyond these there are a vast number of databases for the most specialized user, dealing with specific families, diseases, structural features and so on.
Collapse
|
18
|
Petersen B, Lundegaard C, Petersen TN. NetTurnP--neural network prediction of beta-turns by use of evolutionary information and predicted protein sequence features. PLoS One 2010; 5:e15079. [PMID: 21152409 PMCID: PMC2994801 DOI: 10.1371/journal.pone.0015079] [Citation(s) in RCA: 78] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2010] [Accepted: 10/19/2010] [Indexed: 11/30/2022] Open
Abstract
β-turns are the most common type of non-repetitive structures, and constitute on average 25% of the amino acids in proteins. The formation of β-turns plays an important role in protein folding, protein stability and molecular recognition processes. In this work we present the neural network method NetTurnP, for prediction of two-class β-turns and prediction of the individual β-turn types, by use of evolutionary information and predicted protein sequence features. It has been evaluated against a commonly used dataset BT426, and achieves a Matthews correlation coefficient of 0.50, which is the highest reported performance on a two-class prediction of β-turn and not-β-turn. Furthermore NetTurnP shows improved performance on some of the specific β-turn types. In the present work, neural network methods have been trained to predict β-turn or not and individual β-turn types from the primary amino acid sequence. The individual β-turn types I, I', II, II', VIII, VIa1, VIa2, VIba and IV have been predicted based on classifications by PROMOTIF, and the two-class prediction of β-turn or not is a superset comprised of all β-turn types. The performance is evaluated using a golden set of non-homologous sequences known as BT426. Our two-class prediction method achieves a performance of: MCC = 0.50, Qtotal = 82.1%, sensitivity = 75.6%, PPV = 68.8% and AUC = 0.864. We have compared our performance to eleven other prediction methods that obtain Matthews correlation coefficients in the range of 0.17 – 0.47. For the type specific β-turn predictions, only type I and II can be predicted with reasonable Matthews correlation coefficients, where we obtain performance values of 0.36 and 0.31, respectively. Conclusion The NetTurnP method has been implemented as a webserver, which is freely available at http://www.cbs.dtu.dk/services/NetTurnP/. NetTurnP is the only available webserver that allows submission of multiple sequences.
Collapse
Affiliation(s)
- Bent Petersen
- Department of Systems Biology, Center for Biological Sequence Analysis (CBS), Technical University of Denmark, Lyngby, Denmark
| | - Claus Lundegaard
- Department of Systems Biology, Center for Biological Sequence Analysis (CBS), Technical University of Denmark, Lyngby, Denmark
| | - Thomas Nordahl Petersen
- Department of Systems Biology, Center for Biological Sequence Analysis (CBS), Technical University of Denmark, Lyngby, Denmark
- * E-mail:
| |
Collapse
|
19
|
Rajgaria R, Wei Y, Floudas CA. Contact prediction for beta and alpha-beta proteins using integer linear optimization and its impact on the first principles 3D structure prediction method ASTRO-FOLD. Proteins 2010; 78:1825-46. [PMID: 20225257 PMCID: PMC2858251 DOI: 10.1002/prot.22696] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
An integer linear optimization model is presented to predict residue contacts in beta, alpha + beta, and alpha/beta proteins. The total energy of a protein is expressed as sum of a C(alpha)-C(alpha) distance dependent contact energy contribution and a hydrophobic contribution. The model selects contact that assign lowest energy to the protein structure as satisfying a set of constraints that are included to enforce certain physically observed topological information. A new method based on hydrophobicity is proposed to find the beta-sheet alignments. These beta-sheet alignments are used as constraints for contacts between residues of beta-sheets. This model was tested on three independent protein test sets and CASP8 test proteins consisting of beta, alpha + beta, alpha/beta proteins and it was found to perform very well. The average accuracy of the predictions (separated by at least six residues) was approximately 61%. The average true positive and false positive distances were also calculated for each of the test sets and they are 7.58 A and 15.88 A, respectively. Residue contact prediction can be directly used to facilitate the protein tertiary structure prediction. This proposed residue contact prediction model is incorporated into the first principles protein tertiary structure prediction approach, ASTRO-FOLD. The effectiveness of the contact prediction model was further demonstrated by the improvement in the quality of the protein structure ensemble generated using the predicted residue contacts for a test set of 10 proteins.
Collapse
Affiliation(s)
- R. Rajgaria
- Department of Chemical Engineering, Princeton University, Princeton, NJ 08544-5263, U.S.A
| | - Y. Wei
- Department of Chemical Engineering, Princeton University, Princeton, NJ 08544-5263, U.S.A
| | - C. A. Floudas
- Department of Chemical Engineering, Princeton University, Princeton, NJ 08544-5263, U.S.A
| |
Collapse
|
20
|
Karakaş M, Woetzel N, Meiler J. BCL::contact-low confidence fold recognition hits boost protein contact prediction and de novo structure determination. J Comput Biol 2010; 17:153-68. [PMID: 19772383 DOI: 10.1089/cmb.2009.0030] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Knowledge of all residue-residue contacts within a protein allows determination of the protein fold. Accurate prediction of even a subset of long-range contacts (contacts between amino acids far apart in sequence) can be instrumental for determining tertiary structure. Here we present BCL::Contact, a novel contact prediction method that utilizes artificial neural networks (ANNs) and specializes in the prediction of medium to long-range contacts. BCL::Contact comes in two modes: sequence-based and structure-based. The sequence-based mode uses only sequence information and has individual ANNs specialized for helix-helix, helix-strand, strand-helix, strand-strand, and sheet-sheet contacts. The structure-based mode combines results from 32-fold recognition methods with sequence information to a consensus prediction. The two methods were presented in the 6(th) and 7(th) Critical Assessment of Techniques for Protein Structure Prediction (CASP) experiments. The present work focuses on elucidating the impact of fold recognition results onto contact prediction via a direct comparison of both methods on a joined benchmark set of proteins. The sequence-based mode predicted contacts with 42% accuracy (7% false positive rate), while the structure-based mode achieved 45% accuracy (2% false positive rate). Predictions by both modes of BCL::Contact were supplied as input to the protein tertiary structure prediction program Rosetta for a benchmark of 17 proteins with no close sequence homologs in the protein data bank (PDB). Rosetta created higher accuracy models, signified by an improvement of 1.3 A on average root mean square deviation (RMSD), when driven by the predicted contacts. Further, filtering Rosetta models by agreement with the predicted contacts enriches for native-like fold topologies.
Collapse
Affiliation(s)
- Mert Karakaş
- Department of Chemistry, Center for Structural Biology, Vanderbilt University, Nashville, Tennessee, USA
| | | | | |
Collapse
|
21
|
Prediction of protein long-range contacts using an ensemble of genetic algorithm classifiers with sequence profile centers. BMC STRUCTURAL BIOLOGY 2010; 10 Suppl 1:S2. [PMID: 20487509 PMCID: PMC2873825 DOI: 10.1186/1472-6807-10-s1-s2] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
Background Prediction of long-range inter-residue contacts is an important topic in bioinformatics research. It is helpful for determining protein structures, understanding protein foldings, and therefore advancing the annotation of protein functions. Results In this paper, we propose a novel ensemble of genetic algorithm classifiers (GaCs) to address the long-range contact prediction problem. Our method is based on the key idea called sequence profile centers (SPCs). Each SPC is the average sequence profiles of residue pairs belonging to the same contact class or non-contact class. GaCs train on multiple but different pairs of long-range contact data (positive data) and long-range non-contact data (negative data). The negative data sets, having roughly the same sizes as the positive ones, are constructed by random sampling over the original imbalanced negative data. As a result, about 21.5% long-range contacts are correctly predicted. We also found that the ensemble of GaCs indeed makes an accuracy improvement by around 5.6% over the single GaC. Conclusions Classifiers with the use of sequence profile centers may advance the long-range contact prediction. In line with this approach, key structural features in proteins would be determined with high efficiency and accuracy.
Collapse
|
22
|
Kahlem P, Clegg A, Reisinger F, Xenarios I, Hermjakob H, Orengo C, Birney E. ENFIN--A European network for integrative systems biology. C R Biol 2010; 332:1050-8. [PMID: 19909926 DOI: 10.1016/j.crvi.2009.09.003] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
Integration of biological data of various types and the development of adapted bioinformatics tools represent critical objectives to enable research at the systems level. The European Network of Excellence ENFIN is engaged in developing an adapted infrastructure to connect databases, and platforms to enable both the generation of new bioinformatics tools and the experimental validation of computational predictions. With the aim of bridging the gap existing between standard wet laboratories and bioinformatics, the ENFIN Network runs integrative research projects to bring the latest computational techniques to bear directly on questions dedicated to systems biology in the wet laboratory environment. The Network maintains internally close collaboration between experimental and computational research, enabling a permanent cycling of experimental validation and improvement of computational prediction methods. The computational work includes the development of a database infrastructure (EnCORE), bioinformatics analysis methods and a novel platform for protein function analysis FuncNet.
Collapse
Affiliation(s)
- Pascal Kahlem
- EMBL - European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom.
| | | | | | | | | | | | | |
Collapse
|
23
|
Zhang Q, Li D, Wei P, Zhang J, Wan J, Ren Y, Chen Z, Liu D, Yu Z, Feng L. Structure-Based Rational Screening of Novel Hit Compounds with Structural Diversity for Cytochrome P450 Sterol 14α-Demethylase from Penicillium digitatum. J Chem Inf Model 2010; 50:317-25. [DOI: 10.1021/ci900425t] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Qingye Zhang
- Key Laboratory of Pesticide & Chemical Biology (CCNU), Ministry of Education; College of Chemistry, Central China Normal University, Wuhan 430079, P R China,and State Key Laboratory for Agricultural Microbiology, National Engineering Research Centre of Microbial Pesticides, Huazhong Agricultural University, Wuhan 430070, P R China
| | - Ding Li
- Key Laboratory of Pesticide & Chemical Biology (CCNU), Ministry of Education; College of Chemistry, Central China Normal University, Wuhan 430079, P R China,and State Key Laboratory for Agricultural Microbiology, National Engineering Research Centre of Microbial Pesticides, Huazhong Agricultural University, Wuhan 430070, P R China
| | - Pei Wei
- Key Laboratory of Pesticide & Chemical Biology (CCNU), Ministry of Education; College of Chemistry, Central China Normal University, Wuhan 430079, P R China,and State Key Laboratory for Agricultural Microbiology, National Engineering Research Centre of Microbial Pesticides, Huazhong Agricultural University, Wuhan 430070, P R China
| | - Jie Zhang
- Key Laboratory of Pesticide & Chemical Biology (CCNU), Ministry of Education; College of Chemistry, Central China Normal University, Wuhan 430079, P R China,and State Key Laboratory for Agricultural Microbiology, National Engineering Research Centre of Microbial Pesticides, Huazhong Agricultural University, Wuhan 430070, P R China
| | - Jian Wan
- Key Laboratory of Pesticide & Chemical Biology (CCNU), Ministry of Education; College of Chemistry, Central China Normal University, Wuhan 430079, P R China,and State Key Laboratory for Agricultural Microbiology, National Engineering Research Centre of Microbial Pesticides, Huazhong Agricultural University, Wuhan 430070, P R China
| | - Yangliang Ren
- Key Laboratory of Pesticide & Chemical Biology (CCNU), Ministry of Education; College of Chemistry, Central China Normal University, Wuhan 430079, P R China,and State Key Laboratory for Agricultural Microbiology, National Engineering Research Centre of Microbial Pesticides, Huazhong Agricultural University, Wuhan 430070, P R China
| | - Zhigang Chen
- Key Laboratory of Pesticide & Chemical Biology (CCNU), Ministry of Education; College of Chemistry, Central China Normal University, Wuhan 430079, P R China,and State Key Laboratory for Agricultural Microbiology, National Engineering Research Centre of Microbial Pesticides, Huazhong Agricultural University, Wuhan 430070, P R China
| | - Deli Liu
- Key Laboratory of Pesticide & Chemical Biology (CCNU), Ministry of Education; College of Chemistry, Central China Normal University, Wuhan 430079, P R China,and State Key Laboratory for Agricultural Microbiology, National Engineering Research Centre of Microbial Pesticides, Huazhong Agricultural University, Wuhan 430070, P R China
| | - Ziniu Yu
- Key Laboratory of Pesticide & Chemical Biology (CCNU), Ministry of Education; College of Chemistry, Central China Normal University, Wuhan 430079, P R China,and State Key Laboratory for Agricultural Microbiology, National Engineering Research Centre of Microbial Pesticides, Huazhong Agricultural University, Wuhan 430070, P R China
| | - Lingling Feng
- Key Laboratory of Pesticide & Chemical Biology (CCNU), Ministry of Education; College of Chemistry, Central China Normal University, Wuhan 430079, P R China,and State Key Laboratory for Agricultural Microbiology, National Engineering Research Centre of Microbial Pesticides, Huazhong Agricultural University, Wuhan 430070, P R China
| |
Collapse
|
24
|
Laskowski RA. Protein structure databases. Methods Mol Biol 2010; 609:59-82. [PMID: 20221913 DOI: 10.1007/978-1-60327-241-4_4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
Web-based protein structure databases come in a wide variety of types and levels of information content. Those having the most general interest are the various atlases that describe each experimentally determined protein structure and provide useful links, analyses, and schematic diagrams relating to its 3D structure and biological function. Also of great interest are the databases that classify 3D structures by their folds as these can reveal evolutionary relationships which may be hard to detect from sequence comparison alone. Related to these are the numerous servers that compare folds--particularly useful for newly solved structures, and especially those of unknown function. Beyond these there are a vast number of databases for the more specialized user, dealing with specific families, diseases, structural features, and so on.
Collapse
Affiliation(s)
- Roman A Laskowski
- EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| |
Collapse
|
25
|
Tegge AN, Wang Z, Eickholt J, Cheng J. NNcon: improved protein contact map prediction using 2D-recursive neural networks. Nucleic Acids Res 2009; 37:W515-8. [PMID: 19420062 PMCID: PMC2703959 DOI: 10.1093/nar/gkp305] [Citation(s) in RCA: 110] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2009] [Revised: 04/13/2009] [Accepted: 04/16/2009] [Indexed: 11/13/2022] Open
Abstract
Protein contact map prediction is useful for protein folding rate prediction, model selection and 3D structure prediction. Here we describe NNcon, a fast and reliable contact map prediction server and software. NNcon was ranked among the most accurate residue contact predictors in the Eighth Critical Assessment of Techniques for Protein Structure Prediction (CASP8), 2008. Both NNcon server and software are available at http://casp.rnet.missouri.edu/nncon.html.
Collapse
Affiliation(s)
| | | | | | - Jianlin Cheng
- Computer Science Department, Informatics Institute, University of Missouri, Columbia, MO 65213, USA
| |
Collapse
|
26
|
Markoff A, Gerke V, Bogdanova N. Combined homology modelling and evolutionary significance evaluation of missense mutations in blood clotting factor VIII to highlight aspects of structure and function. Haemophilia 2009; 15:932-41. [PMID: 19473423 DOI: 10.1111/j.1365-2516.2009.02009.x] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
Most small lesions in the factor VIII (FVIII) gene that cause haemophilia A (HA) are single nucleotide substitutions resulting in amino acid replacing (missense) mutations and leading to various phenotypes, ranging from mild to severe. We took a combined approach of homology modelling and quantitative evaluation of evolutionary significance of amino acid replacing alterations using the Grantham Matrix Score (GMS) to assess their structural effects and significance of pathological expression. Comparative homology models of all amino acid substitutions summarized in the FVIII mutations database plus these identified and reported lately by us or by our collaborators were evaluated. Altogether 640 amino acid replacing mutations were scored for potential distant or local conformation changes, influence on the molecular stability and predicted contact residues, using available FVIII domain models. The average propensity to substitute amino acid residues by mutation was found comparable to the overall probability of de novo mutations. Missense changes reported with various HA phenotypes were all confirmed significant using GMS. The fraction of these, comprising residues apparently involved in intermolecular interactions, exceeds the average proportion of such residues for FVIII. Predicted contact residues changed through mutation were visualized on the surface of FVIII domains and their possible functional implications were verified from the literature and are discussed considering available structural information. Our predictive modelling adds on the current view of domain interface molecular contacts. This structural insight could aid in part to the design of engineered FVIII constructs for therapy, to possibly enhance their stability and prolong circulating lifetime.
Collapse
Affiliation(s)
- A Markoff
- Institut für Medizinische Biochemie, ZMBE, WWU Münster, Von Esmarch Str. 56, Münster 48149, Germany.
| | | | | |
Collapse
|
27
|
Rajgaria R, McAllister SR, Floudas CA. Towards accurate residue-residue hydrophobic contact prediction for alpha helical proteins via integer linear optimization. Proteins 2009; 74:929-47. [PMID: 18767158 DOI: 10.1002/prot.22202] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
A new optimization-based method is presented to predict the hydrophobic residue contacts in alpha-helical proteins. The proposed approach uses a high resolution distance dependent force field to calculate the interaction energy between different residues of a protein. The formulation predicts the hydrophobic contacts by minimizing the sum of these contact energies. These residue contacts are highly useful in narrowing down the conformational space searched by protein structure prediction algorithms. The proposed algorithm also offers the algorithmic advantage of producing a rank ordered list of the best contact sets. This model was tested on four independent alpha-helical protein test sets and was found to perform very well. The average accuracy of the predictions (separated by at least six residues) obtained using the presented method was approximately 66% for single domain proteins. The average true positive and false positive distances were also calculated for each protein test set and they are 8.87 and 14.67 A, respectively.
Collapse
Affiliation(s)
- R Rajgaria
- Department of Chemical Engineering, Princeton University, Princeton, New Jersey 08544-5263, USA
| | | | | |
Collapse
|
28
|
Walsh I, Baù D, Martin AJM, Mooney C, Vullo A, Pollastri G. Ab initio and template-based prediction of multi-class distance maps by two-dimensional recursive neural networks. BMC STRUCTURAL BIOLOGY 2009; 9:5. [PMID: 19183478 PMCID: PMC2654788 DOI: 10.1186/1472-6807-9-5] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/29/2008] [Accepted: 01/30/2009] [Indexed: 11/17/2022]
Abstract
Background Prediction of protein structures from their sequences is still one of the open grand challenges of computational biology. Some approaches to protein structure prediction, especially ab initio ones, rely to some extent on the prediction of residue contact maps. Residue contact map predictions have been assessed at the CASP competition for several years now. Although it has been shown that exact contact maps generally yield correct three-dimensional structures, this is true only at a relatively low resolution (3–4 Å from the native structure). Another known weakness of contact maps is that they are generally predicted ab initio, that is not exploiting information about potential homologues of known structure. Results We introduce a new class of distance restraints for protein structures: multi-class distance maps. We show that Cα trace reconstructions based on 4-class native maps are significantly better than those from residue contact maps. We then build two predictors of 4-class maps based on recursive neural networks: one ab initio, or relying on the sequence and on evolutionary information; one template-based, or in which homology information to known structures is provided as a further input. We show that virtually any level of sequence similarity to structural templates (down to less than 10%) yields more accurate 4-class maps than the ab initio predictor. We show that template-based predictions by recursive neural networks are consistently better than the best template and than a number of combinations of the best available templates. We also extract binary residue contact maps at an 8 Å threshold (as per CASP assessment) from the 4-class predictors and show that the template-based version is also more accurate than the best template and consistently better than the ab initio one, down to very low levels of sequence identity to structural templates. Furthermore, we test both ab-initio and template-based 8 Å predictions on the CASP7 targets using a pre-CASP7 PDB, and find that both predictors are state-of-the-art, with the template-based one far outperforming the best CASP7 systems if templates with sequence identity to the query of 10% or better are available. Although this is not the main focus of this paper we also report on reconstructions of Cα traces based on both ab initio and template-based 4-class map predictions, showing that the latter are generally more accurate even when homology is dubious. Conclusion Accurate predictions of multi-class maps may provide valuable constraints for improved ab initio and template-based prediction of protein structures, naturally incorporate multiple templates, and yield state-of-the-art binary maps. Predictions of protein structures and 8 Å contact maps based on the multi-class distance map predictors described in this paper are freely available to academic users at the url .
Collapse
Affiliation(s)
- Ian Walsh
- School of Computer Science and Informatics, University College Dublin, Dublin, Ireland.
| | | | | | | | | | | |
Collapse
|
29
|
Makkar P, Metpally RPR, Sangadala S, Reddy BVB. Modeling and analysis of MH1 domain of Smads and their interaction with promoter DNA sequence motif. J Mol Graph Model 2008; 27:803-12. [PMID: 19157940 DOI: 10.1016/j.jmgm.2008.12.003] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2008] [Revised: 12/09/2008] [Accepted: 12/12/2008] [Indexed: 01/11/2023]
Abstract
The Smads are a group of related intracellular proteins critical for transmitting the signals to the nucleus from the transforming growth factor-beta (TGF-beta) superfamily of proteins at the cell surface. The prototypic members of the Smad family, Mad and Sma, were first described in Drosophila and Caenorhabditis elegans, respectively. Related proteins in Xenopus, Humans, Mice and Rats were subsequently identified, and are now known as Smads. Smad protein family members act downstream in the TGF-beta signaling pathway mediating various biological processes, including cell growth, differentiation, matrix production, apoptosis and development. Smads range from about 400-500 amino acids in length and are grouped into the receptor-regulated Smads (R-Smads), the common Smads (Co-Smads) and the inhibitory Smads (I-Smads). There are eight Smads in mammals, Smad1/5/8 (bone morphogenetic protein regulated) and Smad2/3 (TGF-beta/activin regulated) are termed R-Smads, Smad4 is denoted as Co-Smad and Smad6/7 are inhibitory Smads. A typical Smad consists of a conserved N-terminal Mad Homology 1 (MH1) domain and a C-terminal Mad Homology 2 (MH2) domain connected by a proline rich linker. The MH1 domain plays key role in DNA recognition and also facilitates the binding of Smad4 to the phosphorylated C-terminus of R-Smads to form activated complex. The MH2 domain exhibits transcriptional activation properties. In order to understand the structural basis of interaction of various Smads with their target proteins and the promoter DNA, we modeled MH1 domain of the remaining mammalian Smads based on known crystal structures of Smad3-MH1 domain bound to GTCT Smad box DNA sequence (1OZJ). We generated a B-DNA structure using average base-pair parameters of Twist, Tilt, Roll and base Slide angles. We then modeled interaction pose of the MH1 domain of Smad1/5/8 to their corresponding DNA sequence motif GCCG. These models provide the structural basis towards understanding functional similarities and differences among various Smads.
Collapse
Affiliation(s)
- Pooja Makkar
- Graduate Center Biochemistry Department and Laboratory of Bioinformatics &in silico Drug Design, Queens College of City University of New York, 65-30 Kissena Blvd, Flushing, NY 11367, USA
| | | | | | | |
Collapse
|
30
|
Ngan SC, Hung LH, Liu T, Samudrala R. Scoring functions for de novo protein structure prediction revisited. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2008; 413:243-81. [PMID: 18075169 DOI: 10.1007/978-1-59745-574-9_10] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/08/2023]
Abstract
De novo protein structure prediction methods attempt to predict tertiary structures from sequences based on general principles that govern protein folding energetics and/or statistical tendencies of conformational features that native structures acquire, without the use of explicit templates. A general paradigm for de novo prediction involves sampling the conformational space, guided by scoring functions and other sequence-dependent biases, such that a large set of candidate ("decoy") structures are generated, and then selecting native-like conformations from those decoys using scoring functions as well as conformer clustering. High-resolution refinement is sometimes used as a final step to fine-tune native-like structures. There are two major classes of scoring functions. Physics-based functions are based on mathematical models describing aspects of the known physics of molecular interaction. Knowledge-based functions are formed with statistical models capturing aspects of the properties of native protein conformations. We discuss the implementation and use of some of the scoring functions from these two classes for de novo structure prediction in this chapter.
Collapse
Affiliation(s)
- Shing-Chung Ngan
- Department of Microbiology, University of Washington School of Medicine, Seattle, WA, USA
| | | | | | | |
Collapse
|
31
|
Faure G, Bornot A, de Brevern AG. Protein contacts, inter-residue interactions and side-chain modelling. Biochimie 2008; 90:626-39. [DOI: 10.1016/j.biochi.2007.11.007] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2007] [Accepted: 11/22/2007] [Indexed: 10/22/2022]
|
32
|
Mechanism for oxidation of high-molecular-weight substrates by a fungal versatile peroxidase, MnP2. Appl Environ Microbiol 2008; 74:2873-81. [PMID: 18326680 DOI: 10.1128/aem.02080-07] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Unlike general peroxidases, Pleurotus ostreatus MnP2 was reported to have a unique property of direct oxidization of high-molecular-weight compounds, such as Poly R-478 and RNase A. To elucidate the mechanism for oxidation of polymeric substrates by MnP2, a series of mutant enzymes were produced by using a homologous gene expression system, and their reactivities were characterized. A mutant enzyme with an Ala substituting for an exposing Trp (W170A) drastically lost oxidation activity for veratryl alcohol (VA), Poly R-478, and RNase A, whereas the kinetic properties for Mn(2+) and H(2)O(2) were substantially unchanged. These results demonstrated that, in addition to VA, the high-molecular-weight substrates are directly oxidized by MnP2 at W170. Moreover, in the mutants Q266F and V166/168L, amino acid substitution(s) around W170 resulted in a decreased activity only for the high-molecular-weight substrates. These results, along with the three-dimensional modeling of the mutants, suggested that the mutations caused a steric hindrance to access of the polymeric substrates to W170. Another mutant, R263N, contained a newly generated N glycosylation site and showed a higher molecular mass in sodium dodecyl sulfate-polyacrylamide gel electrophoresis analysis. Interestingly, the R263N mutant exhibited an increased reactivity with VA and high-molecular-weight substrates. The existence of an additional carbohydrate modification and the catalytic properties in this mutant are discussed. This is the first study of a direct mechanism for oxidation of high-molecular-weight substrates by a fungal peroxidase using a homologous gene expression system.
Collapse
|
33
|
|
34
|
Cheng J, Baldi P. Improved residue contact prediction using support vector machines and a large feature set. BMC Bioinformatics 2007; 8:113. [PMID: 17407573 PMCID: PMC1852326 DOI: 10.1186/1471-2105-8-113] [Citation(s) in RCA: 142] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2006] [Accepted: 04/02/2007] [Indexed: 11/12/2022] Open
Abstract
BACKGROUND Predicting protein residue-residue contacts is an important 2D prediction task. It is useful for ab initio structure prediction and understanding protein folding. In spite of steady progress over the past decade, contact prediction remains still largely unsolved. RESULTS Here we develop a new contact map predictor (SVMcon) that uses support vector machines to predict medium- and long-range contacts. SVMcon integrates profiles, secondary structure, relative solvent accessibility, contact potentials, and other useful features. On the same test data set, SVMcon's accuracy is 4% higher than the latest version of the CMAPpro contact map predictor. SVMcon recently participated in the seventh edition of the Critical Assessment of Techniques for Protein Structure Prediction (CASP7) experiment and was evaluated along with seven other contact map predictors. SVMcon was ranked as one of the top predictors, yielding the second best coverage and accuracy for contacts with sequence separation > or = 12 on 13 de novo domains. CONCLUSION We describe SVMcon, a new contact map predictor that uses SVMs and a large set of informative features. SVMcon yields good performance on medium- to long-range contact predictions and can be modularly incorporated into a structure prediction pipeline.
Collapse
Affiliation(s)
- Jianlin Cheng
- School of Electrical Engineering and Computer Science, University of Central Florida, Orlando, FL 32816-2362, USA
| | - Pierre Baldi
- School of Information and Computer Sciences, University of California Irvine, Irvine, CA 92617, USA
| |
Collapse
|
35
|
Lee VSY, Tu WC, Jinn TR, Peng CC, Lin LJ, Tzen JTC. Molecular cloning of the precursor polypeptide of mastoparan B and its putative processing enzyme, dipeptidyl peptidase IV, from the black-bellied hornet, Vespa basalis. INSECT MOLECULAR BIOLOGY 2007; 16:231-7. [PMID: 17298553 DOI: 10.1111/j.1365-2583.2006.00718.x] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]
Abstract
Mastoparan B, a cationic toxin, is the major peptide component in the venom of Vespa basalis. Molecular cloning of its cDNA fragment revealed that this toxin was initially synthesized as a precursor polypeptide, containing an N-terminal signal sequence, a prosequence, the mature toxin, and an appendix glycine at C-terminus. Sequence alignment between precursors of mastoparan B and melittin from honeybee venom showed a significant conservation in prosequence. Alternate positions existing in both prosequences were either proline or alanine known as the potential cleaving sites for dipeptidyl peptidase IV. Subsequently, a putative dipeptidyl peptidase IV cDNA fragment was cloned from Vespa basalis venom gland. The prosequence may possibly be removed via sequential liberation of dipeptides during the processing of mastoparan B.
Collapse
Affiliation(s)
- V S Y Lee
- Graduate Institute of Biotechnology, National Chung Hsing University, Taichung, Taiwan
| | | | | | | | | | | |
Collapse
|
36
|
Scior T, Luna F, Koch W, Sánchez-Ruiz J. In silico analysis identifies a C3HC4-RING finger domain of a putative E3 ubiquitin-protein ligase located at the C-terminus of a polyglutamine-containing protein. Braz J Med Biol Res 2007. [DOI: 10.1590/s0100-879x2006005000075] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Affiliation(s)
- T. Scior
- Benemérita Universidad Autónoma de Puebla, México
| | - F. Luna
- Benemérita Universidad Autónoma de Puebla, México
| | - W. Koch
- Facultad de Estudios Superiores, México
| | | |
Collapse
|
37
|
Haste Andersen P, Nielsen M, Lund O. Prediction of residues in discontinuous B-cell epitopes using protein 3D structures. Protein Sci 2006; 15:2558-67. [PMID: 17001032 PMCID: PMC2242418 DOI: 10.1110/ps.062405906] [Citation(s) in RCA: 413] [Impact Index Per Article: 22.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
Discovery of discontinuous B-cell epitopes is a major challenge in vaccine design. Previous epitope prediction methods have mostly been based on protein sequences and are not very effective. Here, we present DiscoTope, a novel method for discontinuous epitope prediction that uses protein three-dimensional structural data. The method is based on amino acid statistics, spatial information, and surface accessibility in a compiled data set of discontinuous epitopes determined by X-ray crystallography of antibody/antigen protein complexes. DiscoTope is the first method to focus explicitly on discontinuous epitopes. We show that the new structure-based method has a better performance for predicting residues of discontinuous epitopes than methods based solely on sequence information, and that it can successfully predict epitope residues that have been identified by different techniques. DiscoTope detects 15.5% of residues located in discontinuous epitopes with a specificity of 95%. At this level of specificity, the conventional Parker hydrophilicity scale for predicting linear B-cell epitopes identifies only 11.0% of residues located in discontinuous epitopes. Predictions by the DiscoTope method can guide experimental epitope mapping in both rational vaccine design and development of diagnostic tools, and may lead to more efficient epitope identification.
Collapse
Affiliation(s)
- Pernille Haste Andersen
- Center for Biological Sequence Analysis, BioCentrum, Technical University of Denmark, DK-2800 Lyngby, Denmark
| | | | | |
Collapse
|
38
|
|
39
|
Hinsby AM, Kiemer L, Karlberg EO, Lage K, Fausbøll A, Juncker AS, Andersen JS, Mann M, Brunak S. A Wiring of the Human Nucleolus. Mol Cell 2006; 22:285-95. [PMID: 16630896 DOI: 10.1016/j.molcel.2006.03.012] [Citation(s) in RCA: 52] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2005] [Revised: 01/31/2006] [Accepted: 03/07/2006] [Indexed: 11/22/2022]
Abstract
Recent proteomic efforts have created an extensive inventory of the human nucleolar proteome. However, approximately 30% of the identified proteins lack functional annotation. We present an approach of assigning function to uncharacterized nucleolar proteins by data integration coupled to a machine-learning method. By assembling protein complexes, we present a first draft of the human ribosome biogenesis pathway encompassing 74 proteins and hereby assign function to 49 previously uncharacterized proteins. Moreover, the functional diversity of the nucleolus is underlined by the identification of a number of protein complexes with functions beyond ribosome biogenesis. Finally, we were able to obtain experimental evidence of nucleolar localization of 11 proteins, which were predicted by our platform to be associates of nucleolar complexes. We believe other biological organelles or systems could be "wired" in a similar fashion, integrating different types of data with high-throughput proteomics, followed by a detailed biological analysis and experimental validation.
Collapse
Affiliation(s)
- Anders M Hinsby
- Center for Biological Sequence Analysis, BioCentrum-DTU, Technical University of Denmark, DK-2800 Lyngby
| | | | | | | | | | | | | | | | | |
Collapse
|
40
|
Arnold K, Bordoli L, Kopp J, Schwede T. The SWISS-MODEL workspace: a web-based environment for protein structure homology modelling. ACTA ACUST UNITED AC 2005; 22:195-201. [PMID: 16301204 DOI: 10.1093/bioinformatics/bti770] [Citation(s) in RCA: 5532] [Impact Index Per Article: 291.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
MOTIVATION Homology models of proteins are of great interest for planning and analysing biological experiments when no experimental three-dimensional structures are available. Building homology models requires specialized programs and up-to-date sequence and structural databases. Integrating all required tools, programs and databases into a single web-based workspace facilitates access to homology modelling from a computer with web connection without the need of downloading and installing large program packages and databases. RESULTS SWISS-MODEL workspace is a web-based integrated service dedicated to protein structure homology modelling. It assists and guides the user in building protein homology models at different levels of complexity. A personal working environment is provided for each user where several modelling projects can be carried out in parallel. Protein sequence and structure databases necessary for modelling are accessible from the workspace and are updated in regular intervals. Tools for template selection, model building and structure quality evaluation can be invoked from within the workspace. Workflow and usage of the workspace are illustrated by modelling human Cyclin A1 and human Transmembrane Protease 3. AVAILABILITY The SWISS-MODEL workspace can be accessed freely at http://swissmodel.expasy.org/workspace/
Collapse
|
41
|
Bendtsen JD, Kiemer L, Fausbøll A, Brunak S. Non-classical protein secretion in bacteria. BMC Microbiol 2005; 5:58. [PMID: 16212653 PMCID: PMC1266369 DOI: 10.1186/1471-2180-5-58] [Citation(s) in RCA: 534] [Impact Index Per Article: 28.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2005] [Accepted: 10/07/2005] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND We present an overview of bacterial non-classical secretion and a prediction method for identification of proteins following signal peptide independent secretion pathways. We have compiled a list of proteins found extracellularly despite the absence of a signal peptide. Some of these proteins also have known roles in the cytoplasm, which means they could be so-called "moon-lightning" proteins having more than one function. RESULTS A thorough literature search was conducted to compile a list of currently known bacterial non-classically secreted proteins. Pattern finding methods were applied to the sequences in order to identify putative signal sequences or motifs responsible for their secretion. We have found no signal or motif characteristic to any majority of the proteins in the compiled list of non-classically secreted proteins, and conclude that these proteins, indeed, seem to be secreted in a novel fashion. However, we also show that the apparently non-classically secreted proteins are still distinguished from cellular proteins by properties such as amino acid composition, secondary structure and disordered regions. Specifically, prediction of disorder reveals that bacterial secretory proteins are more structurally disordered than their cytoplasmic counterparts. Finally, artificial neural networks were used to construct protein feature based methods for identification of non-classically secreted proteins in both Gram-positive and Gram-negative bacteria. CONCLUSION We present a publicly available prediction method capable of discriminating between this group of proteins and other proteins, thus allowing for the identification of novel non-classically secreted proteins. We suggest candidates for non-classically secreted proteins in Escherichia coli and Bacillus subtilis. The prediction method is available online.
Collapse
Affiliation(s)
- Jannick D Bendtsen
- Center for Biological Sequence Analysis, BioCentrum-DTU, Building 208, Technical University of Denmark, DK-2800 Lyngby, Denmark
| | - Lars Kiemer
- Center for Biological Sequence Analysis, BioCentrum-DTU, Building 208, Technical University of Denmark, DK-2800 Lyngby, Denmark
| | - Anders Fausbøll
- Center for Biological Sequence Analysis, BioCentrum-DTU, Building 208, Technical University of Denmark, DK-2800 Lyngby, Denmark
| | - Søren Brunak
- Center for Biological Sequence Analysis, BioCentrum-DTU, Building 208, Technical University of Denmark, DK-2800 Lyngby, Denmark
| |
Collapse
|
42
|
Pogozheva ID, Przydzial MJ, Mosberg HI. Homology modeling of opioid receptor-ligand complexes using experimental constraints. AAPS JOURNAL 2005; 7:E434-48. [PMID: 16353922 PMCID: PMC2750980 DOI: 10.1208/aapsj070243] [Citation(s) in RCA: 72] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Opioid receptors interact with a variety of ligands, including endogenous peptides, opiates, and thousands of synthetic compounds with different structural scaffolds. In the absence of experimental structures of opioid receptors, theoretical modeling remains an important tool for structure-function analysis. The combination of experimental studies and modeling approaches allows development of realistic models of ligand-receptor complexes helpful for elucidation of the molecular determinants of ligand affinity and selectivity and for understanding mechanisms of functional agonism or antagonism. In this review we provide a brief critical assessment of the status of such theoretical modeling and describe some common problems and their possible solutions. Currently, there are no reliable theoretical methods to generate the models in a completely automatic fashion. Models of higher accuracy can be produced if homology modeling, based on the rhodopsin X-ray template, is supplemented by experimental structural constraints appropriate for the active or inactive receptor conformations, together with receptor-specific and ligand-specific interactions. The experimental constraints can be derived from mutagenesis and cross-linking studies, correlative replacements of ligand and receptor groups, and incorporation of metal binding sites between residues of receptors or receptors and ligands. This review focuses on the analysis of similarity and differences of the refined homology models of mu, delta, and kappa-opioid receptors in active and inactive states, emphasizing the molecular details of interaction of the receptors with some representative peptide and nonpeptide ligands, underlying the multiple modes of binding of small opiates, and the differences in binding modes of agonists and antagonists, and of peptides and alkaloids.
Collapse
Affiliation(s)
- Irina D Pogozheva
- Department of Medicinal Chemistry, College of Pharmacy, University of Michigan, Ann Arbor, MI 48109, USA
| | | | | |
Collapse
|
43
|
Bogdanova N, Markoff A, Pollmann H, Nowak-Göttl U, Eisert R, Wermes C, Todorova A, Eigel A, Dworniczak B, Horst J. Spectrum of molecular defects and mutation detection rate in patients with severe hemophilia A. Hum Mutat 2005; 26:249-54. [PMID: 16086318 DOI: 10.1002/humu.20208] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
Hemophilia A is the most frequently occurring X-linked bleeding disorder, affecting one to two out of 10,000 males worldwide. Various types of mutations in the F8 gene are causative for this condition. It is well known that the most common mutation in severely affected patients is the intron 22 inversion, which accounts for about 45% of cases with F8 residual activity of less than 1%. Therefore, the aim of the present study was to determine the spectrum and distribution of mutations in the F8 gene in a large group of patients with severe hemophilia A who previously tested negative for the common intron 22 inversion. Here we report on a mutation analysis of 86 patients collected under the above-mentioned criterion. The pathogenic molecular defect was identified in all patients, and thus our detection rate was virtually 100%. Thirty-four of the identified mutations are described for the first time. The newly detected amino acid substitutions were scored for potential gross or local conformational changes and influence on molecular stability for every single F8 domain with available structures, using homology modeling.
Collapse
|
44
|
Combining a binary input encoding scheme with RBFNN for globulin protein inter-residue contact map prediction. Pattern Recognit Lett 2005. [DOI: 10.1016/j.patrec.2005.01.005] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
45
|
Abstract
MOTIVATION Despite the continuing advance in the experimental determination of protein structures, the gap between the number of known protein sequences and structures continues to increase. Prediction methods can bridge this sequence-structure gap only partially. Better predictions of non-local contacts between residues could improve comparative modeling, fold recognition and could assist in the experimental structure determination. RESULTS Here, we introduced PROFcon, a novel contact prediction method that combines information from alignments, from predictions of secondary structure and solvent accessibility, from the region between two residues and from the average properties of the entire protein. In contrast to some other methods, PROFcon predicted short and long proteins at similar levels of accuracy. As expected, PROFcon was clearly less accurate when tested on sparse evolutionary profiles, that is, on families with few homologs. Prediction accuracy was highest for proteins belonging to the SCOP alpha/beta class. PROFcon compared favorably with state-of-the-art prediction methods at the CASP6 meeting. While the performance may still be perceived as low, our method clearly pushed the mark higher. Furthermore, predictions are already accurate enough to seed predictions of global features of protein structure.
Collapse
Affiliation(s)
- Marco Punta
- CUBIC, Department of Biochemistry and Molecular Biophysics, Columbia University 650 West 168th Street BB217, New York, NY 10032, USA.
| | | |
Collapse
|
46
|
Li YC, Yang YC, Hsu JSF, Wu DJ, Wu HH, Tzen JTC. Cloning and immunolocalization of an antifungal chitinase in jelly fig (Ficus awkeotsang) achenes. PHYTOCHEMISTRY 2005; 66:879-886. [PMID: 15845406 DOI: 10.1016/j.phytochem.2005.02.015] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/15/2004] [Revised: 12/13/2004] [Indexed: 05/24/2023]
Abstract
A 30-kDa protein extracted from the pericarpial portion of jelly fig (Ficus awkeotsang Makino) achenes has been identified as a thermostable chitinase based on its enzymatic activity. A cDNA fragment encoding the precursor protein (including a cleavable signal sequence) of this chitinase was obtained by PCR cloning, and subsequently confirmed by immunological recognition of its overexpressed protein in Escherichia coli. Homology modeling predicted that this thermostable chitinase in jelly fig achenes comprised a stable (betaalpha)(8) barrel fold with three pairs of disulfide linkage. Immunostaining indicated that this chitinase was exclusively localized in the pericarpial region but not in the seed cells where bulky protein bodies and massive oil bodies were accumulated. Spore germination of Colletotrichum gloeosporioides, a common post-harvest pathogen infecting ripening fruit of jelly fig and many other fruits, was inhibited by this chitinase purified from achenes. It is suggested that the biological function of the thermostable chitinase in the pericarp of jelly fig achenes is to protect the nutritive seeds from fungal attack during fruit ripening.
Collapse
Affiliation(s)
- Yu-Ching Li
- Graduate Institute of Biotechnology, National Chung-Hsing University, Taichung 40227, Taiwan, ROC
| | | | | | | | | | | |
Collapse
|
47
|
Hysi P, Kabesch M, Moffatt MF, Schedel M, Carr D, Zhang Y, Boardman B, von Mutius E, Weiland SK, Leupold W, Fritzsch C, Klopp N, Musk AW, James A, Nunez G, Inohara N, Cookson WOC. NOD1 variation, immunoglobulin E and asthma. Hum Mol Genet 2005; 14:935-41. [PMID: 15718249 DOI: 10.1093/hmg/ddi087] [Citation(s) in RCA: 204] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
Asthma is a familial inflammatory disease of the airways of the lung. Microbial exposures in childhood protect against asthma through unknown mechanisms. The innate immune system is able to identify microbial components through a variety of pattern-recognition receptors (PRRs). NOD1 is an intracellular PRR that initiates inflammation in response to bacterial diaminopimelic acid (iE-DAP). The NOD1 gene is on chromosome 7p14, in a region that has been genetically linked to asthma. We carried out a systematic search for polymorphism in the gene. We found an insertion-deletion polymorphism (ND(1)+32656) near the beginning of intron IX that accounted for approximately 7% of the variation in IgE in two panels of families (P<0.0005 in each). Allele*2 (the insertion) was associated with high IgE levels. The same allele was strongly associated with asthma in an independent study of 600 asthmatic children and 1194 super-normal controls [odds ratio (OR) 6.3; 95% confidence interval (CI) 1.4-28.3, dominant model]. Differential binding of the two ND(1)+32656 alleles was observed to a protein from nuclei of the Calu 3 epithelial cell line. In an accompanying study, the deletion allele (ND(1)+32656*1) was found to be associated with inflammatory bowel disease. The results indicate that intracellular recognition of specific bacterial products affects the presence of childhood asthma.
Collapse
Affiliation(s)
- Pirro Hysi
- Wellcome Trust Centre for Human Genetics, University of Oxford, Roosevelt Drive, Oxford OX3 7BN, UK
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
48
|
Tang SN, Huang JF. Evolution of different oligomeric glycyl-tRNA synthetases. FEBS Lett 2005; 579:1441-5. [PMID: 15733854 DOI: 10.1016/j.febslet.2005.01.045] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2004] [Revised: 01/11/2005] [Accepted: 01/11/2005] [Indexed: 10/25/2022]
Abstract
There are two oligomeric types of glycyl-tRNA synthetases (GlyRSs) in genome, the alpha2beta2 tetramer and alpha2 dimer. Here, we showed that the anticodon-binding domains (ABDs) of dimeric and tetrameric GlyRSs are non-homologous, although their catalytic central domains (CCDs) are homologous. The dimeric GlyRS_ABD is fused to the C-terminal of CCD in alpha-subunit, but the tetrameric GlyRS_ABD is to the C-terminal in beta-subunit during evolution. Generally, one species only contains one oligomeric type of GlyRS, but the both oligomeric GlyRSs with the multiple homologous domains can be observed in Magnetospirillum magnetotacticum genome, nevertheless, these homologous domains are probably from different genomes.
Collapse
Affiliation(s)
- Su-Ni Tang
- Key Laboratory of Cellular and Molecular Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, 32 Eastern Jiaochang Road, Kunming, Yunnan 650223, PR China
| | | |
Collapse
|
49
|
Linear Response Properties Required to Simulate Vibrational Spectra of Biomolecules in Various Media: (R)-Phenyloxirane (A Comparative Theoretical and Spectroscopic Vibrational Study). ADVANCES IN QUANTUM CHEMISTRY 2005. [DOI: 10.1016/s0065-3276(05)50006-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/19/2023]
|
50
|
Mitra S. Computational Intelligence in Bioinformatics. TRANSACTIONS ON ROUGH SETS III 2005. [DOI: 10.1007/11427834_6] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
|