1
|
Mufassirin MMM, Newton MAH, Sattar A. Artificial intelligence for template-free protein structure prediction: a comprehensive review. Artif Intell Rev 2022. [DOI: 10.1007/s10462-022-10350-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
|
2
|
Hall D, Basu G, Ito N. Computational biophysics and structural biology of proteins-a Special Issue in honor of Prof. Haruki Nakamura's 70th birthday. Biophys Rev 2022; 14:1211-1222. [PMID: 36620377 PMCID: PMC9809522 DOI: 10.1007/s12551-022-01039-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/08/2022] [Indexed: 01/05/2023] Open
Abstract
Receiving his initial training jointly in theoretical and applied physics at the University of Tokyo, Professor Haruki Nakamura has had a long and eventful scientific career, along the way helping to shape the way that biophysics is carried out in Japan. Concentrating his research efforts on the simulation of protein structure and function, he has, over his career arc, acted as director of the Institute for Protein Research (Osaka, Japan), director of the Protein Data Bank of Japan (PDBj), president of the Biophysical Society of Japan (BSJ), president of the Protein Science Society of Japan (PSSJ), and group leader and professor of Bioinformatics and Computational Structural Biology at Osaka University. In 2022, Prof. Haruki Nakamura turned 70 years old, and to mark this occasion, his scientific colleagues from around the world have combined their efforts to produce this Festschrift Issue of the IUPAB Biophysical Reviews journal around the theme of the computational biophysics and structural biology of proteins.
Collapse
Affiliation(s)
- Damien Hall
- WPI Nano Life Science Institute, Kanazawa University, Kakumamachi, Kanazawa, Ishikawa 920-1164 Japan
- Department of Applied Physics, Aalto University, 00076 Aalto, Finland
| | - Gautam Basu
- Department of Biophysics, Bose Institute, Centenary Campus, P-1/12 C.I.T. Scheme VII-M, Kolkata, 700054 India
| | - Nobutoshi Ito
- Medical Research Institute, Tokyo Medical and Dental University (TMDU), Yushima, Bunkyo-Ku, Tokyo, 113-8510 Japan
| |
Collapse
|
3
|
Kinjo AR. Cooperative "folding transition" in the sequence space facilitates function-driven evolution of protein families. J Theor Biol 2018; 443:18-27. [PMID: 29355538 DOI: 10.1016/j.jtbi.2018.01.019] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2017] [Revised: 01/16/2018] [Accepted: 01/17/2018] [Indexed: 12/23/2022]
Abstract
In the protein sequence space, natural proteins form clusters of families which are characterized by their unique native folds whereas the great majority of random polypeptides are neither clustered nor foldable to unique structures. Since a given polypeptide can be either foldable or unfoldable, a kind of "folding transition" is expected at the boundary of a protein family in the sequence space. By Monte Carlo simulations of a statistical mechanical model of protein sequence alignment that coherently incorporates both short-range and long-range interactions as well as variable-length insertions to reproduce the statistics of the multiple sequence alignment of a given protein family, we demonstrate the existence of such transition between natural-like sequences and random sequences in the sequence subspaces for 15 domain families of various folds. The transition was found to be highly cooperative and two-state-like. Furthermore, enforcing or suppressing consensus residues on a few of the well-conserved sites enhanced or diminished, respectively, the natural-like pattern formation over the entire sequence. In most families, the key sites included ligand binding sites. These results suggest some selective pressure on the key residues, such as ligand binding activity, may cooperatively facilitate the emergence of a protein family during evolution. From a more practical aspect, the present results highlight an essential role of long-range effects in precisely defining protein families, which are absent in conventional sequence models.
Collapse
Affiliation(s)
- Akira R Kinjo
- Institute for Protein Research, Osaka University, 3-2 Yamadaoka, Suita, Osaka 565-0871, Japan.
| |
Collapse
|
4
|
Kinjo AR. A unified statistical model of protein multiple sequence alignment integrating direct coupling and insertions. Biophys Physicobiol 2016; 13:45-62. [PMID: 27924257 PMCID: PMC5042171 DOI: 10.2142/biophysico.13.0_45] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2015] [Accepted: 03/18/2016] [Indexed: 12/01/2022] Open
Abstract
The multiple sequence alignment (MSA) of a protein family provides a wealth of information in terms of the conservation pattern of amino acid residues not only at each alignment site but also between distant sites. In order to statistically model the MSA incorporating both short-range and long-range correlations as well as insertions, I have derived a lattice gas model of the MSA based on the principle of maximum entropy. The partition function, obtained by the transfer matrix method with a mean-field approximation, accounts for all possible alignments with all possible sequences. The model parameters for short-range and long-range interactions were determined by a self-consistent condition and by a Gaussian approximation, respectively. Using this model with and without long-range interactions, I analyzed the globin and V-set domains by increasing the “temperature” and by “mutating” a site. The correlations between residue conservation and various measures of the system’s stability indicate that the long-range interactions make the conservation pattern more specific to the structure, and increasingly stabilize better conserved residues.
Collapse
Affiliation(s)
- Akira R Kinjo
- Institute for Protein Research, Osaka University, Suita, Osaka 565-0871, Japan
| |
Collapse
|
5
|
Specific non-local interactions are not necessary for recovering native protein dynamics. PLoS One 2014; 9:e91347. [PMID: 24625758 PMCID: PMC3953337 DOI: 10.1371/journal.pone.0091347] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2013] [Accepted: 02/11/2014] [Indexed: 11/25/2022] Open
Abstract
The elastic network model (ENM) is a widely used method to study native protein dynamics by normal mode analysis (NMA). In ENM we need information about all pairwise distances, and the distance between contacting atoms is restrained to the native value. Therefore ENM requires O(N2) information to realize its dynamics for a protein consisting of N amino acid residues. To see if (or to what extent) such a large amount of specific structural information is required to realize native protein dynamics, here we introduce a novel model based on only O(N) restraints. This model, named the ‘contact number diffusion’ model (CND), includes specific distance restraints for only local (along the amino acid sequence) atom pairs, and semi-specific non-local restraints imposed on each atom, rather than atom pairs. The semi-specific non-local restraints are defined in terms of the non-local contact numbers of atoms. The CND model exhibits the dynamic characteristics comparable to ENM and more correlated with the explicit-solvent molecular dynamics simulation than ENM. Moreover, unrealistic surface fluctuations often observed in ENM were suppressed in CND. On the other hand, in some ligand-bound structures CND showed larger fluctuations of buried protein atoms interacting with the ligand compared to ENM. In addition, fluctuations from CND and ENM show comparable correlations with the experimental B-factor. Although there are some indications of the importance of some specific non-local interactions, the semi-specific non-local interactions are mostly sufficient for reproducing the native protein dynamics.
Collapse
|
6
|
Hu L, Cui W, He Z, Shi X, Feng K, Ma B, Cai YD. Cooperativity among short amyloid stretches in long amyloidogenic sequences. PLoS One 2012; 7:e39369. [PMID: 22761773 PMCID: PMC3382238 DOI: 10.1371/journal.pone.0039369] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2012] [Accepted: 05/18/2012] [Indexed: 12/29/2022] Open
Abstract
Amyloid fibrillar aggregates of polypeptides are associated with many neurodegenerative diseases. Short peptide segments in protein sequences may trigger aggregation. Identifying these stretches and examining their behavior in longer protein segments is critical for understanding these diseases and obtaining potential therapies. In this study, we combined machine learning and structure-based energy evaluation to examine and predict amyloidogenic segments. Our feature selection method discovered that windows consisting of long amino acid segments of ~30 residues, instead of the commonly used short hexapeptides, provided the highest accuracy. Weighted contributions of an amino acid at each position in a 27 residue window revealed three cooperative regions of short stretch, resemble the β-strand-turn-β-strand motif in A-βpeptide amyloid and β-solenoid structure of HET-s(218-289) prion (C). Using an in-house energy evaluation algorithm, the interaction energy between two short stretches in long segment is computed and incorporated as an additional feature. The algorithm successfully predicted and classified amyloid segments with an overall accuracy of 75%. Our study revealed that genome-wide amyloid segments are not only dependent on short high propensity stretches, but also on nearby residues.
Collapse
Affiliation(s)
- Lele Hu
- Institute of Systems Biology, Shanghai University, Shanghai, People’s Republic of China
- Department of Chemistry, College of Sciences, Shanghai University, Shanghai, People’s Republic of China
| | - Weiren Cui
- CAS-MPG Partner Institute of Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, People’s Republic of China
| | - Zhisong He
- CAS-MPG Partner Institute of Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, People’s Republic of China
| | - Xiaohe Shi
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences and Shanghai Jiao Tong University School of Medicine, Shanghai, People’s Republic of China
| | - Kaiyan Feng
- Shanghai Center for Bioinformation Technology, Shanghai, China
| | - Buyong Ma
- Basic Science Program, SAIC – Frederick, Center for Cancer Research Nanobiology Program, National Cancer Institute-Fredeick, National Institute of Health, Frederick, Maryland, United States of America
| | - Yu-Dong Cai
- Institute of Systems Biology, Shanghai University, Shanghai, People’s Republic of China
| |
Collapse
|
7
|
Colafranceschi M, Giuliani A, Andersen Ø, Brix O, De Rosa MC, Giardina B, Colosimo A. Hydrophobicity patterns and biological adaptation: an exemplary case from fish hemoglobins. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2010; 14:275-81. [PMID: 20450440 DOI: 10.1089/omi.2010.0007] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
The dissection of phylogenetic and environmental components in biological evolution is one of the main themes of general biology. Here we propose an approach to this theme relying upon the comparison between a phylogenetic oriented metrics spanning the hemoglobin beta chains of different fishes and a more physiologically oriented metrics defining the same sequences in terms of the dynamical features of their hydrophobic distributions. By analyzing the set of sequences more similar to the Gadus morhua (Atlantic cod) hemoglobin beta chain, we were able to give a proof of concept of the possibility to discriminate the phylogenetic and environmental (evolutive convergence) components by the comparative analysis of the Clustal W (phylogenetics first) and Recurrence Quantification Analysis (physiology first) metrics in which the sequences were embedded. The use of a molecular system like hemoglobin playing a crucial role in fishes adaptation to environmental cues allowed us to span different levels of biological variability by means of the same paradigm. Starting from the reconstruction of the general taxonomy of vertebrate groups we went down to the exploitation of the peculiar role played by Met55Val and Lys62Ala polymorphisms in the beta1 hemoglobin chain of the Atlantic cod able to influence the geographical distribution of its various stocks.
Collapse
|
8
|
Zimmermann K, Gibrat JF. Amino acid "little Big Bang": representing amino acid substitution matrices as dot products of Euclidian vectors. BMC Bioinformatics 2010; 11:4. [PMID: 20047649 PMCID: PMC3098074 DOI: 10.1186/1471-2105-11-4] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2009] [Accepted: 01/04/2010] [Indexed: 11/15/2022] Open
Abstract
BACKGROUND Sequence comparisons make use of a one-letter representation for amino acids, the necessary quantitative information being supplied by the substitution matrices. This paper deals with the problem of finding a representation that provides a comprehensive description of amino acid intrinsic properties consistent with the substitution matrices. RESULTS We present a Euclidian vector representation of the amino acids, obtained by the singular value decomposition of the substitution matrices. The substitution matrix entries correspond to the dot product of amino acid vectors. We apply this vector encoding to the study of the relative importance of various amino acid physicochemical properties upon the substitution matrices. We also characterize and compare the PAM and BLOSUM series substitution matrices. CONCLUSIONS This vector encoding introduces a Euclidian metric in the amino acid space, consistent with substitution matrices. Such a numerical description of the amino acid is useful when intrinsic properties of amino acids are necessary, for instance, building sequence profiles or finding consensus sequences, using machine learning algorithms such as Support Vector Machine and Neural Networks algorithms.
Collapse
Affiliation(s)
- Karel Zimmermann
- Université Pierre et Marie Curie (Paris VI), France
- INRA, Mathématique, Informatique et Génome UR1077, F-78352 Jouy-en-Josas, France
| | - Jean-François Gibrat
- INRA, Mathématique, Informatique et Génome UR1077, F-78352 Jouy-en-Josas, France
| |
Collapse
|
9
|
Song J, Tan H, Mahmood K, Law RHP, Buckle AM, Webb GI, Akutsu T, Whisstock JC. Prodepth: predict residue depth by support vector regression approach from protein sequences only. PLoS One 2009; 4:e7072. [PMID: 19759917 PMCID: PMC2742725 DOI: 10.1371/journal.pone.0007072] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2009] [Accepted: 08/20/2009] [Indexed: 11/24/2022] Open
Abstract
Residue depth (RD) is a solvent exposure measure that complements the information provided by conventional accessible surface area (ASA) and describes to what extent a residue is buried in the protein structure space. Previous studies have established that RD is correlated with several protein properties, such as protein stability, residue conservation and amino acid types. Accurate prediction of RD has many potentially important applications in the field of structural bioinformatics, for example, facilitating the identification of functionally important residues, or residues in the folding nucleus, or enzyme active sites from sequence information. In this work, we introduce an efficient approach that uses support vector regression to quantify the relationship between RD and protein sequence. We systematically investigated eight different sequence encoding schemes including both local and global sequence characteristics and examined their respective prediction performances. For the objective evaluation of our approach, we used 5-fold cross-validation to assess the prediction accuracies and showed that the overall best performance could be achieved with a correlation coefficient (CC) of 0.71 between the observed and predicted RD values and a root mean square error (RMSE) of 1.74, after incorporating the relevant multiple sequence features. The results suggest that residue depth could be reliably predicted solely from protein primary sequences: local sequence environments are the major determinants, while global sequence features could influence the prediction performance marginally. We highlight two examples as a comparison in order to illustrate the applicability of this approach. We also discuss the potential implications of this new structural parameter in the field of protein structure prediction and homology modeling. This method might prove to be a powerful tool for sequence analysis.
Collapse
Affiliation(s)
- Jiangning Song
- Department of Biochemistry and Molecular Biology, Monash University, Clayton, Melbourne, Victoria, Australia
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho, Uji, Kyoto, Japan
- * E-mail: (JS); (JCW)
| | - Hao Tan
- Department of Biochemistry and Molecular Biology, Monash University, Clayton, Melbourne, Victoria, Australia
| | - Khalid Mahmood
- Department of Biochemistry and Molecular Biology, Monash University, Clayton, Melbourne, Victoria, Australia
- ARC Centre of Excellence for Structural and Functional Microbial Genomics, Monash University, Clayton, Melbourne, Victoria, Australia
| | - Ruby H. P. Law
- Department of Biochemistry and Molecular Biology, Monash University, Clayton, Melbourne, Victoria, Australia
| | - Ashley M. Buckle
- Department of Biochemistry and Molecular Biology, Monash University, Clayton, Melbourne, Victoria, Australia
| | - Geoffrey I. Webb
- Faculty of Information Technology, Monash University, Clayton, Melbourne, Victoria, Australia
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho, Uji, Kyoto, Japan
| | - James C. Whisstock
- Department of Biochemistry and Molecular Biology, Monash University, Clayton, Melbourne, Victoria, Australia
- ARC Centre of Excellence for Structural and Functional Microbial Genomics, Monash University, Clayton, Melbourne, Victoria, Australia
- * E-mail: (JS); (JCW)
| |
Collapse
|
10
|
Kinjo AR. Profile conditional random fields for modeling protein families with structural information. Biophysics (Nagoya-shi) 2009; 5:37-44. [PMID: 27857577 PMCID: PMC5036637 DOI: 10.2142/biophysics.5.37] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2009] [Accepted: 05/12/2009] [Indexed: 12/01/2022] Open
Abstract
A statistical model of protein families, called profile conditional random fields (CRFs), is proposed. This model may be regarded as an integration of the profile hidden Markov model (HMM) and the Finkelstein-Reva (FR) theory of protein folding. While the model structure of the profile CRF is almost identical to the profile HMM, it can incorporate arbitrary correlations in the sequences to be aligned to the model. In addition, like in the FR theory, the profile CRF can incorporate long-range pair-wise interactions between model states via mean-field-like approximations. We give the detailed formulation of the model, self-consistent approximations for treating long-range interactions, and algorithms for computing partition functions and marginal probabilities. We also outline the methods for the global optimization of model parameters as well as a Bayesian framework for parameter learning and selection of optimal alignments.
Collapse
Affiliation(s)
- Akira R Kinjo
- Institute for Protein Research, Osaka University, Suita, Osaka, 565-0871, Japan
| |
Collapse
|