1
|
Kakade P, Ojha H, Raimi OG, Shaw A, Waddell AD, Ault JR, Burel S, Brockmann K, Kumar A, Ahangar MS, Krysztofinska EM, Macartney T, Bayliss R, Fitzgerald JC, Muqit MMK. Mapping of a N-terminal α-helix domain required for human PINK1 stabilization, Serine228 autophosphorylation and activation in cells. Open Biol 2022; 12:210264. [PMID: 35042401 PMCID: PMC8767193 DOI: 10.1098/rsob.210264] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2021] [Accepted: 11/25/2021] [Indexed: 12/11/2022] Open
Abstract
Autosomal recessive mutations in the PINK1 gene are causal for Parkinson's disease (PD). PINK1 encodes a mitochondrial localized protein kinase that is a master-regulator of mitochondrial quality control pathways. Structural studies to date have elaborated the mechanism of how mutations located within the kinase domain disrupt PINK1 function; however, the molecular mechanism of PINK1 mutations located upstream and downstream of the kinase domain is unknown. We have employed mutagenesis studies to define the minimal region of human PINK1 required for optimal ubiquitin phosphorylation, beginning at residue Ile111. Inspection of the AlphaFold human PINK1 structure model predicts a conserved N-terminal α-helical extension (NTE) domain forming an intramolecular interaction with the C-terminal extension (CTE), which we corroborate using hydrogen/deuterium exchange mass spectrometry of recombinant insect PINK1 protein. Cell-based analysis of human PINK1 reveals that PD-associated mutations (e.g. Q126P), located within the NTE : CTE interface, markedly inhibit stabilization of PINK1; autophosphorylation at Serine228 (Ser228) and Ubiquitin Serine65 (Ser65) phosphorylation. Furthermore, we provide evidence that NTE and CTE domain mutants disrupt PINK1 stabilization at the mitochondrial Translocase of outer membrane complex. The clinical relevance of our findings is supported by the demonstration of defective stabilization and activation of endogenous PINK1 in human fibroblasts of a patient with early-onset PD due to homozygous PINK1 Q126P mutations. Overall, we define a functional role of the NTE : CTE interface towards PINK1 stabilization and activation and show that loss of NTE : CTE interactions is a major mechanism of PINK1-associated mutations linked to PD.
Collapse
Affiliation(s)
- Poonam Kakade
- MRC Protein Phosphorylation and Ubiquitylation Unit, University of Dundee, Dundee DD1 5EH, UK
| | - Hina Ojha
- MRC Protein Phosphorylation and Ubiquitylation Unit, University of Dundee, Dundee DD1 5EH, UK
| | - Olawale G. Raimi
- MRC Protein Phosphorylation and Ubiquitylation Unit, University of Dundee, Dundee DD1 5EH, UK
| | - Andrew Shaw
- MRC Protein Phosphorylation and Ubiquitylation Unit, University of Dundee, Dundee DD1 5EH, UK
| | - Andrew D. Waddell
- MRC Protein Phosphorylation and Ubiquitylation Unit, University of Dundee, Dundee DD1 5EH, UK
| | - James R. Ault
- Astbury Centre for Structural Molecular Biology, School of Molecular and Cellular Biology, Faculty of Biological Sciences, University of Leeds, Leeds LS2 9JT, UK
| | - Sophie Burel
- MRC Protein Phosphorylation and Ubiquitylation Unit, University of Dundee, Dundee DD1 5EH, UK
| | - Kathrin Brockmann
- Hertie Institute for Clinical Brain Research, University of Tübingen, Tübingen, Germany
- The German Centre for Neurodegenerative Diseases (DZNE), Tübingen, Germany
| | - Atul Kumar
- MRC Protein Phosphorylation and Ubiquitylation Unit, University of Dundee, Dundee DD1 5EH, UK
- Division of Gene Regulation and Expression, School of Life Sciences, University of Dundee, Dundee DD1 5EH, UK
| | - Mohd Syed Ahangar
- Astbury Centre for Structural Molecular Biology, School of Molecular and Cellular Biology, Faculty of Biological Sciences, University of Leeds, Leeds LS2 9JT, UK
| | - Ewelina M. Krysztofinska
- MRC Protein Phosphorylation and Ubiquitylation Unit, University of Dundee, Dundee DD1 5EH, UK
- Astex Pharmaceuticals, 436 Cambridge Science Park, Milton Road, Cambridge CB4 0QA, UK
| | - Thomas Macartney
- MRC Protein Phosphorylation and Ubiquitylation Unit, University of Dundee, Dundee DD1 5EH, UK
| | - Richard Bayliss
- Astbury Centre for Structural Molecular Biology, School of Molecular and Cellular Biology, Faculty of Biological Sciences, University of Leeds, Leeds LS2 9JT, UK
| | - Julia C. Fitzgerald
- Hertie Institute for Clinical Brain Research, University of Tübingen, Tübingen, Germany
| | - Miratul M. K. Muqit
- MRC Protein Phosphorylation and Ubiquitylation Unit, University of Dundee, Dundee DD1 5EH, UK
| |
Collapse
|
2
|
In Silico Analysis for Determination and Validation of Human CD20 Antigen 3D Structure. Int J Pept Res Ther 2017. [DOI: 10.1007/s10989-017-9654-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
3
|
Xie S, Li Z, Hu H. Protein secondary structure prediction based on the fuzzy support vector machine with the hyperplane optimization. Gene 2017; 642:74-83. [PMID: 29104167 DOI: 10.1016/j.gene.2017.11.005] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2017] [Revised: 10/29/2017] [Accepted: 11/02/2017] [Indexed: 11/30/2022]
Abstract
The prediction of the protein secondary structure is a crucial point in bioinformatics and related fields. In the last years, machine learning methods have become a valuable tool, achieving satisfactory results. However, the prediction accuracy needs to be further ameliorated. This paper proposes a new method based on an improved fuzzy support vector machine (FSVM) for the prediction of the secondary structure of proteins. Unlike traditional methods to set the membership function, it firstly constructs an approximate optimal separating hyperplane by iterating the class centers in the feature space. Then sample points close to this hyperplane are assigned with large membership values, while outliers with small membership values according to the K-nearest neighbor. And some sample points with low membership values are removed, reducing the training time and improving the prediction accuracy. To optimize the prediction results, our method also exploits information on sequence-based structural similarity. We used three databases (e.g. RS126, CB513 and data1199) to test this method, showing the achievement of 94.2%, 93.1%, 96.7% Q3 accuracy and 91.7%, 89.7%, 94.1% SOV values for the three datasets, respectively. Overall, our method results are comparable to or often better than commonly used methods (Magnan & Baldi, 2014; Sheng et al., 2016) for secondary structure prediction.
Collapse
Affiliation(s)
- Shangxin Xie
- School of Science, Zhejiang Sci-Tech University, Hangzhou, Zhejiang, 310018, China
| | - Zhong Li
- School of Science, Zhejiang Sci-Tech University, Hangzhou, Zhejiang, 310018, China.
| | - Hailong Hu
- School of Science, Zhejiang Sci-Tech University, Hangzhou, Zhejiang, 310018, China; School of Science, Zhejiang A&F University, Lin'an, Zhejiang 311300, China
| |
Collapse
|
4
|
Protein secondary structure prediction: A survey of the state of the art. J Mol Graph Model 2017; 76:379-402. [DOI: 10.1016/j.jmgm.2017.07.015] [Citation(s) in RCA: 50] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2017] [Revised: 07/14/2017] [Accepted: 07/17/2017] [Indexed: 11/21/2022]
|
5
|
Abstract
The limitation of most HMMs is their inherent high dimensionality. Therefore we developed several variations of low complexity models that can be applied even to protein families with a few members. In this chapter we present these variations. All of them include the use of a hidden Markov model (HMM), with a small number of states (called reduced state-space HMM), which is trained with both amino acid sequence and secondary structure of proteins whose 3D structure is known and it is used for protein fold classification. We used data from Protein Data Bank and annotation from SCOP database for training and evaluation of the proposed HMM variations for a number of protein folds that belong to major structural classes. Results indicate that the variations have similar performance, or even better in some cases, on classifying proteins than SAM, which is a widely used HMM-based method for protein classification. The major advantage of the proposed variations is that we employed a small number of states and the algorithms used for training and scoring are of low complexity and thus relatively fast. The main variations examined include a version of the reduced state-space HMM with seven states (7-HMM), a version of the reduced state-space HMM with three states (3-HMM) and an optimized version of the reduced state-space HMM with three states, where an optimization process is applied to its scores (optimized 3-HMM).
Collapse
Affiliation(s)
- Christos Lampros
- Unit of Medical Technology and Intelligent Information Systems, Department of Materials Science and Engineering, University of Ioannina, University Campus of Ioannina, GR45110, Ioannina, Greece
| | - Costas Papaloukas
- Department of Biological Applications and Technology, University of Ioannina, Ioannina, Greece
| | - Themis Exarchos
- Unit of Medical Technology and Intelligent Information Systems, Department of Materials Science and Engineering, University of Ioannina, University Campus of Ioannina, GR45110, Ioannina, Greece
| | - Dimitrios I Fotiadis
- Unit of Medical Technology and Intelligent Information Systems, Department of Materials Science and Engineering, University of Ioannina, University Campus of Ioannina, GR45110, Ioannina, Greece.
| |
Collapse
|
6
|
Exploring the Feasibility of the Sec Route to Secrete Proteins Using the Tat Route in Streptomyces lividans. Mol Biotechnol 2016. [PMID: 26202494 DOI: 10.1007/s12033-015-9883-0] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
Streptomyces lividans uses mainly two pathways to target secretory proteins to the cytoplasmic membrane. The major pathway (Sec pathway) transports pre-proteins using the signal recognition particle, and the minor Tat pathway is responsible for the secretion using a folded conformation of a relatively low number of proteins. The signal peptides of the Sec-dependent alpha-amylase and the Tat-dependent agarase were interchanged and fused in-frame to the corresponding mature part of the other enzyme. Alpha-amylase was unable to use the Tat route when fused to the agarase signal peptide, while agarase used the Sec route when it was targeted by the alpha-amylase signal peptide. In addition to the signal peptide some yet unidentified parts of the secreted proteins may play a role in selecting the secretory route. Structure predictions for the Tat- and Sec-dependent proteins suggest that less structured proteins are more likely to be candidates for the Tat route.
Collapse
|
7
|
Roterman-Konieczna I, Fabian P, Stąpor K. A method of predicting the secondary protein structure based on dictionaries. BIO-ALGORITHMS AND MED-SYSTEMS 2015. [DOI: 10.1515/bams-2015-0019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
AbstractThe shape of a protein chain may be analyzed at different levels of details. The ultimate shape description contains three-dimensional coordinates of all atoms in the chain. In many cases, a description of the local shape, namely secondary structure, is enough to determine some properties of proteins. Although obtaining the full three-dimensional (3D) information also defines the secondary structure, the problem of finding this precise 3D shape (tertiary structure) given only the amino acid sequence is very complex. However, the secondary structure may be found even without having the full 3D information. Many methods have been developed for this purpose. Most of them are based on similarities of the analyzed protein chain to other proteins that are already analyzed and have a known secondary structure. The presented paper proposes a method based on dictionaries of known structures for predicting the secondary structure from either the primary structure or the so-called structural code. Accuracies of up to 79% have been achieved.
Collapse
|
8
|
Spencer M, Eickholt J, Cheng J. A Deep Learning Network Approach to ab initio Protein Secondary Structure Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2015; 12:103-12. [PMID: 25750595 PMCID: PMC4348072 DOI: 10.1109/tcbb.2014.2343960] [Citation(s) in RCA: 138] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
Ab initio protein secondary structure (SS) predictions are utilized to generate tertiary structure predictions, which are increasingly demanded due to the rapid discovery of proteins. Although recent developments have slightly exceeded previous methods of SS prediction, accuracy has stagnated around 80 percent and many wonder if prediction cannot be advanced beyond this ceiling. Disciplines that have traditionally employed neural networks are experimenting with novel deep learning techniques in attempts to stimulate progress. Since neural networks have historically played an important role in SS prediction, we wanted to determine whether deep learning could contribute to the advancement of this field as well. We developed an SS predictor that makes use of the position-specific scoring matrix generated by PSI-BLAST and deep learning network architectures, which we call DNSS. Graphical processing units and CUDA software optimize the deep network architecture and efficiently train the deep networks. Optimal parameters for the training process were determined, and a workflow comprising three separately trained deep networks was constructed in order to make refined predictions. This deep learning network approach was used to predict SS for a fully independent test dataset of 198 proteins, achieving a Q3 accuracy of 80.7 percent and a Sov accuracy of 74.2 percent.
Collapse
Affiliation(s)
- Matt Spencer
- Informatics Institute, University of Missouri, Columbia, MO 65211.
| | - Jesse Eickholt
- Department of Computer Science, Central Michigan University, Mount Pleasant, MI 48859.
| | - Jianlin Cheng
- Department of Computer Science, University of Missouri, Columbia, MO 65211.
| |
Collapse
|
9
|
Khor BY, Tye GJ, Lim TS, Noordin R, Choong YS. The structure and dynamics of BmR1 protein from Brugia malayi: in silico approaches. Int J Mol Sci 2014; 15:11082-99. [PMID: 24950179 PMCID: PMC4100200 DOI: 10.3390/ijms150611082] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2014] [Revised: 03/25/2014] [Accepted: 06/04/2014] [Indexed: 12/27/2022] Open
Abstract
Brugia malayi is a filarial nematode, which causes lymphatic filariasis in humans. In 1995, the disease has been identified by the World Health Organization (WHO) as one of the second leading causes of permanent and long-term disability and thus it is targeted for elimination by year 2020. Therefore, accurate filariasis diagnosis is important for management and elimination programs. A recombinant antigen (BmR1) from the Bm17DIII gene product was used for antibody-based filariasis diagnosis in "Brugia Rapid". However, the structure and dynamics of BmR1 protein is yet to be elucidated. Here we study the three dimensional structure and dynamics of BmR1 protein using comparative modeling, threading and ab initio protein structure prediction. The best predicted structure obtained via an ab initio method (Rosetta) was further refined and minimized. A total of 5 ns molecular dynamics simulation were performed to investigate the packing of the protein. Here we also identified three epitopes as potential antibody binding sites from the molecular dynamics average structure. The structure and epitopes obtained from this study can be used to design a binder specific against BmR1, thus aiding future development of antigen-based filariasis diagnostics to complement the current diagnostics.
Collapse
Affiliation(s)
- Bee Yin Khor
- Institute for Research in Molecular Medicine, Universiti Sains Malaysia, Minden, Penang 11800, Malaysia.
| | - Gee Jun Tye
- Institute for Research in Molecular Medicine, Universiti Sains Malaysia, Minden, Penang 11800, Malaysia.
| | - Theam Soon Lim
- Institute for Research in Molecular Medicine, Universiti Sains Malaysia, Minden, Penang 11800, Malaysia.
| | - Rahmah Noordin
- Institute for Research in Molecular Medicine, Universiti Sains Malaysia, Minden, Penang 11800, Malaysia.
| | - Yee Siew Choong
- Institute for Research in Molecular Medicine, Universiti Sains Malaysia, Minden, Penang 11800, Malaysia.
| |
Collapse
|
10
|
Carrascoza F, Zaric S, Silaghi-Dumitrescu R. Computational study of protein secondary structure elements: Ramachandran plots revisited. J Mol Graph Model 2014; 50:125-33. [DOI: 10.1016/j.jmgm.2014.04.001] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2013] [Revised: 04/01/2014] [Accepted: 04/02/2014] [Indexed: 11/28/2022]
|
11
|
Joseph AP, de Brevern AG. From local structure to a global framework: recognition of protein folds. J R Soc Interface 2014; 11:20131147. [PMID: 24740960 DOI: 10.1098/rsif.2013.1147] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Protein folding has been a major area of research for many years. Nonetheless, the mechanisms leading to the formation of an active biological fold are still not fully apprehended. The huge amount of available sequence and structural information provides hints to identify the putative fold for a given sequence. Indeed, protein structures prefer a limited number of local backbone conformations, some being characterized by preferences for certain amino acids. These preferences largely depend on the local structural environment. The prediction of local backbone conformations has become an important factor to correctly identifying the global protein fold. Here, we review the developments in the field of local structure prediction and especially their implication in protein fold recognition.
Collapse
Affiliation(s)
- Agnel Praveen Joseph
- Science and Technology Facilities Council, Rutherford Appleton Laboratory, Harwell Oxford, , Didcot OX11 0QX, UK
| | | |
Collapse
|
12
|
Saatchi M, Garrick DJ, Tait RG, Mayes MS, Drewnoski M, Schoonmaker J, Diaz C, Beitz DC, Reecy JM. Genome-wide association and prediction of direct genomic breeding values for composition of fatty acids in Angus beef cattle. BMC Genomics 2013; 14:730. [PMID: 24156620 PMCID: PMC3819509 DOI: 10.1186/1471-2164-14-730] [Citation(s) in RCA: 55] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2013] [Accepted: 10/21/2013] [Indexed: 01/27/2023] Open
Abstract
BACKGROUND As consumers continue to request food products that have health advantages, it will be important for the livestock industry to supply a product that meet these demands. One such nutrient is fatty acids, which have been implicated as playing a role in cardiovascular disease. Therefore, the objective of this study was to determine the extent to which molecular markers could account for variation in fatty acid composition of skeletal muscle and identify genomic regions that harbor genetic variation. RESULTS Subsets of markers on the Illumina 54K bovine SNPchip were able to account for up to 57% of the variance observed in fatty acid composition. In addition, these markers could be used to calculate a direct genomic breeding values (DGV) for a given fatty acids with an accuracy (measured as simple correlations between DGV and phenotype) ranging from -0.06 to 0.57. Furthermore, 57 1-Mb regions were identified that were associated with at least one fatty acid with a posterior probability of inclusion greater than 0.90. 1-Mb regions on BTA19, BTA26 and BTA29, which harbored fatty acid synthase, Sterol-CoA desaturase and thyroid hormone responsive candidate genes, respectively, explained a high percentage of genetic variance in more than one fatty acid. It was also observed that the correlation between DGV for different fatty acids at a given 1-Mb window ranged from almost 1 to -1. CONCLUSIONS Further investigations are needed to identify the causal variants harbored within the identified 1-Mb windows. For the first time, Angus breeders have a tool whereby they could select for altered fatty acid composition. Furthermore, these reported results could improve our understanding of the biology of fatty acid metabolism and deposition.
Collapse
Affiliation(s)
- Mahdi Saatchi
- Department of Animal Science, Iowa State University, 2255 Kildee Hall, Ames, IA 50011, USA
| | - Dorian J Garrick
- Department of Animal Science, Iowa State University, 2255 Kildee Hall, Ames, IA 50011, USA
- Institute of Veterinary, Animal and Biomedical Sciences, Massey University, Palmerston North 4442, New Zealand
| | - Richard G Tait
- Department of Animal Science, Iowa State University, 2255 Kildee Hall, Ames, IA 50011, USA
- Present address: USDA, Agricultural Research Service, U.S. Meat Animal Research Center, Clay Center, NE 68933, USA
| | - Mary S Mayes
- Department of Animal Science, Iowa State University, 2255 Kildee Hall, Ames, IA 50011, USA
| | - Mary Drewnoski
- Department of Animal Science, Iowa State University, 2255 Kildee Hall, Ames, IA 50011, USA
- Department of Animal and Veterinary Science, University of Idaho, Moscow, ID 83844, USA
| | - Jon Schoonmaker
- Department of Animal Science, Purdue University, West Lafayette, IN 47907, USA
| | - Clara Diaz
- INIA, Depto. de Mejora Genética Animal, Ctra. de La Coruña Km 7.5, Madrid 28040, Spain
| | - Don C Beitz
- Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, IA 50011, USA
- Department of Animal Science, Iowa State University, 2255 Kildee Hall, Ames, IA 50011, USA
| | - James M Reecy
- Department of Animal Science, Iowa State University, 2255 Kildee Hall, Ames, IA 50011, USA
| |
Collapse
|
13
|
Cong P, Li D, Wang Z, Tang S, Li T. SPSSM8: an accurate approach for predicting eight-state secondary structures of proteins. Biochimie 2013; 95:2460-4. [PMID: 24056076 DOI: 10.1016/j.biochi.2013.09.007] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2013] [Accepted: 09/09/2013] [Indexed: 11/15/2022]
Abstract
Protein eight-state secondary structure prediction is challenging, but is necessary to determine protein structure and function. Here, we report the development of a novel approach, SPSSM8, to predict eight-state secondary structures of proteins accurately from sequences based on the structural position-specific scoring matrix (SPSSM). The SPSSM has been successfully utilized to predict three-state secondary structures. Now we employ an eight-state SPSSM as a feature that is obtained from sequence structure alignment against a large database of 9 million sequences with putative structural information. The SPSSM8 uses a low sequence identity dataset (9062 entries) as a training set and conditional random field for the classification algorithm. The SPSSM8 achieved an average eight-state secondary structure accuracy (Q8) of 71.7% (Q3, 81.6%) for an independent testing set (463 entries), which had an improved accuracy of 10.1% and 4.6% compared with SSPro8 and CNF, respectively, and significantly improved the accuracy of eight-state secondary structure prediction. For CASP 9 dataset (92 entries) the SPSSM8 achieved a Q8 accuracy of 80.1% (Q3, 83.0%). The SPSSM8 was confirmed as an outstanding predictor for eight-state secondary structures of proteins. SPSSM8 is freely available at http://cal.tongji.edu.cn/SPSSM8.
Collapse
Affiliation(s)
- Peisheng Cong
- Department of Chemistry, Tongji University, Shanghai, PR China.
| | | | | | | | | |
Collapse
|
14
|
Motomura K, Nakamura M, Otaki JM. A frequency-based linguistic approach to protein decoding and design: Simple concepts, diverse applications, and the SCS Package. Comput Struct Biotechnol J 2013; 5:e201302010. [PMID: 24688703 PMCID: PMC3962227 DOI: 10.5936/csbj.201302010] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2012] [Revised: 02/07/2013] [Accepted: 02/08/2013] [Indexed: 11/23/2022] Open
Abstract
Protein structure and function information is coded in amino acid sequences. However, the relationship between primary sequences and three-dimensional structures and functions remains enigmatic. Our approach to this fundamental biochemistry problem is based on the frequencies of short constituent sequences (SCSs) or words. A protein amino acid sequence is considered analogous to an English sentence, where SCSs are equivalent to words. Availability scores, which are defined as real SCS frequencies in the non-redundant amino acid database relative to their probabilistically expected frequencies, demonstrate the biological usage bias of SCSs. As a result, this frequency-based linguistic approach is expected to have diverse applications, such as secondary structure specifications by structure-specific SCSs and immunological adjuvants with rare or non-existent SCSs. Linguistic similarities (e.g., wide ranges of scale-free distributions) and dissimilarities (e.g., behaviors of low-rank samples) between proteins and the natural English language have been revealed in the rank-frequency relationships of SCSs or words. We have developed a web server, the SCS Package, which contains five applications for analyzing protein sequences based on the linguistic concept. These tools have the potential to assist researchers in deciphering structurally and functionally important protein sites, species-specific sequences, and functional relationships between SCSs. The SCS Package also provides researchers with a tool to construct amino acid sequences de novo based on the idiomatic usage of SCSs.
Collapse
Affiliation(s)
- Kenta Motomura
- The BCPH Unit of Molecular Physiology, Department of Chemistry, Biology and Marine Science, University of the Ryukyus, Senbaru, Nishihara, Okinawa 903-0213, Japan ; Department of Information Science, University of the Ryukyus, Senbaru, Nishihara, Okinawa 903-0213, Japan
| | - Morikazu Nakamura
- Department of Information Science, University of the Ryukyus, Senbaru, Nishihara, Okinawa 903-0213, Japan
| | - Joji M Otaki
- The BCPH Unit of Molecular Physiology, Department of Chemistry, Biology and Marine Science, University of the Ryukyus, Senbaru, Nishihara, Okinawa 903-0213, Japan
| |
Collapse
|
15
|
Motomura K, Fujita T, Tsutsumi M, Kikuzato S, Nakamura M, Otaki JM. Word decoding of protein amino Acid sequences with availability analysis: a linguistic approach. PLoS One 2012; 7:e50039. [PMID: 23185527 PMCID: PMC3503725 DOI: 10.1371/journal.pone.0050039] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2012] [Accepted: 10/15/2012] [Indexed: 11/19/2022] Open
Abstract
The amino acid sequences of proteins determine their three-dimensional structures and functions. However, how sequence information is related to structures and functions is still enigmatic. In this study, we show that at least a part of the sequence information can be extracted by treating amino acid sequences of proteins as a collection of English words, based on a working hypothesis that amino acid sequences of proteins are composed of short constituent amino acid sequences (SCSs) or "words". We first confirmed that the English language highly likely follows Zipf's law, a special case of power law. We found that the rank-frequency plot of SCSs in proteins exhibits a similar distribution when low-rank tails are excluded. In comparison with natural English and "compressed" English without spaces between words, amino acid sequences of proteins show larger linear ranges and smaller exponents with heavier low-rank tails, demonstrating that the SCS distribution in proteins is largely scale-free. A distribution pattern of SCSs in proteins is similar among species, but species-specific features are also present. Based on the availability scores of SCSs, we found that sequence motifs are enriched in high-availability sites (i.e., "key words") and vice versa. In fact, the highest availability peak within a given protein sequence often directly corresponds to a sequence motif. The amino acid composition of high-availability sites within motifs is different from that of entire motifs and all protein sequences, suggesting the possible functional importance of specific SCSs and their compositional amino acids within motifs. We anticipate that our availability-based word decoding approach is complementary to sequence alignment approaches in predicting functionally important sites of unknown proteins from their amino acid sequences.
Collapse
Affiliation(s)
- Kenta Motomura
- The BCPH Unit of Molecular Physiology, Department of Chemistry, Biology and Marine Science, University of the Ryukyus, Nishihara, Okinawa, Japan
- Department of Information Science, University of the Ryukyus, Nishihara, Okinawa, Japan
| | - Tomohiro Fujita
- The BCPH Unit of Molecular Physiology, Department of Chemistry, Biology and Marine Science, University of the Ryukyus, Nishihara, Okinawa, Japan
| | - Motosuke Tsutsumi
- The BCPH Unit of Molecular Physiology, Department of Chemistry, Biology and Marine Science, University of the Ryukyus, Nishihara, Okinawa, Japan
| | - Satsuki Kikuzato
- The BCPH Unit of Molecular Physiology, Department of Chemistry, Biology and Marine Science, University of the Ryukyus, Nishihara, Okinawa, Japan
| | - Morikazu Nakamura
- Department of Information Science, University of the Ryukyus, Nishihara, Okinawa, Japan
| | - Joji M. Otaki
- The BCPH Unit of Molecular Physiology, Department of Chemistry, Biology and Marine Science, University of the Ryukyus, Nishihara, Okinawa, Japan
| |
Collapse
|
16
|
Song Q, Li T, Cong P, Sun J, Li D, Tang S. Predicting turns in proteins with a unified model. PLoS One 2012; 7:e48389. [PMID: 23144872 PMCID: PMC3492357 DOI: 10.1371/journal.pone.0048389] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2012] [Accepted: 09/24/2012] [Indexed: 11/18/2022] Open
Abstract
MOTIVATION Turns are a critical element of the structure of a protein; turns play a crucial role in loops, folds, and interactions. Current prediction methods are well developed for the prediction of individual turn types, including α-turn, β-turn, and γ-turn, etc. However, for further protein structure and function prediction it is necessary to develop a uniform model that can accurately predict all types of turns simultaneously. RESULTS In this study, we present a novel approach, TurnP, which offers the ability to investigate all the turns in a protein based on a unified model. The main characteristics of TurnP are: (i) using newly exploited features of structural evolution information (secondary structure and shape string of protein) based on structure homologies, (ii) considering all types of turns in a unified model, and (iii) practical capability of accurate prediction of all turns simultaneously for a query. TurnP utilizes predicted secondary structures and predicted shape strings, both of which have greater accuracy, based on innovative technologies which were both developed by our group. Then, sequence and structural evolution features, which are profile of sequence, profile of secondary structures and profile of shape strings are generated by sequence and structure alignment. When TurnP was validated on a non-redundant dataset (4,107 entries) by five-fold cross-validation, we achieved an accuracy of 88.8% and a sensitivity of 71.8%, which exceeded the most state-of-the-art predictors of certain type of turn. Newly determined sequences, the EVA and CASP9 datasets were used as independent tests and the results we achieved were outstanding for turn predictions and confirmed the good performance of TurnP for practical applications.
Collapse
Affiliation(s)
- Qi Song
- Department of Chemistry, Tongji University, Shanghai, China
| | - Tonghua Li
- Department of Chemistry, Tongji University, Shanghai, China
- * E-mail:
| | - Peisheng Cong
- Department of Chemistry, Tongji University, Shanghai, China
| | - Jiangming Sun
- Department of Chemistry, Tongji University, Shanghai, China
| | - Dapeng Li
- Department of Chemistry, Tongji University, Shanghai, China
| | - Shengnan Tang
- Department of Chemistry, Tongji University, Shanghai, China
| |
Collapse
|
17
|
Bettella F, Rasinski D, Knapp EW. Protein Secondary Structure Prediction with SPARROW. J Chem Inf Model 2012; 52:545-56. [DOI: 10.1021/ci200321u] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Affiliation(s)
- Francesco Bettella
- Freie Universität
Berlin,
Institut für Chemie, Fabeckstr. 36a, D-14195 Berlin, Germany
- deCODE genetics, Sturlugata
8, 101 Reykjavik, Iceland
| | - Dawid Rasinski
- Freie Universität
Berlin,
Institut für Chemie, Fabeckstr. 36a, D-14195 Berlin, Germany
| | - Ernst Walter Knapp
- Freie Universität
Berlin,
Institut für Chemie, Fabeckstr. 36a, D-14195 Berlin, Germany
| |
Collapse
|
18
|
Lin HN, Notredame C, Chang JM, Sung TY, Hsu WL. Improving the alignment quality of consistency based aligners with an evaluation function using synonymous protein words. PLoS One 2011; 6:e27872. [PMID: 22163274 PMCID: PMC3229492 DOI: 10.1371/journal.pone.0027872] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2011] [Accepted: 10/27/2011] [Indexed: 11/18/2022] Open
Abstract
Most sequence alignment tools can successfully align protein sequences with higher levels of sequence identity. The accuracy of corresponding structure alignment, however, decreases rapidly when considering distantly related sequences (<20% identity). In this range of identity, alignments optimized so as to maximize sequence similarity are often inaccurate from a structural point of view. Over the last two decades, most multiple protein aligners have been optimized for their capacity to reproduce structure-based alignments while using sequence information. Methods currently available differ essentially in the similarity measurement between aligned residues using substitution matrices, Fourier transform, sophisticated profile-profile functions, or consistency-based approaches, more recently. In this paper, we present a flexible similarity measure for residue pairs to improve the quality of protein sequence alignment. Our approach, called SymAlign, relies on the identification of conserved words found across a sizeable fraction of the considered dataset, and supported by evolutionary analysis. These words are then used to define a position specific substitution matrix that better reflects the biological significance of local similarity. The experiment results show that the SymAlign scoring scheme can be incorporated within T-Coffee to improve sequence alignment accuracy. We also demonstrate that SymAlign is less sensitive to the presence of structurally non-similar proteins. In the analysis of the relationship between sequence identity and structure similarity, SymAlign can better differentiate structurally similar proteins from non- similar proteins. We show that protein sequence alignments can be significantly improved using a similarity estimation based on weighted n-grams. In our analysis of the alignments thus produced, sequence conservation becomes a better indicator of structural similarity. SymAlign also provides alignment visualization that can display sub-optimal alignments on dot-matrices. The visualization makes it easy to identify well-supported alternative alignments that may not have been identified by dynamic programming. SymAlign is available at http://bio-cluster.iis.sinica.edu.tw/SymAlign/.
Collapse
Affiliation(s)
- Hsin-Nan Lin
- Bioinformatics Lab, Institute of Information Science, Academia Sinica, Taipei, Taiwan
| | | | - Jia-Ming Chang
- Centre for Genomic Regulation (CRG), UPF, Barcelona, Spain
| | - Ting-Yi Sung
- Bioinformatics Lab, Institute of Information Science, Academia Sinica, Taipei, Taiwan
- * E-mail: (TYS); (WLH)
| | - Wen-Lian Hsu
- Bioinformatics Lab, Institute of Information Science, Academia Sinica, Taipei, Taiwan
- * E-mail: (TYS); (WLH)
| |
Collapse
|
19
|
Wei Y, Thompson J, Floudas CA. CONCORD: a consensus method for protein secondary structure prediction via mixed integer linear optimization. Proc Math Phys Eng Sci 2011. [DOI: 10.1098/rspa.2011.0514] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Most of the protein structure prediction methods use a multi-step process, which often includes secondary structure prediction, contact prediction, fragment generation, clustering, etc. For many years, secondary structure prediction has been the workhorse for numerous methods aimed at predicting protein structure and function. This paper presents a new mixed integer linear optimization (MILP)-based consensus method: a Consensus scheme based On a mixed integer liNear optimization method for seCOndary stRucture preDiction (CONCORD). Based on seven secondary structure prediction methods, SSpro, DSC, PROF, PROFphd, PSIPRED, Predator and GorIV, the MILP-based consensus method combines the strengths of different methods, maximizes the number of correctly predicted amino acids and achieves a better prediction accuracy. The method is shown to perform well compared with the seven individual methods when tested on the PDBselect25 training protein set using sixfold cross validation. It also performs well compared with another set of 10 online secondary structure prediction servers (including several recent ones) when tested on the CASP9 targets (
http://predictioncenter.org/casp9/
). The average Q3 prediction accuracy is 83.04 per cent for the sixfold cross validation of the PDBselect25 set and 82.3 per cent for the CASP9 targets. We have developed a MILP-based consensus method for protein secondary structure prediction. A web server, CONCORD, is available to the scientific community at
http://helios.princeton.edu/CONCORD
.
Collapse
Affiliation(s)
- Y. Wei
- Department of Chemical and Biological Engineering, Princeton University, Princeton, NJ 08544, USA
| | - J. Thompson
- Department of Chemical and Biological Engineering, Princeton University, Princeton, NJ 08544, USA
| | - C. A. Floudas
- Department of Chemical and Biological Engineering, Princeton University, Princeton, NJ 08544, USA
| |
Collapse
|
20
|
Ranganathan S, Schönbach C, Nakai K, Tan TW. Challenges of the next decade for the Asia Pacific region: 2010 International Conference in Bioinformatics (InCoB 2010). BMC Genomics 2010; 11 Suppl 4:S1. [PMID: 21143792 PMCID: PMC3005919 DOI: 10.1186/1471-2164-11-s4-s1] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
The 2010 annual conference of the Asia Pacific Bioinformatics Network (APBioNet), Asia’s oldest bioinformatics organisation formed in 1998, was organized as the 9th International Conference on Bioinformatics (InCoB), Sept. 26-28, 2010 in Tokyo, Japan. Initially, APBioNet created InCoB as forum to foster bioinformatics in the Asia Pacific region. Given the growing importance of interdisciplinary research, InCoB2010 included topics targeting scientists in the fields of genomic medicine, immunology and chemoinformatics, supporting translational research. Peer-reviewed manuscripts that were accepted for publication in this supplement, represent key areas of research interests that have emerged in our region. We also highlight some of the current challenges bioinformatics is facing in the Asia Pacific region and conclude our report with the announcement of APBioNet’s 100 BioDatabases (BioDB100) initiative. BioDB100 will comply with the database criteria set out earlier in our proposal for Minimum Information about a Bioinformatics and Investigation (MIABi), setting the standards for biocuration and bioinformatics research, on which we will report at the next InCoB, Nov. 27 – Dec. 2, 2011 at Kuala Lumpur, Malaysia.
Collapse
Affiliation(s)
- Shoba Ranganathan
- Department of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, NSW, Australia.
| | | | | | | |
Collapse
|