1
|
Rahimzadeh F, Mohammad Khanli L, Salehpoor P, Golabi F, PourBahrami S. Unveiling the evolution of policies for enhancing protein structure predictions: A comprehensive analysis. Comput Biol Med 2024; 179:108815. [PMID: 38986287 DOI: 10.1016/j.compbiomed.2024.108815] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2024] [Revised: 06/09/2024] [Accepted: 06/24/2024] [Indexed: 07/12/2024]
Abstract
Predicting protein structure is both fascinating and formidable, playing a crucial role in structure-based drug discovery and unraveling diseases with elusive origins. The Critical Assessment of Protein Structure Prediction (CASP) serves as a biannual battleground where global scientists converge to untangle the intricate relationships within amino acid chains. Two primary methods, Template-Based Modeling (TBM) and Template-Free (TF) strategies, dominate protein structure prediction. The trend has shifted towards Template-Free predictions due to their broader sequence coverage with fewer templates. The predictive process can be broadly classified into contact map, binned-distance, and real-valued distance predictions, each with distinctive strengths and limitations manifested through tailored loss functions. We have also introduced revolutionary end-to-end, and all-atom diffusion-based techniques that have transformed protein structure predictions. Recent advancements in deep learning techniques have significantly improved prediction accuracy, although the effectiveness is contingent upon the quality of input features derived from natural bio-physiochemical attributes and Multiple Sequence Alignments (MSA). Hence, the generation of high-quality MSA data holds paramount importance in harnessing informative input features for enhanced prediction outcomes. Remarkable successes have been achieved in protein structure prediction accuracy, however not enough for what structural knowledge was intended to, which implies need for development in some other aspects of the predictions. In this regard, scientists have opened other frontiers for protein structural prediction. The utilization of subsampling in multiple sequence alignment (MSA) and protein language modeling appears to be particularly promising in enhancing the accuracy and efficiency of predictions, ultimately aiding in drug discovery efforts. The exploration of predicting protein complex structure also opens up exciting opportunities to deepen our knowledge of molecular interactions and design therapeutics that are more effective. In this article, we have discussed the vicissitudes that the scientists have gone through to improve prediction accuracy, and examined the effective policies in predicting from different aspects, including the construction of high quality MSA, providing informative input features, and progresses in deep learning approaches. We have also briefly touched upon transitioning from predicting single-chain protein structures to predicting protein complex structures. Our findings point towards promoting open research environments to support the objectives of protein structure prediction.
Collapse
Affiliation(s)
- Faezeh Rahimzadeh
- Faculty of Electrical and Computer Engineering, University of Tabriz, Tabriz, Iran
| | | | - Pedram Salehpoor
- Faculty of Electrical and Computer Engineering, University of Tabriz, Tabriz, Iran
| | - Faegheh Golabi
- Department of Biomedical Engineering, Faculty of Advanced Medical Sciences, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Shahin PourBahrami
- Department of Computer Engineering, Technical and Vocational University (TVU), Tehran, Iran
| |
Collapse
|
2
|
Schön JC. Energy landscapes-Past, present, and future: A perspective. J Chem Phys 2024; 161:050901. [PMID: 39101536 DOI: 10.1063/5.0212867] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2024] [Accepted: 06/17/2024] [Indexed: 08/06/2024] Open
Abstract
Energy landscapes and the closely related cost function landscapes have been recognized in science, mathematics, and various other fields such as economics as being highly useful paradigms and tools for the description and analysis of the properties of many systems, ranging from glasses, proteins, and abstract global optimization problems to business models. A multitude of algorithms for the exploration and exploitation of such landscapes have been developed over the past five decades in the various fields of applications, where many re-inventions but also much cross-fertilization have occurred. Twenty-five years ago, trying to increase the fruitful interactions between workers in different fields led to the creation of workshops and small conferences dedicated to the study of energy landscapes in general instead of only focusing on specific applications. In this perspective, I will present some history of the development of energy landscape studies and try to provide an outlook on in what directions the field might evolve in the future and what larger challenges are going to lie ahead, both from a conceptual and a practical point of view, with the main focus on applications of energy landscapes in chemistry and physics.
Collapse
Affiliation(s)
- J C Schön
- Max-Planck-Institute for Solid State Research, Heisenbergstr. 1, D-70569 Stuttgart, Germany
| |
Collapse
|
3
|
Yue T, Wang Y, Zhang L, Gu C, Xue H, Wang W, Lyu Q, Dun Y. Deep Learning for Genomics: From Early Neural Nets to Modern Large Language Models. Int J Mol Sci 2023; 24:15858. [PMID: 37958843 PMCID: PMC10649223 DOI: 10.3390/ijms242115858] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Revised: 10/24/2023] [Accepted: 10/30/2023] [Indexed: 11/15/2023] Open
Abstract
The data explosion driven by advancements in genomic research, such as high-throughput sequencing techniques, is constantly challenging conventional methods used in genomics. In parallel with the urgent demand for robust algorithms, deep learning has succeeded in various fields such as vision, speech, and text processing. Yet genomics entails unique challenges to deep learning, since we expect a superhuman intelligence that explores beyond our knowledge to interpret the genome from deep learning. A powerful deep learning model should rely on the insightful utilization of task-specific knowledge. In this paper, we briefly discuss the strengths of different deep learning models from a genomic perspective so as to fit each particular task with proper deep learning-based architecture, and we remark on practical considerations of developing deep learning architectures for genomics. We also provide a concise review of deep learning applications in various aspects of genomic research and point out current challenges and potential research directions for future genomics applications. We believe the collaborative use of ever-growing diverse data and the fast iteration of deep learning models will continue to contribute to the future of genomics.
Collapse
Affiliation(s)
- Tianwei Yue
- School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA; (Y.W.); (L.Z.); (W.W.)
| | - Yuanxin Wang
- School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA; (Y.W.); (L.Z.); (W.W.)
| | - Longxiang Zhang
- School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA; (Y.W.); (L.Z.); (W.W.)
| | - Chunming Gu
- Department of Biomedical Engineering, School of Medicine, Johns Hopkins University, Baltimore, MD 21218, USA;
| | - Haoru Xue
- The Robotics Institute, Carnegie Mellon University, Pittsburgh, PA 15213, USA;
| | - Wenping Wang
- School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA; (Y.W.); (L.Z.); (W.W.)
| | - Qi Lyu
- Department of Computational Mathematics, Science, and Engineering, Michigan State University, East Lansing, MI 48824, USA;
| | - Yujie Dun
- School of Information and Communications Engineering, Xi’an Jiaotong University, Xi’an 710049, China;
| |
Collapse
|
4
|
Bartuzi D, Kaczor AA, Matosiuk D. Illuminating the "Twilight Zone": Advances in Difficult Protein Modeling. Methods Mol Biol 2023; 2627:25-40. [PMID: 36959440 DOI: 10.1007/978-1-0716-2974-1_2] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/25/2023]
Abstract
Homology modeling was long considered a method of choice in tertiary protein structure prediction. However, it used to provide models of acceptable quality only when templates with appreciable sequence identity with a target could be found. The threshold value was long assumed to be around 20-30%. Below this level, obtained sequence identity was getting dangerously close to values that can be obtained by chance, after aligning any random, unrelated sequences. In these cases, other approaches, including ab initio folding simulations or fragment assembly, were usually employed. The most recent editions of the CASP and CAMEO community-wide modeling methods assessment have brought some surprising outcomes, proving that much more clues can be inferred from protein sequence analyses than previously thought. In this chapter, we focus on recent advances in the field of difficult protein modeling, pushing the threshold deep into the "twilight zone", with particular attention devoted to improvements in applications of machine learning and model evaluation.
Collapse
Affiliation(s)
- Damian Bartuzi
- Department of Synthesis and Chemical Technology of Pharmaceutical Substances with Computer Modelling Laboratory, Medical University of Lublin, Lublin, Poland.
| | - Agnieszka A Kaczor
- Department of Synthesis and Chemical Technology of Pharmaceutical Substances with Computer Modelling Laboratory, Medical University of Lublin, Lublin, Poland
- University of Eastern Finland, School of Pharmacy, Kuopio, Finland
| | - Dariusz Matosiuk
- Department of Synthesis and Chemical Technology of Pharmaceutical Substances with Computer Modelling Laboratory, Medical University of Lublin, Lublin, Poland
| |
Collapse
|
5
|
Ismi DP, Pulungan R, Afiahayati. Deep learning for protein secondary structure prediction: Pre and post-AlphaFold. Comput Struct Biotechnol J 2022; 20:6271-6286. [PMID: 36420164 PMCID: PMC9678802 DOI: 10.1016/j.csbj.2022.11.012] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2022] [Revised: 11/05/2022] [Accepted: 11/05/2022] [Indexed: 11/13/2022] Open
Abstract
This paper aims to provide a comprehensive review of the trends and challenges of deep neural networks for protein secondary structure prediction (PSSP). In recent years, deep neural networks have become the primary method for protein secondary structure prediction. Previous studies showed that deep neural networks had uplifted the accuracy of three-state secondary structure prediction to more than 80%. Favored deep learning methods, such as convolutional neural networks, recurrent neural networks, inception networks, and graph neural networks, have been implemented in protein secondary structure prediction. Methods adapted from natural language processing (NLP) and computer vision are also employed, including attention mechanism, ResNet, and U-shape networks. In the post-AlphaFold era, PSSP studies focus on different objectives, such as enhancing the quality of evolutionary information and exploiting protein language models as the PSSP input. The recent trend to utilize pre-trained language models as input features for secondary structure prediction provides a new direction for PSSP studies. Moreover, the state-of-the-art accuracy achieved by previous PSSP models is still below its theoretical limit. There are still rooms for improvement to be made in the field.
Collapse
Affiliation(s)
- Dewi Pramudi Ismi
- Department of Computer Science and Electronics, Faculty of Mathematics and Natural Sciences, Universitas Gadjah Mada, Yogyakarta, Indonesia
- Department of Infomatics, Faculty of Industrial Technology, Universitas Ahmad Dahlan, Yogyakarta, Indonesia
| | - Reza Pulungan
- Department of Computer Science and Electronics, Faculty of Mathematics and Natural Sciences, Universitas Gadjah Mada, Yogyakarta, Indonesia
| | - Afiahayati
- Department of Computer Science and Electronics, Faculty of Mathematics and Natural Sciences, Universitas Gadjah Mada, Yogyakarta, Indonesia
| |
Collapse
|
6
|
Abstract
Artificial intelligence (AI) methods have been and are now being increasingly integrated in prediction software implemented in bioinformatics and its glycoscience branch known as glycoinformatics. AI techniques have evolved in the past decades, and their applications in glycoscience are not yet widespread. This limited use is partly explained by the peculiarities of glyco-data that are notoriously hard to produce and analyze. Nonetheless, as time goes, the accumulation of glycomics, glycoproteomics, and glycan-binding data has reached a point where even the most recent deep learning methods can provide predictors with good performance. We discuss the historical development of the application of various AI methods in the broader field of glycoinformatics. A particular focus is placed on shining a light on challenges in glyco-data handling, contextualized by lessons learnt from related disciplines. Ending on the discussion of state-of-the-art deep learning approaches in glycoinformatics, we also envision the future of glycoinformatics, including development that need to occur in order to truly unleash the capabilities of glycoscience in the systems biology era.
Collapse
Affiliation(s)
- Daniel Bojar
- Department
of Chemistry and Molecular Biology, University
of Gothenburg, Gothenburg 41390, Sweden
- Wallenberg
Centre for Molecular and Translational Medicine, University of Gothenburg, Gothenburg 41390, Sweden
| | - Frederique Lisacek
- Proteome
Informatics Group, Swiss Institute of Bioinformatics, CH-1227 Geneva, Switzerland
- Computer
Science Department & Section of Biology, University of Geneva, route de Drize 7, CH-1227, Geneva, Switzerland
| |
Collapse
|
7
|
Bongirwar V, Mokhade AS. Different methods, techniques and their limitations in protein structure prediction: A review. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2022; 173:72-82. [PMID: 35588858 DOI: 10.1016/j.pbiomolbio.2022.05.002] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/23/2021] [Revised: 04/16/2022] [Accepted: 05/11/2022] [Indexed: 11/17/2022]
Abstract
Because of the increase in different types of diseases in human habitats, demands for designing various types of drugs are also increasing. Protein and its structure play a very important role in drug design. Therefore researchers from different areas like mathematics, medicines, and computer science are teaming up for getting better solutions in the said field. In this paper, we have discussed different methods of secondary and tertiary protein structure prediction (PSP), along with the limitations of different approaches. Different types of datasets used in PSP are also discussed here. This paper also tells about different performance measures to evaluate the prediction accuracy of PSP methods. Different software's/servers are available for download, which are used to find the protein structures for the input protein sequence. These softwares will also help to compare the performance of any new algorithm with other available methods. Details of those softwares are also mentioned in this paper.
Collapse
Affiliation(s)
| | - A S Mokhade
- Visvesvaraya National Institute of Technology, Nagpur, India
| |
Collapse
|
8
|
Artificial Intelligence in Medicine: Biochemical 3D Modeling and Drug Discovery. Artif Intell Med 2022. [DOI: 10.1007/978-3-030-64573-1_318] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
9
|
Jindal S, Hsu PJ, Phan HT, Tsou PK, Kuo JL. Capturing the potential energy landscape of large size molecular clusters from atomic interactions up to a 4-body system using deep learning. Phys Chem Chem Phys 2022; 24:27263-27276. [DOI: 10.1039/d2cp04441b] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
We propose a new method that utilizes the database of stable conformers and borrow the fragmentation concept of many-body-expansion (MBE) methods in ab initio methods to train a deep-learning machine learning (ML) model using SchNet.
Collapse
Affiliation(s)
- Shweta Jindal
- Institute of Atomic and Molecular Sciences, Academia Sinica, No. 1 Roosevelt Road, Section 4, Daan District, Taipei City 10617, Taiwan
| | - Po-Jen Hsu
- Institute of Atomic and Molecular Sciences, Academia Sinica, No. 1 Roosevelt Road, Section 4, Daan District, Taipei City 10617, Taiwan
| | - Huu Trong Phan
- Institute of Atomic and Molecular Sciences, Academia Sinica, No. 1 Roosevelt Road, Section 4, Daan District, Taipei City 10617, Taiwan
- Molecular Science and Technology Program, Taiwan International Graduate Program, Academia Sinica, Taipei, 11529, Taiwan
- Department of Chemistry, National Tsing Hua University, Hsinchu 30013, Taiwan
| | - Pei-Kang Tsou
- Institute of Atomic and Molecular Sciences, Academia Sinica, No. 1 Roosevelt Road, Section 4, Daan District, Taipei City 10617, Taiwan
| | - Jer-Lai Kuo
- Institute of Atomic and Molecular Sciences, Academia Sinica, No. 1 Roosevelt Road, Section 4, Daan District, Taipei City 10617, Taiwan
- Department of Chemistry, National Tsing Hua University, Hsinchu 30013, Taiwan
- Molecular Science and Technology, National Taiwan University, Section 4, Daan District, Taipei City 10617, Taiwan
| |
Collapse
|
10
|
de Oliveira GB, Pedrini H, Dias Z. Ensemble of Template-Free and Template-Based Classifiers for Protein Secondary Structure Prediction. Int J Mol Sci 2021; 22:11449. [PMID: 34768880 PMCID: PMC8583764 DOI: 10.3390/ijms222111449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Revised: 10/18/2021] [Accepted: 10/20/2021] [Indexed: 11/16/2022] Open
Abstract
Protein secondary structures are important in many biological processes and applications. Due to advances in sequencing methods, there are many proteins sequenced, but fewer proteins with secondary structures defined by laboratory methods. With the development of computer technology, computational methods have (started to) become the most important methodologies for predicting secondary structures. We evaluated two different approaches to this problem-driven by the recent results obtained by computational methods in this task-(i) template-free classifiers, based on machine learning techniques; and (ii) template-based classifiers, based on searching tools. Both approaches are formed by different sub-classifiers-six for template-free and two for template-based, each with a specific view of the protein. Our results show that these ensembles improve the results of each approach individually.
Collapse
|
11
|
Chong LC, Gandhi G, Lee JM, Yeo WWY, Choi SB. Drug Discovery of Spinal Muscular Atrophy (SMA) from the Computational Perspective: A Comprehensive Review. Int J Mol Sci 2021; 22:8962. [PMID: 34445667 PMCID: PMC8396480 DOI: 10.3390/ijms22168962] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2021] [Accepted: 01/27/2021] [Indexed: 01/02/2023] Open
Abstract
Spinal muscular atrophy (SMA), one of the leading inherited causes of child mortality, is a rare neuromuscular disease arising from loss-of-function mutations of the survival motor neuron 1 (SMN1) gene, which encodes the SMN protein. When lacking the SMN protein in neurons, patients suffer from muscle weakness and atrophy, and in the severe cases, respiratory failure and death. Several therapeutic approaches show promise with human testing and three medications have been approved by the U.S. Food and Drug Administration (FDA) to date. Despite the shown promise of these approved therapies, there are some crucial limitations, one of the most important being the cost. The FDA-approved drugs are high-priced and are shortlisted among the most expensive treatments in the world. The price is still far beyond affordable and may serve as a burden for patients. The blooming of the biomedical data and advancement of computational approaches have opened new possibilities for SMA therapeutic development. This article highlights the present status of computationally aided approaches, including in silico drug repurposing, network driven drug discovery as well as artificial intelligence (AI)-assisted drug discovery, and discusses the future prospects.
Collapse
Affiliation(s)
- Li Chuin Chong
- Centre for Bioinformatics, School of Data Sciences, Perdana University, Suite 9.2, 9th Floor, Wisma Chase Perdana, Changkat Semantan, Kuala Lumpur 50490, Malaysia; (L.C.C.); (J.M.L.)
| | - Gayatri Gandhi
- Perdana University Graduate School of Medicine, Perdana University, Suite 9.2, 9th Floor, Wisma Chase Perdana, Changkat Semantan, Kuala Lumpur 50490, Malaysia; (G.G.); (W.W.Y.Y.)
| | - Jian Ming Lee
- Centre for Bioinformatics, School of Data Sciences, Perdana University, Suite 9.2, 9th Floor, Wisma Chase Perdana, Changkat Semantan, Kuala Lumpur 50490, Malaysia; (L.C.C.); (J.M.L.)
| | - Wendy Wai Yeng Yeo
- Perdana University Graduate School of Medicine, Perdana University, Suite 9.2, 9th Floor, Wisma Chase Perdana, Changkat Semantan, Kuala Lumpur 50490, Malaysia; (G.G.); (W.W.Y.Y.)
| | - Sy-Bing Choi
- Centre for Bioinformatics, School of Data Sciences, Perdana University, Suite 9.2, 9th Floor, Wisma Chase Perdana, Changkat Semantan, Kuala Lumpur 50490, Malaysia; (L.C.C.); (J.M.L.)
| |
Collapse
|
12
|
Akbar S, Pardasani KR, Panda NR. PSO Based Neuro-fuzzy Model for Secondary Structure Prediction of Protein. Neural Process Lett 2021. [DOI: 10.1007/s11063-021-10615-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
|
13
|
Kulichenko M, Smith JS, Nebgen B, Li YW, Fedik N, Boldyrev AI, Lubbers N, Barros K, Tretiak S. The Rise of Neural Networks for Materials and Chemical Dynamics. J Phys Chem Lett 2021; 12:6227-6243. [PMID: 34196559 DOI: 10.1021/acs.jpclett.1c01357] [Citation(s) in RCA: 38] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Machine learning (ML) is quickly becoming a premier tool for modeling chemical processes and materials. ML-based force fields, trained on large data sets of high-quality electron structure calculations, are particularly attractive due their unique combination of computational efficiency and physical accuracy. This Perspective summarizes some recent advances in the development of neural network-based interatomic potentials. Designing high-quality training data sets is crucial to overall model accuracy. One strategy is active learning, in which new data are automatically collected for atomic configurations that produce large ML uncertainties. Another strategy is to use the highest levels of quantum theory possible. Transfer learning allows training to a data set of mixed fidelity. A model initially trained to a large data set of density functional theory calculations can be significantly improved by retraining to a relatively small data set of expensive coupled cluster theory calculations. These advances are exemplified by applications to molecules and materials.
Collapse
Affiliation(s)
- Maksim Kulichenko
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
- Department of Chemistry and Biochemistry, Utah State University, Logan, Utah 84322, United States
| | - Justin S Smith
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
- Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Benjamin Nebgen
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Ying Wai Li
- Computer, Computational, and Statistical Sciences Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Nikita Fedik
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
- Department of Chemistry and Biochemistry, Utah State University, Logan, Utah 84322, United States
| | - Alexander I Boldyrev
- Department of Chemistry and Biochemistry, Utah State University, Logan, Utah 84322, United States
| | - Nicholas Lubbers
- Computer, Computational, and Statistical Sciences Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Kipton Barros
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
- Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Sergei Tretiak
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
- Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
- Center for Integrated Nanotechnologies, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| |
Collapse
|
14
|
Affiliation(s)
- Heather J. Kulik
- Department of Chemical Engineering Massachusetts Institute of Technology 77 Massachusetts Ave Rm 66–464 Cambridge MA 02139 USA
| |
Collapse
|
15
|
Uddin MR, Mahbub S, Rahman MS, Bayzid MS. SAINT: self-attention augmented inception-inside-inception network improves protein secondary structure prediction. Bioinformatics 2021; 36:4599-4608. [PMID: 32437517 DOI: 10.1093/bioinformatics/btaa531] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2019] [Revised: 05/10/2020] [Accepted: 05/16/2020] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Protein structures provide basic insight into how they can interact with other proteins, their functions and biological roles in an organism. Experimental methods (e.g. X-ray crystallography and nuclear magnetic resonance spectroscopy) for predicting the secondary structure (SS) of proteins are very expensive and time consuming. Therefore, developing efficient computational approaches for predicting the SS of protein is of utmost importance. Advances in developing highly accurate SS prediction methods have mostly been focused on 3-class (Q3) structure prediction. However, 8-class (Q8) resolution of SS contains more useful information and is much more challenging than the Q3 prediction. RESULTS We present SAINT, a highly accurate method for Q8 structure prediction, which incorporates self-attention mechanism (a concept from natural language processing) with the Deep Inception-Inside-Inception network in order to effectively capture both the short- and long-range interactions among the amino acid residues. SAINT offers a more interpretable framework than the typical black-box deep neural network methods. Through an extensive evaluation study, we report the performance of SAINT in comparison with the existing best methods on a collection of benchmark datasets, namely, TEST2016, TEST2018, CASP12 and CASP13. Our results suggest that self-attention mechanism improves the prediction accuracy and outperforms the existing best alternate methods. SAINT is the first of its kind and offers the best known Q8 accuracy. Thus, we believe SAINT represents a major step toward the accurate and reliable prediction of SSs of proteins. AVAILABILITY AND IMPLEMENTATION SAINT is freely available as an open-source project at https://github.com/SAINTProtein/SAINT.
Collapse
Affiliation(s)
- Mostofa Rafid Uddin
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka 1205, Bangladesh.,Department of Computer Science and Engineering, East West University, Dhaka 1212, Bangladesh
| | - Sazan Mahbub
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka 1205, Bangladesh
| | - M Saifur Rahman
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka 1205, Bangladesh
| | - Md Shamsuzzoha Bayzid
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka 1205, Bangladesh
| |
Collapse
|
16
|
Dybowski R. Artificial Intelligence in Medicine: Biochemical 3D Modeling and Drug Discovery. Artif Intell Med 2021. [DOI: 10.1007/978-3-030-58080-3_318-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
17
|
Guo Z, Hou J, Cheng J. DNSS2: Improved ab initio protein secondary structure prediction using advanced deep learning architectures. Proteins 2020; 89:207-217. [PMID: 32893403 DOI: 10.1002/prot.26007] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2020] [Revised: 07/07/2020] [Accepted: 09/02/2020] [Indexed: 12/27/2022]
Abstract
Accurate prediction of protein secondary structure (alpha-helix, beta-strand and coil) is a crucial step for protein inter-residue contact prediction and ab initio tertiary structure prediction. In a previous study, we developed a deep belief network-based protein secondary structure method (DNSS1) and successfully advanced the prediction accuracy beyond 80%. In this work, we developed multiple advanced deep learning architectures (DNSS2) to further improve secondary structure prediction. The major improvements over the DNSS1 method include (a) designing and integrating six advanced one-dimensional deep convolutional/recurrent/residual/memory/fractal/inception networks to predict 3-state and 8-state secondary structure, and (b) using more sensitive profile features inferred from Hidden Markov model (HMM) and multiple sequence alignment (MSA). Most of the deep learning architectures are novel for protein secondary structure prediction. DNSS2 was systematically benchmarked on independent test data sets with eight state-of-art tools and consistently ranked as one of the best methods. Particularly, DNSS2 was tested on the protein targets of 2018 CASP13 experiment and achieved the Q3 score of 81.62%, SOV score of 72.19%, and Q8 score of 73.28%. DNSS2 is freely available at: https://github.com/multicom-toolbox/DNSS2.
Collapse
Affiliation(s)
- Zhiye Guo
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri, USA
| | - Jie Hou
- Department of Computer Science, Saint Louis University, St. Louis, Missouri, USA
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri, USA
| |
Collapse
|
18
|
Shapovalov M, Dunbrack RL, Vucetic S. Multifaceted analysis of training and testing convolutional neural networks for protein secondary structure prediction. PLoS One 2020; 15:e0232528. [PMID: 32374785 PMCID: PMC7202669 DOI: 10.1371/journal.pone.0232528] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2019] [Accepted: 04/16/2020] [Indexed: 11/30/2022] Open
Abstract
Protein secondary structure prediction remains a vital topic with broad applications. Due to lack of a widely accepted standard in secondary structure predictor evaluation, a fair comparison of predictors is challenging. A detailed examination of factors that contribute to higher accuracy is also lacking. In this paper, we present: (1) new test sets, Test2018, Test2019, and Test2018-2019, consisting of proteins from structures released in 2018 and 2019 with less than 25% identity to any protein published before 2018; (2) a 4-layer convolutional neural network, SecNet, with an input window of ±14 amino acids which was trained on proteins ≤25% identical to proteins in Test2018 and the commonly used CB513 test set; (3) an additional test set that shares no homologous domains with the training set proteins, according to the Evolutionary Classification of Proteins (ECOD) database; (4) a detailed ablation study where we reverse one algorithmic choice at a time in SecNet and evaluate the effect on the prediction accuracy; (5) new 4- and 5-label prediction alphabets that may be more practical for tertiary structure prediction methods. The 3-label accuracy (helix, sheet, coil) of the leading predictors on both Test2018 and CB513 is 81-82%, while SecNet's accuracy is 84% for both sets. Accuracy on the non-homologous ECOD set is only 0.6 points (83.9%) lower than the results on the Test2018-2019 set (84.5%). The ablation study of features, neural network architecture, and training hyper-parameters suggests the best accuracy results are achieved with good choices for each of them while the neural network architecture is not as critical as long as it is not too simple. Protocols for generating and using unbiased test, validation, and training sets are provided. Our data sets, including input features and assigned labels, and SecNet software including third-party dependencies and databases, are downloadable from dunbrack.fccc.edu/ss and github.com/sh-maxim/ss.
Collapse
Affiliation(s)
- Maxim Shapovalov
- Fox Chase Cancer Center, Philadelphia, PA, United States of America
- Temple University, Philadelphia, PA, United States of America
| | | | | |
Collapse
|
19
|
Matsuo K, Gekko K. Circular-Dichroism and Synchrotron-Radiation Circular-Dichroism Spectroscopy as Tools to Monitor Protein Structure in a Lipid Environment. Methods Mol Biol 2020; 2003:253-279. [PMID: 31218622 DOI: 10.1007/978-1-4939-9512-7_12] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/31/2023]
Abstract
Circular-dichroism (CD) spectroscopy is a powerful tool for the secondary-structure analysis of proteins. The structural information obtained by CD does not have atomic-level resolution (unlike X-ray crystallography and NMR spectroscopy), but it has the great advantage of being applicable to both nonnative and native proteins in a wide range of solution conditions containing lipids and detergents. The development of synchrotron-radiation CD (SRCD) instruments has greatly expanded the utility of this method by extending the spectra to the vacuum-ultraviolet region below 190 nm and producing information that is unobtainable by conventional CD instruments. Combining SRCD data with bioinformatics provides new insight into the conformational changes of proteins in a membrane environment.
Collapse
Affiliation(s)
- Koichi Matsuo
- Hiroshima Synchrotron Radiation Center, Hiroshima University, Higashi-Hiroshima, Japan
| | - Kunihiko Gekko
- Hiroshima Synchrotron Radiation Center, Hiroshima University, Higashi-Hiroshima, Japan.
| |
Collapse
|
20
|
Smolarczyk T, Roterman-Konieczna I, Stapor K. Protein Secondary Structure Prediction: A Review of Progress and Directions. Curr Bioinform 2020. [DOI: 10.2174/1574893614666191017104639] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
Over the last few decades, a search for the theory of protein folding has
grown into a full-fledged research field at the intersection of biology, chemistry and informatics.
Despite enormous effort, there are still open questions and challenges, like understanding the rules
by which amino acid sequence determines protein secondary structure.
Objective:
In this review, we depict the progress of the prediction methods over the years and
identify sources of improvement.
Methods:
The protein secondary structure prediction problem is described followed by the discussion
on theoretical limitations, description of the commonly used data sets, features and a review
of three generations of methods with the focus on the most recent advances. Additionally, methods
with available online servers are assessed on the independent data set.
Results:
The state-of-the-art methods are currently reaching almost 88% for 3-class prediction and
76.5% for an 8-class prediction.
Conclusion:
This review summarizes recent advances and outlines further research directions.
Collapse
Affiliation(s)
- Tomasz Smolarczyk
- Institute of Informatics, Silesian University of Technology, Gliwice, Poland
| | - Irena Roterman-Konieczna
- Department of Bioinformatics and Telemedicine, Jagiellonian University Medical College, Krakow, Poland
| | - Katarzyna Stapor
- Institute of Informatics, Silesian University of Technology, Gliwice, Poland
| |
Collapse
|
21
|
The Order-Disorder Continuum: Linking Predictions of Protein Structure and Disorder through Molecular Simulation. Sci Rep 2020; 10:2068. [PMID: 32034199 PMCID: PMC7005769 DOI: 10.1038/s41598-020-58868-w] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2019] [Accepted: 10/16/2019] [Indexed: 12/11/2022] Open
Abstract
Intrinsically disordered proteins (IDPs) and intrinsically disordered regions within proteins (IDRs) serve an increasingly expansive list of biological functions, including regulation of transcription and translation, protein phosphorylation, cellular signal transduction, as well as mechanical roles. The strong link between protein function and disorder motivates a deeper fundamental characterization of IDPs and IDRs for discovering new functions and relevant mechanisms. We review recent advances in experimental techniques that have improved identification of disordered regions in proteins. Yet, experimentally curated disorder information still does not currently scale to the level of experimentally determined structural information in folded protein databases, and disorder predictors rely on several different binary definitions of disorder. To link secondary structure prediction algorithms developed for folded proteins and protein disorder predictors, we conduct molecular dynamics simulations on representative proteins from the Protein Data Bank, comparing secondary structure and disorder predictions with simulation results. We find that structure predictor performance from neural networks can be leveraged for the identification of highly dynamic regions within molecules, linked to disorder. Low accuracy structure predictions suggest a lack of static structure for regions that disorder predictors fail to identify. While disorder databases continue to expand, secondary structure predictors and molecular simulations can improve disorder predictor performance, which aids discovery of novel functions of IDPs and IDRs. These observations provide a platform for the development of new, integrated structural databases and fusion of prediction tools toward protein disorder characterization in health and disease.
Collapse
|
22
|
Torrisi M, Pollastri G, Le Q. Deep learning methods in protein structure prediction. Comput Struct Biotechnol J 2020; 18:1301-1310. [PMID: 32612753 PMCID: PMC7305407 DOI: 10.1016/j.csbj.2019.12.011] [Citation(s) in RCA: 116] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2019] [Revised: 12/19/2019] [Accepted: 12/20/2019] [Indexed: 01/01/2023] Open
Abstract
Protein Structure Prediction is a central topic in Structural Bioinformatics. Since the '60s statistical methods, followed by increasingly complex Machine Learning and recently Deep Learning methods, have been employed to predict protein structural information at various levels of detail. In this review, we briefly introduce the problem of protein structure prediction and essential elements of Deep Learning (such as Convolutional Neural Networks, Recurrent Neural Networks and basic feed-forward Neural Networks they are founded on), after which we discuss the evolution of predictive methods for one-dimensional and two-dimensional Protein Structure Annotations, from the simple statistical methods of the early days, to the computationally intensive highly-sophisticated Deep Learning algorithms of the last decade. In the process, we review the growth of the databases these algorithms are based on, and how this has impacted our ability to leverage knowledge about evolution and co-evolution to achieve improved predictions. We conclude this review outlining the current role of Deep Learning techniques within the wider pipelines to predict protein structures and trying to anticipate what challenges and opportunities may arise next.
Collapse
Affiliation(s)
- Mirko Torrisi
- School of Computer Science, University College Dublin, Ireland
| | | | - Quan Le
- Centre for Applied Data Analytics Research, University College Dublin, Ireland
| |
Collapse
|
23
|
Sample Reduction Strategies for Protein Secondary Structure Prediction. APPLIED SCIENCES-BASEL 2019. [DOI: 10.3390/app9204429] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Predicting the secondary structure from protein sequence plays a crucial role in estimating the 3D structure, which has applications in drug design and in understanding the function of proteins. As new genes and proteins are discovered, the large size of the protein databases and datasets that can be used for training prediction models grows considerably. A two-stage hybrid classifier, which employs dynamic Bayesian networks and a support vector machine (SVM) has been shown to provide state-of-the-art prediction accuracy for protein secondary structure prediction. However, SVM is not efficient for large datasets due to the quadratic optimization involved in model training. In this paper, two techniques are implemented on CB513 benchmark for reducing the number of samples in the train set of the SVM. The first method randomly selects a fraction of data samples from the train set using a stratified selection strategy. This approach can remove approximately 50% of the data samples from the train set and reduce the model training time by 73.38% on average without decreasing the prediction accuracy significantly. The second method clusters the data samples by a hierarchical clustering algorithm and replaces the train set samples with nearest neighbors of the cluster centers in order to improve the training time. To cluster the feature vectors, the hierarchical clustering method is implemented, for which the number of clusters and the number of nearest neighbors are optimized as hyper-parameters by computing the prediction accuracy on validation sets. It is found that clustering can reduce the size of the train set by 26% without reducing the prediction accuracy. Among the clustering techniques Ward’s method provided the best accuracy on test data.
Collapse
|
24
|
Torrisi M, Kaleel M, Pollastri G. Deeper Profiles and Cascaded Recurrent and Convolutional Neural Networks for state-of-the-art Protein Secondary Structure Prediction. Sci Rep 2019; 9:12374. [PMID: 31451723 PMCID: PMC6710256 DOI: 10.1038/s41598-019-48786-x] [Citation(s) in RCA: 51] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2019] [Accepted: 08/12/2019] [Indexed: 01/10/2023] Open
Abstract
Protein Secondary Structure prediction has been a central topic of research in Bioinformatics for decades. In spite of this, even the most sophisticated ab initio SS predictors are not able to reach the theoretical limit of three-state prediction accuracy (88–90%), while only a few predict more than the 3 traditional Helix, Strand and Coil classes. In this study we present tests on different models trained both on single sequence and evolutionary profile-based inputs and develop a new state-of-the-art system with Porter 5. Porter 5 is composed of ensembles of cascaded Bidirectional Recurrent Neural Networks and Convolutional Neural Networks, incorporates new input encoding techniques and is trained on a large set of protein structures. Porter 5 achieves 84% accuracy (81% SOV) when tested on 3 classes and 73% accuracy (70% SOV) on 8 classes on a large independent set. In our tests Porter 5 is 2% more accurate than its previous version and outperforms or matches the most recent predictors of secondary structure we tested. When Porter 5 is retrained on SCOPe based sets that eliminate homology between training/testing samples we obtain similar results. Porter is available as a web server and standalone program at http://distilldeep.ucd.ie/porter/ alongside all the datasets and alignments.
Collapse
Affiliation(s)
- Mirko Torrisi
- School of Computer Science, University College Dublin, Belfield, Dublin 4, Ireland
| | - Manaz Kaleel
- School of Computer Science, University College Dublin, Belfield, Dublin 4, Ireland
| | - Gianluca Pollastri
- School of Computer Science, University College Dublin, Belfield, Dublin 4, Ireland.
| |
Collapse
|
25
|
Toussi CA, Haddadnia J. Improving protein secondary structure prediction: the evolutionary optimized classification algorithms. Struct Chem 2019. [DOI: 10.1007/s11224-018-1271-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
26
|
Wardah W, Khan M, Sharma A, Rashid MA. Protein secondary structure prediction using neural networks and deep learning: A review. Comput Biol Chem 2019; 81:1-8. [DOI: 10.1016/j.compbiolchem.2019.107093] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2018] [Revised: 12/28/2018] [Accepted: 07/10/2019] [Indexed: 02/02/2023]
|
27
|
Yang X, Wang Y, Byrne R, Schneider G, Yang S. Concepts of Artificial Intelligence for Computer-Assisted Drug Discovery. Chem Rev 2019; 119:10520-10594. [PMID: 31294972 DOI: 10.1021/acs.chemrev.8b00728] [Citation(s) in RCA: 351] [Impact Index Per Article: 70.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Artificial intelligence (AI), and, in particular, deep learning as a subcategory of AI, provides opportunities for the discovery and development of innovative drugs. Various machine learning approaches have recently (re)emerged, some of which may be considered instances of domain-specific AI which have been successfully employed for drug discovery and design. This review provides a comprehensive portrayal of these machine learning techniques and of their applications in medicinal chemistry. After introducing the basic principles, alongside some application notes, of the various machine learning algorithms, the current state-of-the art of AI-assisted pharmaceutical discovery is discussed, including applications in structure- and ligand-based virtual screening, de novo drug design, physicochemical and pharmacokinetic property prediction, drug repurposing, and related aspects. Finally, several challenges and limitations of the current methods are summarized, with a view to potential future directions for AI-assisted drug discovery and design.
Collapse
Affiliation(s)
- Xin Yang
- State Key Laboratory of Biotherapy and Cancer Center, West China Hospital , Sichuan University , Chengdu , Sichuan 610041 , China
| | - Yifei Wang
- State Key Laboratory of Biotherapy and Cancer Center, West China Hospital , Sichuan University , Chengdu , Sichuan 610041 , China
| | - Ryan Byrne
- ETH Zurich , Department of Chemistry and Applied Biosciences , Vladimir-Prelog-Weg 4 , CH-8093 Zurich , Switzerland
| | - Gisbert Schneider
- ETH Zurich , Department of Chemistry and Applied Biosciences , Vladimir-Prelog-Weg 4 , CH-8093 Zurich , Switzerland
| | - Shengyong Yang
- State Key Laboratory of Biotherapy and Cancer Center, West China Hospital , Sichuan University , Chengdu , Sichuan 610041 , China
| |
Collapse
|
28
|
Gandhi J, Antonelli AC, Afridi A, Vatsia S, Joshi G, Romanov V, Murray IVJ, Khan SA. Protein misfolding and aggregation in neurodegenerative diseases: a review of pathogeneses, novel detection strategies, and potential therapeutics. Rev Neurosci 2019; 30:339-358. [PMID: 30742586 DOI: 10.1515/revneuro-2016-0035] [Citation(s) in RCA: 68] [Impact Index Per Article: 13.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2018] [Accepted: 08/03/2018] [Indexed: 12/13/2022]
Abstract
Protein folding is a complex, multisystem process characterized by heavy molecular and cellular footprints. Chaperone machinery enables proper protein folding and stable conformation. Other pathways concomitant with the protein folding process include transcription, translation, post-translational modifications, degradation through the ubiquitin-proteasome system, and autophagy. As such, the folding process can go awry in several different ways. The pathogenic basis behind most neurodegenerative diseases is that the disruption of protein homeostasis (i.e. proteostasis) at any level will eventually lead to protein misfolding. Misfolded proteins often aggregate and accumulate to trigger neurotoxicity through cellular stress pathways and consequently cause neurodegenerative diseases. The manifestation of a disease is usually dependent on the specific brain region that the neurotoxicity affects. Neurodegenerative diseases are age-associated, and their incidence is expected to rise as humans continue to live longer and pursue a greater life expectancy. We presently review the sequelae of protein misfolding and aggregation, as well as the role of these phenomena in several neurodegenerative diseases including Alzheimer's disease, Huntington's disease, amyotrophic lateral sclerosis, Parkinson's disease, transmissible spongiform encephalopathies, and spinocerebellar ataxia. Strategies for treatment and therapy are also conferred with respect to impairing, inhibiting, or reversing protein misfolding.
Collapse
Affiliation(s)
- Jason Gandhi
- Department of Physiology and Biophysics, Stony Brook University School of Medicine, 101 Nicolls Road, Health Sciences Center, Stony Brook, NY 11794-8434, USA.,Medical Student Research Institute, St. George's University School of Medicine, Grenada, West Indies
| | - Anthony C Antonelli
- Department of Pathology, Stony Brook University School of Medicine, 101 Nicolls Road, Health Sciences Center, Stony Brook, NY 11794-8434, USA
| | - Adil Afridi
- Department of Physiology and Biophysics, Stony Brook University School of Medicine, 101 Nicolls Road, Health Sciences Center, Stony Brook, NY 11794-8434, USA
| | - Sohrab Vatsia
- Department of Cardiothoracic Surgery, Lenox Hill Hospital, 130 East 77th Street, New York, NY 10075, USA
| | - Gunjan Joshi
- Department of Internal Medicine, Stony Brook Southampton Hospital, 240 Meeting House Lane, Southampton, NY 11968, USA
| | - Victor Romanov
- Department of Urology, Health Sciences Center T9-040, Stony Brook University School of Medicine, 101 Nicolls Road, Stony Brook, NY 11794-8093, USA
| | - Ian V J Murray
- Department of Physiology and Neuroscience, St. George's University School of Medicine, Grenada, West Indies
| | - Sardar Ali Khan
- Department of Physiology and Biophysics, Stony Brook University School of Medicine, 101 Nicolls Road, Health Sciences Center, Stony Brook, NY 11794-8434, USA.,Department of Urology, Health Sciences Center T9-040, Stony Brook University School of Medicine, 101 Nicolls Road, Stony Brook, NY 11794-8093, USA
| |
Collapse
|
29
|
Helfrecht BA, Gasparotto P, Giberti F, Ceriotti M. Atomic Motif Recognition in (Bio)Polymers: Benchmarks From the Protein Data Bank. Front Mol Biosci 2019; 6:24. [PMID: 31058166 PMCID: PMC6482324 DOI: 10.3389/fmolb.2019.00024] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2018] [Accepted: 03/26/2019] [Indexed: 11/13/2022] Open
Abstract
Rationalizing the structure and structure-property relations for complex materials such as polymers or biomolecules relies heavily on the identification of local atomic motifs, e.g., hydrogen bonds and secondary structure patterns, that are seen as building blocks of more complex supramolecular and mesoscopic structures. Over the past few decades, several automated procedures have been developed to identify these motifs in proteins given the atomic structure. Being based on a very precise understanding of the specific interactions, these heuristic criteria formulate the question in a way that implies the answer, by defining a list of motifs based on those that are known to be naturally occurring. This makes them less likely to identify unexpected phenomena, such as the occurrence of recurrent motifs in disordered segments of proteins, and less suitable to be applied to different polymers whose structure is not driven by hydrogen bonds, or even to polypeptides when appearing in unusual, non-biological conditions. Here we discuss how unsupervised machine learning schemes can be used to recognize patterns based exclusively on the frequency with which different motifs occur, taking high-resolution structures from the Protein Data Bank as benchmarks. We first discuss the application of a density-based motif recognition scheme in combination with traditional representations of protein structure (namely, interatomic distances and backbone dihedrals). Then, we proceed one step further toward an entirely unbiased scheme by using as input a structural representation based on the atomic density and by employing supervised classification to objectively assess the role played by the representation in determining the nature of atomic-scale patterns.
Collapse
Affiliation(s)
- Benjamin A Helfrecht
- Laboratory of Computational Science and Modeling, Institute of Materials, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Piero Gasparotto
- Laboratory of Computational Science and Modeling, Institute of Materials, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Federico Giberti
- Laboratory of Computational Science and Modeling, Institute of Materials, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Michele Ceriotti
- Laboratory of Computational Science and Modeling, Institute of Materials, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| |
Collapse
|
30
|
Gekko K. Synchrotron-radiation vacuum-ultraviolet circular dichroism spectroscopy in structural biology: an overview. Biophys Physicobiol 2019; 16:41-58. [PMID: 30923662 PMCID: PMC6435020 DOI: 10.2142/biophysico.16.0_41] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2018] [Accepted: 01/13/2019] [Indexed: 12/01/2022] Open
Abstract
Circular dichroism spectroscopy is widely used for analyzing the structures of chiral molecules, including biomolecules. Vacuum-ultraviolet circular dichroism (VUVCD) spectroscopy using synchrotron radiation can extend the short-wavelength limit into the vacuum-ultraviolet region (down to ~160 nm) to provide detailed and new information about the structures of biomolecules in combination with theoretical analysis and bioinformatics. The VUVCD spectra of saccharides can detect the high-energy transitions of chromophores such as hydroxy and acetal groups, disclosing the contributions of inter- or intramolecular hydrogen bonds to the equilibrium configuration of monosaccharides in aqueous solution. The roles of hydration in the fluctuation of the dihedral angles of carboxyl and amino groups of amino acids can be clarified by comparing the observed VUVCD spectra with those calculated theoretically. The VUVCD spectra of proteins markedly improves the accuracy of predicting the contents and number of segments of the secondary structures, and their amino acid sequences when combined with bioinformatics, for not only native but also nonnative and membrane-bound proteins. The VUVCD spectra of nucleic acids confirm the contributions of the base composition and sequence to the conformation in comparative analyses of synthetic poly-nucleotides composed of selected bases. This review surveys these recent applications of synchrotron-radiation VUVCD spectroscopy in structural biology, covering saccharides, amino acids, proteins, and nucleic acids.
Collapse
Affiliation(s)
- Kunihiko Gekko
- Hiroshima Synchrotron Radiation Center, Hiroshima University, Higashi-Hiroshima, Hiroshima 739-0046, Japan
| |
Collapse
|
31
|
Delarue M, Koehl P. Combined approaches from physics, statistics, and computer science for ab initio protein structure prediction: ex unitate vires (unity is strength)? F1000Res 2018; 7. [PMID: 30079234 PMCID: PMC6058471 DOI: 10.12688/f1000research.14870.1] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 07/19/2018] [Indexed: 11/20/2022] Open
Abstract
Connecting the dots among the amino acid sequence of a protein, its structure, and its function remains a central theme in molecular biology, as it would have many applications in the treatment of illnesses related to misfolding or protein instability. As a result of high-throughput sequencing methods, biologists currently live in a protein sequence-rich world. However, our knowledge of protein structure based on experimental data remains comparatively limited. As a consequence, protein structure prediction has established itself as a very active field of research to fill in this gap. This field, once thought to be reserved for theoretical biophysicists, is constantly reinventing itself, borrowing ideas informed by an ever-increasing assembly of scientific domains, from biology, chemistry, (statistical) physics, mathematics, computer science, statistics, bioinformatics, and more recently data sciences. We review the recent progress arising from this integration of knowledge, from the development of specific computer architecture to allow for longer timescales in physics-based simulations of protein folding to the recent advances in predicting contacts in proteins based on detection of coevolution using very large data sets of aligned protein sequences.
Collapse
Affiliation(s)
- Marc Delarue
- Unité Dynamique Structurale des Macromolécules, Institut Pasteur, and UMR 3528 du CNRS, Paris, France
| | - Patrice Koehl
- Department of Computer Science, Genome Center, University of California, Davis, Davis, California, USA
| |
Collapse
|
32
|
Abstract
Since the 1980s, deep learning and biomedical data have been coevolving and feeding each other. The breadth, complexity, and rapidly expanding size of biomedical data have stimulated the development of novel deep learning methods, and application of these methods to biomedical data have led to scientific discoveries and practical solutions. This overview provides technical and historical pointers to the field, and surveys current applications of deep learning to biomedical data organized around five subareas, roughly of increasing spatial scale: chemoinformatics, proteomics, genomics and transcriptomics, biomedical imaging, and health care. The black box problem of deep learning methods is also briefly discussed.
Collapse
Affiliation(s)
- Pierre Baldi
- Department of Computer Science, Institute for Genomics and Bioinformatics, and Center for Machine Learning and Intelligent Systems, University of California, Irvine, California 92697, USA
| |
Collapse
|
33
|
Protein Secondary Structure Prediction Based on Data Partition and Semi-Random Subspace Method. Sci Rep 2018; 8:9856. [PMID: 29959372 PMCID: PMC6026213 DOI: 10.1038/s41598-018-28084-8] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2018] [Accepted: 06/12/2018] [Indexed: 11/20/2022] Open
Abstract
Protein secondary structure prediction is one of the most important and challenging problems in bioinformatics. Machine learning techniques have been applied to solve the problem and have gained substantial success in this research area. However there is still room for improvement toward the theoretical limit. In this paper, we present a novel method for protein secondary structure prediction based on a data partition and semi-random subspace method (PSRSM). Data partitioning is an important strategy for our method. First, the protein training dataset was partitioned into several subsets based on the length of the protein sequence. Then we trained base classifiers on the subspace data generated by the semi-random subspace method, and combined base classifiers by majority vote rule into ensemble classifiers on each subset. Multiple classifiers were trained on different subsets. These different classifiers were used to predict the secondary structures of different proteins according to the protein sequence length. Experiments are performed on 25PDB, CB513, CASP10, CASP11, CASP12, and T100 datasets, and the good performance of 86.38%, 84.53%, 85.51%, 85.89%, 85.55%, and 85.09% is achieved respectively. Experimental results showed that our method outperforms other state-of-the-art methods.
Collapse
|
34
|
Yang Y, Gao J, Wang J, Heffernan R, Hanson J, Paliwal K, Zhou Y. Sixty-five years of the long march in protein secondary structure prediction: the final stretch? Brief Bioinform 2018; 19:482-494. [PMID: 28040746 PMCID: PMC5952956 DOI: 10.1093/bib/bbw129] [Citation(s) in RCA: 84] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2016] [Revised: 11/15/2016] [Indexed: 11/13/2022] Open
Abstract
Protein secondary structure prediction began in 1951 when Pauling and Corey predicted helical and sheet conformations for protein polypeptide backbone even before the first protein structure was determined. Sixty-five years later, powerful new methods breathe new life into this field. The highest three-state accuracy without relying on structure templates is now at 82-84%, a number unthinkable just a few years ago. These improvements came from increasingly larger databases of protein sequences and structures for training, the use of template secondary structure information and more powerful deep learning techniques. As we are approaching to the theoretical limit of three-state prediction (88-90%), alternative to secondary structure prediction (prediction of backbone torsion angles and Cα-atom-based angles and torsion angles) not only has more room for further improvement but also allows direct prediction of three-dimensional fragment structures with constantly improved accuracy. About 20% of all 40-residue fragments in a database of 1199 non-redundant proteins have <6 Å root-mean-squared distance from the native conformations by SPIDER2. More powerful deep learning methods with improved capability of capturing long-range interactions begin to emerge as the next generation of techniques for secondary structure prediction. The time has come to finish off the final stretch of the long march towards protein secondary structure prediction.
Collapse
Affiliation(s)
- Yuedong Yang
- Insitute for Glycomics and School of Information and Communication Technology, Griffith University, Parklands Drive, Southport, QLD, Australia
| | - Jianzhao Gao
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, China
| | - Jihua Wang
- Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou, China
| | - Rhys Heffernan
- Signal Processing Laboratory, Griffith University, Brisbane, Australia
| | - Jack Hanson
- Signal Processing Laboratory, Griffith University, Brisbane, Australia
| | - Kuldip Paliwal
- Signal Processing Laboratory, Griffith University, Brisbane, Australia
| | - Yaoqi Zhou
- Insitute for Glycomics and School of Information and Communication Technology, Griffith University, Parklands Drive, Southport, QLD, Australia
- Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou, China
| |
Collapse
|
35
|
Setiawan D, Brender J, Zhang Y. Recent advances in automated protein design and its future challenges. Expert Opin Drug Discov 2018; 13:587-604. [PMID: 29695210 DOI: 10.1080/17460441.2018.1465922] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
INTRODUCTION Protein function is determined by protein structure which is in turn determined by the corresponding protein sequence. If the rules that cause a protein to adopt a particular structure are understood, it should be possible to refine or even redefine the function of a protein by working backwards from the desired structure to the sequence. Automated protein design attempts to calculate the effects of mutations computationally with the goal of more radical or complex transformations than are accessible by experimental techniques. Areas covered: The authors give a brief overview of the recent methodological advances in computer-aided protein design, showing how methodological choices affect final design and how automated protein design can be used to address problems considered beyond traditional protein engineering, including the creation of novel protein scaffolds for drug development. Also, the authors address specifically the future challenges in the development of automated protein design. Expert opinion: Automated protein design holds potential as a protein engineering technique, particularly in cases where screening by combinatorial mutagenesis is problematic. Considering solubility and immunogenicity issues, automated protein design is initially more likely to make an impact as a research tool for exploring basic biology in drug discovery than in the design of protein biologics.
Collapse
Affiliation(s)
- Dani Setiawan
- a Department of Computational Medicine and Bioinformatics , University of Michigan , Ann Arbor , MI , USA
| | - Jeffrey Brender
- b Radiation Biology Branch , Center for Cancer Research, National Cancer Institute - NIH , Bethesda , MD , USA
| | - Yang Zhang
- a Department of Computational Medicine and Bioinformatics , University of Michigan , Ann Arbor , MI , USA.,c Department of Biological Chemistry , University of Michigan , Ann Arbor , MI , USA
| |
Collapse
|
36
|
Manikandan P, Ramyachitra D. PATSIM: Prediction and analysis of protein sequences using hybrid Knuth-Morris Pratt (KMP) and Boyer-Moore (BM) algorithm. Gene 2018; 657:50-59. [PMID: 29501620 DOI: 10.1016/j.gene.2018.02.069] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2017] [Revised: 02/26/2018] [Accepted: 02/27/2018] [Indexed: 10/17/2022]
Abstract
In phylogenomic profiling, the genomic context based methods are based on the observation that two or more proteins having the same pattern of presence or absence in many diverse genomes most likely have a functional link. In this research work, a tool (PATSIM) has been developed to predict the protein patterns based on the SOPM tool. In this tool, the secondary structure for CATH database protein sequences, predicted by the SOPM (Self Optimized Prediction Method) server is passed as input to fulfill objectives such as, (i) Predict the Amino Acid Pattern using the proposed Hybrid KMP and BM algorithm, (ii) Predict the physiochemical properties such as Hydrophobic Non-Polar ALKYL Amino Acid groups, Hydrophobic Non-Polar AROMATIC Amino Acid groups, Hydrophilic Polar Neutral Amino Acid groups, Hydrophilic Polar Acidic Amino Acid groups and Hydrophilic Polar Basic Amino Acid groups of protein sequence, (iii) Predict the secondary structure of protein where the structure of protein sequence is unknown, and (iv) Similarity analysis of protein sequence (structure unknown) with the CATH database. From the results, it is inferred that this tool effectively predicts the similarity between the sequences and also identifies the protein patterns for four secondary structural classes, namely Alpha Helix (h), Beta Sheet (e), Turn (t) and Coil (c). Based on the experimental results, it is inferred that this tool identifies the physiochemical properties of the protein sequence in an effective manner. The source code and its documentation for the PATSIM tool is freely available in the GitHub public repository (https://github.com/manimkn89/Protein-Sequence-Analysis).
Collapse
Affiliation(s)
- P Manikandan
- Department of Computer Science, Bharathiar University, Coimbatore 641 046, India
| | - D Ramyachitra
- Department of Computer Science, Bharathiar University, Coimbatore 641 046, India.
| |
Collapse
|
37
|
Behler J. First Principles Neural Network Potentials for Reactive Simulations of Large Molecular and Condensed Systems. Angew Chem Int Ed Engl 2017; 56:12828-12840. [PMID: 28520235 DOI: 10.1002/anie.201703114] [Citation(s) in RCA: 329] [Impact Index Per Article: 47.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2017] [Indexed: 11/06/2022]
Abstract
Modern simulation techniques have reached a level of maturity which allows a wide range of problems in chemistry and materials science to be addressed. Unfortunately, the application of first principles methods with predictive power is still limited to rather small systems, and despite the rapid evolution of computer hardware no fundamental change in this situation can be expected. Consequently, the development of more efficient but equally reliable atomistic potentials to reach an atomic level understanding of complex systems has received considerable attention in recent years. A promising new development has been the introduction of machine learning (ML) methods to describe the atomic interactions. Once trained with electronic structure data, ML potentials can accelerate computer simulations by several orders of magnitude, while preserving quantum mechanical accuracy. This Review considers the methodology of an important class of ML potentials that employs artificial neural networks.
Collapse
Affiliation(s)
- Jörg Behler
- Universität Göttingen, Institut für Physikalische Chemie, Theoretische Chemie, Tammannstrasse 6, 37077, Göttingen, Germany
| |
Collapse
|
38
|
Behler J. Hochdimensionale neuronale Netze für Potentialhyperflächen großer molekularer und kondensierter Systeme. Angew Chem Int Ed Engl 2017. [DOI: 10.1002/ange.201703114] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Affiliation(s)
- Jörg Behler
- Universität Göttingen; Institut für Physikalische Chemie, Theoretische Chemie; Tammannstraße 6 37077 Göttingen Deutschland
| |
Collapse
|
39
|
Faraggi E, Kloczkowski A. Accurate Prediction of One-Dimensional Protein Structure Features Using SPINE-X. Methods Mol Biol 2017; 1484:45-53. [PMID: 27787819 DOI: 10.1007/978-1-4939-6406-2_5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Accurate prediction of protein secondary structure and other one-dimensional structure features is essential for accurate sequence alignment, three-dimensional structure modeling, and function prediction. SPINE-X is a software package to predict secondary structure as well as accessible surface area and dihedral angles ϕ and ψ. For secondary structure SPINE-X achieves an accuracy of between 81 and 84 % depending on the dataset and choice of tests. The Pearson correlation coefficient for accessible surface area prediction is 0.75 and the mean absolute error from the ϕ and ψ dihedral angles are 20∘ and 33∘, respectively. The source code and a Linux executables for SPINE-X are available from Research and Information Systems at http://mamiris.com .
Collapse
Affiliation(s)
- Eshel Faraggi
- Department of Biochemistry and Molecular Biology, Indiana University School of Medicine, Indianapolis, IN, 46032, USA
- Research and Information Systems, LLC, Indianapolis, IN, USA
| | - Andrzej Kloczkowski
- Battelle Center for Mathematical Medicine, Nationwide Children's Hospital, Columbus, OH, USA
- Department of Pediatrics, The Ohio State University College of Medicine, Columbus, OH, USA
| |
Collapse
|
40
|
Kandoi G, Leelananda SP, Jernigan RL, Sen TZ. Predicting Protein Secondary Structure Using Consensus Data Mining (CDM) Based on Empirical Statistics and Evolutionary Information. Methods Mol Biol 2017; 1484:35-44. [PMID: 27787818 DOI: 10.1007/978-1-4939-6406-2_4] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Predicting the secondary structure of a protein from its sequence still remains a challenging problem. The prediction accuracies remain around 80 %, and for very diverse methods. Using evolutionary information and machine learning algorithms in particular has had the most impact. In this chapter, we will first define secondary structures, then we will review the Consensus Data Mining (CDM) technique based on the robust GOR algorithm and Fragment Database Mining (FDM) approach. GOR V is an empirical method utilizing a sliding window approach to model the secondary structural elements of a protein by making use of generalized evolutionary information. FDM uses data mining from experimental structure fragments, and is able to successfully predict the secondary structure of a protein by combining experimentally determined structural fragments based on sequence similarities of the fragments. The CDM method combines predictions from GOR V and FDM in a hierarchical manner to produce consensus predictions for secondary structure. In other words, if sequence fragment are not available, then it uses GOR V to make the secondary structure prediction. The online server of CDM is available at http://gor.bb.iastate.edu/cdm/ .
Collapse
Affiliation(s)
- Gaurav Kandoi
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA, USA
- Department of Electrical and Computer Engineering, Iowa State University, Ames, IA, USA
| | - Sumudu P Leelananda
- Battelle Center for Mathematical Medicine, The Research Institute at Nationwide Children's Hospital, Columbus, OH, USA
| | - Robert L Jernigan
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA, USA
- Department of Biochemistry, Biophysics, and Molecular Biology, Iowa State University, Ames, IA, USA
| | - Taner Z Sen
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA, USA.
- Department of Genetics, Development and Cell Biology, Iowa State University, 1025 Crop Genome Informatics Lab, Ames, IA, 50011, USA.
| |
Collapse
|
41
|
Reaching optimized parameter set: protein secondary structure prediction using neural network. Neural Comput Appl 2016. [DOI: 10.1007/s00521-015-2150-2] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
42
|
Protein Secondary Structure Prediction Using Deep Convolutional Neural Fields. Sci Rep 2016; 6:18962. [PMID: 26752681 PMCID: PMC4707437 DOI: 10.1038/srep18962] [Citation(s) in RCA: 255] [Impact Index Per Article: 31.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2015] [Accepted: 11/26/2015] [Indexed: 12/29/2022] Open
Abstract
Protein secondary structure (SS) prediction is important for studying protein structure and function. When only the sequence (profile) information is used as input feature, currently the best predictors can obtain ~80% Q3 accuracy, which has not been improved in the past decade. Here we present DeepCNF (Deep Convolutional Neural Fields) for protein SS prediction. DeepCNF is a Deep Learning extension of Conditional Neural Fields (CNF), which is an integration of Conditional Random Fields (CRF) and shallow neural networks. DeepCNF can model not only complex sequence-structure relationship by a deep hierarchical architecture, but also interdependency between adjacent SS labels, so it is much more powerful than CNF. Experimental results show that DeepCNF can obtain ~84% Q3 accuracy, ~85% SOV score, and ~72% Q8 accuracy, respectively, on the CASP and CAMEO test proteins, greatly outperforming currently popular predictors. As a general framework, DeepCNF can be used to predict other protein structure properties such as contact number, disorder regions, and solvent accessibility.
Collapse
|
43
|
Nguyen T, Khosravi A, Creighton D, Nahavandi S. Multi-Output Interval Type-2 Fuzzy Logic System for Protein Secondary Structure Prediction. INT J UNCERTAIN FUZZ 2015. [DOI: 10.1142/s0218488515500324] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
A new multi-output interval type-2 fuzzy logic system (MOIT2FLS) is introduced for protein secondary structure prediction in this paper. Three outputs of the MOIT2FLS correspond to three structure classes including helix, strand (sheet) and coil. Quantitative properties of amino acids are employed to characterize twenty amino acids rather than the widely used computationally expensive binary encoding scheme. Three clustering tasks are performed using the adaptive vector quantization method to construct an equal number of initial rules for each type of secondary structure. Genetic algorithm is applied to optimally adjust parameters of the MOIT2FLS. The genetic fitness function is designed based on the Q3 measure. Experimental results demonstrate the dominance of the proposed approach against the traditional methods that are Chou-Fasman method, Garnier-Osguthorpe-Robson method, and artificial neural network models.
Collapse
Affiliation(s)
- Thanh Nguyen
- Centre for Intelligent Systems Research (CISR), Deakin University, Waurn Ponds, Victoria, 3216, Australia
| | - Abbas Khosravi
- Centre for Intelligent Systems Research (CISR), Deakin University, Waurn Ponds, Victoria, 3216, Australia
| | - Douglas Creighton
- Centre for Intelligent Systems Research (CISR), Deakin University, Waurn Ponds, Victoria, 3216, Australia
| | - Saeid Nahavandi
- Centre for Intelligent Systems Research (CISR), Deakin University, Waurn Ponds, Victoria, 3216, Australia
| |
Collapse
|
44
|
Darsey JA, Griffin WO, Joginipelli S, Melapu VK. Architecture and biological applications of artificial neural networks: a tuberculosis perspective. Methods Mol Biol 2015; 1260:269-83. [PMID: 25502388 DOI: 10.1007/978-1-4939-2239-0_17] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/16/2023]
Abstract
Advancement of science and technology has prompted researchers to develop new intelligent systems that can solve a variety of problems such as pattern recognition, prediction, and optimization. The ability of the human brain to learn in a fashion that tolerates noise and error has attracted many researchers and provided the starting point for the development of artificial neural networks: the intelligent systems. Intelligent systems can acclimatize to the environment or data and can maximize the chances of success or improve the efficiency of a search. Due to massive parallelism with large numbers of interconnected processers and their ability to learn from the data, neural networks can solve a variety of challenging computational problems. Neural networks have the ability to derive meaning from complicated and imprecise data; they are used in detecting patterns, and trends that are too complex for humans, or other computer systems. Solutions to the toughest problems will not be found through one narrow specialization; therefore we need to combine interdisciplinary approaches to discover the solutions to a variety of problems. Many researchers in different disciplines such as medicine, bioinformatics, molecular biology, and pharmacology have successfully applied artificial neural networks. This chapter helps the reader in understanding the basics of artificial neural networks, their applications, and methodology; it also outlines the network learning process and architecture. We present a brief outline of the application of neural networks to medical diagnosis, drug discovery, gene identification, and protein structure prediction. We conclude with a summary of the results from our study on tuberculosis data using neural networks, in diagnosing active tuberculosis, and predicting chronic vs. infiltrative forms of tuberculosis.
Collapse
Affiliation(s)
- Jerry A Darsey
- Department of Chemistry, University of Arkansas at Little Rock, 2801 S University Ave, Little Rock, AR, 72204, USA
| | | | | | | |
Collapse
|
45
|
Mehra A, Jerath G, Ramakrishnan V, Trivedi V. Characterization of ICAM-1 biophore to design cytoadherence blocking peptides. J Mol Graph Model 2015; 57:27-35. [PMID: 25625914 DOI: 10.1016/j.jmgm.2015.01.004] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2014] [Revised: 11/22/2014] [Accepted: 01/06/2015] [Indexed: 01/13/2023]
Abstract
Peptides from natural sources are good starting material to design bioactive agents with desired therapeutic property. IB peptide derived from the ICAM-1 has been studied extensively as an agent to disrupt the non-specific binding of lymphocyte to the endothelial cells. ICAM-1: IB molecular model reveals that IB peptide binds in an extended conformation to the ICAM-1, masking LFA-1 and partially covering PfEMP-1 binding site. Considering the regioselective requirement of ICAM-1: PfEMP1 binding site, IB peptide charge and 3-D conformation are optimized through generation of combinatorial peptide library containing single, double, triple, tetra and quadra amino acid substitutions of IB peptide. Further, truncation of IB peptide followed by molecular modeling studies gave us the biophoric environment of the IB peptide required for its activity. Molecular modeling of these peptides into the binding site indicates that these complexes are fitting well into the site and making extensive interactions with the residues crucial for PfEMP-1 binding. Molecular dynamics simulations were performed for 10ns each under four different temperatures to estimate comparative stability of ICAM1: IB peptide complexes. The designed peptide ICAM1: IBT213 has comparable stability at ambient temperature, while ICAM1: IBT1 shows a greater degree of robustness at higher temperatures. Overall, the study has given useful insights into IB peptide binding site on ICAM1 and its potential in designing novel peptides to disrupt the cytoadherence complex involving ICAM1: PfEMP1.
Collapse
Affiliation(s)
- A Mehra
- Malaria Research Group, Department of Biotechnology, Indian Institute of Technology-Guwahati, Guwahati 781039, Assam, India
| | - Gaurav Jerath
- Molecular Informatics & Design Laboratory, Department of Biotechnology, Indian Institute of Technology-Guwahati, Guwahati 781039, Assam, India
| | - Vibin Ramakrishnan
- Molecular Informatics & Design Laboratory, Department of Biotechnology, Indian Institute of Technology-Guwahati, Guwahati 781039, Assam, India
| | - Vishal Trivedi
- Malaria Research Group, Department of Biotechnology, Indian Institute of Technology-Guwahati, Guwahati 781039, Assam, India.
| |
Collapse
|
46
|
Wang X, Yennawar N, Hankey PA. Autoinhibition of the Ron receptor tyrosine kinase by the juxtamembrane domain. Cell Commun Signal 2014; 12:28. [PMID: 24739671 PMCID: PMC4021555 DOI: 10.1186/1478-811x-12-28] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2013] [Accepted: 02/05/2014] [Indexed: 01/21/2023] Open
Abstract
Background The Ron receptor tyrosine kinase (RTK) has been implicated in the progression of a number of carcinomas, thus understanding the regulatory mechanisms governing its activity is of potential therapeutic significance. A critical role for the juxtamembrane domain in regulating RTK activity is emerging, however the mechanism by which this regulation occurs varies considerably from receptor to receptor. Results Unlike other RTKs described to date, tyrosines in the juxtamembrane domain of Ron are inconsequential for receptor activation. Rather, we have identified an acidic region in the juxtamembrane domain of Ron that plays a central role in promoting receptor autoinhibition. Furthermore, our studies demonstrate that phosphorylation of Y1198 in the kinase domain promotes Ron activation, likely by relieving the inhibitory constraints imposed by the juxtamembrane domain. Conclusions Taken together, our experimental data and molecular modeling provide a better understanding of the mechanisms governing Ron activation, which will lay the groundwork for the development of novel therapeutic approaches for targeting Ron in human malignancies.
Collapse
Affiliation(s)
| | | | - Pamela A Hankey
- Graduate Program in Cell and Developmental Biology, The Pennsylvania State University, University Park, PA 16802, USA.
| |
Collapse
|
47
|
An alternative approach to protein folding. BIOMED RESEARCH INTERNATIONAL 2013; 2013:583045. [PMID: 24078920 PMCID: PMC3775432 DOI: 10.1155/2013/583045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/05/2013] [Revised: 06/20/2013] [Accepted: 07/31/2013] [Indexed: 11/26/2022]
Abstract
A diffusion theory-based, all-physical ab initio protein folding simulation is described and applied. The model is based upon the drift and diffusion of protein substructures relative to one another in the multiple energy fields present. Without templates or statistical inputs, the simulations were run at physiologic and ambient temperatures (including pH). Around 100 protein secondary structures were surveyed, and twenty tertiary structures were determined. Greater than 70% of the secondary core structures with over 80% alpha helices were correctly identified on protein ranging from 30 to 200 amino-acid sequence. The drift-diffusion model predicted tertiary structures with RMSD values in the 3–5 Angstroms range for proteins ranging 30 to 150 amino acids. These predictions are among the best for an all ab initio protein simulation. Simulations could be run entirely on a desktop computer in minutes; however, more accurate tertiary structures were obtained using molecular dynamic energy relaxation. The drift-diffusion model generated realistic energy versus time traces. Rapid secondary structures followed by a slow compacting towards lower energy tertiary structures occurred after an initial incubation period in agreement with observations.
Collapse
|
48
|
Circular-dichroism and synchrotron-radiation circular-dichroism spectroscopy as tools to monitor protein structure in a lipid environment. Methods Mol Biol 2013; 974:151-76. [PMID: 23404276 DOI: 10.1007/978-1-62703-275-9_8] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/27/2023]
Abstract
Circular-dichroism (CD) spectroscopy is a powerful tool for the secondary-structure analysis of proteins. The structural information obtained by CD does not have atomic-level resolution (unlike X-ray crystallography and NMR spectroscopy), but it has the great advantage of being applicable to both nonnative and native proteins in a wide range of solution conditions containing lipids and detergents. The development of synchrotron-radiation CD (SRCD) instruments has greatly expanded the utility of this method by extending the spectra to the vacuum-ultraviolet region below 190 nm and producing information that is unobtainable by conventional CD instruments. Combining SRCD data with bioinformatics provides new insight into the conformational changes of proteins in a membrane environment.
Collapse
|
49
|
On the Variability of Neural Network Classification Measures in the Protein Secondary Structure Prediction Problem. APPLIED COMPUTATIONAL INTELLIGENCE AND SOFT COMPUTING 2013. [DOI: 10.1155/2013/794350] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
We revisit the protein secondary structure prediction problem using linear and backpropagation neural network architectures commonly applied in the literature. In this context, neural network mappings are constructed between protein training set sequences and their assigned structure classes in order to analyze the class membership of test data and associated measures of significance. We present numerical results demonstrating that classifier performance measures can vary significantly depending upon the classifier architecture and the structure class encoding technique. Furthermore, an analytic formulation is introduced in order to substantiate the observed numerical data. Finally, we analyze and discuss the ability of the neural network to accurately model fundamental attributes of protein secondary structure.
Collapse
|
50
|
Armano G, Ledda F. Exploiting intrastructure information for secondary structure prediction with multifaceted pipelines. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2012; 9:799-808. [PMID: 22201070 DOI: 10.1109/tcbb.2011.159] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Predicting the secondary structure of proteins is still a typical step in several bioinformatic tasks, in particular, for tertiary structure prediction. Notwithstanding the impressive results obtained so far, mostly due to the advent of sequence encoding schemes based on multiple alignment, in our view the problem should be studied from a novel perspective, in which understanding how available information sources are dealt with plays a central role. After revisiting a well-known secondary structure predictor viewed from this perspective (with the goal of identifying which sources of information have been considered and which have not), we propose a generic software architecture designed to account for all relevant information sources. To demonstrate the validity of the approach, a predictor compliant with the proposed generic architecture has been implemented and compared with several state-of-the-art secondary structure predictors. Experiments have been carried out on standard data sets, and the corresponding results confirm the validity of the approach. The predictor is available at http://iasc.diee.unica.it/ssp2/ through the corresponding web application or as downloadable stand-alone portable unpack-and-run bundle.
Collapse
Affiliation(s)
- Giuliano Armano
- Department of Electrical and Electronic Engineering, University of Cagliari, Piazza d’Armi, Cagliari 09123, Italy.
| | | |
Collapse
|