1
|
Abbass J, Parisi C. Machine learning-based prediction of proteins' architecture using sequences of amino acids and structural alphabets. J Biomol Struct Dyn 2024:1-16. [PMID: 38505995 DOI: 10.1080/07391102.2024.2328736] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Accepted: 03/05/2024] [Indexed: 03/21/2024]
Abstract
In addition to the growth of protein structures generated through wet laboratory experiments and deposited in the PDB repository, AlphaFold predictions have significantly contributed to the creation of a much larger database of protein structures. Annotating such a vast number of structures has become an increasingly challenging task. CATH is widely recognized as one the most common platforms for addressing this challenge, as it classifies proteins based on their structural and evolutionary relationships, offering the scientific community an invaluable resource for uncovering various properties, including functional annotations. While CATH annotation involves - to some extent - human intervention, keeping up with the classification of the rapidly expanding repositories of protein structures has become exceedingly difficult. Therefore, there is a pressing need for a fully automated approach. On the other hand, the abundance of protein sequences stemming from next generation sequencing technologies, lacking structural annotations, presents an additional challenge to the scientific community. Consequently, 'pre-annotating' protein sequences with structural features, ensuring a high level of precision, could prove highly advantageous. In this paper, after a thorough investigation, we introduce a novel machine-learning model capable of classifying any protein domain, whether it has a known structure or not, into one of the 40 main CATH Architectures. We achieve an F1 Score of 0.92 using only the amino acid sequence and a score of 0.94 using both the sequence of amino acids and the sequence of structural alphabets.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Jad Abbass
- School of Computer Science and Mathematics, Kingston University, London, UK
| | - Charles Parisi
- School of Computer Science and Mathematics, Kingston University, London, UK
- Telecom Physique Strasbourg, Strasbourg University, Strasbourg, France
| |
Collapse
|
2
|
Bale A, Dutta A, Mitra D. Combined charge and hydrophobicity-guided screening of antibacterial peptides: two-level approach to predict antibacterial activity and efficacy. Amino Acids 2023:10.1007/s00726-023-03274-5. [PMID: 37248437 DOI: 10.1007/s00726-023-03274-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Accepted: 05/02/2023] [Indexed: 05/31/2023]
Abstract
Antibacterial peptides can be a potential game changer in the fight against antibiotic resistance. In order for these peptides to become successful antibiotic alternatives, it is essential that they possess high efficacy in addition to just being antibacterial. In this study, we have developed a two-level SVM-based binary classification approach to predict the antibacterial activity of a given peptide (model 1) and thereafter classify its antibacterial efficacy as high/low (model 2) with respect to minimum inhibitory concentration (MIC) values against Staphylococcus aureus, one of the most common pathogens. Based on charge and hydrophobicity of amino acids, we developed a sequence-based combined charge and hydrophobicity-guided triad (CHT) as a new method for obtaining features of any peptide. Model 1 with a combination of CHT and amino acid composition (AAC) as the feature representation method resulted in the highest accuracy of 96.7%. Model 2 with CHT as the feature representation method yielded the highest accuracy of 70.9%. Thus, CHT is found to be a potential feature representation method for classifying antibacterial peptides based on both activity and efficacy. Furthermore, we have also used an explainable machine learning algorithm to extract various insights from these models. These insights are found to be in excellent agreement with experimental findings reported in the literature, thus enhancing the dependability of the proposed models.
Collapse
Affiliation(s)
- Ashwin Bale
- Chemical Engineering Department, Birla Institute of Technology and Science (BITS) Pilani, Hyderabad Campus, Jawahar Nagar, Medchal District, Hyderabad, 500078, Telangana, India
| | - Arnab Dutta
- Chemical Engineering Department, Birla Institute of Technology and Science (BITS) Pilani, Hyderabad Campus, Jawahar Nagar, Medchal District, Hyderabad, 500078, Telangana, India.
| | - Debirupa Mitra
- Chemical Engineering Department, Birla Institute of Technology and Science (BITS) Pilani, Hyderabad Campus, Jawahar Nagar, Medchal District, Hyderabad, 500078, Telangana, India.
| |
Collapse
|
3
|
Broni E, Miller WA. Computational Analysis Predicts Correlations among Amino Acids in SARS-CoV-2 Proteomes. Biomedicines 2023; 11:512. [PMID: 36831052 PMCID: PMC9953644 DOI: 10.3390/biomedicines11020512] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Revised: 02/03/2023] [Accepted: 02/08/2023] [Indexed: 02/12/2023] Open
Abstract
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a serious global challenge requiring urgent and permanent therapeutic solutions. These solutions can only be engineered if the patterns and rate of mutations of the virus can be elucidated. Predicting mutations and the structure of proteins based on these mutations have become necessary for early drug and vaccine design purposes in anticipation of future viral mutations. The amino acid composition (AAC) of proteomes and individual viral proteins provide avenues for exploitation since AACs have been previously used to predict structure, shape and evolutionary rates. Herein, the frequency of amino acid residues found in 1637 complete proteomes belonging to 11 SARS-CoV-2 variants/lineages were analyzed. Leucine is the most abundant amino acid residue in the SARS-CoV-2 with an average AAC of 9.658% while tryptophan had the least abundance of 1.11%. The AAC and ranking of lysine and glycine varied in the proteome. For some variants, glycine had higher frequency and AAC than lysine and vice versa in other variants. Tryptophan was also observed to be the most intolerant to mutation in the various proteomes for the variants used. A correlogram revealed a very strong correlation of 0.999992 between B.1.525 (Eta) and B.1.526 (Iota) variants. Furthermore, isoleucine and threonine were observed to have a very strong negative correlation of -0.912, while cysteine and isoleucine had a very strong positive correlation of 0.835 at p < 0.001. Shapiro-Wilk normality test revealed that AAC values for all the amino acid residues except methionine showed no evidence of non-normality at p < 0.05. Thus, AACs of SARS-CoV-2 variants can be predicted using probability and z-scores. AACs may be beneficial in classifying viral strains, predicting viral disease types, members of protein families, protein interactions and for diagnostic purposes. They may also be used as a feature along with other crucial factors in machine-learning based algorithms to predict viral mutations. These mutation-predicting algorithms may help in developing effective therapeutics and vaccines for SARS-CoV-2.
Collapse
Affiliation(s)
- Emmanuel Broni
- Department of Medicine, Loyola University Medical Center, Loyola University Chicago, Maywood, IL 60153, USA
| | - Whelton A. Miller
- Department of Medicine, Loyola University Medical Center, Loyola University Chicago, Maywood, IL 60153, USA
- Department of Molecular Pharmacology & Neuroscience, Loyola University Medical Center, Loyola University Chicago, Maywood, IL 60153, USA
| |
Collapse
|
4
|
Malarczyk M, Kaminski M, Szrek J. Metaheuristic Approach to Synthesis of Suspension System of Mobile Robot for Mining Infrastructure Inspection. SENSORS (BASEL, SWITZERLAND) 2022; 22:8839. [PMID: 36433436 PMCID: PMC9695186 DOI: 10.3390/s22228839] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/20/2022] [Revised: 11/04/2022] [Accepted: 11/12/2022] [Indexed: 06/16/2023]
Abstract
The article describes the problem of geometric synthesis of the inspection robot suspension system, designed for operation in difficult conditions with the presence of scattered obstacles. The exemplary application of a mine infrastructure inspection robot is developed and supported by the ideas. The brief introduction presents current trends, requirements and known design approaches of platforms enabled to cross the obstacles. The idea of a nature-inspired wheel-legged robot is given, and the general outline of its characteristics is provided. Then the general idea of kinematic system elements selection is discussed. The main subject of geometrical synthesis of the chosen four-bar mechanism is described in detail. The mathematical model of the suspension and connections between the parts of the structure is clarified. The well-known analytical approach of brute force search is analyzed and validated. Then the method inspired by the branch and bound algorithm is developed. Finally, a novel application of the nature-inspired algorithm (the Chameleon Swarm Algorithm) to synthesis is proposed. The obtained results are analyzed, and a brief comparison of methods is given. The successful implementation of the algorithm is presented. The obtained results are effectively tested with simulations and experimental tests. The designed structure developed with the CSA is assembled and attached to the prototype of a 14-DOF wheel-legged robot. Furthermore, the principles of walking and the elements forming the control structure were also discussed. The paper is summarized with the description of the developed wheel-legged robot LegVan 1v2.
Collapse
Affiliation(s)
- Mateusz Malarczyk
- Department of Electrical Machines, Drives and Measurements, Faculty of Electrical Engineering, Wroclaw University of Science and Technology, Smoluchowskiego 19, 50-372 Wroclaw, Poland
| | - Marcin Kaminski
- Department of Electrical Machines, Drives and Measurements, Faculty of Electrical Engineering, Wroclaw University of Science and Technology, Smoluchowskiego 19, 50-372 Wroclaw, Poland
| | - Jaroslaw Szrek
- Department of Fundamentals of Machine Design and Mechatronic Systems, Faculty of Mechanical Engineering, Wroclaw University of Science and Technology, Lukasiewicza 7/9, 50-372 Wroclaw, Poland
| |
Collapse
|
5
|
Zhu L, Li W. Roles of Physicochemical and Structural Properties of RNA-Binding Proteins in Predicting the Activities of Trans-Acting Splicing Factors with Machine Learning. Int J Mol Sci 2022; 23:ijms23084426. [PMID: 35457243 PMCID: PMC9030803 DOI: 10.3390/ijms23084426] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2022] [Revised: 04/13/2022] [Accepted: 04/14/2022] [Indexed: 02/06/2023] Open
Abstract
Trans-acting splicing factors play a pivotal role in modulating alternative splicing by specifically binding to cis-elements in pre-mRNAs. There are approximately 1500 RNA-binding proteins (RBPs) in the human genome, but the activities of these RBPs in alternative splicing are unknown. Since determining RBP activities through experimental methods is expensive and time consuming, the development of an efficient computational method for predicting the activities of RBPs in alternative splicing from their sequences is of great practical importance. Recently, a machine learning model for predicting the activities of splicing factors was built based on features of single and dual amino acid compositions. Here, we explored the role of physicochemical and structural properties in predicting their activities in alternative splicing using machine learning approaches and found that the prediction performance is significantly improved by including these properties. By combining the minimum redundancy–maximum relevance (mRMR) method and forward feature searching strategy, a promising feature subset with 24 features was obtained to predict the activities of RBPs. The feature subset consists of 16 dual amino acid compositions, 5 physicochemical features, and 3 structural features. The physicochemical and structural properties were as important as the sequence composition features for an accurate prediction of the activities of splicing factors. The hydrophobicity and distribution of coil are suggested to be the key physicochemical and structural features, respectively.
Collapse
Affiliation(s)
| | - Wenjin Li
- Correspondence: ; Tel.: +86-0755-26942336
| |
Collapse
|
6
|
Amerifar S, Norouzi M, Ghandi M. A tool for feature extraction from biological sequences. Brief Bioinform 2022; 23:6563937. [PMID: 35383372 DOI: 10.1093/bib/bbac108] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Revised: 03/01/2022] [Accepted: 03/03/2022] [Indexed: 11/12/2022] Open
Abstract
With the advances in sequencing technologies, a huge amount of biological data is extracted nowadays. Analyzing this amount of data is beyond the ability of human beings, creating a splendid opportunity for machine learning methods to grow. The methods, however, are practical only when the sequences are converted into feature vectors. Many tools target this task including iLearnPlus, a Python-based tool which supports a rich set of features. In this paper, we propose a holistic tool that extracts features from biological sequences (i.e. DNA, RNA and Protein). These features are the inputs to machine learning models that predict properties, structures or functions of the input sequences. Our tool not only supports all features in iLearnPlus but also 30 additional features which exist in the literature. Moreover, our tool is based on R language which makes an alternative for bioinformaticians to transform sequences into feature vectors. We have compared the conversion time of our tool with that of iLearnPlus: we transform the sequences much faster. We convert small nucleotides by a median of 2.8X faster, while we outperform iLearnPlus by a median of 6.3X for large sequences. Finally, in amino acids, our tool achieves a median speedup of 23.9X.
Collapse
Affiliation(s)
- Sare Amerifar
- Bioinformatics, Tatbiat Modares University, Jalal Al Ahmad, 14115-111, Tehran, Iran
| | - Mahammad Norouzi
- Computer Science, Technical University of Darmstadt, Hochschulstr. 1, 64293, Hesse, Germany
| | - Mahmoud Ghandi
- Bioinformatics, Monte Rosa Therapeutics, Summer Street, 02210, Boston, United States
| |
Collapse
|
7
|
Saito Y, Oikawa M, Sato T, Nakazawa H, Ito T, Kameda T, Tsuda K, Umetsu M. Machine-Learning-Guided Library Design Cycle for Directed Evolution of Enzymes: The Effects of Training Data Composition on Sequence Space Exploration. ACS Catal 2021. [DOI: 10.1021/acscatal.1c03753] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Affiliation(s)
- Yutaka Saito
- Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), 2-4-7 Aomi, Koto-ku, Tokyo 135-0064, Japan
- AIST-Waseda University Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), 3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8561, Japan
- Center for Advanced Intelligence Project, RIKEN, 1-4-1 Nihombashi, Chuo-ku, Tokyo 103-0027, Japan
| | - Misaki Oikawa
- Department of Biomolecular Engineering, Graduate School of Engineering, Tohoku University, 6-6-11 Aoba, Aramaki, Aoba-ku, Sendai 980-8579, Japan
| | - Takumi Sato
- Department of Biomolecular Engineering, Graduate School of Engineering, Tohoku University, 6-6-11 Aoba, Aramaki, Aoba-ku, Sendai 980-8579, Japan
| | - Hikaru Nakazawa
- Department of Biomolecular Engineering, Graduate School of Engineering, Tohoku University, 6-6-11 Aoba, Aramaki, Aoba-ku, Sendai 980-8579, Japan
| | - Tomoyuki Ito
- Department of Biomolecular Engineering, Graduate School of Engineering, Tohoku University, 6-6-11 Aoba, Aramaki, Aoba-ku, Sendai 980-8579, Japan
| | - Tomoshi Kameda
- Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), 2-4-7 Aomi, Koto-ku, Tokyo 135-0064, Japan
- Center for Advanced Intelligence Project, RIKEN, 1-4-1 Nihombashi, Chuo-ku, Tokyo 103-0027, Japan
| | - Koji Tsuda
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba 277-8561, Japan
- Center for Advanced Intelligence Project, RIKEN, 1-4-1 Nihombashi, Chuo-ku, Tokyo 103-0027, Japan
- Research and Services Division of Materials Data and Integrated System, National Institute for Materials Science, 1-2-1 Sengen, Tsukuba, Ibaraki 305-0047, Japan
| | - Mitsuo Umetsu
- Department of Biomolecular Engineering, Graduate School of Engineering, Tohoku University, 6-6-11 Aoba, Aramaki, Aoba-ku, Sendai 980-8579, Japan
- Center for Advanced Intelligence Project, RIKEN, 1-4-1 Nihombashi, Chuo-ku, Tokyo 103-0027, Japan
| |
Collapse
|
8
|
Kim J, Park S, Min D, Kim W. Comprehensive Survey of Recent Drug Discovery Using Deep Learning. Int J Mol Sci 2021; 22:9983. [PMID: 34576146 PMCID: PMC8470987 DOI: 10.3390/ijms22189983] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2021] [Revised: 09/09/2021] [Accepted: 09/10/2021] [Indexed: 02/07/2023] Open
Abstract
Drug discovery based on artificial intelligence has been in the spotlight recently as it significantly reduces the time and cost required for developing novel drugs. With the advancement of deep learning (DL) technology and the growth of drug-related data, numerous deep-learning-based methodologies are emerging at all steps of drug development processes. In particular, pharmaceutical chemists have faced significant issues with regard to selecting and designing potential drugs for a target of interest to enter preclinical testing. The two major challenges are prediction of interactions between drugs and druggable targets and generation of novel molecular structures suitable for a target of interest. Therefore, we reviewed recent deep-learning applications in drug-target interaction (DTI) prediction and de novo drug design. In addition, we introduce a comprehensive summary of a variety of drug and protein representations, DL models, and commonly used benchmark datasets or tools for model training and testing. Finally, we present the remaining challenges for the promising future of DL-based DTI prediction and de novo drug design.
Collapse
Affiliation(s)
- Jintae Kim
- KaiPharm Co., Ltd., Seoul 03759, Korea; (J.K.); (S.P.)
| | - Sera Park
- KaiPharm Co., Ltd., Seoul 03759, Korea; (J.K.); (S.P.)
| | - Dongbo Min
- Computer Vision Lab, Department of Computer Science and Engineering, Ewha Womans University, Seoul 03760, Korea
| | - Wankyu Kim
- KaiPharm Co., Ltd., Seoul 03759, Korea; (J.K.); (S.P.)
- System Pharmacology Lab, Department of Life Sciences, Ewha Womans University, Seoul 03760, Korea
| |
Collapse
|
9
|
Bayarri G, Hospital A, Orozco M. 3dRS, a Web-Based Tool to Share Interactive Representations of 3D Biomolecular Structures and Molecular Dynamics Trajectories. Front Mol Biosci 2021; 8:726232. [PMID: 34485386 PMCID: PMC8414788 DOI: 10.3389/fmolb.2021.726232] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2021] [Accepted: 08/03/2021] [Indexed: 11/13/2022] Open
Abstract
3D Representation Sharing (3dRS) is a web-based tool designed to share biomolecular structure representations, including 4D ensembles derived from Molecular Dynamics (MD) trajectories. The server offers a team working in different locations a single URL to share and discuss structural data in an interactive fashion, with the possibility to use it as a live figure for scientific papers. The web tool allows an easy upload of structures and trajectories in different formats. The 3D representation, powered by NGL viewer, offers an interactive display with smooth visualization in modern web browsers. Multiple structures can be loaded and superposed in the same scene. 1D sequences from the loaded structures are presented and linked to the 3D representation. Multiple, pre-defined 3D molecular representations are available. The powerful NGL selection syntax allows the definition of molecular regions that can be then displayed using different representations. Important descriptors such as distances or interactions can be easily added into the representation. Trajectory frames can be explored using a common video player control panel. Trajectories are efficiently stored and transferred to the NGL viewer thanks to an MDsrv-based data streaming. The server design offers all functionalities in one single web page, with a curated user experience, involving a minimum learning curve. Extended documentation is available, including a gallery with a collection of scenes. The server requires no registration and is available at https://mmb.irbbarcelona.org/3dRS.
Collapse
Affiliation(s)
- Genís Bayarri
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology (BIST), Barcelona, Spain
| | - Adam Hospital
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology (BIST), Barcelona, Spain
| | - Modesto Orozco
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology (BIST), Barcelona, Spain.,Departament de Bioquímica i Biomedicina, Facultat de Biologia, Universitat de Barcelona, Barcelona, Spain
| |
Collapse
|