1
|
Gillani M, Pollastri G. Protein subcellular localization prediction tools. Comput Struct Biotechnol J 2024; 23:1796-1807. [PMID: 38707539 PMCID: PMC11066471 DOI: 10.1016/j.csbj.2024.04.032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Revised: 04/11/2024] [Accepted: 04/11/2024] [Indexed: 05/07/2024] Open
Abstract
Protein subcellular localization prediction is of great significance in bioinformatics and biological research. Most of the proteins do not have experimentally determined localization information, computational prediction methods and tools have been acting as an active research area for more than two decades now. Knowledge of the subcellular location of a protein provides valuable information about its functionalities, the functioning of the cell, and other possible interactions with proteins. Fast, reliable, and accurate predictors provides platforms to harness the abundance of sequence data to predict subcellular locations accordingly. During the last decade, there has been a considerable amount of research effort aimed at developing subcellular localization predictors. This paper reviews recent subcellular localization prediction tools in the Eukaryotic, Prokaryotic, and Virus-based categories followed by a detailed analysis. Each predictor is discussed based on its main features, strengths, weaknesses, algorithms used, prediction techniques, and analysis. This review is supported by prediction tools taxonomies that highlight their rele- vant area and examples for uncomplicated categorization and ease of understandability. These taxonomies help users find suitable tools according to their needs. Furthermore, recent research gaps and challenges are discussed to cover areas that need the utmost attention. This survey provides an in-depth analysis of the most recent prediction tools to facilitate readers and can be considered a quick guide for researchers to identify and explore the recent literature advancements.
Collapse
Affiliation(s)
- Maryam Gillani
- School of Computer Science, University College Dublin (UCD), Dublin, D04 V1W8, Ireland
| | - Gianluca Pollastri
- School of Computer Science, University College Dublin (UCD), Dublin, D04 V1W8, Ireland
| |
Collapse
|
2
|
Zhao L, Li J, Zhan W, Jiang X, Zhang B. Prediction of protein secondary structure by the improved TCN-BiLSTM-MHA model with knowledge distillation. Sci Rep 2024; 14:16488. [PMID: 39020005 PMCID: PMC11255250 DOI: 10.1038/s41598-024-67403-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2024] [Accepted: 07/10/2024] [Indexed: 07/19/2024] Open
Abstract
Secondary structure prediction is a key step in understanding protein function and biological properties and is highly important in the fields of new drug development, disease treatment, bioengineering, etc. Accurately predicting the secondary structure of proteins helps to reveal how proteins are folded and how they function in cells. The application of deep learning models in protein structure prediction is particularly important because of their ability to process complex sequence information and extract meaningful patterns and features, thus significantly improving the accuracy and efficiency of prediction. In this study, a combined model integrating an improved temporal convolutional network (TCN), bidirectional long short-term memory (BiLSTM), and a multi-head attention (MHA) mechanism is proposed to enhance the accuracy of protein prediction in both eight-state and three-state structures. One-hot encoding features and word vector representations of physicochemical properties are incorporated. A significant emphasis is placed on knowledge distillation techniques utilizing the ProtT5 pretrained model, leading to performance improvements. The improved TCN, achieved through multiscale fusion and bidirectional operations, allows for better extraction of amino acid sequence features than traditional TCN models. The model demonstrated excellent prediction performance on multiple datasets. For the TS115, CB513 and PDB (2018-2020) datasets, the prediction accuracy of the eight-state structure of the six datasets in this paper reached 88.2%, 84.9%, and 95.3%, respectively, and the prediction accuracy of the three-state structure reached 91.3%, 90.3%, and 96.8%, respectively. This study not only improves the accuracy of protein secondary structure prediction but also provides an important tool for understanding protein structure and function, which is particularly applicable to resource-constrained contexts and provides a valuable tool for understanding protein structure and function.
Collapse
Affiliation(s)
- Lufei Zhao
- Agricultural Science and Engineering School, Liaocheng University, Liaocheng, 252059, China
| | - Jingyi Li
- School of Statistics and Mathematics, Zhongnan University of Economics and Law, Wuhan, 430073, China
| | - Weiqiang Zhan
- School of Statistics and Mathematics, Zhongnan University of Economics and Law, Wuhan, 430073, China
| | - Xuchu Jiang
- School of Statistics and Mathematics, Zhongnan University of Economics and Law, Wuhan, 430073, China.
- Emergency Management Research Center, Zhongnan University of Economics and Law, Wuhan, 430073, China.
| | - Biao Zhang
- School of Computer Science, Liaocheng University, Liaocheng, 252059, China
| |
Collapse
|
3
|
Saharkhiz S, Mostafavi M, Birashk A, Karimian S, Khalilollah S, Jaferian S, Yazdani Y, Alipourfard I, Huh YS, Farani MR, Akhavan-Sigari R. The State-of-the-Art Overview to Application of Deep Learning in Accurate Protein Design and Structure Prediction. Top Curr Chem (Cham) 2024; 382:23. [PMID: 38965117 PMCID: PMC11224075 DOI: 10.1007/s41061-024-00469-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2024] [Accepted: 06/09/2024] [Indexed: 07/06/2024]
Abstract
In recent years, there has been a notable increase in the scientific community's interest in rational protein design. The prospect of designing an amino acid sequence that can reliably fold into a desired three-dimensional structure and exhibit the intended function is captivating. However, a major challenge in this endeavor lies in accurately predicting the resulting protein structure. The exponential growth of protein databases has fueled the advancement of the field, while newly developed algorithms have pushed the boundaries of what was previously achievable in structure prediction. In particular, using deep learning methods instead of brute force approaches has emerged as a faster and more accurate strategy. These deep-learning techniques leverage the vast amount of data available in protein databases to extract meaningful patterns and predict protein structures with improved precision. In this article, we explore the recent developments in the field of protein structure prediction. We delve into the newly developed methods that leverage deep learning approaches, highlighting their significance and potential for advancing our understanding of protein design.
Collapse
Affiliation(s)
- Saber Saharkhiz
- Division of Neuroscience, Department of Cellular and Molecular Medicine, Faculty of Medicine, University of Ottawa, Ottawa, ON, Canada
| | - Mehrnaz Mostafavi
- Faculty of Allied Medicine, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Amin Birashk
- Department of Computer Science, The University of Texas at Dallas, Richardson, TX, USA
| | - Shiva Karimian
- Electrical and Computer Research Center, Sanandaj Azad University, Sanandaj, Iran
| | - Shayan Khalilollah
- Department of Neurosurgery, Faculty of Medicine, Tehran Medical Sciences, Islamic Azad University, Tehran, Iran
| | - Sohrab Jaferian
- Goergen Institute for Data Science, University of Rochester, Rochester, NY, USA
| | - Yalda Yazdani
- Immunology Research Center, Tabriz University of Medical Sciences, Tabriz, Iran.
| | - Iraj Alipourfard
- Institute of Physical Chemistry, Polish Academy of Sciences, Marcina Kasprzaka 44/52, 01-224, Warsaw, Poland.
| | - Yun Suk Huh
- Department of Biological Engineering, Inha University, Incheon, Republic of Korea
| | | | | |
Collapse
|
4
|
Wang R, Huang S, Wang P, Shi X, Li S, Ye Y, Zhang W, Shi L, Zhou X, Tang X. Bibliometric analysis of the application of deep learning in cancer from 2015 to 2023. Cancer Imaging 2024; 24:85. [PMID: 38965599 PMCID: PMC11223420 DOI: 10.1186/s40644-024-00737-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2024] [Accepted: 06/27/2024] [Indexed: 07/06/2024] Open
Abstract
BACKGROUND Recently, the application of deep learning (DL) has made great progress in various fields, especially in cancer research. However, to date, the bibliometric analysis of the application of DL in cancer is scarce. Therefore, this study aimed to explore the research status and hotspots of the application of DL in cancer. METHODS We retrieved all articles on the application of DL in cancer from the Web of Science database Core Collection database. Biblioshiny, VOSviewer and CiteSpace were used to perform the bibliometric analysis through analyzing the numbers, citations, countries, institutions, authors, journals, references, and keywords. RESULTS We found 6,016 original articles on the application of DL in cancer. The number of annual publications and total citations were uptrend in general. China published the greatest number of articles, USA had the highest total citations, and Saudi Arabia had the highest centrality. Chinese Academy of Sciences was the most productive institution. Tian, Jie published the greatest number of articles, while He Kaiming was the most co-cited author. IEEE Access was the most popular journal. The analysis of references and keywords showed that DL was mainly used for the prediction, detection, classification and diagnosis of breast cancer, lung cancer, and skin cancer. CONCLUSIONS Overall, the number of articles on the application of DL in cancer is gradually increasing. In the future, further expanding and improving the application scope and accuracy of DL applications, and integrating DL with protein prediction, genomics and cancer research may be the research trends.
Collapse
Affiliation(s)
- Ruiyu Wang
- Department of Gastroenterology, The Affiliated Hospital of Southwest Medical University, Street Taiping No.25, Region Jiangyang, Luzhou, Sichuan Province, 646099, China
- Nuclear Medicine and Molecular Imaging Key Laboratory of Sichuan Province, Luzhou, China
| | - Shu Huang
- Department of Gastroenterology, Lianshui County People' Hospital, Huaian, China
- Department of Gastroenterology, Lianshui People' Hospital of Kangda CollegeAffiliated to, Nanjing Medical University , Huaian, China
| | - Ping Wang
- Department of Gastroenterology, The Affiliated Hospital of Southwest Medical University, Street Taiping No.25, Region Jiangyang, Luzhou, Sichuan Province, 646099, China
- Nuclear Medicine and Molecular Imaging Key Laboratory of Sichuan Province, Luzhou, China
| | - Xiaomin Shi
- Department of Gastroenterology, The Affiliated Hospital of Southwest Medical University, Street Taiping No.25, Region Jiangyang, Luzhou, Sichuan Province, 646099, China
- Nuclear Medicine and Molecular Imaging Key Laboratory of Sichuan Province, Luzhou, China
| | - Shiqi Li
- Department of Gastroenterology, The Affiliated Hospital of Southwest Medical University, Street Taiping No.25, Region Jiangyang, Luzhou, Sichuan Province, 646099, China
- Nuclear Medicine and Molecular Imaging Key Laboratory of Sichuan Province, Luzhou, China
| | - Yusong Ye
- Department of Gastroenterology, The Affiliated Hospital of Southwest Medical University, Street Taiping No.25, Region Jiangyang, Luzhou, Sichuan Province, 646099, China
- Nuclear Medicine and Molecular Imaging Key Laboratory of Sichuan Province, Luzhou, China
| | - Wei Zhang
- Department of Gastroenterology, The Affiliated Hospital of Southwest Medical University, Street Taiping No.25, Region Jiangyang, Luzhou, Sichuan Province, 646099, China
- Nuclear Medicine and Molecular Imaging Key Laboratory of Sichuan Province, Luzhou, China
| | - Lei Shi
- Department of Gastroenterology, The Affiliated Hospital of Southwest Medical University, Street Taiping No.25, Region Jiangyang, Luzhou, Sichuan Province, 646099, China
- Nuclear Medicine and Molecular Imaging Key Laboratory of Sichuan Province, Luzhou, China
| | - Xian Zhou
- Department of Gastroenterology, The Affiliated Hospital of Southwest Medical University, Street Taiping No.25, Region Jiangyang, Luzhou, Sichuan Province, 646099, China.
- Nuclear Medicine and Molecular Imaging Key Laboratory of Sichuan Province, Luzhou, China.
| | - Xiaowei Tang
- Department of Gastroenterology, The Affiliated Hospital of Southwest Medical University, Street Taiping No.25, Region Jiangyang, Luzhou, Sichuan Province, 646099, China.
- Nuclear Medicine and Molecular Imaging Key Laboratory of Sichuan Province, Luzhou, China.
| |
Collapse
|
5
|
Zhao N, Wu T, Wang W, Zhang L, Gong X. Review and Comparative Analysis of Methods and Advancements in Predicting Protein Complex Structure. Interdiscip Sci 2024:10.1007/s12539-024-00626-x. [PMID: 38955920 DOI: 10.1007/s12539-024-00626-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Revised: 02/29/2024] [Accepted: 03/01/2024] [Indexed: 07/04/2024]
Abstract
Protein complexes perform diverse biological functions, and obtaining their three-dimensional structure is critical to understanding and grasping their functions. In many cases, it's not just two proteins interacting to form a dimer; instead, multiple proteins interact to form a multimer. Experimentally resolving protein complex structures can be quite challenging. Recently, there have been efforts and methods that build upon prior predictions of dimer structures to attempt to predict multimer structures. However, in comparison to monomeric protein structure prediction, the accuracy of protein complex structure prediction remains relatively low. This paper provides an overview of recent advancements in efficient computational models for predicting protein complex structures. We introduce protein-protein docking methods in detail and summarize their main ideas, applicable modes, and related information. To enhance prediction accuracy, other critical protein-related information is also integrated, such as predicting interchain residue contact, utilizing experimental data like cryo-EM experiments, and considering protein interactions and non-interactions. In addition, we comprehensively review computational approaches for end-to-end prediction of protein complex structures based on artificial intelligence (AI) technology and describe commonly used datasets and representative evaluation metrics in protein complexes. Finally, we analyze the formidable challenges faced in current protein complex structure prediction tasks, including the structure prediction of heteromeric complex, disordered regions in complex, antibody-antigen complex, and RNA-related complex, as well as the evaluation metrics for complex assessment. We hope that this work will provide comprehensive knowledge of complex structure predictions to contribute to future advanced predictions.
Collapse
Affiliation(s)
- Nan Zhao
- Institute for Mathematical Sciences, Renmin University of China, Beijing, 100872, China
- School of Mathematics, Renmin University of China, Beijing, 100872, China
| | - Tong Wu
- Institute for Mathematical Sciences, Renmin University of China, Beijing, 100872, China
- School of Mathematics, Renmin University of China, Beijing, 100872, China
| | - Wenda Wang
- Institute for Mathematical Sciences, Renmin University of China, Beijing, 100872, China
- School of Mathematics, Renmin University of China, Beijing, 100872, China
| | - Lunchuan Zhang
- School of Mathematics, Renmin University of China, Beijing, 100872, China.
| | - Xinqi Gong
- Institute for Mathematical Sciences, Renmin University of China, Beijing, 100872, China.
- School of Mathematics, Renmin University of China, Beijing, 100872, China.
- Beijing Academy of Artificial Intelligence, Beijing, 100084, China.
| |
Collapse
|
6
|
Vila JA. Analysis of proteins in the light of mutations. EUROPEAN BIOPHYSICS JOURNAL : EBJ 2024:10.1007/s00249-024-01714-y. [PMID: 38955858 DOI: 10.1007/s00249-024-01714-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Revised: 05/23/2024] [Accepted: 06/18/2024] [Indexed: 07/04/2024]
Abstract
Proteins have evolved through mutations-amino acid substitutions-since life appeared on Earth, some 109 years ago. The study of these phenomena has been of particular significance because of their impact on protein stability, function, and structure. This study offers a new viewpoint on how the most recent findings in these areas can be used to explore the impact of mutations on protein sequence, stability, and evolvability. Preliminary results indicate that: (1) mutations can be viewed as sensitive probes to identify 'typos' in the amino-acid sequence, and also to assess the resistance of naturally occurring proteins to unwanted sequence alterations; (2) the presence of 'typos' in the amino acid sequence, rather than being an evolutionary obstacle, could promote faster evolvability and, in turn, increase the likelihood of higher protein stability; (3) the mutation site is far more important than the substituted amino acid in terms of the marginal stability changes of the protein, and (4) the unpredictability of protein evolution at the molecular level-by mutations-exists even in the absence of epistasis effects. Finally, the Darwinian concept of evolution "descent with modification" and experimental evidence endorse one of the results of this study, which suggests that some regions of any protein sequence are susceptible to mutations while others are not. This work contributes to our general understanding of protein responses to mutations and may spur significant progress in our efforts to develop methods to accurately forecast changes in protein stability, their propensity for metamorphism, and their ability to evolve.
Collapse
Affiliation(s)
- Jorge A Vila
- IMASL-CONICET, Universidad Nacional de San Luis, Ejército de los Andes 950, 5700, San Luis, Argentina.
| |
Collapse
|
7
|
Liu S, Yang Q, Zhang L, Luo S. Accurate Protein p Ka Prediction with Physical Organic Chemistry Guided 3D Protein Representation. J Chem Inf Model 2024; 64:4410-4418. [PMID: 38780156 DOI: 10.1021/acs.jcim.4c00354] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/25/2024]
Abstract
Protein pKa is a fundamental physicochemical parameter that dictates protein structure and function. However, accurately determining protein site-pKa values remains a substantial challenge, both experimentally and theoretically. In this study, we introduce a physical organic approach, leveraging a protein structural and physical-organic-parameter-based representation (P-SPOC), to develop a rapid and intuitive model for protein pKa prediction. Our P-SPOC model achieves state-of-the-art predictive accuracy, with a mean absolute error (MAE) of 0.33 pKa units. Furthermore, we have incorporated advanced protein structure prediction models, like AlphaFold2, to approximate structures for proteins lacking three-dimensional representations, which enhances the applicability of our model in the context of structure-undetermined protein research. To promote broader accessibility within the research community, an online prediction interface was also established at isyn.luoszgroup.com.
Collapse
Affiliation(s)
- Siyuan Liu
- Center of Basic Molecular Science, Department of Chemistry, Tsinghua University, Beijing 100084, China
| | - Qi Yang
- Center of Basic Molecular Science, Department of Chemistry, Tsinghua University, Beijing 100084, China
| | - Long Zhang
- Center of Basic Molecular Science, Department of Chemistry, Tsinghua University, Beijing 100084, China
| | - Sanzhong Luo
- Center of Basic Molecular Science, Department of Chemistry, Tsinghua University, Beijing 100084, China
| |
Collapse
|
8
|
Giladi M, Montgomery AP, Kassiou M, Danon JJ. Structure-based drug design for TSPO: Challenges and opportunities. Biochimie 2024:S0300-9084(24)00120-2. [PMID: 38782353 DOI: 10.1016/j.biochi.2024.05.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Revised: 04/27/2024] [Accepted: 05/21/2024] [Indexed: 05/25/2024]
Abstract
The translocator protein 18 kDa (TSPO) is an evolutionarily conserved mitochondrial transmembrane protein implicated in various neuropathologies and inflammatory conditions, making it a longstanding diagnostic and therapeutic target of interest. Despite the development of various classes of TSPO ligand chemotypes, and the elucidation of bacterial and non-human mammalian experimental structures, many unknowns exist surrounding its differential structural and functional features in health and disease. There are several limitations associated with currently used computational methodologies for modelling the native structure and ligand-binding behaviour of this enigmatic protein. In this perspective, we provide a critical analysis of the developments in the uses of these methods, outlining their uses, inherent limitations, and continuing challenges. We offer suggestions of unexplored opportunities that exist in the use of computational methodologies which offer promise for enhancing our understanding of the TSPO.
Collapse
Affiliation(s)
- Mia Giladi
- School of Chemistry, The University of Sydney, 2050, Sydney, NSW, Australia
| | | | - Michael Kassiou
- School of Chemistry, The University of Sydney, 2050, Sydney, NSW, Australia.
| | - Jonathan J Danon
- School of Chemistry, The University of Sydney, 2050, Sydney, NSW, Australia.
| |
Collapse
|
9
|
Lee S, Kim G, Karin EL, Mirdita M, Park S, Chikhi R, Babaian A, Kryshtafovych A, Steinegger M. Petabase-Scale Homology Search for Structure Prediction. Cold Spring Harb Perspect Biol 2024; 16:a041465. [PMID: 38316555 PMCID: PMC11065157 DOI: 10.1101/cshperspect.a041465] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2024]
Abstract
The recent CASP15 competition highlighted the critical role of multiple sequence alignments (MSAs) in protein structure prediction, as demonstrated by the success of the top AlphaFold2-based prediction methods. To push the boundaries of MSA utilization, we conducted a petabase-scale search of the Sequence Read Archive (SRA), resulting in gigabytes of aligned homologs for CASP15 targets. These were merged with default MSAs produced by ColabFold-search and provided to ColabFold-predict. By using SRA data, we achieved highly accurate predictions (GDT_TS > 70) for 66% of the non-easy targets, whereas using ColabFold-search default MSAs scored highly in only 52%. Next, we tested the effect of deep homology search and ColabFold's advanced features, such as more recycles, on prediction accuracy. While SRA homologs were most significant for improving ColabFold's CASP15 ranking from 11th to 3rd place, other strategies contributed too. We analyze these in the context of existing strategies to improve prediction.
Collapse
Affiliation(s)
- Sewon Lee
- School of Biological Sciences, Seoul National University, Gwanak-gu, Seoul 08826, South Korea
| | - Gyuri Kim
- School of Biological Sciences, Seoul National University, Gwanak-gu, Seoul 08826, South Korea
| | | | - Milot Mirdita
- School of Biological Sciences, Seoul National University, Gwanak-gu, Seoul 08826, South Korea
| | - Sukhwan Park
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, South Korea
| | - Rayan Chikhi
- Institut Pasteur, Université Paris Cité, G5 Sequence Bioinformatics, 75015 Paris, France
| | - Artem Babaian
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 1A8, Canada
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario M5S 3E1, Canada
| | | | - Martin Steinegger
- School of Biological Sciences, Seoul National University, Gwanak-gu, Seoul 08826, South Korea
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, South Korea
- Artificial Intelligence Institute, Seoul National University, Seoul 08826, South Korea
- Institute of Molecular Biology and Genetics, Seoul National University, Seoul 08826, South Korea
| |
Collapse
|
10
|
Penunuri G, Wang P, Corbett-Detig R, Russell SL. A Structural Proteome Screen Identifies Protein Mimicry in Host-Microbe Systems. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.10.588793. [PMID: 38645127 PMCID: PMC11030372 DOI: 10.1101/2024.04.10.588793] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/23/2024]
Abstract
Host-microbe systems are evolutionary niches that produce coevolved biological interactions and are a key component of global health. However, these systems have historically been a difficult field of biological research due to their experimental intractability. Impactful advances in global health will be obtained by leveraging in silico screens to identify genes involved in mediating interspecific interactions. These predictions will progress our understanding of these systems and lay the groundwork for future in vitro and in vivo experiments and bioengineering projects. A driver of host-manipulation and intracellular survival utilized by host-associated microbes is molecular mimicry, a critical mechanism that can occur at any level from DNA to protein structures. We applied protein structure prediction and alignment tools to explore host-associated bacterial structural proteomes for examples of protein structure mimicry. By leveraging the Legionella pneumophila proteome and its many known structural mimics, we developed and validated a screen that can be applied to virtually any host-microbe system to uncover signals of protein mimicry. These mimics represent candidate proteins that mediate host interactions in microbial proteomes. We successfully applied this screen to other microbes with demonstrated effects on global health, Helicobacter pylori and Wolbachia , identifying protein mimic candidates in each proteome. We discuss the roles these candidates may play in important Wolbachia -induced phenotypes and show that Wobachia infection can partially rescue the loss of one of these factors. This work demonstrates how a genome-wide screen for candidates of host-manipulation and intracellular survival offers an opportunity to identify functionally important genes in host-microbe systems.
Collapse
|
11
|
Li C, Yao J, Wei W, Niu Z, Zeng X, Li J, Wang J. Geometry-Based Molecular Generation With Deep Constrained Variational Autoencoder. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:4852-4861. [PMID: 35171779 DOI: 10.1109/tnnls.2022.3147790] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Finding target molecules with specific chemical properties plays a decisive role in drug development. We proposed GEOM-CVAE, a constrained variational autoencoder based on geometric representation for molecular generation with specific properties, which is protein-context-dependent. In terms of machine learning, it includes continuous feature embedding encoder and molecular generation decoder. Our key contribution is to propose an efficient geometric embedding method, including the spatial structure representations of drug molecule (converting the 3-D coordinates into image) and the geometric graph representations of protein target (modeling the protein surface as a mesh). The 3-D geometric information is vital to successful molecular generation, which is different from previous molecular generative methods based on 1-D or 2-D. Our model framework generates specific molecules in two phases, by first generating special image with molecular 3-D information to learn latent representations and generating molecules with constrained condition based on geometric graph convolution for specific protein and then inputting the generated structural molecules into a parser network for obtaining Simplified Molecular Input Line Entry System (SMILES) strings. Our model achieves competitive performance that implies its potential effectiveness to enable the exploration of the vast chemical space for drug discovery.
Collapse
|
12
|
Monteiro da Silva G, Cui JY, Dalgarno DC, Lisi GP, Rubenstein BM. High-throughput prediction of protein conformational distributions with subsampled AlphaFold2. Nat Commun 2024; 15:2464. [PMID: 38538622 PMCID: PMC10973385 DOI: 10.1038/s41467-024-46715-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Accepted: 02/28/2024] [Indexed: 04/12/2024] Open
Abstract
This paper presents an innovative approach for predicting the relative populations of protein conformations using AlphaFold 2, an AI-powered method that has revolutionized biology by enabling the accurate prediction of protein structures. While AlphaFold 2 has shown exceptional accuracy and speed, it is designed to predict proteins' ground state conformations and is limited in its ability to predict conformational landscapes. Here, we demonstrate how AlphaFold 2 can directly predict the relative populations of different protein conformations by subsampling multiple sequence alignments. We tested our method against nuclear magnetic resonance experiments on two proteins with drastically different amounts of available sequence data, Abl1 kinase and the granulocyte-macrophage colony-stimulating factor, and predicted changes in their relative state populations with more than 80% accuracy. Our subsampling approach worked best when used to qualitatively predict the effects of mutations or evolution on the conformational landscape and well-populated states of proteins. It thus offers a fast and cost-effective way to predict the relative populations of protein conformations at even single-point mutation resolution, making it a useful tool for pharmacology, analysis of experimental results, and predicting evolution.
Collapse
Affiliation(s)
| | - Jennifer Y Cui
- Brown University Department of Molecular and Cell Biology and Biochemistry, Providence, RI, USA
| | | | - George P Lisi
- Brown University Department of Molecular and Cell Biology and Biochemistry, Providence, RI, USA
- Brown University Department of Chemistry, Providence, RI, USA
| | - Brenda M Rubenstein
- Brown University Department of Molecular and Cell Biology and Biochemistry, Providence, RI, USA.
- Brown University Department of Chemistry, Providence, RI, USA.
| |
Collapse
|
13
|
Szikszai M, Magnus M, Sanghi S, Kadyan S, Bouatta N, Rivas E. RNA3DB: A structurally-dissimilar dataset split for training and benchmarking deep learning models for RNA structure prediction. J Mol Biol 2024:168552. [PMID: 38552946 DOI: 10.1016/j.jmb.2024.168552] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2024] [Revised: 03/19/2024] [Accepted: 03/22/2024] [Indexed: 04/09/2024]
Abstract
With advances in protein structure prediction thanks to deep learning models like AlphaFold, RNA structure prediction has recently received increased attention from deep learning researchers. RNAs introduce substantial challenges due to the sparser availability and lower structural diversity of the experimentally resolved RNA structures in comparison to protein structures. These challenges are often poorly addressed by the existing literature, many of which report inflated performance due to using training and testing sets with significant structural overlap. Further, the most recent Critical Assessment of Structure Prediction (CASP15) has shown that deep learning models for RNA structure are currently outperformed by traditional methods. In this paper we present RNA3DB, a dataset of structured RNAs, derived from the Protein Data Bank (PDB), that is designed for training and benchmarking deep learning models. The RNA3DB method arranges the RNA 3D chains into distinct groups (Components) that are non-redundant both with regard to sequence as well as structure, providing a robust way of dividing training, validation, and testing sets. Any split of these structurally-dissimilar Components are guaranteed to produce test and validations sets that are distinct by sequence and structure from those in the training set. We provide the RNA3DB dataset, a particular train/test split of the RNA3DB Components (in an approximate 70/30 ratio) that will be updated periodically. We also provide the RNA3DB methodology along with the source-code, with the goal of creating a reproducible and customizable tool for producing structurally-dissimilar dataset splits for structural RNAs.
Collapse
Affiliation(s)
- Marcell Szikszai
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, 02138, MA, USA
| | - Marcin Magnus
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, 02138, MA, USA
| | - Siddhant Sanghi
- Department of Systems Biology, Columbia University, New York 10027, NY, USA; College of Biological Sciences, UC Davis, Davis 95616, CA, USA
| | - Sachin Kadyan
- Department of Systems Biology, Columbia University, New York 10027, NY, USA
| | - Nazim Bouatta
- Laboratory of Systems Pharmacology, Harvard Medical School, Boston 02115, MA, USA
| | - Elena Rivas
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, 02138, MA, USA
| |
Collapse
|
14
|
Szikszai M, Magnus M, Sanghi S, Kadyan S, Bouatta N, Rivas E. RNA3DB: A structurally-dissimilar dataset split for training and benchmarking deep learning models for RNA structure prediction. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.30.578025. [PMID: 38352531 PMCID: PMC10862857 DOI: 10.1101/2024.01.30.578025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/23/2024]
Abstract
With advances in protein structure prediction thanks to deep learning models like AlphaFold, RNA structure prediction has recently received increased attention from deep learning researchers. RNAs introduce substantial challenges due to the sparser availability and lower structural diversity of the experimentally resolved RNA structures in comparison to protein structures. These challenges are often poorly addressed by the existing literature, many of which report inflated performance due to using training and testing sets with significant structural overlap. Further, the most recent Critical Assessment of Structure Prediction (CASP15) has shown that deep learning models for RNA structure are currently outperformed by traditional methods. In this paper we present RNA3DB, a dataset of structured RNAs, derived from the Protein Data Bank (PDB), that is designed for training and benchmarking deep learning models. The RNA3DB method arranges the RNA 3D chains into distinct groups (Components) that are non-redundant both with regard to sequence as well as structure, providing a robust way of dividing training, validation, and testing sets. Any split of these structurally-dissimilar Components are guaranteed to produce test and validations sets that are distinct by sequence and structure from those in the training set. We provide the RNA3DB dataset, a particular train/test split of the RNA3DB Components (in an approximate 70/30 ratio) that will be updated periodically. We also provide the RNA3DB methodology along with the source-code, with the goal of creating a reproducible and customizable tool for producing structurally-dissimilar dataset splits for structural RNAs.
Collapse
Affiliation(s)
- Marcell Szikszai
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, 02138, MA, USA
| | - Marcin Magnus
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, 02138, MA, USA
| | - Siddhant Sanghi
- Department of Systems Biology, Columbia University, New York, 10027, NY, USA
- College of Biological Sciences, UC Davis, Davis, 95616, CA, USA
| | - Sachin Kadyan
- Department of Systems Biology, Columbia University, New York, 10027, NY, USA
| | - Nazim Bouatta
- Laboratory of Systems Pharmacology, Harvard Medical School, Boston, 02115, MA, USA
| | - Elena Rivas
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, 02138, MA, USA
| |
Collapse
|
15
|
Wenzel M, Grüner E, Strodthoff N. Insights into the inner workings of transformer models for protein function prediction. Bioinformatics 2024; 40:btae031. [PMID: 38244570 PMCID: PMC10950482 DOI: 10.1093/bioinformatics/btae031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 12/14/2023] [Accepted: 01/16/2024] [Indexed: 01/22/2024] Open
Abstract
MOTIVATION We explored how explainable artificial intelligence (XAI) can help to shed light into the inner workings of neural networks for protein function prediction, by extending the widely used XAI method of integrated gradients such that latent representations inside of transformer models, which were finetuned to Gene Ontology term and Enzyme Commission number prediction, can be inspected too. RESULTS The approach enabled us to identify amino acids in the sequences that the transformers pay particular attention to, and to show that these relevant sequence parts reflect expectations from biology and chemistry, both in the embedding layer and inside of the model, where we identified transformer heads with a statistically significant correspondence of attribution maps with ground truth sequence annotations (e.g. transmembrane regions, active sites) across many proteins. AVAILABILITY AND IMPLEMENTATION Source code can be accessed at https://github.com/markuswenzel/xai-proteins.
Collapse
Affiliation(s)
- Markus Wenzel
- Department of Artificial Intelligence, Fraunhofer Institute for Telecommunications, Heinrich-Hertz-Institut, HHI, Einsteinufer 37, 10587 Berlin, Germany
| | - Erik Grüner
- Department of Artificial Intelligence, Fraunhofer Institute for Telecommunications, Heinrich-Hertz-Institut, HHI, Einsteinufer 37, 10587 Berlin, Germany
| | - Nils Strodthoff
- School VI - Medicine and Health Services, Carl von Ossietzky University of Oldenburg, Ammerländer Heerstr. 114-118, 26129 Oldenburg, Germany
| |
Collapse
|
16
|
Morera H, Dave P, Kolinko Y, Alahmari S, Anderson A, Denham G, Davis C, Riano J, Goldgof D, Hall LO, Harry GJ, Mouton PR. A novel deep learning-based method for automatic stereology of microglia cells from low magnification images. Neurotoxicol Teratol 2024; 102:107336. [PMID: 38402997 DOI: 10.1016/j.ntt.2024.107336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Revised: 01/31/2024] [Accepted: 02/21/2024] [Indexed: 02/27/2024]
Abstract
Microglial cells mediate diverse homeostatic, inflammatory, and immune processes during normal development and in response to cytotoxic challenges. During these functional activities, microglial cells undergo distinct numerical and morphological changes in different tissue volumes in both rodent and human brains. However, it remains unclear how these cytostructural changes in microglia correlate with region-specific neurochemical functions. To better understand these relationships, neuroscientists need accurate, reproducible, and efficient methods for quantifying microglial cell number and morphologies in histological sections. To address this deficit, we developed a novel deep learning (DL)-based classification, stereology approach that links the appearance of Iba1 immunostained microglial cells at low magnification (20×) with the total number of cells in the same brain region based on unbiased stereology counts as ground truth. Once DL models are trained, total microglial cell numbers in specific regions of interest can be estimated and treatment groups predicted in a high-throughput manner (<1 min) using only low-power images from test cases, without the need for time and labor-intensive stereology counts or morphology ratings in test cases. Results for this DL-based automatic stereology approach on two datasets (total 39 mouse brains) showed >90% accuracy, 100% percent repeatability (Test-Retest) and 60× greater efficiency than manual stereology (<1 min vs. ∼ 60 min) using the same tissue sections. Ongoing and future work includes use of this DL-based approach to establish clear neurodegeneration profiles in age-related human neurological diseases and related animal models.
Collapse
Affiliation(s)
- Hunter Morera
- Department of Computer Science and Engineering, University of South Florida, Tampa, FL 33620, USA.
| | - Palak Dave
- Department of Computer Science and Engineering, University of South Florida, Tampa, FL 33620, USA
| | - Yaroslav Kolinko
- Department of Histology and Embryology & Biomedical Center, Faculty of Medicine in Pilsen, Charles University, Pilsen, Czech Republic
| | - Saeed Alahmari
- Department of Computer Science, Najran University, Najran 66462, Saudi Arabia
| | | | | | | | | | - Dmitry Goldgof
- Department of Computer Science and Engineering, University of South Florida, Tampa, FL 33620, USA
| | - Lawrence O Hall
- Department of Computer Science and Engineering, University of South Florida, Tampa, FL 33620, USA
| | - G Jean Harry
- Mechanistic Toxicology Branch, Division of Translational Toxicology, NIEHS/NIH, Research Triangle Park, NC 27709, USA
| | - Peter R Mouton
- Department of Computer Science and Engineering, University of South Florida, Tampa, FL 33620, USA; SRC Biosciences, Tampa, FL 33606, USA.
| |
Collapse
|
17
|
Bu Y, Sun C, Guo J, Zhu W, Li J, Li X, Zhang Y. Identification novel salt-enhancing peptides from largemouth bass and exploration their action mechanism with transmembrane channel-like 4 (TMC4) by molecular simulation. Food Chem 2024; 435:137614. [PMID: 37820400 DOI: 10.1016/j.foodchem.2023.137614] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2023] [Revised: 09/21/2023] [Accepted: 09/27/2023] [Indexed: 10/13/2023]
Abstract
The purpose of this study was to screen and verify salt-enhancing peptides that can effectively reduce sodium consumption from Largemouth bass myosin through virtual hydrolysis, molecular simulation, and sensory evaluation. The human transmembrane channel-like 4 (TMC4) was constructed using Alphafold2, with 93.3 % of amino acids falling within allowed regions. A total of 19 peptides were predicted through virtual hydrolysis and screening. DAF, QIF, RPAL, and IPVM significantly enhanced the saltiness perception, and QIF exhibited the most pronounced effect in enhancing saltiness (P < 0.05). The residues Ala258, Ser546, Ser603, Phe259, Cys265, Glu539, Lys278 and Ser585 were identified as key binding sites. The TMC4-DAF complex achieved stability after 20, 000 ps, exhibiting an average RMSD value of 0.84 nm. DAF consistently displayed fluctuations at approximately 3.05 nm, and the number of hydrogen bonds varied between 3 and 5. These results suggested that Alphafold2 modelling can be used for predicting salt-enhancing peptides.
Collapse
Affiliation(s)
- Ying Bu
- College of Food Science and Engineering, Bohai University. National & Local Joint Engineering Research Center of Storage, Processing and Safety Control Technology for Fresh Agricultural and Aquatic Products, Jinzhou, Liaoning 121013, China; Engineering Research Centre of Fujian-Taiwan Special Marine Food Processing and Nutrition, Ministry of Education, Fuzhou 350002, China; College of Food Science, Fujian Agriculture and Forestry University, Fuzhou 350002, China.
| | - Chaonan Sun
- College of Food Science and Engineering, Bohai University. National & Local Joint Engineering Research Center of Storage, Processing and Safety Control Technology for Fresh Agricultural and Aquatic Products, Jinzhou, Liaoning 121013, China
| | - Jiaqi Guo
- College of Food Science and Engineering, Bohai University. National & Local Joint Engineering Research Center of Storage, Processing and Safety Control Technology for Fresh Agricultural and Aquatic Products, Jinzhou, Liaoning 121013, China
| | - Wenhui Zhu
- College of Food Science and Engineering, Bohai University. National & Local Joint Engineering Research Center of Storage, Processing and Safety Control Technology for Fresh Agricultural and Aquatic Products, Jinzhou, Liaoning 121013, China
| | - Jianrong Li
- College of Food Science and Engineering, Bohai University. National & Local Joint Engineering Research Center of Storage, Processing and Safety Control Technology for Fresh Agricultural and Aquatic Products, Jinzhou, Liaoning 121013, China
| | - Xuepeng Li
- College of Food Science and Engineering, Bohai University. National & Local Joint Engineering Research Center of Storage, Processing and Safety Control Technology for Fresh Agricultural and Aquatic Products, Jinzhou, Liaoning 121013, China
| | - Yi Zhang
- Engineering Research Centre of Fujian-Taiwan Special Marine Food Processing and Nutrition, Ministry of Education, Fuzhou 350002, China; College of Food Science, Fujian Agriculture and Forestry University, Fuzhou 350002, China.
| |
Collapse
|
18
|
Krokidis MG, Dimitrakopoulos GN, Vrahatis AG, Exarchos TP, Vlamos P. Challenges and limitations in computational prediction of protein misfolding in neurodegenerative diseases. Front Comput Neurosci 2024; 17:1323182. [PMID: 38250244 PMCID: PMC10796696 DOI: 10.3389/fncom.2023.1323182] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Accepted: 12/19/2023] [Indexed: 01/23/2024] Open
Affiliation(s)
| | | | | | | | - Panagiotis Vlamos
- Bioinformatics and Human Electrophysiology Laboratory, Department of Informatics, Ionian University, Corfu, Greece
| |
Collapse
|
19
|
da Silva GM, Cui JY, Dalgarno DC, Lisi GP, Rubenstein BM. Predicting Relative Populations of Protein Conformations without a Physics Engine Using AlphaFold 2. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.25.550545. [PMID: 37546747 PMCID: PMC10402055 DOI: 10.1101/2023.07.25.550545] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/08/2023]
Abstract
This paper presents a novel approach for predicting the relative populations of protein conformations using AlphaFold 2, an AI-powered method that has revolutionized biology by enabling the accurate prediction of protein structures. While AlphaFold 2 has shown exceptional accuracy and speed, it is designed to predict proteins' ground state conformations and is limited in its ability to predict conformational landscapes. Here, we demonstrate how AlphaFold 2 can directly predict the relative populations of different protein conformations by subsampling multiple sequence alignments. We tested our method against NMR experiments on two proteins with drastically different amounts of available sequence data, Abl1 kinase and the granulocyte-macrophage colony-stimulating factor, and predicted changes in their relative state populations with more than 80% accuracy. Our subsampling approach worked best when used to qualitatively predict the effects of mutations or evolution on the conformational landscape and well-populated states of proteins. It thus offers a fast and cost-effective way to predict the relative populations of protein conformations at even single-point mutation resolution, making it a useful tool for pharmacology, NMR analysis, and evolution.
Collapse
Affiliation(s)
- Gabriel Monteiro da Silva
- Brown University Department of Molecular Biology, Cell Biology, and Biochemistry, Providence, RI, USA
| | - Jennifer Y Cui
- Brown University Department of Molecular Biology, Cell Biology, and Biochemistry, Providence, RI, USA
| | | | - George P Lisi
- Brown University Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University Department of Chemistry, Providence, RI, USA
| | - Brenda M Rubenstein
- Brown University Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University Department of Chemistry, Providence, RI, USA
| |
Collapse
|
20
|
Sakhawat A, Khan MU, Rehman R, Khan S, Shan MA, Batool A, Javed MA, Ali Q. Natural compound targeting BDNF V66M variant: insights from in silico docking and molecular analysis. AMB Express 2023; 13:134. [PMID: 38015338 PMCID: PMC10684480 DOI: 10.1186/s13568-023-01640-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Accepted: 11/09/2023] [Indexed: 11/29/2023] Open
Abstract
Brain-Derived Neurotrophic Factor (BDNF) is a neurotrophin gene family gene that encodes proteins vital for the growth, maintenance, and survival of neurons in the nervous system. The study aimed to screen natural compounds against BDNF variant (V66M), which affects memory, cognition, and mood regulation. BDNF variant (V66M) as a target structure was selected, and Vitamin D, Curcumin, Vitamin C, and Quercetin as ligands structures were taken from PubChem database. Multiple tools like AUTODOCK VINA, BIOVIA discovery studio, PyMOL, CB-dock, IMOD server, Swiss ADEMT, and Swiss predict ligands target were used to analyze binding energy, interaction, stability, toxicity, and visualize BDNF-ligand complexes. Compounds Vitamin D3, Curcumin, Vitamin C, and Quercetin with binding energies values of - 5.5, - 6.1, - 4.5, and - 6.7 kj/mol, respectively, were selected. The ligands bind to the active sites of the BDNF variant (V66M) via hydrophobic bonds, hydrogen bonds, and electrostatic interactions. Furthermore, ADMET analysis of the ligands revealed they exhibited sound pharmacokinetic and toxicity profiles. In addition, an MD simulation study showed that the most active ligand bound favorably and dynamically to the target protein, and protein-ligand complex stability was determined. The finding of this research could provide an excellent platform for discovering and rationalizing novel drugs against stress related to BDNF (V66M). Docking, preclinical drug testing and MD simulation results suggest Quercetin as a more potent BDNF variant (V66M) inhibitor and forming a more structurally stable complex.
Collapse
Affiliation(s)
- Azra Sakhawat
- Institute of Molecular Biology and Biotechnology, The University of Lahore, Lahore, Pakistan
| | - Muhammad Umer Khan
- Institute of Molecular Biology and Biotechnology, The University of Lahore, Lahore, Pakistan.
| | - Raima Rehman
- Institute of Molecular Biology and Biotechnology, The University of Lahore, Lahore, Pakistan
| | - Samiullah Khan
- Institute of Molecular Biology and Biotechnology, The University of Lahore, Lahore, Pakistan
| | - Muhammad Adnan Shan
- Centre for Applied Molecular Biology, University of the Punjab, Lahore, Pakistan
| | - Alia Batool
- Department of Plant Breeding and Genetics, Faculty of Agricultural Sciences, University of the Punjab, Lahore, Pakistan
| | - Muhammad Arshad Javed
- Department of Plant Breeding and Genetics, Faculty of Agricultural Sciences, University of the Punjab, Lahore, Pakistan
| | - Qurban Ali
- Department of Plant Breeding and Genetics, Faculty of Agricultural Sciences, University of the Punjab, Lahore, Pakistan.
| |
Collapse
|
21
|
Braghetto A, Orlandini E, Baiesi M. Interpretable Machine Learning of Amino Acid Patterns in Proteins: A Statistical Ensemble Approach. J Chem Theory Comput 2023; 19:6011-6022. [PMID: 37552831 PMCID: PMC10500975 DOI: 10.1021/acs.jctc.3c00383] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Indexed: 08/10/2023]
Abstract
Explainable and interpretable unsupervised machine learning helps one to understand the underlying structure of data. We introduce an ensemble analysis of machine learning models to consolidate their interpretation. Its application shows that restricted Boltzmann machines compress consistently into a few bits the information stored in a sequence of five amino acids at the start or end of α-helices or β-sheets. The weights learned by the machines reveal unexpected properties of the amino acids and the secondary structure of proteins: (i) His and Thr have a negligible contribution to the amphiphilic pattern of α-helices; (ii) there is a class of α-helices particularly rich in Ala at their end; (iii) Pro occupies most often slots otherwise occupied by polar or charged amino acids, and its presence at the start of helices is relevant; (iv) Glu and especially Asp on one side and Val, Leu, Iso, and Phe on the other display the strongest tendency to mark amphiphilic patterns, i.e., extreme values of an effective hydrophobicity, though they are not the most powerful (non)hydrophobic amino acids.
Collapse
Affiliation(s)
- Anna Braghetto
- Department
of Physics and Astronomy, University of
Padova, Via Marzolo 8, 35131 Padua, Italy
- INFN,
Sezione di Padova, Via
Marzolo 8, 35131 Padua, Italy
| | - Enzo Orlandini
- Department
of Physics and Astronomy, University of
Padova, Via Marzolo 8, 35131 Padua, Italy
- INFN,
Sezione di Padova, Via
Marzolo 8, 35131 Padua, Italy
| | - Marco Baiesi
- Department
of Physics and Astronomy, University of
Padova, Via Marzolo 8, 35131 Padua, Italy
- INFN,
Sezione di Padova, Via
Marzolo 8, 35131 Padua, Italy
| |
Collapse
|
22
|
Vila JA. Protein structure prediction from the complementary science perspective. Biophys Rev 2023; 15:439-445. [PMID: 37681107 PMCID: PMC10480374 DOI: 10.1007/s12551-023-01107-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2023] [Accepted: 07/25/2023] [Indexed: 09/09/2023] Open
Abstract
A comparative analysis between two problems-apparently unrelated-which are solved in a period of ~400 years, viz., the accurate prediction of both the planetary orbits and the protein structures, leads to inferred conjectures that go far beyond the existence of a common path in their resolution, i.e., observation → pattern recognition → modeling. The preliminary results from this analysis indicate that complementary science, together with a new perspective on protein folding, may help us discover common features that could contribute to a more in-depth understanding of still-unsolved problems such as protein folding.
Collapse
Affiliation(s)
- Jorge A. Vila
- IMASL-CONICET, Universidad Nacional de San Luis, Ejército de Los Andes 950, 5700 San Luis, Argentina
| |
Collapse
|
23
|
Lee S, Kim G, Karin EL, Mirdita M, Park S, Chikhi R, Babaian A, Kryshtafovych A, Steinegger M. Petascale Homology Search for Structure Prediction. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.10.548308. [PMID: 37503235 PMCID: PMC10369885 DOI: 10.1101/2023.07.10.548308] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
The recent CASP15 competition highlighted the critical role of multiple sequence alignments (MSAs) in protein structure prediction, as demonstrated by the success of the top AlphaFold2-based prediction methods. To push the boundaries of MSA utilization, we conducted a petabase-scale search of the Sequence Read Archive (SRA), resulting in gigabytes of aligned homologs for CASP15 targets. These were merged with default MSAs produced by ColabFold-search and provided to ColabFold-predict. By using SRA data, we achieved highly accurate predictions (GDT_TS > 70) for 66% of the non-easy targets, whereas using ColabFold-search default MSAs scored highly in only 52%. Next, we tested the effect of deep homology search and ColabFold's advanced features, such as more recycles, on prediction accuracy. While SRA homologs were most significant for improving ColabFold's CASP15 ranking from 11th to 3rd place, other strategies contributed too. We analyze these in the context of existing strategies to improve prediction.
Collapse
Affiliation(s)
- Sewon Lee
- School of Biological Sciences, Seoul National University, Seoul 08826, South Korea
| | - Gyuri Kim
- School of Biological Sciences, Seoul National University, Seoul 08826, South Korea
| | | | - Milot Mirdita
- School of Biological Sciences, Seoul National University, Seoul 08826, South Korea
| | - Sukhwan Park
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, South Korea
| | - Rayan Chikhi
- Institut Pasteur, Université Paris Cité, G5 Sequence Bioinformatics, 75015 Paris, France
| | - Artem Babaian
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 1A8, Canada
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario M5S 3E1, Canada
| | | | - Martin Steinegger
- School of Biological Sciences, Seoul National University, Seoul 08826, South Korea
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, South Korea
- Artificial Intelligence Institute, Seoul National University, Seoul 08826, South Korea
- Institute of Molecular Biology and Genetics, Seoul National University, Seoul 08826, South Korea
| |
Collapse
|
24
|
Schön JC. Structure prediction in low dimensions: concepts, issues and examples. PHILOSOPHICAL TRANSACTIONS. SERIES A, MATHEMATICAL, PHYSICAL, AND ENGINEERING SCIENCES 2023; 381:20220246. [PMID: 37211034 DOI: 10.1098/rsta.2022.0246] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Accepted: 03/06/2023] [Indexed: 05/23/2023]
Abstract
Structure prediction of stable and metastable polymorphs of chemical systems in low dimensions has become an important field, since materials that are patterned on the nano-scale are of increasing importance in modern technological applications. While many techniques for the prediction of crystalline structures in three dimensions or of small clusters of atoms have been developed over the past three decades, dealing with low-dimensional systems-ideal one-dimensional and two-dimensional systems, quasi-one-dimensional and quasi-two-dimensional systems, as well as low-dimensional composite systems-poses its own challenges that need to be addressed when developing a systematic methodology for the determination of low-dimensional polymorphs that are suitable for practical applications. Quite generally, the search algorithms that had been developed for three-dimensional systems need to be adjusted when being applied to low-dimensional systems with their own specific constraints; in particular, the embedding of the (quasi-)one-dimensional/two-dimensional system in three dimensions and the influence of stabilizing substrates need to be taken into account, both on a technical and a conceptual level. This article is part of a discussion meeting issue 'Supercomputing simulations of advanced materials'.
Collapse
Affiliation(s)
- J Christian Schön
- Department of Nanoscience, Max-Planck-Institute for Solid State Research, Heisenbergstr. 1, D-70569 Stuttgart, Germany
| |
Collapse
|
25
|
Liu M, Huang J, Ma S, Yu G, Liao A, Pan L, Hou Y. Allergenicity of wheat protein in diet: Mechanisms, modifications and challenges. Food Res Int 2023; 169:112913. [PMID: 37254349 DOI: 10.1016/j.foodres.2023.112913] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Revised: 04/21/2023] [Accepted: 04/25/2023] [Indexed: 06/01/2023]
Abstract
Wheat is widely available in people's daily diets. However, some people are currently experiencing IgE-mediated allergic reactions to wheat-based foods, which seriously impact their quality of life. Thus, it is imperative to provide comprehensive knowledge and effective methods to reduce the risk of wheat allergy (WA) in food. In the present review, recent advances in WA symptoms, the major allergens, detection methods, opportunities and challenges in establishing animal models of WA are summarized and discussed. Furthermore, an updated overview of the different modification methods that are currently being applied to wheat-based foods is provided. This study concludes that future approaches to food allergen detection will focus on combining multiple tools to rapidly and accurately quantify individual allergens in complex food matrices. Besides, biological modification has many advantages over physical or chemical modification methods in the development of hypoallergenic wheat products, such as enzymatic hydrolysis and fermentation. It is worth noting that using biotechnology to edit wheat allergen genes to produce allergen-free food may be a promising method in the future which could improve the safety of wheat foods and the health of allergy sufferers.
Collapse
Affiliation(s)
- Ming Liu
- Henan Provincial Key Laboratory of Biological Processing and Nutritional Function of Wheat, College of Biological Engineering, Henan University of Technology, Zhengzhou 450001, PR China; College of Food Science and Engineering, Henan University of Technology, Zhengzhou, 450001, PR China
| | - Jihong Huang
- Henan Provincial Key Laboratory of Biological Processing and Nutritional Function of Wheat, College of Biological Engineering, Henan University of Technology, Zhengzhou 450001, PR China; College of Food Science and Engineering, Henan University of Technology, Zhengzhou, 450001, PR China; State Key Laboratory of Crop Stress Adaptation and Improvement, College of Agriculture, Henan University, Kaifeng 475004, PR China; School of Food and Pharmacy, Xuchang University, Xuchang 461000, PR China.
| | - Sen Ma
- College of Food Science and Engineering, Henan University of Technology, Zhengzhou, 450001, PR China.
| | - Guanghai Yu
- Henan Provincial Key Laboratory of Biological Processing and Nutritional Function of Wheat, College of Biological Engineering, Henan University of Technology, Zhengzhou 450001, PR China
| | - Aimei Liao
- Henan Provincial Key Laboratory of Biological Processing and Nutritional Function of Wheat, College of Biological Engineering, Henan University of Technology, Zhengzhou 450001, PR China
| | - Long Pan
- Henan Provincial Key Laboratory of Biological Processing and Nutritional Function of Wheat, College of Biological Engineering, Henan University of Technology, Zhengzhou 450001, PR China
| | - Yinchen Hou
- College of Food and Biological Engineering, Henan University of Animal Husbandry and Economy, Zhengzhou 450044, PR China
| |
Collapse
|
26
|
Spiers AJ, Dorfmueller HC, Jerdan R, McGregor J, Nicoll A, Steel K, Cameron S. Bioinformatics characterization of BcsA-like orphan proteins suggest they form a novel family of pseudomonad cyclic-β-glucan synthases. PLoS One 2023; 18:e0286540. [PMID: 37267309 PMCID: PMC10237404 DOI: 10.1371/journal.pone.0286540] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2022] [Accepted: 05/18/2023] [Indexed: 06/04/2023] Open
Abstract
Bacteria produce a variety of polysaccharides with functional roles in cell surface coating, surface and host interactions, and biofilms. We have identified an 'Orphan' bacterial cellulose synthase catalytic subunit (BcsA)-like protein found in four model pseudomonads, P. aeruginosa PA01, P. fluorescens SBW25, P. putida KT2440 and P. syringae pv. tomato DC3000. Pairwise alignments indicated that the Orphan and BcsA proteins shared less than 41% sequence identity suggesting they may not have the same structural folds or function. We identified 112 Orphans among soil and plant-associated pseudomonads as well as in phytopathogenic and human opportunistic pathogenic strains. The wide distribution of these highly conserved proteins suggest they form a novel family of synthases producing a different polysaccharide. In silico analysis, including sequence comparisons, secondary structure and topology predictions, and protein structural modelling, revealed a two-domain transmembrane ovoid-like structure for the Orphan protein with a periplasmic glycosyl hydrolase family GH17 domain linked via a transmembrane region to a cytoplasmic glycosyltransferase family GT2 domain. We suggest the GT2 domain synthesises β-(1,3)-glucan that is transferred to the GH17 domain where it is cleaved and cyclised to produce cyclic-β-(1,3)-glucan (CβG). Our structural models are consistent with enzymatic characterisation and recent molecular simulations of the PaPA01 and PpKT2440 GH17 domains. It also provides a functional explanation linking PaPAK and PaPA14 Orphan (also known as NdvB) transposon mutants with CβG production and biofilm-associated antibiotic resistance. Importantly, cyclic glucans are also involved in osmoregulation, plant infection and induced systemic suppression, and our findings suggest this novel family of CβG synthases may provide similar range of adaptive responses for pseudomonads.
Collapse
Affiliation(s)
- Andrew J. Spiers
- School of Applied Sciences, Abertay University, Dundee, United Kingdom
| | - Helge C. Dorfmueller
- Division of Molecular Microbiology, School of Life Sciences, University of Dundee, Dundee, United Kingdom
| | - Robyn Jerdan
- School of Applied Sciences, Abertay University, Dundee, United Kingdom
| | - Jessica McGregor
- Nuffield Research Placement Students, School of Applied Sciences, Abertay University, Dundee, United Kingdom
| | - Abbie Nicoll
- Nuffield Research Placement Students, School of Applied Sciences, Abertay University, Dundee, United Kingdom
| | - Kenzie Steel
- Nuffield Research Placement Students, School of Applied Sciences, Abertay University, Dundee, United Kingdom
| | - Scott Cameron
- School of Applied Sciences, Abertay University, Dundee, United Kingdom
| |
Collapse
|
27
|
Vila JA. Rethinking the protein folding problem from a new perspective. EUROPEAN BIOPHYSICS JOURNAL : EBJ 2023:10.1007/s00249-023-01657-w. [PMID: 37165178 DOI: 10.1007/s00249-023-01657-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Received: 12/28/2022] [Revised: 04/16/2023] [Accepted: 04/30/2023] [Indexed: 05/12/2023]
Abstract
One of the main concerns of Anfinsen was to reveal the connection between the amino-acid sequence and their biologically active conformation. This search gave rise to two crucial questions in structural biology, namely, why the proteins fold and how a sequence encodes its folding. As to the why, he proposes a plausible answer, namely, the thermodynamic hypothesis. As to the how, this remains an unsolved challenge. Consequently, the protein folding problem is examined here from a new perspective, namely, as an 'analytic whole'. Conceiving the protein folding in this way enabled us to (i) examine in detail why the force-field-based approaches have failed, among other purposes, in their ability to predict the three-dimensional structure of a protein accurately; (ii) propose how to redefine them to prevent these shortcomings, and (iii) conjecture on the origin of the state-of-the-art numerical-methods success to predict the tridimensional structure of proteins accurately.
Collapse
Affiliation(s)
- Jorge A Vila
- IMASL-CONICET, Universidad Nacional de San Luis, Ejército de Los Andes 950, 5700, San Luis, Argentina.
| |
Collapse
|
28
|
Wodak SJ, Vajda S, Lensink MF, Kozakov D, Bates PA. Critical Assessment of Methods for Predicting the 3D Structure of Proteins and Protein Complexes. Annu Rev Biophys 2023; 52:183-206. [PMID: 36626764 PMCID: PMC10885158 DOI: 10.1146/annurev-biophys-102622-084607] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
Advances in a scientific discipline are often measured by small, incremental steps. In this review, we report on two intertwined disciplines in the protein structure prediction field, modeling of single chains and modeling of complexes, that have over decades emulated this pattern, as monitored by the community-wide blind prediction experiments CASP and CAPRI. However, over the past few years, dramatic advances were observed for the accurate prediction of single protein chains, driven by a surge of deep learning methodologies entering the prediction field. We review the mainscientific developments that enabled these recent breakthroughs and feature the important role of blind prediction experiments in building up and nurturing the structure prediction field. We discuss how the new wave of artificial intelligence-based methods is impacting the fields of computational and experimental structural biology and highlight areas in which deep learning methods are likely to lead to future developments, provided that major challenges are overcome.
Collapse
Affiliation(s)
- Shoshana J Wodak
- VIB-VUB Center for Structural Biology, Vrije Universiteit Brussel, Brussels, Belgium;
| | - Sandor Vajda
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, USA;
- Department of Chemistry, Boston University, Boston, Massachusetts, USA
| | - Marc F Lensink
- Univ. Lille, CNRS, UMR 8576-UGSF-Unité de Glycobiologie Structurale et Fonctionnelle, Lille, France;
| | - Dima Kozakov
- Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, New York, USA;
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, New York, USA
| | - Paul A Bates
- Biomolecular Modelling Laboratory, The Francis Crick Institute, London, United Kingdom;
| |
Collapse
|
29
|
Zhang O, Haghighatlari M, Li J, Liu ZH, Namini A, Teixeira JMC, Forman-Kay JD, Head-Gordon T. Learning to evolve structural ensembles of unfolded and disordered proteins using experimental solution data. J Chem Phys 2023; 158:174113. [PMID: 37144719 PMCID: PMC10163956 DOI: 10.1063/5.0141474] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2023] [Accepted: 04/11/2023] [Indexed: 05/06/2023] Open
Abstract
The structural characterization of proteins with a disorder requires a computational approach backed by experiments to model their diverse and dynamic structural ensembles. The selection of conformational ensembles consistent with solution experiments of disordered proteins highly depends on the initial pool of conformers, with currently available tools limited by conformational sampling. We have developed a Generative Recurrent Neural Network (GRNN) that uses supervised learning to bias the probability distributions of torsions to take advantage of experimental data types such as nuclear magnetic resonance J-couplings, nuclear Overhauser effects, and paramagnetic resonance enhancements. We show that updating the generative model parameters according to the reward feedback on the basis of the agreement between experimental data and probabilistic selection of torsions from learned distributions provides an alternative to existing approaches that simply reweight conformers of a static structural pool for disordered proteins. Instead, the biased GRNN, DynamICE, learns to physically change the conformations of the underlying pool of the disordered protein to those that better agree with experiments.
Collapse
Affiliation(s)
- Oufan Zhang
- Kenneth S. Pitzer Theory Center and Department of Chemistry, University of California, Berkeley, California 94720, USA
| | - Mojtaba Haghighatlari
- Kenneth S. Pitzer Theory Center and Department of Chemistry, University of California, Berkeley, California 94720, USA
| | - Jie Li
- Kenneth S. Pitzer Theory Center and Department of Chemistry, University of California, Berkeley, California 94720, USA
| | | | - Ashley Namini
- Molecular Medicine Program, Hospital for Sick Children, Toronto, Ontario M5S 1A8, Canada
| | | | | | | |
Collapse
|
30
|
Mardikoraem M, Woldring D. Protein Fitness Prediction Is Impacted by the Interplay of Language Models, Ensemble Learning, and Sampling Methods. Pharmaceutics 2023; 15:1337. [PMID: 37242577 PMCID: PMC10224321 DOI: 10.3390/pharmaceutics15051337] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2023] [Revised: 04/19/2023] [Accepted: 04/21/2023] [Indexed: 05/28/2023] Open
Abstract
Advances in machine learning (ML) and the availability of protein sequences via high-throughput sequencing techniques have transformed the ability to design novel diagnostic and therapeutic proteins. ML allows protein engineers to capture complex trends hidden within protein sequences that would otherwise be difficult to identify in the context of the immense and rugged protein fitness landscape. Despite this potential, there persists a need for guidance during the training and evaluation of ML methods over sequencing data. Two key challenges for training discriminative models and evaluating their performance include handling severely imbalanced datasets (e.g., few high-fitness proteins among an abundance of non-functional proteins) and selecting appropriate protein sequence representations (numerical encodings). Here, we present a framework for applying ML over assay-labeled datasets to elucidate the capacity of sampling techniques and protein encoding methods to improve binding affinity and thermal stability prediction tasks. For protein sequence representations, we incorporate two widely used methods (One-Hot encoding and physiochemical encoding) and two language-based methods (next-token prediction, UniRep; masked-token prediction, ESM). Elaboration on performance is provided over protein fitness, protein size, and sampling techniques. In addition, an ensemble of protein representation methods is generated to discover the contribution of distinct representations and improve the final prediction score. We then implement multiple criteria decision analysis (MCDA; TOPSIS with entropy weighting), using multiple metrics well-suited for imbalanced data, to ensure statistical rigor in ranking our methods. Within the context of these datasets, the synthetic minority oversampling technique (SMOTE) outperformed undersampling while encoding sequences with One-Hot, UniRep, and ESM representations. Moreover, ensemble learning increased the predictive performance of the affinity-based dataset by 4% compared to the best single-encoding candidate (F1-score = 97%), while ESM alone was rigorous enough in stability prediction (F1-score = 92%).
Collapse
Affiliation(s)
- Mehrsa Mardikoraem
- Department of Chemical Engineering and Materials Science, Michigan State University, East Lansing, MI 48824, USA
- Institute for Quantitative Health Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| | - Daniel Woldring
- Department of Chemical Engineering and Materials Science, Michigan State University, East Lansing, MI 48824, USA
- Institute for Quantitative Health Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| |
Collapse
|
31
|
Sicard J, Barbe S, Boutrou R, Bouvier L, Delaplace G, Lashermes G, Théron L, Vitrac O, Tonda A. A primer on predictive techniques for food and bioresources transformation processes. J FOOD PROCESS ENG 2023. [DOI: 10.1111/jfpe.14325] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/19/2023]
Affiliation(s)
| | | | | | - Laurent Bouvier
- UMET Université de Lille, CNRS, Centrale Lille, INRAE Villeneuve‐D'Ascq France
| | - Guillaume Delaplace
- UMET Université de Lille, CNRS, Centrale Lille, INRAE Villeneuve‐D'Ascq France
| | | | | | - Olivier Vitrac
- SayFood, INRAE, AgroParisTech Université Paris Saclay Massy France
| | - Alberto Tonda
- MIA‐Paris, AgroParisTech, INRAE Université Paris Saclay Paris France
| |
Collapse
|
32
|
Yang Z, Zeng X, Zhao Y, Chen R. AlphaFold2 and its applications in the fields of biology and medicine. Signal Transduct Target Ther 2023; 8:115. [PMID: 36918529 PMCID: PMC10011802 DOI: 10.1038/s41392-023-01381-z] [Citation(s) in RCA: 60] [Impact Index Per Article: 60.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2022] [Revised: 12/27/2022] [Accepted: 02/16/2023] [Indexed: 03/16/2023] Open
Abstract
AlphaFold2 (AF2) is an artificial intelligence (AI) system developed by DeepMind that can predict three-dimensional (3D) structures of proteins from amino acid sequences with atomic-level accuracy. Protein structure prediction is one of the most challenging problems in computational biology and chemistry, and has puzzled scientists for 50 years. The advent of AF2 presents an unprecedented progress in protein structure prediction and has attracted much attention. Subsequent release of structures of more than 200 million proteins predicted by AF2 further aroused great enthusiasm in the science community, especially in the fields of biology and medicine. AF2 is thought to have a significant impact on structural biology and research areas that need protein structure information, such as drug discovery, protein design, prediction of protein function, et al. Though the time is not long since AF2 was developed, there are already quite a few application studies of AF2 in the fields of biology and medicine, with many of them having preliminarily proved the potential of AF2. To better understand AF2 and promote its applications, we will in this article summarize the principle and system architecture of AF2 as well as the recipe of its success, and particularly focus on reviewing its applications in the fields of biology and medicine. Limitations of current AF2 prediction will also be discussed.
Collapse
Affiliation(s)
- Zhenyu Yang
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, 610041, China
| | - Xiaoxi Zeng
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, 610041, China.
| | - Yi Zhao
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, 610041, China.
- Key Laboratory of Intelligent Information Processing, Advanced Computer Research Center, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190, China.
| | - Runsheng Chen
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, 610041, China.
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China.
- Pingshan Translational Medicine Center, Shenzhen Bay Laboratory, Shenzhen, 518118, China.
| |
Collapse
|
33
|
Szwabowski GL, Baker DL, Parrill AL. Application of computational methods for class A GPCR Ligand discovery. J Mol Graph Model 2023; 121:108434. [PMID: 36841204 DOI: 10.1016/j.jmgm.2023.108434] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2022] [Revised: 02/11/2023] [Accepted: 02/13/2023] [Indexed: 02/22/2023]
Abstract
G protein-coupled receptors (GPCR) are integral membrane proteins of considerable interest as targets for drug development due to their role in transmitting cellular signals in a multitude of biological processes. Of the six classes categorizing GPCR (A, B, C, D, E, and F), class A contains the largest number of therapeutically relevant GPCR. Despite their importance as drug targets, many challenges exist for the discovery of novel class A GPCR ligands serving as drug precursors. Though knowledge of the structural and functional characteristics of GPCR has grown significantly over the past 20 years, a large portion of GPCR lack reported, experimentally determined structures. Furthermore, many GPCR have no known endogenous and/or synthetic ligands, limiting further exploration of their biochemical, cellular, and physiological roles. While many successes in GPCR ligand discovery have resulted from experimental high-throughput screening, computational methods have played an increasingly important role in GPCR ligand identification in the past decade. Here we discuss computational techniques applied to GPCR ligand discovery. This review summarizes class A GPCR structure/function and provides an overview of many obstacles currently faced in GPCR ligand discovery. Furthermore, we discuss applications and recent successes of computational techniques used to predict GPCR structure as well as present a summary of ligand- and structure-based methods used to identify potential GPCR ligands. Finally, we discuss computational hit list generation and refinement and provide comprehensive workflows for GPCR ligand identification.
Collapse
Affiliation(s)
| | - Daniel L Baker
- Department of Chemistry, The University of Memphis, Memphis, TN, 38152, USA
| | - Abby L Parrill
- Department of Chemistry, The University of Memphis, Memphis, TN, 38152, USA.
| |
Collapse
|
34
|
Non-synonymous variation and protein structure of candidate genes associated with selection in farm and wild populations of turbot (Scophthalmus maximus). Sci Rep 2023; 13:3019. [PMID: 36810752 PMCID: PMC9944912 DOI: 10.1038/s41598-023-29826-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2022] [Accepted: 02/10/2023] [Indexed: 02/24/2023] Open
Abstract
Non-synonymous variation (NSV) of protein coding genes represents raw material for selection to improve adaptation to the diverse environmental scenarios in wild and livestock populations. Many aquatic species face variations in temperature, salinity and biological factors throughout their distribution range that is reflected by the presence of allelic clines or local adaptation. The turbot (Scophthalmus maximus) is a flatfish of great commercial value with a flourishing aquaculture which has promoted the development of genomic resources. In this study, we developed the first atlas of NSVs in the turbot genome by resequencing 10 individuals from Northeast Atlantic Ocean. More than 50,000 NSVs where detected in the ~ 21,500 coding genes of the turbot genome, and we selected 18 NSVs to be genotyped using a single Mass ARRAY multiplex on 13 wild populations and three turbot farms. We detected signals of divergent selection on several genes related to growth, circadian rhythms, osmoregulation and oxygen binding in the different scenarios evaluated. Furthermore, we explored the impact of NSVs identified on the 3D structure and functional relationship of the correspondent proteins. In summary, our study provides a strategy to identify NSVs in species with consistently annotated and assembled genomes to ascertain their role in adaptation.
Collapse
|
35
|
Manavi F, Sharma A, Sharma R, Tsunoda T, Shatabda S, Dehzangi I. CNN-Pred: Prediction of single-stranded and double-stranded DNA-binding protein using convolutional neural networks. Gene X 2023; 853:147045. [PMID: 36503892 DOI: 10.1016/j.gene.2022.147045] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Revised: 10/10/2022] [Accepted: 11/08/2022] [Indexed: 11/27/2022] Open
Abstract
DNA-binding proteins play a vital role in biological activity including DNA replication, DNA packing, and DNA reparation. DNA-binding proteins can be classified into single-stranded DNA-binding proteins (SSBs) or double-stranded DNA-binding proteins (DSBs). Determining whether a protein is DSB or SSB helps determine the protein's function. Therefore, many studies have been conducted to accurately identify DSB and SSB in recent years. Despite all the efforts have been made so far, the DSB and SSB prediction performance remains limited. In this study, we propose a new method called CNN-Pred to accurately predict DSB and SSB. To build CNN-Pred, we first extract evolutionary-based features in the form of mono-gram and bi-gram profiles using position specific scoring matrix (PSSM). We then, use 1D-convolutional neural network (CNN) as the classifier to our extracted features. Our results demonstrate that CNN-Pred can enhance the DSB and SSB prediction accuracies by more than 4%, on the independent test compared to previous studies found in the literature. CNN-pred as a standalone tool and all its source codes are publicly available at: https://github.com/MLBC-lab/CNN-Pred.
Collapse
Affiliation(s)
- Farnoush Manavi
- Computer Science and Engineering and Information Technology Department, Shiraz University, Shiraz, Iran
| | - Alok Sharma
- Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama 230-0045, Japan; Institute for Integrated and Intelligent Systems, Griffith University, Nathan, Brisbane, QLD 4111, Australia
| | - Ronesh Sharma
- School of Electrical and Electronics Engineering, Fiji National University, Suva, Fiji
| | - Tatsuhiko Tsunoda
- Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama 230-0045, Japan; Laboratory for Medical Science Mathematics, Department of Biological Sciences, School of Science, The University of Tokyo, Tokyo 113-0033, Japan; Laboratory for Medical Science Mathematics, Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Tokyo 113-0033, Japan
| | - Swakkhar Shatabda
- Department of Computer Science and Engineering, United International University, Dhaka, Bangladesh
| | - Iman Dehzangi
- Department of Computer Science, Rutgers University, Camden, NJ, USA; Center for Computational and Integrative Biology, Rutgers University, Camden, USA
| |
Collapse
|
36
|
Gogoi CR, Rahman A, Saikia B, Baruah A. Protein Dihedral Angle Prediction: The State of the Art. ChemistrySelect 2023. [DOI: 10.1002/slct.202203427] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Affiliation(s)
| | - Aziza Rahman
- Department of Chemistry Dibrugarh University Dibrugarh Assam India
| | - Bondeepa Saikia
- Department of Chemistry Dibrugarh University Dibrugarh Assam India
| | - Anupaul Baruah
- Department of Chemistry Dibrugarh University Dibrugarh Assam India
| |
Collapse
|
37
|
Ahmad W, Tayara H, Chong KT. Attention-Based Graph Neural Network for Molecular Solubility Prediction. ACS OMEGA 2023; 8:3236-3244. [PMID: 36713733 PMCID: PMC9878542 DOI: 10.1021/acsomega.2c06702] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/18/2022] [Accepted: 12/23/2022] [Indexed: 06/18/2023]
Abstract
Drug discovery (DD) research is aimed at the discovery of new medications. Solubility is an important physicochemical property in drug development. Active pharmaceutical ingredients (APIs) are essential substances for high drug efficacy. During DD research, aqueous solubility (AS) is a key physicochemical attribute required for API characterization. High-precision in silico solubility prediction reduces the experimental cost and time of drug development. Several artificial tools have been employed for solubility prediction using machine learning and deep learning techniques. This study aims to create different deep learning models that can predict the solubility of a wide range of molecules using the largest currently available solubility data set. Simplified molecular-input line-entry system (SMILES) strings were used as molecular representation, models developed using simple graph convolution, graph isomorphism network, graph attention network, and AttentiveFP network. Based on the performance of the models, the AttentiveFP-based network model was finally selected. The model was trained and tested on 9943 compounds. The model outperformed on 62 anticancer compounds with metric Pearson correlation R 2 and root-mean-square error values of 0.52 and 0.61, respectively. AS can be improved by graph algorithm improvement or more molecular properties addition.
Collapse
Affiliation(s)
- Waqar Ahmad
- Department
of Electronics and Information Engineering, Jeonbuk National University, Jeonju54896, South Korea
| | - Hilal Tayara
- School
of International Engineering and Science, Jeonbuk National University, Jeonju54896, South Korea
| | - Kil To Chong
- Department
of Electronics and Information Engineering, Jeonbuk National University, Jeonju54896, South Korea
- Advanced
Electronics and Information Research Center, Jeonbuk National University, Jeonju54896, South Korea
| |
Collapse
|
38
|
Liu Y, Zhang R, Li T, Jiang J, Ma J, Wang P. MolRoPE-BERT: An enhanced molecular representation with Rotary Position Embedding for molecular property prediction. J Mol Graph Model 2023; 118:108344. [PMID: 36242862 DOI: 10.1016/j.jmgm.2022.108344] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2022] [Revised: 09/21/2022] [Accepted: 09/21/2022] [Indexed: 11/28/2022]
Abstract
Molecular property prediction is a significant task in drug discovery. Most deep learning-based computational methods either develop unique chemical representation or combine complex model. However, researchers are less concerned with the possible advantages of enormous quantities of unlabeled molecular data. Since the obvious limited amount of labeled data available, this task becomes more difficult. In some senses, SMILES of the drug molecule may be regarded of as a language for chemistry, taking inspiration from natural language processing research and current advances in pretrained models. In this paper, we incorporated Rotary Position Embedding(RoPE) efficiently encode the position information of SMILES sequences, ultimately enhancing the capability of the BERT pretrained model to extract potential molecular substructure information for molecular property prediction. We proposed the MolRoPE-BERT framework, an new end-to-end deep learning framework that integrates an efficient position coding approach for capturing sequence position information with a pretrained BERT model for molecular property prediction. To generate useful molecular substructure embeddings, we first exclusively train the MolRoPE-BERT on four million unlabeled drug SMILES(i.e., ZINC 15 and ChEMBL 27). Then, we conduct a series of experiments to evaluate the performance of our proposed MolRoPE-BERT on four well-studied datasets. Compared with conventional and state-of-the-art baselines, our experiment demonstrated comparable or superior performance.
Collapse
Affiliation(s)
- Yunwu Liu
- School of Information Science and Engineering, Lanzhou University, TianshuiRoad, Lanzhou city, 730000, Lanzhou, China.
| | - Ruisheng Zhang
- School of Information Science and Engineering, Lanzhou University, TianshuiRoad, Lanzhou city, 730000, Lanzhou, China.
| | - Tongfeng Li
- School of Information Science and Engineering, Lanzhou University, TianshuiRoad, Lanzhou city, 730000, Lanzhou, China.
| | - Jing Jiang
- School of Information Science and Engineering, Lanzhou University, TianshuiRoad, Lanzhou city, 730000, Lanzhou, China.
| | - Jun Ma
- School of Information Science and Engineering, Lanzhou University, TianshuiRoad, Lanzhou city, 730000, Lanzhou, China.
| | - Ping Wang
- School of Information Science and Engineering, Lanzhou University, TianshuiRoad, Lanzhou city, 730000, Lanzhou, China.
| |
Collapse
|
39
|
Shea A, Bartz J, Zhang L, Dong X. Predicting mutational function using machine learning. MUTATION RESEARCH. REVIEWS IN MUTATION RESEARCH 2023; 791:108457. [PMID: 36965820 PMCID: PMC10239318 DOI: 10.1016/j.mrrev.2023.108457] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Revised: 03/11/2023] [Accepted: 03/20/2023] [Indexed: 03/27/2023]
Abstract
Genetic variations are one of the major causes of phenotypic variations between human individuals. Although beneficial as being the substrate of evolution, germline mutations may cause diseases, including Mendelian diseases and complex diseases such as diabetes and heart diseases. Mutations occurring in somatic cells are a main cause of cancer and likely cause age-related phenotypes and other age-related diseases. Because of the high abundance of genetic variations in the human genome, i.e., millions of germline variations per human subject and thousands of additional somatic mutations per cell, it is technically challenging to experimentally verify the function of every possible mutation and their interactions. Significant progress has been made to solve this problem using computational approaches, especially machine learning (ML). Here, we review the progress and achievements made in recent years in this field of research. We classify the computational models in two ways: one according to their prediction goals including protein structural alterations, gene expression changes, and disease risks, and the other according to their methodologies, including non-machine learning methods, classical machine learning methods, and deep neural network methods. For models in each category, we discuss their architecture, prediction accuracy, and potential limitations. This review provides new insights into the applications and future directions of computational approaches in understanding the role of mutations in aging and disease.
Collapse
Affiliation(s)
- Anthony Shea
- Institute on the Biology of Aging and Metabolism, University of Minnesota, Minneapolis, MN 55455, USA; Department of Genetics, Cell Biology and Development, University of Minnesota, Minneapolis, MN 55455, USA
| | - Josh Bartz
- Institute on the Biology of Aging and Metabolism, University of Minnesota, Minneapolis, MN 55455, USA; Department of Genetics, Cell Biology and Development, University of Minnesota, Minneapolis, MN 55455, USA; Bioinformatics and Computational Biology Program, University of Minnesota, Minneapolis, MN 55455, USA
| | - Lei Zhang
- Institute on the Biology of Aging and Metabolism, University of Minnesota, Minneapolis, MN 55455, USA; Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN 55455, USA
| | - Xiao Dong
- Institute on the Biology of Aging and Metabolism, University of Minnesota, Minneapolis, MN 55455, USA; Department of Genetics, Cell Biology and Development, University of Minnesota, Minneapolis, MN 55455, USA.
| |
Collapse
|
40
|
Nallasamy V, Seshiah M. Energy Profile Bayes and Thompson Optimized Convolutional Neural Network protein structure prediction. Neural Comput Appl 2023; 35:1983-2006. [PMID: 36245797 PMCID: PMC9542649 DOI: 10.1007/s00521-022-07868-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2021] [Accepted: 09/21/2022] [Indexed: 01/12/2023]
Abstract
In living organisms, proteins are considered as the executants of biological functions. Owing to its pivotal role played in protein folding patterns, comprehension of protein structure is a challenging issue. Moreover, owing to numerous protein sequence exploration in protein data banks and complication of protein structures, experimental methods are found to be inadequate for protein structural class prediction. Hence, it is very much advantageous to design a reliable computational method to predict protein structural classes from protein sequences. In the recent few years there has been an elevated interest in using deep learning to assist protein structure prediction as protein structure prediction models can be utilized to screen a large number of novel sequences. In this regard, we propose a model employing Energy Profile for atom pairs in conjunction with the Legion-Class Bayes function called Energy Profile Legion-Class Bayes Protein Structure Identification model. Followed by this, we use a Thompson Optimized convolutional neural network to extract features between amino acids and then the Thompson Optimized SoftMax function is employed to extract associations between protein sequences for predicting secondary protein structure. The proposed Energy Profile Bayes and Thompson Optimized Convolutional Neural Network (EPB-OCNN) method tested distinct unique protein data and was compared to the state-of-the-art methods, the Template-Based Modeling, Protein Design using Deep Graph Neural Networks, a deep learning-based S-glutathionylation sites prediction tool called a Computational Framework, the Deep Learning and a distance-based protein structure prediction using deep learning. The results obtained when applied with the Biopython tool with respect to protein structure prediction time, protein structure prediction accuracy, specificity, recall, F-measure, and precision, respectively, are measured. The proposed EPB-OCNN method outperformed the state-of-the-art methods, thereby corroborating the objective.
Collapse
Affiliation(s)
- Varanavasi Nallasamy
- Cognizant Technology Solutions Pvt. Ltd, CHIL SEZ IT Park, Keeranatham, Saravanam Patti, Coimbatore, Tamil Nadu 641035 India
| | - Malarvizhi Seshiah
- Department of Computer Science, Thiruvalluvar Government Arts College, Rasipuram, Namakkal, Tamil Nadu India
| |
Collapse
|
41
|
Choi YN, Cho N, Lee K, Gwon DA, Lee JW, Lee J. Programmable Synthesis of Biobased Materials Using Cell-Free Systems. ADVANCED MATERIALS (DEERFIELD BEACH, FLA.) 2023; 35:e2203433. [PMID: 36108274 DOI: 10.1002/adma.202203433] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/15/2022] [Revised: 08/26/2022] [Indexed: 06/15/2023]
Abstract
Motivated by the intricate mechanisms underlying biomolecule syntheses in cells that chemistry is currently unable to mimic, researchers have harnessed biological systems for manufacturing novel materials. Cell-free systems (CFSs) utilizing the bioactivity of transcriptional and translational machineries in vitro are excellent tools that allow supplementation of exogenous materials for production of innovative materials beyond the capability of natural biological systems. Herein, recent studies that have advanced the ability to expand the scope of biobased materials using CFS are summarized and approaches enabling the production of high-value materials, prototyping of genetic parts and modules, and biofunctionalization are discussed. By extending the reach of chemical and enzymatic reactions complementary to cellular materials, CFSs provide new opportunities at the interface of materials science and synthetic biology.
Collapse
Affiliation(s)
- Yun-Nam Choi
- Department of Chemical Engineering, Pohang University of Science and Technology (POSTECH), Pohang, 37673, Republic of Korea
| | - Namjin Cho
- Department of Chemical Engineering, Pohang University of Science and Technology (POSTECH), Pohang, 37673, Republic of Korea
| | - Kanghun Lee
- School of Interdisciplinary Bioscience and Bioengineering (I-Bio), Pohang University of Science and Technology (POSTECH), Pohang, 37673, Republic of Korea
| | - Da-Ae Gwon
- Department of Chemical Engineering, Pohang University of Science and Technology (POSTECH), Pohang, 37673, Republic of Korea
| | - Jeong Wook Lee
- Department of Chemical Engineering, Pohang University of Science and Technology (POSTECH), Pohang, 37673, Republic of Korea
- School of Interdisciplinary Bioscience and Bioengineering (I-Bio), Pohang University of Science and Technology (POSTECH), Pohang, 37673, Republic of Korea
| | - Joongoo Lee
- Department of Chemical Engineering, Pohang University of Science and Technology (POSTECH), Pohang, 37673, Republic of Korea
- School of Interdisciplinary Bioscience and Bioengineering (I-Bio), Pohang University of Science and Technology (POSTECH), Pohang, 37673, Republic of Korea
| |
Collapse
|
42
|
Bartuzi D, Kaczor AA, Matosiuk D. Illuminating the "Twilight Zone": Advances in Difficult Protein Modeling. Methods Mol Biol 2023; 2627:25-40. [PMID: 36959440 DOI: 10.1007/978-1-0716-2974-1_2] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/25/2023]
Abstract
Homology modeling was long considered a method of choice in tertiary protein structure prediction. However, it used to provide models of acceptable quality only when templates with appreciable sequence identity with a target could be found. The threshold value was long assumed to be around 20-30%. Below this level, obtained sequence identity was getting dangerously close to values that can be obtained by chance, after aligning any random, unrelated sequences. In these cases, other approaches, including ab initio folding simulations or fragment assembly, were usually employed. The most recent editions of the CASP and CAMEO community-wide modeling methods assessment have brought some surprising outcomes, proving that much more clues can be inferred from protein sequence analyses than previously thought. In this chapter, we focus on recent advances in the field of difficult protein modeling, pushing the threshold deep into the "twilight zone", with particular attention devoted to improvements in applications of machine learning and model evaluation.
Collapse
Affiliation(s)
- Damian Bartuzi
- Department of Synthesis and Chemical Technology of Pharmaceutical Substances with Computer Modelling Laboratory, Medical University of Lublin, Lublin, Poland.
| | - Agnieszka A Kaczor
- Department of Synthesis and Chemical Technology of Pharmaceutical Substances with Computer Modelling Laboratory, Medical University of Lublin, Lublin, Poland
- University of Eastern Finland, School of Pharmacy, Kuopio, Finland
| | - Dariusz Matosiuk
- Department of Synthesis and Chemical Technology of Pharmaceutical Substances with Computer Modelling Laboratory, Medical University of Lublin, Lublin, Poland
| |
Collapse
|
43
|
Mufassirin MMM, Newton MAH, Sattar A. Artificial intelligence for template-free protein structure prediction: a comprehensive review. Artif Intell Rev 2022. [DOI: 10.1007/s10462-022-10350-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
|
44
|
Kaushik R, Zhang KY. An Integrated Protein Structure Fitness Scoring Approach for Identifying Native-Like Model Structures. Comput Struct Biotechnol J 2022; 20:6467-6472. [DOI: 10.1016/j.csbj.2022.11.032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Revised: 11/14/2022] [Accepted: 11/14/2022] [Indexed: 11/18/2022] Open
|
45
|
Markandan K, Tiong YW, Sankaran R, Subramanian S, Markandan UD, Chaudhary V, Numan A, Khalid M, Walvekar R. Emergence of infectious diseases and role of advanced nanomaterials in point-of-care diagnostics: a review. Biotechnol Genet Eng Rev 2022:1-89. [PMID: 36243900 DOI: 10.1080/02648725.2022.2127070] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2022] [Accepted: 09/12/2022] [Indexed: 11/09/2022]
Abstract
Infectious outbreaks are the foremost global public health concern, challenging the current healthcare system, which claims millions of lives annually. The most crucial way to control an infectious outbreak is by early detection through point-of-care (POC) diagnostics. POC diagnostics are highly advantageous owing to the prompt diagnosis, which is economical, simple and highly efficient with remote access capabilities. In particular, utilization of nanomaterials to architect POC devices has enabled highly integrated and portable (compact) devices with enhanced efficiency. As such, this review will detail the factors influencing the emergence of infectious diseases and methods for fast and accurate detection, thus elucidating the underlying factors of these infections. Furthermore, it comprehensively highlights the importance of different nanomaterials in POCs to detect nucleic acid, whole pathogens, proteins and antibody detection systems. Finally, we summarize findings reported on nanomaterials based on advanced POCs such as lab-on-chip, lab-on-disc-devices, point-of-action and hospital-on-chip. To this end, we discuss the challenges, potential solutions, prospects of integrating internet-of-things, artificial intelligence, 5G communications and data clouding to achieve intelligent POCs.
Collapse
Affiliation(s)
- Kalaimani Markandan
- Temasek Laboratories, Nanyang Technological University, Nanyang Drive, Singapore
- Faculty of Engineering, Technology and Built Environment, UCSI University, Kuala Lumpur, Malaysia
| | - Yong Wei Tiong
- NUS Environmental Research Institute, National University of Singapore, Engineering Drive, Singapore
| | - Revathy Sankaran
- Graduate School, University of Nottingham Malaysia Campus, Semenyih, Selangor, Malaysia
| | - Sakthinathan Subramanian
- Department of Materials & Mineral Resources Engineering, National Taipei University of Technology (NTUT), Taipei, Taiwan
| | | | - Vishal Chaudhary
- Research Cell & Department of Physics, Bhagini Nivedita College, University of Delhi, New Delhi, India
| | - Arshid Numan
- Graphene & Advanced 2D Materials Research Group (GAMRG), School of Engineering and Technology, Sunway University, Petaling Jaya, Selangor, Malaysia
- Sunway Materials Smart Science & Engineering (SMS2E) Research Cluster School of Engineering and Technology, Sunway University, Selangor, Malaysia
| | - Mohammad Khalid
- Graphene & Advanced 2D Materials Research Group (GAMRG), School of Engineering and Technology, Sunway University, Petaling Jaya, Selangor, Malaysia
- Sunway Materials Smart Science & Engineering (SMS2E) Research Cluster School of Engineering and Technology, Sunway University, Selangor, Malaysia
| | - Rashmi Walvekar
- Department of Chemical Engineering, School of Energy and Chemical Engineering, Xiamen University Malaysia, Sepang, Selangor, Malaysia
| |
Collapse
|
46
|
Protein Function Analysis through Machine Learning. Biomolecules 2022; 12:biom12091246. [PMID: 36139085 PMCID: PMC9496392 DOI: 10.3390/biom12091246] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2022] [Revised: 08/22/2022] [Accepted: 08/31/2022] [Indexed: 11/16/2022] Open
Abstract
Machine learning (ML) has been an important arsenal in computational biology used to elucidate protein function for decades. With the recent burgeoning of novel ML methods and applications, new ML approaches have been incorporated into many areas of computational biology dealing with protein function. We examine how ML has been integrated into a wide range of computational models to improve prediction accuracy and gain a better understanding of protein function. The applications discussed are protein structure prediction, protein engineering using sequence modifications to achieve stability and druggability characteristics, molecular docking in terms of protein–ligand binding, including allosteric effects, protein–protein interactions and protein-centric drug discovery. To quantify the mechanisms underlying protein function, a holistic approach that takes structure, flexibility, stability, and dynamics into account is required, as these aspects become inseparable through their interdependence. Another key component of protein function is conformational dynamics, which often manifest as protein kinetics. Computational methods that use ML to generate representative conformational ensembles and quantify differences in conformational ensembles important for function are included in this review. Future opportunities are highlighted for each of these topics.
Collapse
|
47
|
Bongirwar V, Mokhade AS. Different methods, techniques and their limitations in protein structure prediction: A review. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2022; 173:72-82. [PMID: 35588858 DOI: 10.1016/j.pbiomolbio.2022.05.002] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/23/2021] [Revised: 04/16/2022] [Accepted: 05/11/2022] [Indexed: 11/17/2022]
Abstract
Because of the increase in different types of diseases in human habitats, demands for designing various types of drugs are also increasing. Protein and its structure play a very important role in drug design. Therefore researchers from different areas like mathematics, medicines, and computer science are teaming up for getting better solutions in the said field. In this paper, we have discussed different methods of secondary and tertiary protein structure prediction (PSP), along with the limitations of different approaches. Different types of datasets used in PSP are also discussed here. This paper also tells about different performance measures to evaluate the prediction accuracy of PSP methods. Different software's/servers are available for download, which are used to find the protein structures for the input protein sequence. These softwares will also help to compare the performance of any new algorithm with other available methods. Details of those softwares are also mentioned in this paper.
Collapse
Affiliation(s)
| | - A S Mokhade
- Visvesvaraya National Institute of Technology, Nagpur, India
| |
Collapse
|
48
|
Kekenes-Huskey PM, Burgess DE, Sun B, Bartos DC, Rozmus ER, Anderson CL, January CT, Eckhardt LL, Delisle BP. Mutation-Specific Differences in Kv7.1 ( KCNQ1) and Kv11.1 ( KCNH2) Channel Dysfunction and Long QT Syndrome Phenotypes. Int J Mol Sci 2022; 23:7389. [PMID: 35806392 PMCID: PMC9266926 DOI: 10.3390/ijms23137389] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2022] [Revised: 06/22/2022] [Accepted: 06/24/2022] [Indexed: 11/16/2022] Open
Abstract
The electrocardiogram (ECG) empowered clinician scientists to measure the electrical activity of the heart noninvasively to identify arrhythmias and heart disease. Shortly after the standardization of the 12-lead ECG for the diagnosis of heart disease, several families with autosomal recessive (Jervell and Lange-Nielsen Syndrome) and dominant (Romano-Ward Syndrome) forms of long QT syndrome (LQTS) were identified. An abnormally long heart rate-corrected QT-interval was established as a biomarker for the risk of sudden cardiac death. Since then, the International LQTS Registry was established; a phenotypic scoring system to identify LQTS patients was developed; the major genes that associate with typical forms of LQTS were identified; and guidelines for the successful management of patients advanced. In this review, we discuss the molecular and cellular mechanisms for LQTS associated with missense variants in KCNQ1 (LQT1) and KCNH2 (LQT2). We move beyond the "benign" to a "pathogenic" binary classification scheme for different KCNQ1 and KCNH2 missense variants and discuss gene- and mutation-specific differences in K+ channel dysfunction, which can predispose people to distinct clinical phenotypes (e.g., concealed, pleiotropic, severe, etc.). We conclude by discussing the emerging computational structural modeling strategies that will distinguish between dysfunctional subtypes of KCNQ1 and KCNH2 variants, with the goal of realizing a layered precision medicine approach focused on individuals.
Collapse
Affiliation(s)
- Peter M. Kekenes-Huskey
- Department of Cell and Molecular Physiology, Stritch School of Medicine, Loyola University Chicago, Maywood, IL 60153, USA
| | - Don E. Burgess
- Department of Physiology, College of Medicine, University of Kentucky, Lexington, KY 40536, USA; (D.E.B.); (E.R.R.)
| | - Bin Sun
- Department of Pharmacology, Harbin Medical University, Harbin 150081, China;
| | | | - Ezekiel R. Rozmus
- Department of Physiology, College of Medicine, University of Kentucky, Lexington, KY 40536, USA; (D.E.B.); (E.R.R.)
| | - Corey L. Anderson
- Cellular and Molecular Arrythmias Program, Division of Cardiovascular Medicine, Department of Medicine, University of Wisconsin-Madison, Madison, WI 53705, USA; (C.L.A.); (C.T.J.); (L.L.E.)
| | - Craig T. January
- Cellular and Molecular Arrythmias Program, Division of Cardiovascular Medicine, Department of Medicine, University of Wisconsin-Madison, Madison, WI 53705, USA; (C.L.A.); (C.T.J.); (L.L.E.)
| | - Lee L. Eckhardt
- Cellular and Molecular Arrythmias Program, Division of Cardiovascular Medicine, Department of Medicine, University of Wisconsin-Madison, Madison, WI 53705, USA; (C.L.A.); (C.T.J.); (L.L.E.)
| | - Brian P. Delisle
- Department of Physiology, College of Medicine, University of Kentucky, Lexington, KY 40536, USA; (D.E.B.); (E.R.R.)
| |
Collapse
|
49
|
|
50
|
Lewis‐Atwell T, Townsend PA, Grayson MN. Machine learning activation energies of chemical reactions. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2022. [DOI: 10.1002/wcms.1593] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Affiliation(s)
- Toby Lewis‐Atwell
- Department of Computer Science, Faculty of Science University of Bath Bath UK
| | - Piers A. Townsend
- Department of Chemistry, Faculty of Science University of Bath Bath UK
| | | |
Collapse
|