1
|
Wuyun Q, Chen Y, Shen Y, Cao Y, Hu G, Cui W, Gao J, Zheng W. Recent Progress of Protein Tertiary Structure Prediction. Molecules 2024; 29:832. [PMID: 38398585 PMCID: PMC10893003 DOI: 10.3390/molecules29040832] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2023] [Revised: 02/06/2024] [Accepted: 02/08/2024] [Indexed: 02/25/2024] Open
Abstract
The prediction of three-dimensional (3D) protein structure from amino acid sequences has stood as a significant challenge in computational and structural bioinformatics for decades. Recently, the widespread integration of artificial intelligence (AI) algorithms has substantially expedited advancements in protein structure prediction, yielding numerous significant milestones. In particular, the end-to-end deep learning method AlphaFold2 has facilitated the rise of structure prediction performance to new heights, regularly competitive with experimental structures in the 14th Critical Assessment of Protein Structure Prediction (CASP14). To provide a comprehensive understanding and guide future research in the field of protein structure prediction for researchers, this review describes various methodologies, assessments, and databases in protein structure prediction, including traditionally used protein structure prediction methods, such as template-based modeling (TBM) and template-free modeling (FM) approaches; recently developed deep learning-based methods, such as contact/distance-guided methods, end-to-end folding methods, and protein language model (PLM)-based methods; multi-domain protein structure prediction methods; the CASP experiments and related assessments; and the recently released AlphaFold Protein Structure Database (AlphaFold DB). We discuss their advantages, disadvantages, and application scopes, aiming to provide researchers with insights through which to understand the limitations, contexts, and effective selections of protein structure prediction methods in protein-related fields.
Collapse
Affiliation(s)
- Qiqige Wuyun
- Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| | - Yihan Chen
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China;
| | - Yifeng Shen
- Faculty of Environment and Information Studies, Keio University, Fujisawa 252-0882, Kanagawa, Japan;
| | - Yang Cao
- College of Life Sciences, Sichuan University, Chengdu 610065, China
| | - Gang Hu
- NITFID, School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin 300071, China
| | - Wei Cui
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China;
| | - Jianzhao Gao
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China;
| | - Wei Zheng
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
2
|
Begum MN, Mahtarin R, Islam MT, Ahmed S, Konika TK, Mannoor K, Akhteruzzaman S, Qadri F. Molecular investigation of TSHR gene in Bangladeshi congenital hypothyroid patients. PLoS One 2023; 18:e0282553. [PMID: 37561783 PMCID: PMC10414570 DOI: 10.1371/journal.pone.0282553] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Accepted: 07/11/2023] [Indexed: 08/12/2023] Open
Abstract
The disorder of thyroid gland development or thyroid dysgenesis accounts for 80-85% of congenital hypothyroidism (CH) cases. Mutations in the TSHR gene are mostly associated with thyroid dysgenesis, and prevent or disrupt normal development of the gland. There is limited data available on the genetic spectrum of congenital hypothyroid children in Bangladesh. Thus, an understanding of the molecular aetiology of thyroid dysgenesis is a prerequisite. The aim of the study was to investigate the effect of mutations in the TSHR gene on the small molecule thyrogenic drug-binding site of the protein. We identified two nonsynonymous mutations (p.Ser508Leu, p.Glu727Asp) in the exon 10 of the TSHR gene in 21 patients with dysgenesis by sequencing-based analysis. Later, the TSHR368-764 protein was modeled by the I-TASSER server for wild-type and mutant structures. The model proteins were targeted by thyrogenic drugs, MS437 and MS438 to perceive the effect of mutations. The damaging effect in drug-protein complexes of mutants was explored by molecular docking and molecular dynamics simulations. The binding affinity of wild-type protein was much higher than the mutant cases for both of the drug ligands (MS437 and MS438). Molecular dynamics simulates the dynamic behavior of wild-type and mutant complexes. MS437-TSHR368-764MT2 and MS438-TSHR368-764MT1 showed stable conformations in biological environments. Finally, Principle Component Analysis revealed structural and energy profile discrepancies. TSHR368-764MT1 exhibited much more variations than TSHR368-764WT and TSHR368-764MT2, emphasizing a more damaging pattern in TSHR368-764MT1. This genetic study might be helpful to explore the mutational impact on drug binding sites of TSHR protein which is important for future drug design and selection for the treatment of congenital hypothyroid children with dysgenesis.
Collapse
Affiliation(s)
- Mst. Noorjahan Begum
- Institute for Developing Science and Health Initiatives (ideSHi), ECB Chattar, Mirpur, Dhaka, Bangladesh
- Department of Genetic Engineering & Biotechnology, University of Dhaka, Dhaka, Bangladesh
- Virology Laboratory, Infectious Diseases Division, International Centre for Diarrhoeal Disease Research, Bangladesh, Mohakhali, Dhaka, Bangladesh
| | - Rumana Mahtarin
- Institute for Developing Science and Health Initiatives (ideSHi), ECB Chattar, Mirpur, Dhaka, Bangladesh
- Department of Biochemistry and Molecular Biology, Shahjalal University of Science and Technology, Sylhet, Bangladesh
| | - Md. Tarikul Islam
- Institute for Developing Science and Health Initiatives (ideSHi), ECB Chattar, Mirpur, Dhaka, Bangladesh
| | - Sinthyia Ahmed
- Division of Computer Aided Drug Design, The Red-Green Research Centre, BICCB, Tejgaon, Dhaka, Bangladesh
| | - Tasnia Kawsar Konika
- Nuclear Medicine and Allied Sciences, Bangabandhu Sheikh Mujib Medical University (BSMMU), Shahbag, Dhaka, Bangladesh
| | - Kaiissar Mannoor
- Institute for Developing Science and Health Initiatives (ideSHi), ECB Chattar, Mirpur, Dhaka, Bangladesh
| | - Sharif Akhteruzzaman
- Department of Genetic Engineering & Biotechnology, University of Dhaka, Dhaka, Bangladesh
| | - Firdausi Qadri
- Institute for Developing Science and Health Initiatives (ideSHi), ECB Chattar, Mirpur, Dhaka, Bangladesh
- Mucosal Immunology and Vaccinology, Infectious Diseases Division, International Centre for Diarrhoeal Disease Research, Bangladesh, Mohakhali, Dhaka, Bangladesh
| |
Collapse
|
3
|
Lee M. Recent Advances in Deep Learning for Protein-Protein Interaction Analysis: A Comprehensive Review. Molecules 2023; 28:5169. [PMID: 37446831 DOI: 10.3390/molecules28135169] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Revised: 06/30/2023] [Accepted: 06/30/2023] [Indexed: 07/15/2023] Open
Abstract
Deep learning, a potent branch of artificial intelligence, is steadily leaving its transformative imprint across multiple disciplines. Within computational biology, it is expediting progress in the understanding of Protein-Protein Interactions (PPIs), key components governing a wide array of biological functionalities. Hence, an in-depth exploration of PPIs is crucial for decoding the intricate biological system dynamics and unveiling potential avenues for therapeutic interventions. As the deployment of deep learning techniques in PPI analysis proliferates at an accelerated pace, there exists an immediate demand for an exhaustive review that encapsulates and critically assesses these novel developments. Addressing this requirement, this review offers a detailed analysis of the literature from 2021 to 2023, highlighting the cutting-edge deep learning methodologies harnessed for PPI analysis. Thus, this review stands as a crucial reference for researchers in the discipline, presenting an overview of the recent studies in the field. This consolidation helps elucidate the dynamic paradigm of PPI analysis, the evolution of deep learning techniques, and their interdependent dynamics. This scrutiny is expected to serve as a vital aid for researchers, both well-established and newcomers, assisting them in maneuvering the rapidly shifting terrain of deep learning applications in PPI analysis.
Collapse
Affiliation(s)
- Minhyeok Lee
- School of Electrical and Electronics Engineering, Chung-Ang University, Seoul 06974, Republic of Korea
| |
Collapse
|
4
|
Guerler A, Baker D, van den Beek M, Gruening B, Bouvier D, Coraor N, Shank SD, Zehr JD, Schatz MC, Nekrutenko A. Fast and accurate genome-wide predictions and structural modeling of protein-protein interactions using Galaxy. BMC Bioinformatics 2023; 24:263. [PMID: 37353753 PMCID: PMC10288729 DOI: 10.1186/s12859-023-05389-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Accepted: 06/15/2023] [Indexed: 06/25/2023] Open
Abstract
BACKGROUND Protein-protein interactions play a crucial role in almost all cellular processes. Identifying interacting proteins reveals insight into living organisms and yields novel drug targets for disease treatment. Here, we present a publicly available, automated pipeline to predict genome-wide protein-protein interactions and produce high-quality multimeric structural models. RESULTS Application of our method to the Human and Yeast genomes yield protein-protein interaction networks similar in quality to common experimental methods. We identified and modeled Human proteins likely to interact with the papain-like protease of SARS-CoV2's non-structural protein 3. We also produced models of SARS-CoV2's spike protein (S) interacting with myelin-oligodendrocyte glycoprotein receptor and dipeptidyl peptidase-4. CONCLUSIONS The presented method is capable of confidently identifying interactions while providing high-quality multimeric structural models for experimental validation. The interactome modeling pipeline is available at usegalaxy.org and usegalaxy.eu.
Collapse
Affiliation(s)
- Aysam Guerler
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.
| | - Dannon Baker
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Marius van den Beek
- Department of Biochemistry and Molecular Biology, Penn State University, College Park, PA, USA
| | - Bjoern Gruening
- Department of Bioinformatics, Freiburg University, Freiburg, Germany
| | - Dave Bouvier
- Department of Biochemistry and Molecular Biology, Penn State University, College Park, PA, USA
| | - Nate Coraor
- Department of Biochemistry and Molecular Biology, Penn State University, College Park, PA, USA
| | - Stephen D Shank
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA
| | - Jordan D Zehr
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, USA
| | - Michael C Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Anton Nekrutenko
- Department of Biochemistry and Molecular Biology, Penn State University, College Park, PA, USA
| |
Collapse
|
5
|
Pelletier JF, Glass JI, Strychalski EA. Cellular mechanics during division of a genomically minimal cell. Trends Cell Biol 2022; 32:900-907. [PMID: 35907702 DOI: 10.1016/j.tcb.2022.06.009] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2022] [Revised: 06/17/2022] [Accepted: 06/20/2022] [Indexed: 01/21/2023]
Abstract
Genomically minimal cells, such as JCVI-syn3.0 and JCVI-syn3A, offer an empowering framework to study relationships between genotype and phenotype. With a polygenic basis, the fundamental physiological process of cell division depends on multiple genes of known and unknown function in JCVI-syn3A. A physical description of cellular mechanics can further understanding of the contributions of genes to cell division in this genomically minimal context. We review current knowledge on genes in JCVI-syn3A contributing to two physical parameters relevant to cell division, namely, the surface-area-to-volume ratio and membrane curvature. This physical view of JCVI-syn3A may inform the attribution of gene functions and conserved processes in bacterial physiology, as well as whole-cell models and the engineering of synthetic cells.
Collapse
Affiliation(s)
- James F Pelletier
- Centro Nacional de Biotecnología, 28049 Madrid, Spain; Department of Systems Biology, Harvard Medical School, Boston, MA 02115, USA
| | - John I Glass
- J. Craig Venter Institute, La Jolla, CA 92037, USA
| | | |
Collapse
|
6
|
Soleymani F, Paquet E, Viktor H, Michalowski W, Spinello D. Protein-protein interaction prediction with deep learning: A comprehensive review. Comput Struct Biotechnol J 2022; 20:5316-5341. [PMID: 36212542 PMCID: PMC9520216 DOI: 10.1016/j.csbj.2022.08.070] [Citation(s) in RCA: 34] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Revised: 08/29/2022] [Accepted: 08/30/2022] [Indexed: 11/15/2022] Open
Abstract
Most proteins perform their biological function by interacting with themselves or other molecules. Thus, one may obtain biological insights into protein functions, disease prevalence, and therapy development by identifying protein-protein interactions (PPI). However, finding the interacting and non-interacting protein pairs through experimental approaches is labour-intensive and time-consuming, owing to the variety of proteins. Hence, protein-protein interaction and protein-ligand binding problems have drawn attention in the fields of bioinformatics and computer-aided drug discovery. Deep learning methods paved the way for scientists to predict the 3-D structure of proteins from genomes, predict the functions and attributes of a protein, and modify and design new proteins to provide desired functions. This review focuses on recent deep learning methods applied to problems including predicting protein functions, protein-protein interaction and their sites, protein-ligand binding, and protein design.
Collapse
Affiliation(s)
- Farzan Soleymani
- Department of Mechanical Engineering, University of Ottawa, Ottawa, ON, Canada
| | - Eric Paquet
- National Research Council, 1200 Montreal Road, Ottawa, ON K1A 0R6, Canada
| | - Herna Viktor
- School of Electrical Engineering and Computer Science, University of Ottawa, ON, Canada
| | | | - Davide Spinello
- Department of Mechanical Engineering, University of Ottawa, Ottawa, ON, Canada
| |
Collapse
|
7
|
Bianchi D, Pelletier JF, Hutchison CA, Glass JI, Luthey-Schulten Z. Toward the Complete Functional Characterization of a Minimal Bacterial Proteome. J Phys Chem B 2022; 126:6820-6834. [PMID: 36048731 PMCID: PMC9483919 DOI: 10.1021/acs.jpcb.2c04188] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2022] [Revised: 08/10/2022] [Indexed: 11/29/2022]
Abstract
Recently, we presented a whole-cell kinetic model of the genetically minimal bacterium JCVI-syn3A that described the coupled metabolic and genetic information processes and predicted behaviors emerging from the interactions among these networks. JCVI-syn3A is a genetically reduced bacterial cell that has the fewest number and smallest fraction of genes of unclear function, with approximately 90 of its 452 protein-coding genes (that is less than 20%) unannotated. Further characterization of unclear JCVI-syn3A genes strengthens the robustness and predictive power of cell modeling efforts and can lead to a deeper understanding of biophysical processes and pathways at the cell scale. Here, we apply computational analyses to elucidate the functions of the products of several essential but previously uncharacterized genes involved in integral cellular processes, particularly those directly affecting cell growth, division, and morphology. We also suggest directed wet-lab experiments informed by our analyses to further understand these "missing puzzle pieces" that are an essential part of the mosaic of biological interactions present in JCVI-syn3A. Our workflow leverages evolutionary sequence analysis, protein structure prediction, interactomics, and genome architecture to determine upgraded annotations. Additionally, we apply the structure prediction analysis component of our work to all 452 protein coding genes in JCVI-syn3A to expedite future functional annotation studies as well as the inverse mapping of the cell state to more physical models requiring all-atom or coarse-grained representations for all JCVI-syn3A proteins.
Collapse
Affiliation(s)
- David
M. Bianchi
- Department
of Chemistry, University of Illinois Urbana−Champaign, 600 S Mathews Ave, Urbana, Illinois 61801, United States
| | - James F. Pelletier
- Centro
Nacional de Biotecnologia, Calle Darwin no. 3, 28049 Madrid, Spain
| | - Clyde A. Hutchison
- J.
Craig Venter Institute, 4120 Capricorn Ln. La Jolla, California 92037, United States
| | - John I. Glass
- J.
Craig Venter Institute, 4120 Capricorn Ln. La Jolla, California 92037, United States
| | - Zaida Luthey-Schulten
- Department
of Chemistry, University of Illinois Urbana−Champaign, 600 S Mathews Ave, Urbana, Illinois 61801, United States
| |
Collapse
|
8
|
LeBlanc N, Charles TC. Bacterial genome reductions: Tools, applications, and challenges. Front Genome Ed 2022; 4:957289. [PMID: 36120530 PMCID: PMC9473318 DOI: 10.3389/fgeed.2022.957289] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2022] [Accepted: 07/29/2022] [Indexed: 11/16/2022] Open
Abstract
Bacterial cells are widely used to produce value-added products due to their versatility, ease of manipulation, and the abundance of genome engineering tools. However, the efficiency of producing these desired biomolecules is often hindered by the cells’ own metabolism, genetic instability, and the toxicity of the product. To overcome these challenges, genome reductions have been performed, making strains with the potential of serving as chassis for downstream applications. Here we review the current technologies that enable the design and construction of such reduced-genome bacteria as well as the challenges that limit their assembly and applicability. While genomic reductions have shown improvement of many cellular characteristics, a major challenge still exists in constructing these cells efficiently and rapidly. Computational tools have been created in attempts at minimizing the time needed to design these organisms, but gaps still exist in modelling these reductions in silico. Genomic reductions are a promising avenue for improving the production of value-added products, constructing chassis cells, and for uncovering cellular function but are currently limited by their time-consuming construction methods. With improvements to and the creation of novel genome editing tools and in silico models, these approaches could be combined to expedite this process and create more streamlined and efficient cell factories.
Collapse
Affiliation(s)
- Nicole LeBlanc
- Department of Biology, University of Waterloo, Waterloo, ON, Canada
- *Correspondence: Nicole LeBlanc,
| | - Trevor C. Charles
- Department of Biology, University of Waterloo, Waterloo, ON, Canada
- Metagenom Bio Life Science Inc., Waterloo, ON, Canada
| |
Collapse
|
9
|
Kondratyeva L, Alekseenko I, Chernov I, Sverdlov E. Data Incompleteness May form a Hard-to-Overcome Barrier to Decoding Life's Mechanism. BIOLOGY 2022; 11:1208. [PMID: 36009835 PMCID: PMC9404739 DOI: 10.3390/biology11081208] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/19/2022] [Revised: 08/03/2022] [Accepted: 08/10/2022] [Indexed: 11/23/2022]
Abstract
In this brief review, we attempt to demonstrate that the incompleteness of data, as well as the intrinsic heterogeneity of biological systems, may form very strong and possibly insurmountable barriers for researchers trying to decipher the mechanisms of the functioning of live systems. We illustrate this challenge using the two most studied organisms: E. coli, with 34.6% genes lacking experimental evidence of function, and C. elegans, with identified proteins for approximately 50% of its genes. Another striking example is an artificial unicellular entity named JCVI-syn3.0, with a minimal set of genes. A total of 31.5% of the genes of JCVI-syn3.0 cannot be ascribed a specific biological function. The human interactome mapping project identified only 5-10% of all protein interactions in humans. In addition, most of the available data are static snapshots, and it is barely possible to generate realistic models of the dynamic processes within cells. Moreover, the existing interactomes reflect the de facto interaction but not its functional result, which is an unpredictable emerging property. Perhaps the completeness of molecular data on any living organism is beyond our reach and represents an unsolvable problem in biology.
Collapse
Affiliation(s)
- Liya Kondratyeva
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry of the Russian Academy of Sciences, Moscow 117997, Russia
| | - Irina Alekseenko
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry of the Russian Academy of Sciences, Moscow 117997, Russia
- Institute of Molecular Genetics of National Research Centre “Kurchatov Institute”, Moscow 123182, Russia
| | - Igor Chernov
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry of the Russian Academy of Sciences, Moscow 117997, Russia
| | - Eugene Sverdlov
- Institute of Molecular Genetics of National Research Centre “Kurchatov Institute”, Moscow 123182, Russia
- Kurchatov Center for Genome Research, National Research Center “Kurchatov Institute”, Moscow 123182, Russia
| |
Collapse
|
10
|
I-TASSER-MTD: a deep-learning-based platform for multi-domain protein structure and function prediction. Nat Protoc 2022; 17:2326-2353. [PMID: 35931779 DOI: 10.1038/s41596-022-00728-0] [Citation(s) in RCA: 139] [Impact Index Per Article: 69.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2022] [Accepted: 05/24/2022] [Indexed: 01/17/2023]
Abstract
Most proteins in cells are composed of multiple folding units (or domains) to perform complex functions in a cooperative manner. Relative to the rapid progress in single-domain structure prediction, there are few effective tools available for multi-domain protein structure assembly, mainly due to the complexity of modeling multi-domain proteins, which involves higher degrees of freedom in domain-orientation space and various levels of continuous and discontinuous domain assembly and linker refinement. To meet the challenge and the high demand of the community, we developed I-TASSER-MTD to model the structures and functions of multi-domain proteins through a progressive protocol that combines sequence-based domain parsing, single-domain structure folding, inter-domain structure assembly and structure-based function annotation in a fully automated pipeline. Advanced deep-learning models have been incorporated into each of the steps to enhance both the domain modeling and inter-domain assembly accuracy. The protocol allows for the incorporation of experimental cross-linking data and cryo-electron microscopy density maps to guide the multi-domain structure assembly simulations. I-TASSER-MTD is built on I-TASSER but substantially extends its ability and accuracy in modeling large multi-domain protein structures and provides meaningful functional insights for the targets at both the domain- and full-chain levels from the amino acid sequence alone.
Collapse
|
11
|
Landon S, Chalkley O, Breese G, Grierson C, Marucci L. Understanding Metabolic Flux Behaviour in Whole-Cell Model Output. Front Mol Biosci 2021; 8:732079. [PMID: 34977150 PMCID: PMC8718694 DOI: 10.3389/fmolb.2021.732079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2021] [Accepted: 10/28/2021] [Indexed: 11/30/2022] Open
Abstract
Whole-cell modelling is a newly expanding field that has many applications in lab experiment design and predictive drug testing. Although whole-cell model output contains a wealth of information, it is complex and high dimensional and thus hard to interpret. Here, we present an analysis pipeline that combines machine learning, dimensionality reduction, and network analysis to interpret and visualise metabolic reaction fluxes from a set of single gene knockouts simulated in the Mycoplasma genitalium whole-cell model. We found that the reaction behaviours show trends that correlate with phenotypic classes of the simulation output, highlighting particular cellular subsystems that malfunction after gene knockouts. From a graphical representation of the metabolic network, we saw that there is a set of reactions that can be used as markers of a phenotypic class, showing their importance within the network. Our analysis pipeline can support the understanding of the complexity of in silico cells without detailed knowledge of the constituent parts, which can help to understand the effects of gene knockouts and, as whole-cell models become more widely built and used, aid genome design.
Collapse
Affiliation(s)
- Sophie Landon
- BrisSynBio, University of Bristol, Bristol, United Kingdom
- Department of Engineering Mathematics, University of Bristol, Bristol, United Kingdom
| | - Oliver Chalkley
- BrisSynBio, University of Bristol, Bristol, United Kingdom
- Department of Engineering Mathematics, University of Bristol, Bristol, United Kingdom
- Bristol Centre for Complexity Science, Department of Engineering Mathematics, University of Bristol, Bristol, United Kingdom
| | - Gus Breese
- Department of Engineering Mathematics, University of Bristol, Bristol, United Kingdom
| | - Claire Grierson
- BrisSynBio, University of Bristol, Bristol, United Kingdom
- School of Biological Sciences, University of Bristol, Bristol, United Kingdom
| | - Lucia Marucci
- BrisSynBio, University of Bristol, Bristol, United Kingdom
- Department of Engineering Mathematics, University of Bristol, Bristol, United Kingdom
- School of Cellular and Molecular Medicine, University of Bristol, Bristol, United Kingdom
| |
Collapse
|
12
|
Pedreira T, Elfmann C, Singh N, Stülke J. SynWiki: Functional annotation of the first artificial organism Mycoplasma mycoides JCVI-syn3A. Protein Sci 2021; 31:54-62. [PMID: 34515387 PMCID: PMC8740822 DOI: 10.1002/pro.4179] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2021] [Revised: 09/02/2021] [Accepted: 09/07/2021] [Indexed: 12/20/2022]
Abstract
The new field of synthetic biology aims at the creation of artificially designed organisms. A major breakthrough in the field was the generation of the artificial synthetic organism Mycoplasma mycoides JCVI-syn3A. This bacterium possesses only 452 protein-coding genes, the smallest number for any organism that is viable independent of a host cell. However, about one third of the proteins have no known function indicating major gaps in our understanding of simple living cells. To facilitate the investigation of the components of this minimal bacterium, we have generated the database SynWiki (http://synwiki.uni-goettingen.de/). SynWiki is based on a relational database and gives access to published information about the genes and proteins of M. mycoides JCVI-syn3A. To gain a better understanding of the functions of the genes and proteins of the artificial bacteria, protein-protein interactions that may provide clues for the protein functions are included in an interactive manner. SynWiki is an important tool for the synthetic biology community that will support the comprehensive understanding of a minimal cell as well as the functional annotation of so far uncharacterized proteins.
Collapse
Affiliation(s)
- Tiago Pedreira
- Department of General Microbiology, Göttingen Center for Molecular Biosciences, Georg-August University Göttingen, Göttingen, Germany
| | - Christoph Elfmann
- Department of General Microbiology, Göttingen Center for Molecular Biosciences, Georg-August University Göttingen, Göttingen, Germany
| | - Neil Singh
- Department of General Microbiology, Göttingen Center for Molecular Biosciences, Georg-August University Göttingen, Göttingen, Germany
| | - Jörg Stülke
- Department of General Microbiology, Göttingen Center for Molecular Biosciences, Georg-August University Göttingen, Göttingen, Germany
| |
Collapse
|
13
|
Vafaee R, Tavirani MR, Tavirani SR, Razzaghi M. Assessment of cancer prevention effect of exercise. Hum Antibodies 2021; 30:31-36. [PMID: 34459390 DOI: 10.3233/hab-210454] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
There are many documents about benefits of exercise on human health. However, evidences indicate to positive effect of exercise on disease prevention, understanding of many aspects of this mechanism need more investigations. Determination of critical genes which effect human health.GSE156249 including 12 gene expression profiles of healthy individual biopsy from vastus lateralis muscle before and after 12-week combined exercise training intervention were extracted from gene expression omnibus (GEO) database. The significant DEGs were included in interactome unit by Cytoscape software and STRING database. The network was analyzed to find the central nodes subnetwork clusters. The nodes of prominent cluster were assessed via gene ontology by using ClueGO. Number of 8 significant DEGs and 100 first neighbors analyzed via network analysis. The network includes 2 clusters and COL3A1, BGN, and LOX were determined as central DEGs. The critical DEGs were involved in cancer prevention process.
Collapse
Affiliation(s)
- Reza Vafaee
- Proteomics Research Center, Shahid Beheshti University of Medical Sciences, Tehran, Iran.,Laser Application in Medical Sciences Research Center, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Mostafa Rezaei Tavirani
- Proteomics Research Center, Faculty of Paramedical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Sina Rezaei Tavirani
- Proteomics Research Center, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Mohammadreza Razzaghi
- Laser Application in Medical Sciences Research Center, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| |
Collapse
|
14
|
Aljindan RY, Al-Subaie AM, Al-Ohali AI, Kumar D T, Doss C GP, Kamaraj B. Investigation of nonsynonymous mutations in the spike protein of SARS-CoV-2 and its interaction with the ACE2 receptor by molecular docking and MM/GBSA approach. Comput Biol Med 2021; 135:104654. [PMID: 34346317 PMCID: PMC8282961 DOI: 10.1016/j.compbiomed.2021.104654] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2021] [Revised: 07/12/2021] [Accepted: 07/13/2021] [Indexed: 12/22/2022]
Abstract
COVID-19 is an infectious and pathogenic viral disease caused by SARS-CoV-2 that leads to septic shock, coagulation dysfunction, and acute respiratory distress syndrome. The spreading rate of SARS-CoV-2 is higher than MERS-CoV and SARS-CoV. The receptor-binding domain (RBD) of the Spike-protein (S-protein) interacts with the human cells through the host angiotensin-converting enzyme 2 (ACE2) receptor. However, the molecular mechanism of pathological mutations of S-protein is still unclear. In this perspective, we investigated the impact of mutations in the S-protein and their interaction with the ACE2 receptor for SAR-CoV-2 viral infection. We examined the stability of pathological nonsynonymous mutations in the S-protein, and the binding behavior of the ACE2 receptor with the S-protein upon nonsynonymous mutations using the molecular docking and MM_GBSA approaches. Using the extensive bioinformatics pipeline, we screened the destabilizing (L8V, L8W, L18F, Y145H, M153T, F157S, G476S, L611F, A879S, C1247F, and C1254F) and stabilizing (H49Y, S50L, N501Y, D614G, A845V, and P1143L) nonsynonymous mutations in the S-protein. The docking and binding free energy (ddG) scores revealed that the stabilizing nonsynonymous mutations show increased interaction between the S-protein and the ACE2 receptor compared to native and destabilizing S-proteins and that they may have been responsible for the virulent high level. Further, the molecular dynamics simulation (MDS) approach reveals the structural transition of mutants (N501Y and D614G) S-protein. These insights might help researchers to understand the pathological mechanisms of the S-protein and provide clues regarding mutations in viral infection and disease propagation. Further, it helps researchers to develop an efficient treatment approach against this SARS-CoV-2 pandemic.
Collapse
Affiliation(s)
- Reem Y Aljindan
- Department of Microbiology, College of Medicine, Imam Abdulrahman Bin Faisal University, Dammam, Saudi Arabia.
| | - Abeer M Al-Subaie
- Department of Clinical Laboratory Sciences, College of Applied Medical Sciences, Imam Abdulrahman Bin Faisal University, Dammam, Saudi Arabia.
| | - Ahoud I Al-Ohali
- Department of Clinical Laboratory Sciences, College of Applied Medical Sciences, Imam Abdulrahman Bin Faisal University, Dammam, Saudi Arabia.
| | - Thirumal Kumar D
- Meenakshi Academy of Higher Education and Research, Chennai, Tamil Nadu, 600078, India.
| | - George Priya Doss C
- School of Biosciences and Technology, Vellore Institute of Technology, Vellore, Tamil Nadu, 632014, India.
| | - Balu Kamaraj
- Department of Neuroscience Technology, College of Applied Medical Sciences in Jubail, Imam Abdulrahman Bin Faisal University, Jubail, Saudi Arabia.
| |
Collapse
|
15
|
Pearce R, Zhang Y. Toward the solution of the protein structure prediction problem. J Biol Chem 2021; 297:100870. [PMID: 34119522 PMCID: PMC8254035 DOI: 10.1016/j.jbc.2021.100870] [Citation(s) in RCA: 61] [Impact Index Per Article: 20.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2021] [Revised: 06/07/2021] [Accepted: 06/09/2021] [Indexed: 11/20/2022] Open
Abstract
Since Anfinsen demonstrated that the information encoded in a protein's amino acid sequence determines its structure in 1973, solving the protein structure prediction problem has been the Holy Grail of structural biology. The goal of protein structure prediction approaches is to utilize computational modeling to determine the spatial location of every atom in a protein molecule starting from only its amino acid sequence. Depending on whether homologous structures can be found in the Protein Data Bank (PDB), structure prediction methods have been historically categorized as template-based modeling (TBM) or template-free modeling (FM) approaches. Until recently, TBM has been the most reliable approach to predicting protein structures, and in the absence of reliable templates, the modeling accuracy sharply declines. Nevertheless, the results of the most recent community-wide assessment of protein structure prediction experiment (CASP14) have demonstrated that the protein structure prediction problem can be largely solved through the use of end-to-end deep machine learning techniques, where correct folds could be built for nearly all single-domain proteins without using the PDB templates. Critically, the model quality exhibited little correlation with the quality of available template structures, as well as the number of sequence homologs detected for a given target protein. Thus, the implementation of deep-learning techniques has essentially broken through the 50-year-old modeling border between TBM and FM approaches and has made the success of high-resolution structure prediction significantly less dependent on template availability in the PDB library.
Collapse
Affiliation(s)
- Robin Pearce
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA; Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, USA.
| |
Collapse
|
16
|
Gong W, Guerler A, Zhang C, Warner E, Li C, Zhang Y. Integrating Multimeric Threading With High-throughput Experiments for Structural Interactome of Escherichia coli. J Mol Biol 2021; 433:166944. [PMID: 33741411 DOI: 10.1016/j.jmb.2021.166944] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2020] [Revised: 03/06/2021] [Accepted: 03/09/2021] [Indexed: 10/21/2022]
Abstract
Genome-wide protein-protein interaction (PPI) determination remains a significant unsolved problem in structural biology. The difficulty is twofold since high-throughput experiments (HTEs) have often a relatively high false-positive rate in assigning PPIs, and PPI quaternary structures are more difficult to solve than tertiary structures using traditional structural biology techniques. We proposed a uniform pipeline, Threpp, to address both problems. Starting from a pair of monomer sequences, Threpp first threads both sequences through a complex structure library, where the alignment score is combined with HTE data using a naïve Bayesian classifier model to predict the likelihood of two chains to interact with each other. Next, quaternary complex structures of the identified PPIs are constructed by reassembling monomeric alignments with dimeric threading frameworks through interface-specific structural alignments. The pipeline was applied to the Escherichia coli genome and created 35,125 confident PPIs which is 4.5-fold higher than HTE alone. Graphic analyses of the PPI networks show a scale-free cluster size distribution, consistent with previous studies, which was found critical to the robustness of genome evolution and the centrality of functionally important proteins that are essential to E. coli survival. Furthermore, complex structure models were constructed for all predicted E. coli PPIs based on the quaternary threading alignments, where 6771 of them were found to have a high confidence score that corresponds to the correct fold of the complexes with a TM-score >0.5, and 39 showed a close consistency with the later released experimental structures with an average TM-score = 0.73. These results demonstrated the significant usefulness of threading-based homologous modeling in both genome-wide PPI network detection and complex structural construction.
Collapse
Affiliation(s)
- Weikang Gong
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA; Faculty of Environmental and Life Sciences, Beijing University of Technology, Beijing 100124, China
| | - Aysam Guerler
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Chengxin Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Elisa Warner
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Chunhua Li
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA; Faculty of Environmental and Life Sciences, Beijing University of Technology, Beijing 100124, China.
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA; Department of Biological Chemistry, University of Michigan, Ann Arbor, MI, 48109, USA.
| |
Collapse
|