1
|
Yin J, Waman VP, Sen N, Firdaus-Raih M, Lam SD, Orengo C. Understanding the structural and functional diversity of ATP-PPases using protein domains and functional families in the CATH database. Structure 2025:S0969-2126(24)00551-3. [PMID: 39826548 DOI: 10.1016/j.str.2024.12.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 04/18/2024] [Accepted: 12/19/2024] [Indexed: 01/22/2025]
Abstract
ATP-pyrophosphatases (ATP-PPases) are the most primordial lineage of the large and diverse HUP (high-motif proteins, universal stress proteins, ATP-pyrophosphatase) superfamily. There are four different ATP-PPase substrate-specificity groups (SSGs), and members of each group show considerable sequence variation across the domains of life despite sharing the same catalytic function. Owing to the expansion in the number of ATP-PPase domain structures from advances in protein structure prediction by AlphaFold2 (AF2), we have characterized the two most populated ATP-PPase SSGs, the nicotinamide adenine dinucleotide synthases (NADSs) and guanosine monophosphate synthases (GMPSs). Local structural and sequence comparisons of NADS and GMPS identified taxonomic-group-specific functional motifs. As GMPS and NADS are potential drug targets of pathogenic microorganisms including Mycobacterium tuberculosis, bacterial GMPS and NADS specific functional motifs reported in this study, may contribute to antibacterial-drug development.
Collapse
Affiliation(s)
- Jialin Yin
- Department of Structural and Molecular Biology, University College London, London, UK
| | - Vaishali P Waman
- Department of Structural and Molecular Biology, University College London, London, UK
| | - Neeladri Sen
- Department of Structural and Molecular Biology, University College London, London, UK
| | - Mohd Firdaus-Raih
- Department of Applied Physics, Faculty of Science and Technology, Universiti Kebangsaan Malaysia, 43600 UKM Bangi, Malaysia
| | - Su Datt Lam
- Department of Structural and Molecular Biology, University College London, London, UK; Department of Applied Physics, Faculty of Science and Technology, Universiti Kebangsaan Malaysia, 43600 UKM Bangi, Malaysia
| | - Christine Orengo
- Department of Structural and Molecular Biology, University College London, London, UK.
| |
Collapse
|
2
|
Soleymani F, Paquet E, Viktor HL, Michalowski W. Structure-based protein and small molecule generation using EGNN and diffusion models: A comprehensive review. Comput Struct Biotechnol J 2024; 23:2779-2797. [PMID: 39050782 PMCID: PMC11268121 DOI: 10.1016/j.csbj.2024.06.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2024] [Revised: 06/13/2024] [Accepted: 06/18/2024] [Indexed: 07/27/2024] Open
Abstract
Recent breakthroughs in deep learning have revolutionized protein sequence and structure prediction. These advancements are built on decades of protein design efforts, and are overcoming traditional time and cost limitations. Diffusion models, at the forefront of these innovations, significantly enhance design efficiency by automating knowledge acquisition. In the field of de novo protein design, the goal is to create entirely novel proteins with predetermined structures. Given the arbitrary positions of proteins in 3-D space, graph representations and their properties are widely used in protein generation studies. A critical requirement in protein modelling is maintaining spatial relationships under transformations (rotations, translations, and reflections). This property, known as equivariance, ensures that predicted protein characteristics adapt seamlessly to changes in orientation or position. Equivariant graph neural networks offer a solution to this challenge. By incorporating equivariant graph neural networks to learn the score of the probability density function in diffusion models, one can generate proteins with robust 3-D structural representations. This review examines the latest deep learning advancements, specifically focusing on frameworks that combine diffusion models with equivariant graph neural networks for protein generation.
Collapse
Affiliation(s)
- Farzan Soleymani
- Telfer School of Management, University of Ottawa, ON, K1N 6N5, Canada
| | - Eric Paquet
- National Research Council, 1200 Montreal Road, Ottawa, ON, K1A 0R6, Canada
- School of Electrical Engineering and Computer Science, University of Ottawa, ON, K1N 6N5, Canada
| | - Herna Lydia Viktor
- School of Electrical Engineering and Computer Science, University of Ottawa, ON, K1N 6N5, Canada
| | | |
Collapse
|
3
|
Pappalardo M, Sipala FM, Nicolosi MC, Guccione S, Ronsisvalle S. Recent Applications of In Silico Approaches for Studying Receptor Mutations Associated with Human Pathologies. Molecules 2024; 29:5349. [PMID: 39598735 PMCID: PMC11596970 DOI: 10.3390/molecules29225349] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2024] [Revised: 11/05/2024] [Accepted: 11/08/2024] [Indexed: 11/29/2024] Open
Abstract
In recent years, the advent of computational techniques to predict the potential activity of a drug interacting with a receptor or to predict the structure of unidentified proteins with aberrant characteristics has significantly impacted the field of drug design. We provide a comprehensive review of the current state of in silico approaches and software for investigating the effects of receptor mutations associated with human diseases, focusing on both frequent and rare mutations. The reported techniques include virtual screening, homology modeling, threading, docking, and molecular dynamics. This review clearly shows that it is common for successful studies to integrate different techniques in drug design, with docking and molecular dynamics being the most frequently used techniques. This trend reflects the current emphasis on developing novel therapies for diseases resulting from receptor mutations with the recently discovered AlphaFold algorithm as the driving force.
Collapse
Affiliation(s)
- Matteo Pappalardo
- Department of Drug and Health Sciences, University of Catania, Viale A. Doria 6, 95125 Catania, Italy; (M.P.); (F.M.S.); (M.C.N.); (S.R.)
| | - Federica Maria Sipala
- Department of Drug and Health Sciences, University of Catania, Viale A. Doria 6, 95125 Catania, Italy; (M.P.); (F.M.S.); (M.C.N.); (S.R.)
- Department of Chemical Science, University of Catania, Viale A. Doria 6, 95125 Catania, Italy
| | - Milena Cristina Nicolosi
- Department of Drug and Health Sciences, University of Catania, Viale A. Doria 6, 95125 Catania, Italy; (M.P.); (F.M.S.); (M.C.N.); (S.R.)
- Department of Chemical Science, University of Catania, Viale A. Doria 6, 95125 Catania, Italy
| | - Salvatore Guccione
- Department of Drug and Health Sciences, University of Catania, Viale A. Doria 6, 95125 Catania, Italy; (M.P.); (F.M.S.); (M.C.N.); (S.R.)
| | - Simone Ronsisvalle
- Department of Drug and Health Sciences, University of Catania, Viale A. Doria 6, 95125 Catania, Italy; (M.P.); (F.M.S.); (M.C.N.); (S.R.)
| |
Collapse
|
4
|
Min X, Liao Y, Chen X, Yang Q, Ying J, Zou J, Yang C, Zhang J, Ge S, Xia N. PB-GPT: An innovative GPT-based model for protein backbone generation. Structure 2024; 32:1820-1833.e5. [PMID: 39173620 DOI: 10.1016/j.str.2024.07.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2024] [Revised: 06/02/2024] [Accepted: 07/28/2024] [Indexed: 08/24/2024]
Abstract
With advanced computational methods, it is now feasible to modify or design proteins for specific functions, a process with significant implications for disease treatment and other medical applications. Protein structures and functions are intrinsically linked to their backbones, making the design of these backbones a pivotal aspect of protein engineering. In this study, we focus on the task of unconditionally generating protein backbones. By means of codebook quantization and compression dictionaries, we convert protein backbone structures into a distinctive coded language and propose a GPT-based protein backbone generation model, PB-GPT. To validate the generalization performance of the model, we trained and evaluated the model on both public datasets and small protein datasets. The results demonstrate that our model has the capability to unconditionally generate elaborate, highly realistic protein backbones with structural patterns resembling those of natural proteins, thus showcasing the significant potential of large language models in protein structure design.
Collapse
Affiliation(s)
- Xiaoping Min
- School of Informatics, Xiamen University, No. 422 Siming South Rd, Xiamen 361005, China; National Institute of Diagnostics and Vaccine Development in Infectious Diseases, Xiamen University, State Key, No. 422 Siming South Rd, Xiamen 361005, China; State Key Laboratory of Vaccines for Infectious Diseases, Xiang An Biomedicine Laboratory, Xiamen University, No. 422 Siming South Rd, Xiamen 361005, China
| | - Yiyang Liao
- School of Informatics, Xiamen University, No. 422 Siming South Rd, Xiamen 361005, China; National Institute of Diagnostics and Vaccine Development in Infectious Diseases, Xiamen University, State Key, No. 422 Siming South Rd, Xiamen 361005, China; State Key Laboratory of Vaccines for Infectious Diseases, Xiang An Biomedicine Laboratory, Xiamen University, No. 422 Siming South Rd, Xiamen 361005, China
| | - Xiao Chen
- School of Informatics, Xiamen University, No. 422 Siming South Rd, Xiamen 361005, China
| | - Qianli Yang
- Institute of Artificial Intelligence, Xiamen University, No. 422 Siming South Rd, Xiamen 361005, China
| | - Junjie Ying
- Institute of Artificial Intelligence, Xiamen University, No. 422 Siming South Rd, Xiamen 361005, China
| | - Jiajun Zou
- School of Informatics, Xiamen University, No. 422 Siming South Rd, Xiamen 361005, China
| | - Chongzhou Yang
- National Institute of Diagnostics and Vaccine Development in Infectious Diseases, Xiamen University, State Key, No. 422 Siming South Rd, Xiamen 361005, China; Institute of Artificial Intelligence, Xiamen University, No. 422 Siming South Rd, Xiamen 361005, China
| | - Jun Zhang
- National Institute of Diagnostics and Vaccine Development in Infectious Diseases, Xiamen University, State Key, No. 422 Siming South Rd, Xiamen 361005, China; School of Public Health, Xiamen University, No. 422 Siming South Rd, Xiamen 361005, China; State Key Laboratory of Vaccines for Infectious Diseases, Xiang An Biomedicine Laboratory, Xiamen University, No. 422 Siming South Rd, Xiamen 361005, China
| | - Shengxiang Ge
- National Institute of Diagnostics and Vaccine Development in Infectious Diseases, Xiamen University, State Key, No. 422 Siming South Rd, Xiamen 361005, China; School of Public Health, Xiamen University, No. 422 Siming South Rd, Xiamen 361005, China; State Key Laboratory of Vaccines for Infectious Diseases, Xiang An Biomedicine Laboratory, Xiamen University, No. 422 Siming South Rd, Xiamen 361005, China.
| | - Ningshao Xia
- National Institute of Diagnostics and Vaccine Development in Infectious Diseases, Xiamen University, State Key, No. 422 Siming South Rd, Xiamen 361005, China; School of Public Health, Xiamen University, No. 422 Siming South Rd, Xiamen 361005, China; State Key Laboratory of Vaccines for Infectious Diseases, Xiang An Biomedicine Laboratory, Xiamen University, No. 422 Siming South Rd, Xiamen 361005, China.
| |
Collapse
|
5
|
Yan H, Wang S, Liu H, Mamitsuka H, Zhu S. GORetriever: reranking protein-description-based GO candidates by literature-driven deep information retrieval for protein function annotation. Bioinformatics 2024; 40:ii53-ii61. [PMID: 39230707 PMCID: PMC11520413 DOI: 10.1093/bioinformatics/btae401] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/05/2024] Open
Abstract
SUMMARY The vast majority of proteins still lack experimentally validated functional annotations, which highlights the importance of developing high-performance automated protein function prediction/annotation (AFP) methods. While existing approaches focus on protein sequences, networks, and structural data, textual information related to proteins has been overlooked. However, roughly 82% of SwissProt proteins already possess literature information that experts have annotated. To efficiently and effectively use literature information, we present GORetriever, a two-stage deep information retrieval-based method for AFP. Given a target protein, in the first stage, candidate Gene Ontology (GO) terms are retrieved by using annotated proteins with similar descriptions. In the second stage, the GO terms are reranked based on semantic matching between the GO definitions and textual information (literature and protein description) of the target protein. Extensive experiments over benchmark datasets demonstrate the remarkable effectiveness of GORetriever in enhancing the AFP performance. Note that GORetriever is the key component of GOCurator, which has achieved first place in the latest critical assessment of protein function annotation (CAFA5: over 1600 teams participated), held in 2023-2024. AVAILABILITY AND IMPLEMENTATION GORetriever is publicly available at https://github.com/ZhuLab-Fudan/GORetriever.
Collapse
Affiliation(s)
- Huiying Yan
- Institute of Science and Technology for Brain-Inspired Intelligence and MOE Frontiers Center for Brain Science, Fudan University, Shanghai 200433, China
| | - Shaojun Wang
- Institute of Science and Technology for Brain-Inspired Intelligence and MOE Frontiers Center for Brain Science, Fudan University, Shanghai 200433, China
| | - Hancheng Liu
- Institute of Science and Technology for Brain-Inspired Intelligence and MOE Frontiers Center for Brain Science, Fudan University, Shanghai 200433, China
| | - Hiroshi Mamitsuka
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto Prefecture 611-0011, Japan
- Department of Computer Science, Aalto University, Espoo 00076, Finland
| | - Shanfeng Zhu
- Institute of Science and Technology for Brain-Inspired Intelligence and MOE Frontiers Center for Brain Science, Fudan University, Shanghai 200433, China
- Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence (Fudan University), Ministry of Education, Shanghai, 200433, China
- Shanghai Key Lab of Intelligent Information Processing and Shanghai Institute of Artificial Intelligence Algorithm, Fudan University, Shanghai, 200433, China
- Zhangjiang Fudan International Innovation Center, Shanghai, 200433, China
| |
Collapse
|
6
|
Dan J, Wei W, Ou W, Gao G, Song W, Ye L, Liang H, Guo X, Tan L, Jiang J. Excavation of Biomarker Candidates for the Diagnosis of Talaromyces marneffei Infection via Genome-Wide Prediction and Functional Annotation of Secreted Proteins. ACS OMEGA 2024; 9:27093-27103. [PMID: 38947822 PMCID: PMC11209904 DOI: 10.1021/acsomega.4c00571] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Revised: 05/28/2024] [Accepted: 06/05/2024] [Indexed: 07/02/2024]
Abstract
Talaromyces marneffei is the third most common infectious pathogen in AIDS patients and leads to the highest death rate in Guangxi, China. The lack of reliable biomarkers is one of the major obstacles in current clinical diagnosis, which largely contributes to this high mortality. Here, we present a study that aimed at identifying diagnostic biomarker candidates through genome-wide prediction and functional annotation of Talaromyces marneffei secreted proteins. A total of 584 secreted proteins then emerged, including 382 classical and 202 nonclassical ones. Among them, there were 87 newly obtained functional annotations in this study. The annotated proteins were further evaluated by combining RNA profiling and a homology comparison. Three proteins were ultimately highlighted as biomarker candidates with robust expression and remarkable specificity. The predicted phosphoinositide phospholipase C and the galactomannoprotein were suggested to play an interactive immune game through metabolism of arachidonic acid. Therefore, they hold promise in developing new tools for clinical diagnosis of Talaromyces marneffei and also possibly serve as molecular targets for future therapy.
Collapse
Affiliation(s)
- Jing Dan
- Collaborative
Innovation Centre of Regenerative Medicine and Medical BioResource
Development and Application Co-constructed by the Province and Ministry, Guangxi Medical University, Nanning, Guangxi 530021, China
- Guangxi
Key Laboratory of AIDS Prevention and Treatment & Biosafety III
Laboratory, Guangxi Medical University, Nanning, Guangxi 530021, China
- Center
for Energy Metabolism and Reproduction, Institute of Biomedicine and
Biotechnology, Shenzhen Institute of Advanced
Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Wudi Wei
- Guangxi
Key Laboratory of AIDS Prevention and Treatment & Biosafety III
Laboratory, Guangxi Medical University, Nanning, Guangxi 530021, China
| | - Weijie Ou
- Center
for Energy Metabolism and Reproduction, Institute of Biomedicine and
Biotechnology, Shenzhen Institute of Advanced
Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Guangshi Gao
- Geekgene
Technology Co. Ltd., Beijing 100091, China
| | - Wanjun Song
- Geekgene
Technology Co. Ltd., Beijing 100091, China
| | - Li Ye
- Guangxi
Key Laboratory of AIDS Prevention and Treatment & Biosafety III
Laboratory, Guangxi Medical University, Nanning, Guangxi 530021, China
| | - Hao Liang
- Collaborative
Innovation Centre of Regenerative Medicine and Medical BioResource
Development and Application Co-constructed by the Province and Ministry, Guangxi Medical University, Nanning, Guangxi 530021, China
- Guangxi
Key Laboratory of AIDS Prevention and Treatment & Biosafety III
Laboratory, Guangxi Medical University, Nanning, Guangxi 530021, China
| | - Xuzhen Guo
- Center
for Energy Metabolism and Reproduction, Institute of Biomedicine and
Biotechnology, Shenzhen Institute of Advanced
Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Lei Tan
- Center
for Energy Metabolism and Reproduction, Institute of Biomedicine and
Biotechnology, Shenzhen Institute of Advanced
Technology, Chinese Academy of Sciences, Shenzhen 518055, China
- College
of Life Sciences, University of Chinese
Academy of Sciences, Beijing 100049, China
- Department
of Cardiology, Shenzhen Guangming District
People’s Hospital, Shenzhen 518055, China
| | - Junjun Jiang
- Collaborative
Innovation Centre of Regenerative Medicine and Medical BioResource
Development and Application Co-constructed by the Province and Ministry, Guangxi Medical University, Nanning, Guangxi 530021, China
- Guangxi
Key Laboratory of AIDS Prevention and Treatment & Biosafety III
Laboratory, Guangxi Medical University, Nanning, Guangxi 530021, China
| |
Collapse
|
7
|
Jebastin T, Syed Abuthakir M, Santhoshi I, Gnanaraj M, Gatasheh MK, Ahamed A, Sharmila V. Unveiling the mysteries: Functional insights into hypothetical proteins from Bacteroides fragilis 638R. Heliyon 2024; 10:e31713. [PMID: 38832264 PMCID: PMC11145332 DOI: 10.1016/j.heliyon.2024.e31713] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2024] [Revised: 05/21/2024] [Accepted: 05/21/2024] [Indexed: 06/05/2024] Open
Abstract
Humans benefit from a vast community of microorganisms in their gastrointestinal tract, known as the gut microbiota, numbering in the tens of trillions. An imbalance in the gut microbiota known as dysbiosis, can lead to changes in the metabolite profile, elevating the levels of toxins like Bacteroides fragilis toxin (BFT), colibactin, and cytolethal distending toxin. These toxins are implicated in the process of oncogenesis. However, a significant portion of the Bacteroides fragilis genome consists of functionally uncharacterized and hypothetical proteins. This study delves into the functional characterization of hypothetical proteins (HPs) encoded by the Bacteroides fragilis genome, employing a systematic in silico approach. A total of 379 HPs were subjected to a BlastP homology search against the NCBI non-redundant protein sequence database, resulting in 162 HPs devoid of identity to known proteins. CDD-Blast identified 106 HPs with functional domains, which were then annotated using Pfam, InterPro, SUPERFAMILY, SCANPROSITE, SMART, and CATH. Physicochemical properties, such as molecular weight, isoelectric point, and stability indices, were assessed for 60 HPs whose functional domains were identified by at least three of the aforementioned bioinformatic tools. Subsequently, subcellular localization analysis was examined and the gene ontology analysis revealed diverse biological processes, cellular components, and molecular functions. Remarkably, E1WPR3 was identified as a virulent and essential gene among the HPs. This study presents a comprehensive exploration of B. fragilis HPs, shedding light on their potential roles and contributing to a deeper understanding of this organism's functional landscape.
Collapse
Affiliation(s)
- Thomas Jebastin
- Computer Aided Drug Designing Lab, Department of Bioinformatics, Bishop Heber College (Autonomous), Tiruchirappalli, 620017, Tamil Nadu, India
| | - M.H. Syed Abuthakir
- Department of Bioinformatics, Bharathiar University, Coimbatore, 641046, Tamil Nadu, India
- Institute of Systems Biology, Universiti Kebangsaan Malaysia, 43600, UKM Bangi, Selangor, Malaysia
| | - Ilangovan Santhoshi
- Computer Aided Drug Designing Lab, Department of Bioinformatics, Bishop Heber College (Autonomous), Tiruchirappalli, 620017, Tamil Nadu, India
| | - Muniraj Gnanaraj
- Department of Biotechnology, School of Life Sciences, St Joseph's University, 36 Lalbagh Road, Bengaluru, 560027, Karnataka, India
| | - Mansour K. Gatasheh
- Department of Biochemistry, College of Science, King Saud University, P.O. Box 2455, Riyadh, 11451, Saudi Arabia
| | - Anis Ahamed
- Department of Botany and Microbiology, College of Science, King Saud University, Saudi Arabia
| | - Velusamy Sharmila
- Department of Biotechnology, Nehru Arts and Science College (NASC), Thirumalayampalayam, Coimbatore, 641 105, Tamil Nadu, India
| |
Collapse
|
8
|
Ulusoy E, Doğan T. Mutual annotation-based prediction of protein domain functions with Domain2GO. Protein Sci 2024; 33:e4988. [PMID: 38757367 PMCID: PMC11099699 DOI: 10.1002/pro.4988] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 02/25/2024] [Accepted: 03/30/2024] [Indexed: 05/18/2024]
Abstract
Identifying unknown functional properties of proteins is essential for understanding their roles in both health and disease states. The domain composition of a protein can reveal critical information in this context, as domains are structural and functional units that dictate how the protein should act at the molecular level. The expensive and time-consuming nature of wet-lab experimental approaches prompted researchers to develop computational strategies for predicting the functions of proteins. In this study, we proposed a new method called Domain2GO that infers associations between protein domains and function-defining gene ontology (GO) terms, thus redefining the problem as domain function prediction. Domain2GO uses documented protein-level GO annotations together with proteins' domain annotations. Co-annotation patterns of domains and GO terms in the same proteins are examined using statistical resampling to obtain reliable associations. As a use-case study, we evaluated the biological relevance of examples selected from the Domain2GO-generated domain-GO term mappings via literature review. Then, we applied Domain2GO to predict unknown protein functions by propagating domain-associated GO terms to proteins annotated with these domains. For function prediction performance evaluation and comparison against other methods, we employed Critical Assessment of Function Annotation 3 (CAFA3) challenge datasets. The results demonstrated the high potential of Domain2GO, particularly for predicting molecular function and biological process terms, along with advantages such as producing interpretable results and having an exceptionally low computational cost. The approach presented here can be extended to other ontologies and biological entities to investigate unknown relationships in complex and large-scale biological data. The source code, datasets, results, and user instructions for Domain2GO are available at https://github.com/HUBioDataLab/Domain2GO. Additionally, we offer a user-friendly online tool at https://huggingface.co/spaces/HUBioDataLab/Domain2GO, which simplifies the prediction of functions of previously unannotated proteins solely using amino acid sequences.
Collapse
Affiliation(s)
- Erva Ulusoy
- Biological Data Science Lab, Department of Computer EngineeringHacettepe UniversityAnkaraTurkey
- Department of BioinformaticsGraduate School of Health Sciences, Hacettepe UniversityAnkaraTurkey
| | - Tunca Doğan
- Biological Data Science Lab, Department of Computer EngineeringHacettepe UniversityAnkaraTurkey
- Department of BioinformaticsGraduate School of Health Sciences, Hacettepe UniversityAnkaraTurkey
| |
Collapse
|
9
|
Tang X, Dai H, Knight E, Wu F, Li Y, Li T, Gerstein M. A survey of generative AI for de novo drug design: new frontiers in molecule and protein generation. Brief Bioinform 2024; 25:bbae338. [PMID: 39007594 PMCID: PMC11247410 DOI: 10.1093/bib/bbae338] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2024] [Revised: 05/21/2024] [Accepted: 06/27/2024] [Indexed: 07/16/2024] Open
Abstract
Artificial intelligence (AI)-driven methods can vastly improve the historically costly drug design process, with various generative models already in widespread use. Generative models for de novo drug design, in particular, focus on the creation of novel biological compounds entirely from scratch, representing a promising future direction. Rapid development in the field, combined with the inherent complexity of the drug design process, creates a difficult landscape for new researchers to enter. In this survey, we organize de novo drug design into two overarching themes: small molecule and protein generation. Within each theme, we identify a variety of subtasks and applications, highlighting important datasets, benchmarks, and model architectures and comparing the performance of top models. We take a broad approach to AI-driven drug design, allowing for both micro-level comparisons of various methods within each subtask and macro-level observations across different fields. We discuss parallel challenges and approaches between the two applications and highlight future directions for AI-driven de novo drug design as a whole. An organized repository of all covered sources is available at https://github.com/gersteinlab/GenAI4Drug.
Collapse
Affiliation(s)
- Xiangru Tang
- Department of Computer Science, Yale University, New Haven, CT 06520, United States
| | - Howard Dai
- Department of Computer Science, Yale University, New Haven, CT 06520, United States
| | - Elizabeth Knight
- School of Medicine, Yale University, New Haven, CT 06520, United States
| | - Fang Wu
- Computer Science Department, Stanford University, CA 94305, United States
| | - Yunyang Li
- Department of Computer Science, Yale University, New Haven, CT 06520, United States
| | - Tianxiao Li
- Program in Computational Biology & Bioinformatics, Yale University, New Haven, CT 06520, United States
| | - Mark Gerstein
- Department of Computer Science, Yale University, New Haven, CT 06520, United States
- Program in Computational Biology & Bioinformatics, Yale University, New Haven, CT 06520, United States
- Department of Statistics & Data Science, Yale University, New Haven, CT 06520, United States
- Department of Biomedical Informatics & Data Science, Yale University, New Haven, CT 06520, United States
- Department of Molecular Biophysics & Biochemistry, Yale University, New Haven, CT 06520, United States
| |
Collapse
|
10
|
Wu KE, Yang KK, van den Berg R, Alamdari S, Zou JY, Lu AX, Amini AP. Protein structure generation via folding diffusion. Nat Commun 2024; 15:1059. [PMID: 38316764 PMCID: PMC10844308 DOI: 10.1038/s41467-024-45051-2] [Citation(s) in RCA: 21] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Accepted: 01/12/2024] [Indexed: 02/07/2024] Open
Abstract
The ability to computationally generate novel yet physically foldable protein structures could lead to new biological discoveries and new treatments targeting yet incurable diseases. Despite recent advances in protein structure prediction, directly generating diverse, novel protein structures from neural networks remains difficult. In this work, we present a diffusion-based generative model that generates protein backbone structures via a procedure inspired by the natural folding process. We describe a protein backbone structure as a sequence of angles capturing the relative orientation of the constituent backbone atoms, and generate structures by denoising from a random, unfolded state towards a stable folded structure. Not only does this mirror how proteins natively twist into energetically favorable conformations, the inherent shift and rotational invariance of this representation crucially alleviates the need for more complex equivariant networks. We train a denoising diffusion probabilistic model with a simple transformer backbone and demonstrate that our resulting model unconditionally generates highly realistic protein structures with complexity and structural patterns akin to those of naturally-occurring proteins. As a useful resource, we release an open-source codebase and trained models for protein structure diffusion.
Collapse
Affiliation(s)
- Kevin E Wu
- Department of Computer Science, Stanford University, Stanford, CA, USA
- Center for Personal Dynamic Regulomes, Stanford University, Stanford, CA, USA
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA, USA
| | | | | | | | - James Y Zou
- Department of Computer Science, Stanford University, Stanford, CA, USA
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA, USA
| | - Alex X Lu
- Microsoft Research, Cambridge, MA, USA
| | | |
Collapse
|
11
|
Kouba P, Kohout P, Haddadi F, Bushuiev A, Samusevich R, Sedlar J, Damborsky J, Pluskal T, Sivic J, Mazurenko S. Machine Learning-Guided Protein Engineering. ACS Catal 2023; 13:13863-13895. [PMID: 37942269 PMCID: PMC10629210 DOI: 10.1021/acscatal.3c02743] [Citation(s) in RCA: 36] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 09/20/2023] [Indexed: 11/10/2023]
Abstract
Recent progress in engineering highly promising biocatalysts has increasingly involved machine learning methods. These methods leverage existing experimental and simulation data to aid in the discovery and annotation of promising enzymes, as well as in suggesting beneficial mutations for improving known targets. The field of machine learning for protein engineering is gathering steam, driven by recent success stories and notable progress in other areas. It already encompasses ambitious tasks such as understanding and predicting protein structure and function, catalytic efficiency, enantioselectivity, protein dynamics, stability, solubility, aggregation, and more. Nonetheless, the field is still evolving, with many challenges to overcome and questions to address. In this Perspective, we provide an overview of ongoing trends in this domain, highlight recent case studies, and examine the current limitations of machine learning-based methods. We emphasize the crucial importance of thorough experimental validation of emerging models before their use for rational protein design. We present our opinions on the fundamental problems and outline the potential directions for future research.
Collapse
Affiliation(s)
- Petr Kouba
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
- Faculty of
Electrical Engineering, Czech Technical
University in Prague, Technicka 2, 166 27 Prague 6, Czech Republic
| | - Pavel Kohout
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| | - Faraneh Haddadi
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| | - Anton Bushuiev
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
| | - Raman Samusevich
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
- Institute
of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Flemingovo nám. 2, 160 00 Prague 6, Czech Republic
| | - Jiri Sedlar
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
| | - Jiri Damborsky
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| | - Tomas Pluskal
- Institute
of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Flemingovo nám. 2, 160 00 Prague 6, Czech Republic
| | - Josef Sivic
- Czech Institute
of Informatics, Robotics and Cybernetics, Czech Technical University in Prague, Jugoslavskych partyzanu 1580/3, 160 00 Prague 6, Czech Republic
| | - Stanislav Mazurenko
- Loschmidt
Laboratories, Department of Experimental Biology and RECETOX, Faculty
of Science, Masaryk University, Kamenice 5, 625 00 Brno, Czech
Republic
- International
Clinical Research Center, St. Anne’s
University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| |
Collapse
|
12
|
Cicconardi F, Milanetti E, Pinheiro de Castro EC, Mazo-Vargas A, Van Belleghem SM, Ruggieri AA, Rastas P, Hanly J, Evans E, Jiggins CD, Owen McMillan W, Papa R, Di Marino D, Martin A, Montgomery SH. Evolutionary dynamics of genome size and content during the adaptive radiation of Heliconiini butterflies. Nat Commun 2023; 14:5620. [PMID: 37699868 PMCID: PMC10497600 DOI: 10.1038/s41467-023-41412-5] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Accepted: 08/30/2023] [Indexed: 09/14/2023] Open
Abstract
Heliconius butterflies, a speciose genus of Müllerian mimics, represent a classic example of an adaptive radiation that includes a range of derived dietary, life history, physiological and neural traits. However, key lineages within the genus, and across the broader Heliconiini tribe, lack genomic resources, limiting our understanding of how adaptive and neutral processes shaped genome evolution during their radiation. Here, we generate highly contiguous genome assemblies for nine Heliconiini, 29 additional reference-assembled genomes, and improve 10 existing assemblies. Altogether, we provide a dataset of annotated genomes for a total of 63 species, including 58 species within the Heliconiini tribe. We use this extensive dataset to generate a robust and dated heliconiine phylogeny, describe major patterns of introgression, explore the evolution of genome architecture, and the genomic basis of key innovations in this enigmatic group, including an assessment of the evolution of putative regulatory regions at the Heliconius stem. Our work illustrates how the increased resolution provided by such dense genomic sampling improves our power to generate and test gene-phenotype hypotheses, and precisely characterize how genomes evolve.
Collapse
Affiliation(s)
- Francesco Cicconardi
- School of Biological Sciences, Bristol University, Bristol, United Kingdom.
- Department of Zoology, University of Cambridge, Cambridge, United Kingdom.
| | - Edoardo Milanetti
- Department of Physics, Sapienza University, Piazzale Aldo Moro 5, 00185, Rome, Italy
- Center for Life Nano- & Neuro-Science, Italian Institute of Technology, Viale Regina Elena 291, 00161, Rome, Italy
| | | | - Anyi Mazo-Vargas
- Department of Ecology and Evolutionary Biology, Cornell University, Ithaca, NY, 14853, USA
| | - Steven M Van Belleghem
- Department of Biology, University of Puerto Rico, Rio Piedras, PR, Puerto Rico
- Ecology, Evolution and Conservation Biology, Biology Department, KU Leuven, Leuven, Belgium
| | | | - Pasi Rastas
- Institute of Biotechnology, University of Helsinki, Helsinki, Finland
| | - Joseph Hanly
- Department of Biological Sciences, The George Washington University, Washington DC, WA, 20052, USA
- Smithsonian Tropical Research Institute, Panama City, Panama
| | - Elizabeth Evans
- Department of Biology, University of Puerto Rico, Rio Piedras, PR, Puerto Rico
| | - Chris D Jiggins
- Department of Zoology, University of Cambridge, Cambridge, United Kingdom
| | - W Owen McMillan
- Smithsonian Tropical Research Institute, Panama City, Panama
| | - Riccardo Papa
- Department of Biology, University of Puerto Rico, Rio Piedras, PR, Puerto Rico
- Molecular Sciences and Research Center, University of Puerto Rico, San Juan, PR, Puerto Rico
- Comprehensive Cancer Center, University of Puerto Rico, San Juan, PR, Puerto Rico
| | - Daniele Di Marino
- Department of Life and Environmental Sciences, New York-Marche Structural Biology Center (NY-MaSBiC), Polytechnic University of Marche, Via Brecce Bianche, 60131, Ancona, Italy
- Neuronal Death and Neuroprotection Unit, Department of Neuroscience, Mario Negri Institute for Pharmacological Research-IRCCS, Via Mario Negri 2, 20156, Milano, Italy
- National Biodiversity Future Center (NBFC), Palermo, Italy
| | - Arnaud Martin
- Department of Biological Sciences, The George Washington University, Washington DC, WA, 20052, USA
| | - Stephen H Montgomery
- School of Biological Sciences, Bristol University, Bristol, United Kingdom.
- Smithsonian Tropical Research Institute, Panama City, Panama.
| |
Collapse
|
13
|
Sisodia R, Mazumdar PA, Madhurantakam C. In silico identification and analysis of potential inhibitors for acid phosphatase, HppA from Helicobacter pylori. J Mol Recognit 2023; 36:e3049. [PMID: 37553866 DOI: 10.1002/jmr.3049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Revised: 07/20/2023] [Accepted: 07/21/2023] [Indexed: 08/10/2023]
Abstract
Helicobacter pylori is the most common cause of gastric ulcers and is associated with gastric cancer. The enzyme HppA of class C nonspecific acid phosphohydrolases (NSAPs) of H. pylori plays a crucial role in the electron transport chain. Herein, we report an in silico homology model of HppA consisting of a monomeric α + β model. A high throughput structure-based virtual screening approach yielded potential inhibitors against HppA with higher binding energies. Further analyses of molecular interaction maps and protein-ligand fingerprints, followed by molecular mechanics-generalized Born surface area (MM-GBSA) end point binding energy calculations of docked complexes, resulted in the detection of top binders/ligands. Our investigations identified potential substrate-competitive small molecule inhibitors of HppA, with admissible pharmacokinetic properties. These molecules may provide a starting point for developing novel therapeutic agents against H. pylori.
Collapse
Affiliation(s)
- Rinki Sisodia
- Structural and Molecular Biology Laboratory (SMBL), Department of Biotechnology, TERI School of Advanced Studies (TERI SAS), New Delhi, India
| | | | - Chaithanya Madhurantakam
- Structural and Molecular Biology Laboratory (SMBL), Department of Biotechnology, TERI School of Advanced Studies (TERI SAS), New Delhi, India
| |
Collapse
|
14
|
Wang W, Meng X, Xiang J, Shuai Y, Bedru HD, Li M. CACO: A Core-Attachment Method With Cross-Species Functional Ortholog Information to Detect Human Protein Complexes. IEEE J Biomed Health Inform 2023; 27:4569-4578. [PMID: 37399160 DOI: 10.1109/jbhi.2023.3289490] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/05/2023]
Abstract
Protein complexes play an essential role in living cells. Detecting protein complexes is crucial to understand protein functions and treat complex diseases. Due to high time and resource consumption of experiment approaches, many computational approaches have been proposed to detect protein complexes. However, most of them are only based on protein-protein interaction (PPI) networks, which heavily suffer from the noise in PPI networks. Therefore, we propose a novel core-attachment method, named CACO, to detect human protein complexes, by integrating the functional information from other species via protein ortholog relations. First, CACO constructs a cross-species ortholog relation matrix and transfers GO terms from other species as a reference to evaluate the confidence of PPIs. Then, a PPI filter strategy is adopted to clean the PPI network and thus a weighted clean PPI network is constructed. Finally, a new effective core-attachment algorithm is proposed to detect protein complexes from the weighted PPI network. Compared to other thirteen state-of-the-art methods, CACO outperforms all of them in terms of F-measure and Composite Score, showing that integrating ortholog information and the proposed core-attachment algorithm are effective in detecting protein complexes.
Collapse
|
15
|
Qureshi A, Connolly JB. Bioinformatic and literature assessment of toxicity and allergenicity of a CRISPR-Cas9 engineered gene drive to control Anopheles gambiae the mosquito vector of human malaria. Malar J 2023; 22:234. [PMID: 37580703 PMCID: PMC10426224 DOI: 10.1186/s12936-023-04665-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2022] [Accepted: 08/07/2023] [Indexed: 08/16/2023] Open
Abstract
BACKGROUND Population suppression gene drive is currently being evaluated, including via environmental risk assessment (ERA), for malaria vector control. One such gene drive involves the dsxFCRISPRh transgene encoding (i) hCas9 endonuclease, (ii) T1 guide RNA (gRNA) targeting the doublesex locus, and (iii) DsRed fluorescent marker protein, in genetically-modified mosquitoes (GMMs). Problem formulation, the first stage of ERA, for environmental releases of dsxFCRISPRh previously identified nine potential harms to the environment or health that could occur, should expressed products of the transgene cause allergenicity or toxicity. METHODS Amino acid sequences of hCas9 and DsRed were interrogated against those of toxins or allergens from NCBI, UniProt, COMPARE and AllergenOnline bioinformatic databases and the gRNA was compared with microRNAs from the miRBase database for potential impacts on gene expression associated with toxicity or allergenicity. PubMed was also searched for any evidence of toxicity or allergenicity of Cas9 or DsRed, or of the donor organisms from which these products were originally derived. RESULTS While Cas9 nuclease activity can be toxic to some cell types in vitro and hCas9 was found to share homology with the prokaryotic toxin VapC, there was no evidence from previous studies of a risk of toxicity to humans and other animals from hCas9. Although hCas9 did contain an 8-mer epitope found in the latex allergen Hev b 9, the full amino acid sequence of hCas9 was not homologous to any known allergens. Combined with a lack of evidence in the literature of Cas9 allergenicity, this indicated negligible risk to humans of allergenicity from hCas9. No matches were found between the gRNA and microRNAs from either Anopheles or humans. Moreover, potential exposure to dsxFCRISPRh transgenic proteins from environmental releases was assessed as negligible. CONCLUSIONS Bioinformatic and literature assessments found no convincing evidence to suggest that transgenic products expressed from dsxFCRISPRh were allergens or toxins, indicating that environmental releases of this population suppression gene drive for malaria vector control should not result in any increased allergenicity or toxicity in humans or animals. These results should also inform evaluations of other GMMs being developed for vector control and in vivo clinical applications of CRISPR-Cas9.
Collapse
Affiliation(s)
- Alima Qureshi
- Department of Life Sciences, Imperial College London, Silwood Park, Sunninghill, Ascot, UK
| | - John B Connolly
- Department of Life Sciences, Imperial College London, Silwood Park, Sunninghill, Ascot, UK.
| |
Collapse
|
16
|
Zheng Y, Young ND, Song J, Gasser RB. Genome-Wide Analysis of Haemonchus contortus Proteases and Protease Inhibitors Using Advanced Informatics Provides Insights into Parasite Biology and Host-Parasite Interactions. Int J Mol Sci 2023; 24:12320. [PMID: 37569696 PMCID: PMC10418638 DOI: 10.3390/ijms241512320] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Revised: 07/24/2023] [Accepted: 07/24/2023] [Indexed: 08/13/2023] Open
Abstract
Biodiversity within the animal kingdom is associated with extensive molecular diversity. The expansion of genomic, transcriptomic and proteomic data sets for invertebrate groups and species with unique biological traits necessitates reliable in silico tools for the accurate identification and annotation of molecules and molecular groups. However, conventional tools are inadequate for lesser-known organismal groups, such as eukaryotic pathogens (parasites), so that improved approaches are urgently needed. Here, we established a combined sequence- and structure-based workflow system to harness well-curated publicly available data sets and resources to identify, classify and annotate proteases and protease inhibitors of a highly pathogenic parasitic roundworm (nematode) of global relevance, called Haemonchus contortus (barber's pole worm). This workflow performed markedly better than conventional, sequence-based classification and annotation alone and allowed the first genome-wide characterisation of protease and protease inhibitor genes and gene products in this worm. In total, we identified 790 genes encoding 860 proteases and protease inhibitors representing 83 gene families. The proteins inferred included 280 metallo-, 145 cysteine, 142 serine, 121 aspartic and 81 "mixed" proteases as well as 91 protease inhibitors, all of which had marked physicochemical diversity and inferred involvements in >400 biological processes or pathways. A detailed investigation revealed a remarkable expansion of some protease or inhibitor gene families, which are likely linked to parasitism (e.g., host-parasite interactions, immunomodulation and blood-feeding) and exhibit stage- or sex-specific transcription profiles. This investigation provides a solid foundation for detailed explorations of the structures and functions of proteases and protease inhibitors of H. contortus and related nematodes, and it could assist in the discovery of new drug or vaccine targets against infections or diseases.
Collapse
Affiliation(s)
- Yuanting Zheng
- Melbourne Veterinary School, Faculty of Science, The University of Melbourne, Parkville, VIC 3010, Australia;
| | - Neil D. Young
- Melbourne Veterinary School, Faculty of Science, The University of Melbourne, Parkville, VIC 3010, Australia;
| | - Jiangning Song
- Department of Data Science and AI, Faculty of IT, Monash University, Melbourne, VIC 3800, Australia;
- Department of Biochemistry and Molecular Biology, Biomedicine Discovery Institute, Monash University, Melbourne, VIC 3800, Australia
- Monash Data Futures Institute, Monash University, Melbourne, VIC 3800, Australia
| | - Robin B. Gasser
- Melbourne Veterinary School, Faculty of Science, The University of Melbourne, Parkville, VIC 3010, Australia;
| |
Collapse
|
17
|
Atif HB, Alvi H, Naveed H. Masked Language Modeling for Resource Constrained Biological Natural Language Processing. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2023; 2023:1-5. [PMID: 38083556 DOI: 10.1109/embc40787.2023.10340499] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2023]
Abstract
Recent advances in Natural Language Processing (NLP) have produced state of the art results on several sequence to sequence (seq2seq) tasks. Enhancements in embedders and their training methodologies have shown significant improvement on downstream tasks. Word vector models like Word2Vec, FastText & Glove were widely used over one-hot encoded vectors for years until the advent of deep contextualized embedders. Protein sequences consist of 20 naturally occurring amino acids that can be treated as the language of nature. These amino acids in combinations with each other makeup the biological functions. The choice of vector representation and architecture design for a biological task is highly dependent upon the nature of the task. We utilize unlabelled protein sequences to train a Convolution and Gated Recurrent Network (CGRN) embedder using Masked Language Modeling (MLM) technique that shows significant performance boost under resource constraint setting on two downstream tasks i.e., F1-score(Q8) of 73.1% on Secondary Structure Prediction (SSP) & F1-score of 84% on Intrinsically Disordered Region Prediction (IDRP). We also compare different architectures on downstream tasks to show the impact of the nature of biological task on the performance of the model.
Collapse
|
18
|
Kolhe JA, Babu NL, Freeman BC. The Hsp90 molecular chaperone governs client proteins by targeting intrinsically disordered regions. Mol Cell 2023; 83:2035-2044.e7. [PMID: 37295430 PMCID: PMC10297700 DOI: 10.1016/j.molcel.2023.05.021] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Revised: 04/10/2023] [Accepted: 05/15/2023] [Indexed: 06/12/2023]
Abstract
Molecular chaperones govern proteome health to support cell homeostasis. An essential eukaryotic component of the chaperone system is Hsp90. Using a chemical-biology approach, we characterized the features driving the Hsp90 physical interactome. We found that Hsp90 associated with ∼20% of the yeast proteome using its three domains to preferentially target intrinsically disordered regions (IDRs) of client proteins. Hsp90 selectively utilized an IDR to regulate client activity as well as maintained IDR-protein health by preventing the transition to stress granules or P-bodies at physiological temperatures. We also discovered that Hsp90 controls the fidelity of ribosome initiation that triggers a heat shock response when disrupted. Our study provides insights into how this abundant molecular chaperone supports a dynamic and healthy native protein landscape.
Collapse
Affiliation(s)
- Janhavi A Kolhe
- Department of Cell and Developmental Biology, School of Molecular and Cellular Biology, University of Illinois-Urbana-Champaign, Urbana, IL, USA
| | - Neethu L Babu
- Department of Cell and Developmental Biology, School of Molecular and Cellular Biology, University of Illinois-Urbana-Champaign, Urbana, IL, USA
| | - Brian C Freeman
- Department of Cell and Developmental Biology, School of Molecular and Cellular Biology, University of Illinois-Urbana-Champaign, Urbana, IL, USA.
| |
Collapse
|
19
|
Li J, He X, Gao S, Liang Y, Qi Z, Xi Q, Zuo Y, Xing Y. The Metal-binding Protein Atlas (MbPA): an integrated database for curating metalloproteins in all aspects. J Mol Biol 2023:168117. [PMID: 37086947 DOI: 10.1016/j.jmb.2023.168117] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2022] [Revised: 04/14/2023] [Accepted: 04/17/2023] [Indexed: 04/24/2023]
Abstract
Metal-binding proteins are essential for the vital activities and engage in their roles by acting in concert with metal cations. MbPA (The Metal-binding Protein Atlas) is the most comprehensive resource up to now dedicated to curating metal-binding proteins. Currently, it contains 106373 entries and 440187 sites related to 54 metals and 8169 species. Users can view all metal-binding proteins and species-specific proteins in MbPA. There are also metal-proteomics data that quantitatively describes protein expression in different tissues and organs. By analyzing the data of the amino acid residues at the metal-binding site, it is found that about 80% of the metal ions tend to bind to cysteine, aspartic acid, glutamic acid, and histidine. Moreover, we use Diversity Measure to confirm that the diversity of metal-binding is specific in different area of periodic table, and further elucidate the binding modes of 19 transition metals on 20 amino acids. In addition, MbPA also embraces 6855 potential pathogenic mutations related to metalloprotein. The resource is freely available at http://bioinfor.imu.edu.cn/mbpa.
Collapse
Affiliation(s)
- Jinzhao Li
- The Key Laboratory of Mammalian Reproductive Biology and Biotechnology of the Ministry of Education, College of life sciences, Inner Mongolia University, Hohhot, 010021, China
| | - Xiang He
- The Key Laboratory of Mammalian Reproductive Biology and Biotechnology of the Ministry of Education, College of life sciences, Inner Mongolia University, Hohhot, 010021, China
| | - Shuang Gao
- The Key Laboratory of Mammalian Reproductive Biology and Biotechnology of the Ministry of Education, College of life sciences, Inner Mongolia University, Hohhot, 010021, China
| | - Yuchao Liang
- The Key Laboratory of Mammalian Reproductive Biology and Biotechnology of the Ministry of Education, College of life sciences, Inner Mongolia University, Hohhot, 010021, China
| | - Zhi Qi
- The Key Laboratory of Mammalian Reproductive Biology and Biotechnology of the Ministry of Education, College of life sciences, Inner Mongolia University, Hohhot, 010021, China; Key Laboratory of Forage and Endemic Crop Biotechnology, Ministry of Education, School of Life Sciences, Inner Mongolia University, Hohhot, 010021, China
| | - Qilemuge Xi
- The Key Laboratory of Mammalian Reproductive Biology and Biotechnology of the Ministry of Education, College of life sciences, Inner Mongolia University, Hohhot, 010021, China
| | - Yongchun Zuo
- The Key Laboratory of Mammalian Reproductive Biology and Biotechnology of the Ministry of Education, College of life sciences, Inner Mongolia University, Hohhot, 010021, China.
| | - Yongqiang Xing
- The Inner Mongolia Key Laboratory of Functional Genome Bioinformatics, School of Life Science and Technology, Inner Mongolia University of Science and Technology, Baotou 014010, China.
| |
Collapse
|
20
|
Rai KK, Singh S, Rai R, Rai LC. Functional characterization of two WD40 family proteins, Alr0671 and All2352, from Anabaena PCC 7120 and deciphering their role in abiotic stress management. PLANT MOLECULAR BIOLOGY 2022; 110:545-563. [PMID: 35997919 DOI: 10.1007/s11103-022-01306-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/17/2022] [Accepted: 08/01/2022] [Indexed: 06/15/2023]
Abstract
WD40 domain-containing proteins are one of the eukaryotes' most ancient and ubiquitous protein families. Little is known about the presence and function of these proteins in cyanobacteria in general and Anabaena in particular. In silico analysis confirmed the presence of WD40 repeats. Gene expression analysis indicated that the transcript levels of both the target proteins were up-regulated up to 4 fold in Cd and drought and 2-3 fold in heat, salt, and UV-B stress. Using a fluorescent oxidative stress indicator, we showed that the recombinant proteins were scavenging reactive oxygen species (ROS) (4-5 fold) more efficiently than empty vectors. Chromatin immunoprecipitation analysis (ChIP) and electrophoretic mobility shift assay (EMSA) revealed that the target proteins function as transcription factors after binding to the promoter sequences. The presence of kinase activity (2-4 fold) in the selected proteins indicated that these proteins could modulate the functions of other cellular proteins under stress conditions by inducing phosphorylation of specific amino acids. The chosen proteins also demonstrated interaction with Zn, Cd, and Cu (1.4-2.5 fold), which might stabilize the proteins' structure and biophysical functions under multiple abiotic stresses. The functionally characterized Alr0671 and All2352 proteins act as transcription factors and offer tolerance to agriculturally relevant abiotic stresses.
Collapse
Affiliation(s)
- Krishna Kumar Rai
- Molecular Biology Section, Centre of Advanced Study in Botany, Institute of Science, Banaras Hindu University, 221005, Varanasi, India
| | - Shilpi Singh
- Molecular Biology Section, Centre of Advanced Study in Botany, Institute of Science, Banaras Hindu University, 221005, Varanasi, India
| | - Ruchi Rai
- Molecular Biology Section, Centre of Advanced Study in Botany, Institute of Science, Banaras Hindu University, 221005, Varanasi, India
| | - L C Rai
- Molecular Biology Section, Centre of Advanced Study in Botany, Institute of Science, Banaras Hindu University, 221005, Varanasi, India.
| |
Collapse
|
21
|
Rai N, Rai KK, Singh MK, Singh J, Kaushik P. Investigating NAC Transcription Factor Role in Redox Homeostasis in Solanum lycopersicum L.: Bioinformatics, Physiological and Expression Analysis under Drought Stress. PLANTS (BASEL, SWITZERLAND) 2022; 11:2930. [PMID: 36365384 PMCID: PMC9654907 DOI: 10.3390/plants11212930] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/17/2022] [Revised: 10/24/2022] [Accepted: 10/25/2022] [Indexed: 06/16/2023]
Abstract
NAC transcription factors regulate stress-defence pathways and developmental processes in crop plants. However, their detailed functional characterization in tomatoes needs to be investigated comprehensively. In the present study, tomato hybrids subjected to 60 and 80 days of drought stress conditions showed a significant increase in membrane damage and reduced relative water, chlorophyll and proline content. However, hybrids viz., VRTH-16-3 and VRTH-17-68 showed superior growth under drought stress, as they were marked with low electrolytic leakage, enhanced relative water content, proline content and an enhanced activity of enzymatic antioxidants, along with the upregulation of NAC and other stress-defence pathway genes. Candidate gene(s) exhibiting maximum expression in all the hybrids under drought stress were subjected to detailed in silico characterization to provide significant insight into its structural and functional classification. The homology modelling and superimposition analysis of predicted tomato NAC protein showed that similar amino acid residues were involved in forming the conserved WKAT domain. DNA docking discovered that the SlNAC1 protein becomes activated and exerts a stress-defence response after the possible interaction of conserved DNA elements using Pro72, Asn73, Trp81, Lys82, Ala83, Thr84, Gly85, Thr86 and Asp87 residues. A protein-protein interaction analysis identified ten functional partners involved in the induction of stress-defence tolerance.
Collapse
Affiliation(s)
- Nagendra Rai
- Indian Institute of Vegetable Research (IIVR), Varanasi 221305, UP, India
| | - Krishna Kumar Rai
- Indian Institute of Vegetable Research (IIVR), Varanasi 221305, UP, India
- Department of Botany, Institute of Science, Banaras Hindu University, Varanasi 221005, UP, India
| | - Manish Kumar Singh
- Indian Institute of Vegetable Research (IIVR), Varanasi 221305, UP, India
| | - Jagdish Singh
- Indian Institute of Vegetable Research (IIVR), Varanasi 221305, UP, India
| | - Prashant Kaushik
- Instituto de Conservación y Mejora de la Agrodiversidad Valenciana, Universitat Politècnica de València, 46022 Valencia, Spain
| |
Collapse
|
22
|
Rahman MA, Heme UH, Parvez MAK. In silico functional annotation of hypothetical proteins from the Bacillus paralicheniformis strain Bac84 reveals proteins with biotechnological potentials and adaptational functions to extreme environments. PLoS One 2022; 17:e0276085. [PMID: 36228026 PMCID: PMC9560612 DOI: 10.1371/journal.pone.0276085] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Accepted: 09/28/2022] [Indexed: 11/26/2022] Open
Abstract
Members of the Bacillus genus are industrial cell factories due to their capacity to secrete significant quantities of biomolecules with industrial applications. The Bacillus paralicheniformis strain Bac84 was isolated from the Red Sea and it shares a close evolutionary relationship with Bacillus licheniformis. However, a significant number of proteins in its genome are annotated as functionally uncharacterized hypothetical proteins. Investigating these proteins' functions may help us better understand how bacteria survive extreme environmental conditions and to find novel targets for biotechnological applications. Therefore, the purpose of our research was to functionally annotate the hypothetical proteins from the genome of B. paralicheniformis strain Bac84. We employed a structured in-silico approach incorporating numerous bioinformatics tools and databases for functional annotation, physicochemical characterization, subcellular localization, protein-protein interactions, and three-dimensional structure determination. Sequences of 414 hypothetical proteins were evaluated and we were able to successfully attribute a function to 37 hypothetical proteins. Moreover, we performed receiver operating characteristic analysis to assess the performance of various tools used in this present study. We identified 12 proteins having significant adaptational roles to unfavorable environments such as sporulation, formation of biofilm, motility, regulation of transcription, etc. Additionally, 8 proteins were predicted with biotechnological potentials such as coenzyme A biosynthesis, phenylalanine biosynthesis, rare-sugars biosynthesis, antibiotic biosynthesis, bioremediation, and others. Evaluation of the performance of the tools showed an accuracy of 98% which represented the rationality of the tools used. This work shows that this annotation strategy will make the functional characterization of unknown proteins easier and can find the target for further investigation. The knowledge of these hypothetical proteins' potential functions aids B. paralicheniformis strain Bac84 in effectively creating a new biotechnological target. In addition, the results may also facilitate a better understanding of the survival mechanisms in harsh environmental conditions.
Collapse
Affiliation(s)
- Md. Atikur Rahman
- Institute of Microbiology, Friedrich Schiller University Jena, Thuringia, Germany
| | - Uzma Habiba Heme
- Faculty of Biological Sciences, Friedrich Schiller University Jena, Thuringia, Germany
| | | |
Collapse
|
23
|
Crapitto AJ, Campbell A, Harris AJ, Goldman AD. A consensus view of the proteome of the last universal common ancestor. Ecol Evol 2022; 12:e8930. [PMID: 35784055 PMCID: PMC9165204 DOI: 10.1002/ece3.8930] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2022] [Revised: 04/11/2022] [Accepted: 04/14/2022] [Indexed: 12/30/2022] Open
Abstract
The availability of genomic and proteomic data from across the tree of life has made it possible to infer features of the genome and proteome of the last universal common ancestor (LUCA). A number of studies have done so, all using a unique set of methods and bioinformatics databases. Here, we compare predictions across eight such studies and measure both their agreement with one another and with the consensus predictions among them. We find that some LUCA genome studies show a strong agreement with the consensus predictions of the others, but that no individual study shares a high or even moderate degree of similarity with any other individual study. From these observations, we conclude that the consensus among studies provides a more accurate depiction of the core proteome of the LUCA and its functional repertoire. The set of consensus LUCA protein family predictions between all of these studies portrays a LUCA genome that, at minimum, encoded functions related to protein synthesis, amino acid metabolism, nucleotide metabolism, and the use of common, nucleotide-derived organic cofactors.
Collapse
Affiliation(s)
| | - Amy Campbell
- Perelman School of MedicineUniversity of PennsylvaniaPhiladelphiaPennsylvaniaUSA
| | - AJ Harris
- Key Laboratory of Plant Resources Conservation and Sustainable UtilizationSouth China Botanical GardenChinese Academy of SciencesGuangzhouChina
| | - Aaron D. Goldman
- Department of BiologyOberlin CollegeOberlinOhioUSA
- Blue Marble Space Institute of ScienceSeattleWashingtonUSA
| |
Collapse
|
24
|
Chakravarty N, Sharma M, Kumar P, Singh RP. Biochemical and molecular insights on the bioactivity and binding interactions of Bacillus australimaris NJB19 L-asparaginase. Int J Biol Macromol 2022; 215:1-11. [PMID: 35718140 DOI: 10.1016/j.ijbiomac.2022.06.110] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2021] [Revised: 06/08/2022] [Accepted: 06/14/2022] [Indexed: 11/05/2022]
Abstract
L-asparaginase, an antileukemic enzyme, is indispensable to the treatment of Acute Lymphoblastic Leukemia (ALL). However, the intrinsic glutaminase activity entails various side effects to the patients; thus, an improved version of the enzyme lacking glutaminase activity would be a requisite for effective treatment management of ALL. The present study highlights the biochemical and molecular characteristics of the recombinant glutaminase-free L-asparaginase from Bacillus australimaris NJB19 (BaAsp). Investigation of the active site architecture of the protein unraveled the binding interactions of BaAsp with its substrate. Comparative analysis of the L-asparaginase sequences revealed few substitutions of key amino acids in the BaAsp that could construe its substrate selectivity and specificity. The purified heterologously expressed protein (42 kDa) displayed maximum L-asparaginase activity at 35-40 °C and pH 8.5-9, with no observed L-glutaminase activity. The kinetic parameters, Km and Vmax, were determined as 45.6 μM and 0.16 μmoles min-1, respectively. Furthermore, in silico analysis revealed a conserved zinc-binding site in the protein, which is generally implicated in inhibiting the L-asparaginase activity. However, BaAsp was not inhibited by zinc at 1 mM concentration. Therefore, the findings provide insights on the biochemical and molecular details of BaAsp, which could be valuable in formulating it for alternate antileukemic drug therapy.
Collapse
Affiliation(s)
- Namrata Chakravarty
- Department of Biosciences and Bioengineering, Indian Institute of Technology Roorkee, Roorkee 247667, India
| | - Monica Sharma
- Department of Biosciences and Bioengineering, Indian Institute of Technology Roorkee, Roorkee 247667, India
| | - Pravindra Kumar
- Department of Biosciences and Bioengineering, Indian Institute of Technology Roorkee, Roorkee 247667, India
| | - R P Singh
- Department of Biosciences and Bioengineering, Indian Institute of Technology Roorkee, Roorkee 247667, India.
| |
Collapse
|
25
|
Barbera N, Granados ST, Vanoye CG, Abramova TV, Kulbak D, Ahn SJ, George AL, Akpa BS, Levitan I. Cholesterol-induced suppression of Kir2 channels is mediated by decoupling at the inter-subunit interfaces. iScience 2022; 25:104329. [PMID: 35602957 PMCID: PMC9120057 DOI: 10.1016/j.isci.2022.104329] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2021] [Revised: 03/29/2022] [Accepted: 04/26/2022] [Indexed: 12/29/2022] Open
Abstract
Cholesterol is a major regulator of multiple types of ion channels. Although there is increasing information about cholesterol binding sites, the molecular mechanisms through which cholesterol binding alters channel function are virtually unknown. In this study, we used a combination of Martini coarse-grained simulations, a network theory-based analysis, and electrophysiology to determine the effect of cholesterol on the dynamic structure of the Kir2.2 channel. We found that increasing membrane cholesterol reduced the likelihood of contact between specific regions of the cytoplasmic and transmembrane domains of the channel, most prominently at the subunit-subunit interfaces of the cytosolic domains. This decrease in contact was mediated by pairwise interactions of specific residues and correlated to the stoichiometry of cholesterol binding events. The predictions of the model were tested by site-directed mutagenesis of two identified residues-V265 and H222-and high throughput electrophysiology.
Collapse
Affiliation(s)
- Nicolas Barbera
- Division of Pulmonary and Critical Care Medicine, Department of Medicine, University of Illinois at Chicago, Chicago, IL 60611, USA
| | - Sara T. Granados
- Division of Pulmonary and Critical Care Medicine, Department of Medicine, University of Illinois at Chicago, Chicago, IL 60611, USA
| | - Carlos Guillermo Vanoye
- Department of Pharmacology; Northwestern University Feinberg School of Medicine, Chicago, IL 60611, USA
| | - Tatiana V. Abramova
- Department of Pharmacology; Northwestern University Feinberg School of Medicine, Chicago, IL 60611, USA
| | - Danielle Kulbak
- Division of Pulmonary and Critical Care Medicine, Department of Medicine, University of Illinois at Chicago, Chicago, IL 60611, USA
| | - Sang Joon Ahn
- Division of Pulmonary and Critical Care Medicine, Department of Medicine, University of Illinois at Chicago, Chicago, IL 60611, USA
| | - Alfred L. George
- Department of Pharmacology; Northwestern University Feinberg School of Medicine, Chicago, IL 60611, USA
| | - Belinda S. Akpa
- Division of Biosciences, Oak Ridge National Laboratory, Oak Ridge, TN 37830, USA
- Department of Chemical & Biomolecular Engineering, University of Tennessee, Knoxville, TN 37996, USA
- Molecular Biomedical Sciences, North Carolina State University, Raleigh, NC 27695, USA
| | - Irena Levitan
- Division of Pulmonary and Critical Care Medicine, Department of Medicine, University of Illinois at Chicago, Chicago, IL 60611, USA
| |
Collapse
|
26
|
Micsonai A, Moussong É, Wien F, Boros E, Vadászi H, Murvai N, Lee YH, Molnár T, Réfrégiers M, Goto Y, Tantos Á, Kardos J. BeStSel: webserver for secondary structure and fold prediction for protein CD spectroscopy. Nucleic Acids Res 2022; 50:W90-W98. [PMID: 35544232 PMCID: PMC9252784 DOI: 10.1093/nar/gkac345] [Citation(s) in RCA: 151] [Impact Index Per Article: 50.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Revised: 04/18/2022] [Accepted: 05/09/2022] [Indexed: 12/15/2022] Open
Abstract
Circular dichroism (CD) spectroscopy is widely used to characterize the secondary structure composition of proteins. To derive accurate and detailed structural information from the CD spectra, we have developed the Beta Structure Selection (BeStSel) method (PNAS, 112, E3095), which can handle the spectral diversity of β-structured proteins. The BeStSel webserver provides this method with useful accessories to the community with the main goal to analyze single or multiple protein CD spectra. Uniquely, BeStSel provides information on eight secondary structure components including parallel β-structure and antiparallel β-sheets with three different groups of twist. It overperforms any available method in accuracy and information content, moreover, it is capable of predicting the protein fold down to the topology/homology level of the CATH classification. A new module of the webserver helps to distinguish intrinsically disordered proteins by their CD spectrum. Secondary structure calculation for uploaded PDB files will help the experimental verification of protein MD and in silico modelling using CD spectroscopy. The server also calculates extinction coefficients from the primary sequence for CD users to determine the accurate protein concentrations which is a prerequisite for reliable secondary structure determination. The BeStSel server can be freely accessed at https://bestsel.elte.hu.
Collapse
Affiliation(s)
- András Micsonai
- ELTE NAP Neuroimmunology Research Group, Department of Biochemistry, Institute of Biology, ELTE Eötvös Loránd University, Budapest H-1117, Hungary
| | - Éva Moussong
- ELTE NAP Neuroimmunology Research Group, Department of Biochemistry, Institute of Biology, ELTE Eötvös Loránd University, Budapest H-1117, Hungary
| | - Frank Wien
- Synchrotron SOLEIL, Gif-sur-Yvette 91192, France
| | - Eszter Boros
- Department of Biochemistry, Institute of Biology, ELTE Eötvös Loránd University, Budapest H-1117, Hungary
| | - Henrietta Vadászi
- ELTE NAP Neuroimmunology Research Group, Department of Biochemistry, Institute of Biology, ELTE Eötvös Loránd University, Budapest H-1117, Hungary
| | - Nikoletta Murvai
- Department of Biochemistry, Institute of Biology, ELTE Eötvös Loránd University, Budapest H-1117, Hungary.,Institute of Enzymology, Research Centre for Natural Sciences, Budapest H-1117, Hungary
| | - Young-Ho Lee
- Research Center of Bioconvergence Analysis, Korea Basic Science Institute (KBSI), Ochang 28119, Republic of Korea.,Bio-Analytical Science, University of Science and Technology (UST), Daejeon 34113, Republic of Korea.,Graduate School of Analytical Science and Technology (GRAST), Chungnam National University (CNU), Daejeon 34134, Republic of Korea
| | - Tamás Molnár
- ELTE NAP Neuroimmunology Research Group, Department of Biochemistry, Institute of Biology, ELTE Eötvös Loránd University, Budapest H-1117, Hungary
| | - Matthieu Réfrégiers
- Synchrotron SOLEIL, Gif-sur-Yvette 91192, France.,Centre de Biophysique Moléculaire, CNRS UPR4301, Orléans, France
| | - Yuji Goto
- Global Center for Medical Engineering and Informatics, Osaka University, Osaka 565-0871, Japan
| | - Ágnes Tantos
- Institute of Enzymology, Research Centre for Natural Sciences, Budapest H-1117, Hungary
| | - József Kardos
- ELTE NAP Neuroimmunology Research Group, Department of Biochemistry, Institute of Biology, ELTE Eötvös Loránd University, Budapest H-1117, Hungary
| |
Collapse
|
27
|
Ijaq J, Chandra D, Ray MK, Jagannadham MV. Investigating the Functional Role of Hypothetical Proteins From an Antarctic Bacterium Pseudomonas sp. Lz4W: Emphasis on Identifying Proteins Involved in Cold Adaptation. Front Genet 2022; 13:825269. [PMID: 35360867 PMCID: PMC8963723 DOI: 10.3389/fgene.2022.825269] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Accepted: 02/07/2022] [Indexed: 11/28/2022] Open
Abstract
Exploring the molecular mechanisms behind bacterial adaptation to extreme temperatures has potential biotechnological applications. In the present study, Pseudomonas sp. Lz4W, a Gram-negative psychrophilic bacterium adapted to survive in Antarctica, was selected to decipher the molecular mechanism underlying the cold adaptation. Proteome analysis of the isolates grown at 4°C was performed to identify the proteins and pathways that are responsible for the adaptation. However, many proteins from the expressed proteome were found to be hypothetical proteins (HPs), whose function is unknown. Investigating the functional roles of these proteins may provide additional information in the biological understanding of the bacterial cold adaptation. Thus, our study aimed to assign functions to these HPs and understand their role at the molecular level. We used a structured insilico workflow combining different bioinformatics tools and databases for functional annotation. Pseudomonas sp. Lz4W genome (CP017432, version 1) contains 4493 genes and 4412 coding sequences (CDS), of which 743 CDS were annotated as HPs. Of these, from the proteome analysis, 61 HPs were found to be expressed consistently at the protein level. The amino acid sequences of these 61 HPs were submitted to our workflow and we could successfully assign a function to 18 HPs. Most of these proteins were predicted to be involved in biological mechanisms of cold adaptations such as peptidoglycan metabolism, cell wall organization, ATP hydrolysis, outer membrane fluidity, catalysis, and others. This study provided a better understanding of the functional significance of HPs in cold adaptation of Pseudomonas sp. Lz4W. Our approach emphasizes the importance of addressing the “hypothetical protein problem” for a thorough understanding of mechanisms at the cellular level, as well as, provided the assessment of integrating proteomics methods with various annotation and curation approaches to characterize hypothetical or uncharacterized protein data. The MS proteomics data generated from this study has been deposited to the ProteomeXchange through PRIDE with the dataset identifier–PXD029741.
Collapse
Affiliation(s)
- Johny Ijaq
- Metabolomics Facility, School of Life Sciences, University of Hyderabad, Hyderabad, India
| | - Deepika Chandra
- CSIR-Centre for Cellular and Molecular Biology, Hyderabad, India
| | - Malay Kumar Ray
- CSIR-Centre for Cellular and Molecular Biology, Hyderabad, India
| | - M. V. Jagannadham
- Metabolomics Facility, School of Life Sciences, University of Hyderabad, Hyderabad, India
- *Correspondence: M. V. Jagannadham,
| |
Collapse
|
28
|
Erdogan F, Qadree AK, Radu TB, Orlova A, de Araujo ED, Israelian J, Valent P, Mustjoki SM, Herling M, Moriggl R, Gunning PT. Structural and mutational analysis of member-specific STAT functions. Biochim Biophys Acta Gen Subj 2022; 1866:130058. [PMID: 34774983 DOI: 10.1016/j.bbagen.2021.130058] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Revised: 10/29/2021] [Accepted: 11/05/2021] [Indexed: 12/21/2022]
Abstract
BACKGROUND The STAT family of transcription factors control gene expression in response to signals from various stimulus. They display functions in diseases ranging from autoimmunity and chronic inflammatory disease to cancer and infectious disease. SCOPE OF REVIEW This work uses an approach informed by structural data to explore how domain-specific structural variations, post-translational modifications, and the cancer genome mutational landscape dictate STAT member-specific activities. MAJOR CONCLUSIONS We illustrated the structure-function relationship of STAT proteins and highlighted their effect on member-specific activity. We correlated disease-linked STAT mutations to the structure and cancer genome mutational landscape and proposed rational drug targeting approaches of oncogenic STAT pathway addiction. GENERAL SIGNIFICANCE Hyper-activated STATs and their variants are associated with multiple diseases and are considered high value oncology targets. A full understanding of the molecular basis of member-specific STAT-mediated signaling and the strategies to selectively target them requires examination of the difference in their structures and sequences.
Collapse
Affiliation(s)
- Fettah Erdogan
- Department of Chemical and Physical Sciences, University of Toronto Mississauga, 3359 Mississauga Rd N., Mississauga, Canada; Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, Canada
| | - Abdul K Qadree
- Department of Chemical and Physical Sciences, University of Toronto Mississauga, 3359 Mississauga Rd N., Mississauga, Canada; Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, Canada
| | - Tudor B Radu
- Department of Chemical and Physical Sciences, University of Toronto Mississauga, 3359 Mississauga Rd N., Mississauga, Canada; Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, Canada
| | - Anna Orlova
- Institute of Animal Breeding and Genetics, University of Veterinary Medicine, A-1210 Vienna, Austria
| | - Elvin D de Araujo
- Department of Chemical and Physical Sciences, University of Toronto Mississauga, 3359 Mississauga Rd N., Mississauga, Canada
| | - Johan Israelian
- Department of Chemical and Physical Sciences, University of Toronto Mississauga, 3359 Mississauga Rd N., Mississauga, Canada; Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, Canada
| | - Peter Valent
- Department of Internal Medicine I, Division of Hematology and Hemostaseology, Medical University of Vienna, Vienna, Austria; Ludwig Boltzmann Institute for Hematology and Oncology, Medical University of Vienna, Vienna, Austria
| | - Satu M Mustjoki
- Hematology Research Unit, Helsinki University Hospital Comprehensive Cancer Center, Helsinki, Finland; Translational Immunology Research Program and Department of Clinical Chemistry and Hematology, University of Helsinki, Helsinki, Finland; iCAN Digital Precision Cancer Medicine Flagship, Helsinki, Finland
| | - Marco Herling
- Department of Hematology, Cellular Therapy, and Hemostaseology, University of Leipzig, Leipzig, Germany
| | - Richard Moriggl
- Institute of Animal Breeding and Genetics, University of Veterinary Medicine, A-1210 Vienna, Austria
| | - Patrick T Gunning
- Department of Chemical and Physical Sciences, University of Toronto Mississauga, 3359 Mississauga Rd N., Mississauga, Canada; Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, Canada.
| |
Collapse
|
29
|
Tseng YY, Sanders MA, Zhang H, Zhou L, Chou CY, Granneman JG. Structural and functional insights into ABHD5, a ligand-regulated lipase co-activator. Sci Rep 2022; 12:2565. [PMID: 35173175 PMCID: PMC8850477 DOI: 10.1038/s41598-021-04179-7] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2021] [Accepted: 12/09/2021] [Indexed: 02/06/2023] Open
Abstract
Alpha/beta hydrolase domain-containing protein 5 (ABHD5) is a highly conserved protein that regulates various lipid metabolic pathways via interactions with members of the perilipin (PLIN) and Patatin-like phospholipase domain-containing protein (PNPLA) protein families. Loss of function mutations in ABHD5 result in Chanarin-Dorfman Syndrome (CDS), characterized by ectopic lipid accumulation in numerous cell types and severe ichthyosis. Recent data demonstrates that ABHD5 is the target of synthetic and endogenous ligands that might be therapeutic beneficial for treating metabolic diseases and cancers. However, the structural basis of ABHD5 functional activities, such as protein-protein interactions and ligand binding is presently unknown. To address this gap, we constructed theoretical structural models of ABHD5 by comparative modeling and topological shape analysis to assess the spatial patterns of ABHD5 conformations computed in protein dynamics. We identified functionally important residues on ABHD5 surface for lipolysis activation by PNPLA2, lipid droplet targeting and PLIN-binding. We validated the computational model by examining the effects of mutating key residues in ABHD5 on an array of functional assays. Our integrated computational and experimental findings provide new insights into the structural basis of the diverse functions of ABHD5 as well as pathological mutations that result in CDS.
Collapse
Affiliation(s)
- Yan Yuan Tseng
- Center for Molecular Medicine and Genetics, Wayne State University School of Medicine, Detroit, MI, 48201, USA.
- Karmanos Cancer Institute, Wayne State University School of Medicine, 4100 John R, Detroit, MI, 48201, USA.
| | - Matthew A Sanders
- Center for Molecular Medicine and Genetics, Wayne State University School of Medicine, Detroit, MI, 48201, USA
- Center for Integrative Metabolic and Endocrine Research, Wayne State University School of Medicine, Detroit, MI, 48201, USA
| | - Huamei Zhang
- Center for Molecular Medicine and Genetics, Wayne State University School of Medicine, Detroit, MI, 48201, USA
- Center for Integrative Metabolic and Endocrine Research, Wayne State University School of Medicine, Detroit, MI, 48201, USA
| | - Li Zhou
- Center for Molecular Medicine and Genetics, Wayne State University School of Medicine, Detroit, MI, 48201, USA
- Center for Integrative Metabolic and Endocrine Research, Wayne State University School of Medicine, Detroit, MI, 48201, USA
| | - Chia-Yi Chou
- Center for Molecular Medicine and Genetics, Wayne State University School of Medicine, Detroit, MI, 48201, USA
| | - James G Granneman
- Center for Molecular Medicine and Genetics, Wayne State University School of Medicine, Detroit, MI, 48201, USA.
- Center for Integrative Metabolic and Endocrine Research, Wayne State University School of Medicine, Detroit, MI, 48201, USA.
| |
Collapse
|
30
|
Hou R, Liu X, Yang H, Deng S, Cheng C, Liu J, Li Y, Zhang Y, Jiang J, Zhu Z, Su Y, Wu L, Xie Y, Li X, Li W, Liu Z, Fang W. Chemically synthesized cinobufagin suppresses nasopharyngeal carcinoma metastasis by inducing ENKUR to stabilize p53 expression. Cancer Lett 2022; 531:57-70. [PMID: 35114328 DOI: 10.1016/j.canlet.2022.01.025] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2021] [Revised: 01/01/2022] [Accepted: 01/19/2022] [Indexed: 02/09/2023]
Abstract
Clinically, the metastasis of tumor cells is the key factor of death in patients with cancer. In this study, we used a model of metastatic nasopharyngeal carcinoma (NPC) to explore the effects of a new chemical, cinobufagin (CB), combined with cisplatin (DDP). We observed that chemically synthesized CB strongly decreased the metastasis of NPC. Furthermore, a better therapeutic effect was shown when CB was combined with DDP. Molecular analysis revealed that CB induced ENKUR expression by deregulating the PI3K/AKT pathway and suppressing c-Jun, an oncogenic transcriptional factor that binds to the ENKUR promoter and negatively modulated its expression in NPC. ENKUR as a tumor suppressor binds to MYH9 and decreases its expression by recruiting β-catenin via its enkurin domain to prevent its nuclear accumulation, which therefore suppresses c-Jun-induced MYH9 expression. Subsequently, downregulated MYH9 reduces the enlistment of E3 ligase UBE3A and thus decreases the UBE3A-mediated ubiquitination degradation of p53, a key tumor suppressor that decreases epithelial-mesenchymal transition (EMT). Clinical sample analysis demonstrated that the ENKUR expression level was significantly reduced in NPC tissues. Its decreased expression substantially promoted clinical progression and reflected poor prognosis for patients with NPC. This study demonstrated that CB induced ENKUR to repress the β-catenin/c-Jun/MYH9 signal and thus decreased UBE3A-mediated p53 ubiquitination degradation. As a result, the EMT signal was inactivated to suppress NPC metastasis.
Collapse
Affiliation(s)
- Rentao Hou
- Cancer Center, Integrated Hospital of Traditional Chinese Medicine, Southern Medical University, Guangzhou, China
| | - Xiong Liu
- Department of Otolaryngology-Head and Neck Surgery, Nanfang Hospital, Southern Medical University, Guangzhou, China.
| | - Huiling Yang
- School of Pharmacy, Guangdong Medical University, Dongguan, China
| | - Shuting Deng
- Cancer Center, Integrated Hospital of Traditional Chinese Medicine, Southern Medical University, Guangzhou, China
| | - Chao Cheng
- Otolaryngology Department, Shenzhen Hospital, Southern Medical University, Guangzhou, China
| | - Jiahao Liu
- Cancer Center, Integrated Hospital of Traditional Chinese Medicine, Southern Medical University, Guangzhou, China
| | - Yonghao Li
- Cancer Center, Integrated Hospital of Traditional Chinese Medicine, Southern Medical University, Guangzhou, China
| | - Yewei Zhang
- Hepatobiliary Surgery, Guizhou Medical University, Guiyang, Guizhou, China
| | - Jingwen Jiang
- Cancer Center, Integrated Hospital of Traditional Chinese Medicine, Southern Medical University, Guangzhou, China; Oncology Department, Traditional Chinese Medicine Hospital of Hainan Provincial, Haikou, China
| | - Zhibo Zhu
- Cancer Center, Integrated Hospital of Traditional Chinese Medicine, Southern Medical University, Guangzhou, China
| | - Yun Su
- Key Laboratory of Protein Modification and Degradation, School of Basic Medical Sciences, Affiliated Cancer Hospital and Institute of Guangzhou Medical University, Guangzhou, China
| | - Liyang Wu
- Key Laboratory of Protein Modification and Degradation, School of Basic Medical Sciences, Affiliated Cancer Hospital and Institute of Guangzhou Medical University, Guangzhou, China
| | - Yingying Xie
- Cancer Center, Integrated Hospital of Traditional Chinese Medicine, Southern Medical University, Guangzhou, China
| | - Xiaoning Li
- Cancer Center, Integrated Hospital of Traditional Chinese Medicine, Southern Medical University, Guangzhou, China
| | - Wenmin Li
- Cancer Center, Integrated Hospital of Traditional Chinese Medicine, Southern Medical University, Guangzhou, China
| | - Zhen Liu
- Cancer Center, Integrated Hospital of Traditional Chinese Medicine, Southern Medical University, Guangzhou, China; Affiliated Cancer Hospital & Institute of Guangzhou Medical University, China; Laboratory of Protein Modification and Degradation, State Key Laboratory of Respiratory Disease, Guangzhou Medical University, Guangzhou, China.
| | - Weiyi Fang
- Cancer Center, Integrated Hospital of Traditional Chinese Medicine, Southern Medical University, Guangzhou, China.
| |
Collapse
|
31
|
Rojano E, Jabato FM, Perkins JR, Córdoba-Caballero J, García-Criado F, Sillitoe I, Orengo C, Ranea JAG, Seoane-Zonjic P. Assigning protein function from domain-function associations using DomFun. BMC Bioinformatics 2022; 23:43. [PMID: 35033002 PMCID: PMC8761305 DOI: 10.1186/s12859-022-04565-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2020] [Accepted: 01/05/2022] [Indexed: 12/03/2022] Open
Abstract
BACKGROUND Protein function prediction remains a key challenge. Domain composition affects protein function. Here we present DomFun, a Ruby gem that uses associations between protein domains and functions, calculated using multiple indices based on tripartite network analysis. These domain-function associations are combined at the protein level, to generate protein-function predictions. RESULTS We analysed 16 tripartite networks connecting homologous superfamily and FunFam domains from CATH-Gene3D with functional annotations from the three Gene Ontology (GO) sub-ontologies, KEGG, and Reactome. We validated the results using the CAFA 3 benchmark platform for GO annotation, finding that out of the multiple association metrics and domain datasets tested, Simpson index for FunFam domain-function associations combined with Stouffer's method leads to the best performance in almost all scenarios. We also found that using FunFams led to better performance than superfamilies, and better results were found for GO molecular function compared to GO biological process terms. DomFun performed as well as the highest-performing method in certain CAFA 3 evaluation procedures in terms of [Formula: see text] and [Formula: see text] We also implemented our own benchmark procedure, Pathway Prediction Performance (PPP), which can be used to validate function prediction for additional annotations sources, such as KEGG and Reactome. Using PPP, we found similar results to those found with CAFA 3 for GO, moreover we found good performance for the other annotation sources. As with CAFA 3, Simpson index with Stouffer's method led to the top performance in almost all scenarios. CONCLUSIONS DomFun shows competitive performance with other methods evaluated in CAFA 3 when predicting proteins function with GO, although results vary depending on the evaluation procedure. Through our own benchmark procedure, PPP, we have shown it can also make accurate predictions for KEGG and Reactome. It performs best when using FunFams, combining Simpson index derived domain-function associations using Stouffer's method. The tool has been implemented so that it can be easily adapted to incorporate other protein features, such as domain data from other sources, amino acid k-mers and motifs. The DomFun Ruby gem is available from https://rubygems.org/gems/DomFun . Code maintained at https://github.com/ElenaRojano/DomFun . Validation procedure scripts can be found at https://github.com/ElenaRojano/DomFun_project .
Collapse
Affiliation(s)
- Elena Rojano
- Department of Molecular Biology and Biochemistry, University of Malaga, Bulevar Louis Pasteur, 31, 29010 Malaga, Spain
- Institute of Biomedical Research in Malaga (IBIMA), Dr. Miguel Díaz Recio, 28, 29010 Malaga, Spain
| | - Fernando M. Jabato
- Department of Molecular Biology and Biochemistry, University of Malaga, Bulevar Louis Pasteur, 31, 29010 Malaga, Spain
- Institute of Biomedical Research in Malaga (IBIMA), Dr. Miguel Díaz Recio, 28, 29010 Malaga, Spain
| | - James R. Perkins
- Department of Molecular Biology and Biochemistry, University of Malaga, Bulevar Louis Pasteur, 31, 29010 Malaga, Spain
- CIBER of Rare Diseases, Av. Monforte de Lemos, 3-5. Pabellon 11. Planta 0, 28029 Madrid, Spain
- Institute of Biomedical Research in Malaga (IBIMA), Dr. Miguel Díaz Recio, 28, 29010 Malaga, Spain
| | - José Córdoba-Caballero
- Department of Molecular Biology and Biochemistry, University of Malaga, Bulevar Louis Pasteur, 31, 29010 Malaga, Spain
| | - Federico García-Criado
- Department of Molecular Biology and Biochemistry, University of Malaga, Bulevar Louis Pasteur, 31, 29010 Malaga, Spain
| | - Ian Sillitoe
- Department of Structural and Molecular Biology, University College London, Gower Street, London, WC1E 6BT UK
| | - Christine Orengo
- Department of Structural and Molecular Biology, University College London, Gower Street, London, WC1E 6BT UK
| | - Juan A. G. Ranea
- Department of Molecular Biology and Biochemistry, University of Malaga, Bulevar Louis Pasteur, 31, 29010 Malaga, Spain
- CIBER of Rare Diseases, Av. Monforte de Lemos, 3-5. Pabellon 11. Planta 0, 28029 Madrid, Spain
- Institute of Biomedical Research in Malaga (IBIMA), Dr. Miguel Díaz Recio, 28, 29010 Malaga, Spain
| | - Pedro Seoane-Zonjic
- Department of Molecular Biology and Biochemistry, University of Malaga, Bulevar Louis Pasteur, 31, 29010 Malaga, Spain
- CIBER of Rare Diseases, Av. Monforte de Lemos, 3-5. Pabellon 11. Planta 0, 28029 Madrid, Spain
- Institute of Biomedical Research in Malaga (IBIMA), Dr. Miguel Díaz Recio, 28, 29010 Malaga, Spain
| |
Collapse
|
32
|
Gomes Ramalli S, John Miles A, Janes RW, Wallace BA. The PCDDB (Protein Circular Dichroism Data Bank): A Bioinformatics Resource for Protein Characterisations and Methods Development. J Mol Biol 2022; 434:167441. [PMID: 34999124 DOI: 10.1016/j.jmb.2022.167441] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2021] [Revised: 12/19/2021] [Accepted: 01/01/2022] [Indexed: 12/20/2022]
Abstract
The Protein Circular Dichroism Data Bank (PCDDB) [https://pcddb.cryst.bbk.ac.uk] is an established resource for the biological, biophysical, chemical, bioinformatics, and molecular biology communities. It is a freely-accessible repository of validated protein circular dichroism (CD) spectra and associated sample and other metadata, with entries having links to other bioinformatics resources including, amongst others, structure (PDB) and sequence (UniProt) databases, as well as to published papers which produced the data and cite the database entries. It includes primary (unprocessed) and final (processed) spectral data, which are available in both text and pictorial formats, as well as detailed sample and validation information produced for each of the entries. Recently the metadata content associated with each of the entries, as well as the number and structural breadth of the protein components included, have been expanded. The PCDDB includes data on both wild-type and mutant proteins, and because CD studies primarily examine proteins in solution, it also contains examples of the effects of different environments on their structures, plus thermal unfolding/folding series. Methods for both sequence and spectral comparisons are included. The data included in the PCDDB complement results from crystal, cryo-electron microscopy, NMR spectroscopy, bioinformatics characterisations and classifications, and other structural information available for the proteins via links to other databases. The entries in the PCDDB have been used for the development of new analytical methodologies, for interpreting spectral and other biophysical data, and for providing insight into structures and functions of individual soluble and membrane proteins and protein complexes.
Collapse
Affiliation(s)
- Sergio Gomes Ramalli
- Institute of Structural and Molecular Biology, Birkbeck, University of London, Malet Street, London WC1E 7HX, UK
| | - Andrew John Miles
- Institute of Structural and Molecular Biology, Birkbeck, University of London, Malet Street, London WC1E 7HX, UK
| | - Robert W Janes
- School of Biological and Behavioural Sciences, Queen Mary University of London, London E1 4NS, UK.
| | - B A Wallace
- Institute of Structural and Molecular Biology, Birkbeck, University of London, Malet Street, London WC1E 7HX, UK.
| |
Collapse
|
33
|
Waman VP, Orengo C, Kleywegt GJ, Lesk AM. Three-dimensional Structure Databases of Biological Macromolecules. Methods Mol Biol 2022; 2449:43-91. [PMID: 35507259 DOI: 10.1007/978-1-0716-2095-3_3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Databases of three-dimensional structures of proteins (and their associated molecules) provide: (a) Curated repositories of coordinates of experimentally determined structures, including extensive metadata; for instance information about provenance, details about data collection and interpretation, and validation of results. (b) Information-retrieval tools to allow searching to identify entries of interest and provide access to them. (c) Links among databases, especially to databases of amino-acid and genetic sequences, and of protein function; and links to software for analysis of amino-acid sequence and protein structure, and for structure prediction. (d) Collections of predicted three-dimensional structures of proteins. These will become more and more important after the breakthrough in structure prediction achieved by AlphaFold2. The single global archive of experimentally determined biomacromolecular structures is the Protein Data Bank (PDB). It is managed by wwPDB, a consortium of five partner institutions: the Protein Data Bank in Europe (PDBe), the Research Collaboratory for Structural Bioinformatics (RCSB), the Protein Data Bank Japan (PDBj), the BioMagResBank (BMRB), and the Electron Microscopy Data Bank (EMDB). In addition to jointly managing the PDB repository, the individual wwPDB partners offer many tools for analysis of protein and nucleic acid structures and their complexes, including providing computer-graphic representations. Their collective and individual websites serve as hubs of the community of structural biologists, offering newsletters, reports from Task Forces, training courses, and "helpdesks," as well as links to external software.Many specialized projects are based on the information contained in the PDB. Especially important are SCOP, CATH, and ECOD, which present classifications of protein domains.
Collapse
Affiliation(s)
- Vaishali P Waman
- Institute of Structural and Molecular Biology, University College London, London, UK
| | - Christine Orengo
- Institute of Structural and Molecular Biology, University College London, London, UK
| | - Gerard J Kleywegt
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
| | - Arthur M Lesk
- Department of Biochemistry and Molecular Biology and Center for Computational Biology and Bioinformatics, The Pennsylvania State University, University Park, PA, USA.
| |
Collapse
|
34
|
Chakrabarty B, Parekh N. DbStRiPs: Database of structural repeats in proteins. Protein Sci 2022; 31:23-36. [PMID: 33641184 PMCID: PMC8740836 DOI: 10.1002/pro.4052] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2020] [Revised: 02/11/2021] [Accepted: 02/15/2021] [Indexed: 01/03/2023]
Abstract
Recent interest in repeat proteins has arisen due to stable structural folds, high evolutionary conservation and repertoire of functions provided by these proteins. However, repeat proteins are poorly characterized because of high sequence variation between repeating units and structure-based identification and classification of repeats is desirable. Using a robust network-based pipeline, manual curation and Kajava's structure-based classification schema, we have developed a database of tandem structural repeats, Database of Structural Repeats in Proteins (DbStRiPs). A unique feature of this database is that available knowledge on sequence repeat families is incorporated by mapping Pfam classification scheme onto structural classification. Integration of sequence and structure-based classifications help in identifying different functional groups within the same structural subclass, leading to refinement in the annotation of repeat proteins. Analysis of complete Protein Data Bank revealed 16,472 repeat annotations in 15,141 protein chains, one previously uncharacterized novel protein repeat family (PRF), named left-handed beta helix, and 33 protein repeat clusters (PRCs). Based on their unique structural motif, ~79% of these repeat proteins are classified in one of the 14 PRFs or 33 PRCs, and the remaining are grouped as unclassified repeat proteins. Each repeat protein is provided with a detailed annotation in DbStRiPs that includes start and end boundaries of repeating units, copy number, secondary and tertiary structure view, repeat class/subclass, disease association, MSA of repeating units and cross-references to various protein pattern databases, human protein atlas and interaction resources. DbStRiPs provides easy search and download options to high-quality annotations of structural repeat proteins (URL: http://bioinf.iiit.ac.in/dbstrips/).
Collapse
Affiliation(s)
- Broto Chakrabarty
- Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information TechnologyHyderabadIndia
| | - Nita Parekh
- Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information TechnologyHyderabadIndia
| |
Collapse
|
35
|
Structural dynamics in the evolution of a bilobed protein scaffold. Proc Natl Acad Sci U S A 2021; 118:2026165118. [PMID: 34845009 PMCID: PMC8694067 DOI: 10.1073/pnas.2026165118] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/20/2021] [Indexed: 11/18/2022] Open
Abstract
Proteins conduct numerous complex biological functions by use of tailored structural dynamics. The molecular details of how these emerged from ancestral peptides remains mysterious. How does nature utilize the same repertoire of folds to diversify function? To shed light on this, we analyzed bilobed proteins with a common structural core, which is spread throughout the tree of life and is involved in diverse biological functions such as transcription, enzymatic catalysis, membrane transport, and signaling. We show here that the structural dynamics of the structural core differentiate predominantly via terminal additions during a long-period evolution. This diversifies substrate specificity and, ultimately, biological function. Novel biophysical tools allow the structural dynamics of proteins and the regulation of such dynamics by binding partners to be explored in unprecedented detail. Although this has provided critical insights into protein function, the means by which structural dynamics direct protein evolution remain poorly understood. Here, we investigated how proteins with a bilobed structure, composed of two related domains from the periplasmic-binding protein–like II domain family, have undergone divergent evolution, leading to adaptation of their structural dynamics. We performed a structural analysis on ∼600 bilobed proteins with a common primordial structural core, which we complemented with biophysical studies to explore the structural dynamics of selected examples by single-molecule Förster resonance energy transfer and Hydrogen–Deuterium exchange mass spectrometry. We show that evolutionary modifications of the structural core, largely at its termini, enable distinct structural dynamics, allowing the diversification of these proteins into transcription factors, enzymes, and extracytoplasmic transport-related proteins. Structural embellishments of the core created interdomain interactions that stabilized structural states, reshaping the active site geometry, and ultimately altered substrate specificity. Our findings reveal an as-yet-unrecognized mechanism for the emergence of functional promiscuity during long periods of evolution and are applicable to a large number of domain architectures.
Collapse
|
36
|
Gao M, Lund-Andersen P, Morehead A, Mahmud S, Chen C, Chen X, Giri N, Roy RS, Quadir F, Effler TC, Prout R, Abraham S, Elwasif W, Haas NQ, Skolnick J, Cheng J, Sedova A. High-Performance Deep Learning Toolbox for Genome-Scale Prediction of Protein Structure and Function. WORKSHOP ON MACHINE LEARNING IN HPC ENVIRONMENTS. WORKSHOP ON MACHINE LEARNING IN HPC ENVIRONMENTS 2021; 2021:46-57. [PMID: 35112110 PMCID: PMC8802329 DOI: 10.1109/mlhpc54614.2021.00010] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Computational biology is one of many scientific disciplines ripe for innovation and acceleration with the advent of high-performance computing (HPC). In recent years, the field of machine learning has also seen significant benefits from adopting HPC practices. In this work, we present a novel HPC pipeline that incorporates various machine-learning approaches for structure-based functional annotation of proteins on the scale of whole genomes. Our pipeline makes extensive use of deep learning and provides computational insights into best practices for training advanced deep-learning models for high-throughput data such as proteomics data. We showcase methodologies our pipeline currently supports and detail future tasks for our pipeline to envelop, including large-scale sequence comparison using SAdLSA and prediction of protein tertiary structures using AlphaFold2.
Collapse
Affiliation(s)
- Mu Gao
- Georgia Institute of Technology, Atlanta, GA
| | | | | | | | - Chen Chen
- University of Missouri, Columbia, MO
| | - Xiao Chen
- University of Missouri, Columbia, MO
| | | | | | | | | | - Ryan Prout
- Oak Ridge National Laboratory, Oak Ridge, TN
| | | | | | | | | | | | - Ada Sedova
- Oak Ridge National Laboratory, Oak Ridge, TN
| |
Collapse
|
37
|
Zhang F, Song H, Zeng M, Wu FX, Li Y, Pan Y, Li M. A Deep Learning Framework for Gene Ontology Annotations With Sequence- and Network-Based Information. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:2208-2217. [PMID: 31985440 DOI: 10.1109/tcbb.2020.2968882] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Knowledge of protein functions plays an important role in biology and medicine. With the rapid development of high-throughput technologies, a huge number of proteins have been discovered. However, there are a great number of proteins without functional annotations. A protein usually has multiple functions and some functions or biological processes require interactions of a plurality of proteins. Additionally, Gene Ontology provides a useful classification for protein functions and contains more than 40,000 terms. We propose a deep learning framework called DeepGOA to predict protein functions with protein sequences and protein-protein interaction (PPI) networks. For protein sequences, we extract two types of information: sequence semantic information and subsequence-based features. We use the word2vec technique to numerically represent protein sequences, and utilize a Bi-directional Long and Short Time Memory (Bi-LSTM) and multi-scale convolutional neural network (multi-scale CNN) to obtain the global and local semantic features of protein sequences, respectively. Additionally, we use the InterPro tool to scan protein sequences for extracting subsequence-based information, such as domains and motifs. Then, the information is plugged into a neural network to generate high-quality features. For the PPI network, the Deepwalk algorithm is applied to generate its embedding information of PPI. Then the two types of features are concatenated together to predict protein functions. To evaluate the performance of DeepGOA, several different evaluation methods and metrics are utilized. The experimental results show that DeepGOA outperforms DeepGO and BLAST.
Collapse
|
38
|
Lobb B, Tremblay BJM, Moreno-Hagelsieb G, Doxey AC. PathFams: statistical detection of pathogen-associated protein domains. BMC Genomics 2021; 22:663. [PMID: 34521345 PMCID: PMC8442362 DOI: 10.1186/s12864-021-07982-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2021] [Accepted: 09/01/2021] [Indexed: 11/10/2022] Open
Abstract
Background A substantial fraction of genes identified within bacterial genomes encode proteins of unknown function. Identifying which of these proteins represent potential virulence factors, and mapping their key virulence determinants, is a challenging but important goal. Results To facilitate virulence factor discovery, we performed a comprehensive analysis of 17,929 protein domain families within the Pfam database, and scored them based on their overrepresentation in pathogenic versus non-pathogenic species, taxonomic distribution, relative abundance in metagenomic datasets, and other factors. Conclusions We identify pathogen-associated domain families, candidate virulence factors in the human gut, and eukaryotic-like mimicry domains with likely roles in virulence. Furthermore, we provide an interactive database called PathFams to allow users to explore pathogen-associated domains as well as identify pathogen-associated domains and domain architectures in user-uploaded sequences of interest. PathFams is freely available at https://pathfams.uwaterloo.ca. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-021-07982-8.
Collapse
Affiliation(s)
- Briallen Lobb
- Department of Biology, University of Waterloo, Waterloo, Ontario, Canada
| | | | | | - Andrew C Doxey
- Department of Biology, University of Waterloo, Waterloo, Ontario, Canada.
| |
Collapse
|
39
|
Rauer C, Sen N, Waman VP, Abbasian M, Orengo CA. Computational approaches to predict protein functional families and functional sites. Curr Opin Struct Biol 2021; 70:108-122. [PMID: 34225010 DOI: 10.1016/j.sbi.2021.05.012] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Revised: 05/13/2021] [Accepted: 05/25/2021] [Indexed: 01/06/2023]
Abstract
Understanding the mechanisms of protein function is indispensable for many biological applications, such as protein engineering and drug design. However, experimental annotations are sparse, and therefore, theoretical strategies are needed to fill the gap. Here, we present the latest developments in building functional subclassifications of protein superfamilies and using evolutionary conservation to detect functional determinants, for example, catalytic-, binding- and specificity-determining residues important for delineating the functional families. We also briefly review other features exploited for functional site detection and new machine learning strategies for combining multiple features.
Collapse
Affiliation(s)
- Clemens Rauer
- Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK
| | - Neeladri Sen
- Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK
| | - Vaishali P Waman
- Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK
| | - Mahnaz Abbasian
- Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK
| | - Christine A Orengo
- Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK.
| |
Collapse
|
40
|
Kulmanov M, Zhapa-Camacho F, Hoehndorf R. DeepGOWeb: fast and accurate protein function prediction on the (Semantic) Web. Nucleic Acids Res 2021; 49:W140-W146. [PMID: 34019664 PMCID: PMC8262746 DOI: 10.1093/nar/gkab373] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2021] [Revised: 04/18/2021] [Accepted: 04/26/2021] [Indexed: 11/24/2022] Open
Abstract
Understanding the functions of proteins is crucial to understand biological processes on a molecular level. Many more protein sequences are available than can be investigated experimentally. DeepGOPlus is a protein function prediction method based on deep learning and sequence similarity. DeepGOWeb makes the prediction model available through a website, an API, and through the SPARQL query language for interoperability with databases that rely on Semantic Web technologies. DeepGOWeb provides accurate and fast predictions and ensures that predicted functions are consistent with the Gene Ontology; it can provide predictions for any protein and any function in Gene Ontology. DeepGOWeb is freely available at https://deepgo.cbrc.kaust.edu.sa/.
Collapse
Affiliation(s)
- Maxat Kulmanov
- Computational Bioscience Research Center, Computer, Electrical and Mathematical Sciences & Engineering Division, King Abdullah University of Science and Technology, 4700 King Abdullah University of Science and Technology, Thuwal 23955-6900, Saudi Arabia
| | - Fernando Zhapa-Camacho
- Computational Bioscience Research Center, Computer, Electrical and Mathematical Sciences & Engineering Division, King Abdullah University of Science and Technology, 4700 King Abdullah University of Science and Technology, Thuwal 23955-6900, Saudi Arabia
| | - Robert Hoehndorf
- Computational Bioscience Research Center, Computer, Electrical and Mathematical Sciences & Engineering Division, King Abdullah University of Science and Technology, 4700 King Abdullah University of Science and Technology, Thuwal 23955-6900, Saudi Arabia
| |
Collapse
|
41
|
Pandey N, Rai KK, Rai SK, Pandey-Rai S. Heterologous expression of cyanobacterial PCS confers augmented arsenic and cadmium stress tolerance and higher artemisinin in Artemisia annua hairy roots. PLANT BIOTECHNOLOGY REPORTS 2021; 15:317-334. [PMID: 34122662 PMCID: PMC8180384 DOI: 10.1007/s11816-021-00682-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/01/2021] [Revised: 05/01/2021] [Accepted: 05/22/2021] [Indexed: 06/12/2023]
Abstract
UNLABELLED The present study provides the first report of heterologous expression of phytochelatin synthase from Anabaena PCC 7120 (anaPCS) into the hairy roots of Artemisia annua. Transformed hairy roots of A. annua expressing anaPCS gene showed better tolerance to heavy metals, viz., arsenic (As) and cadmium (Cd) owing to 143 and 191% more As- and Cd-accumulation, respectively, as compared to normal roots with a bioconcentration factor (BCF) of 9.7 and 21.1 for As and Cd, respectively. Under As and Cd stresses, transformed hairy roots possessed significantly higher amounts of phytochelatins and thiols probably due to the presence of both AaPCS (Artemisia annua PCS) and anaPCS. In addition, artemisinin synthesis was also induced in transformed hairy roots under heavy metals stresses. In-silico analysis revealed the presence of conserved motifs in both AaPCS and anaPCS sequences as well as structural modelling of PCS functional domain was conducted. Interaction of AaPCS and anaPCS proteins with CdCl2 and sodium arsenate gene ontology analysis gave insights to anaPCS functioning in transformed hairy roots of A. annua. The study provides transformed hairy roots of A. annua as an efficient tool for effective phytoremediation with added advantages of artemisinin extraction from hairy roots used for phytoremediation. SUPPLEMENTARY INFORMATION The online version contains supplementary material available at 10.1007/s11816-021-00682-5.
Collapse
Affiliation(s)
- Neha Pandey
- Department of Botany, CMP PG College (A Constituent PG College of University of Allahabad), Prayagraj, India
- Department of Botany, Institute of Science, Banaras Hindu University, Varanasi, India
| | - Krishna Kumar Rai
- Department of Botany, Institute of Science, Banaras Hindu University, Varanasi, India
| | - Sanjay Kumar Rai
- Department of Horticulture, Dr. Rajendra Prasad Agricultural University, Pusa, Samastipur, Bihar India
| | - Shashi Pandey-Rai
- Department of Botany, Institute of Science, Banaras Hindu University, Varanasi, India
| |
Collapse
|
42
|
Moreira-Filho JT, Silva AC, Dantas RF, Gomes BF, Souza Neto LR, Brandao-Neto J, Owens RJ, Furnham N, Neves BJ, Silva-Junior FP, Andrade CH. Schistosomiasis Drug Discovery in the Era of Automation and Artificial Intelligence. Front Immunol 2021; 12:642383. [PMID: 34135888 PMCID: PMC8203334 DOI: 10.3389/fimmu.2021.642383] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Accepted: 04/30/2021] [Indexed: 12/20/2022] Open
Abstract
Schistosomiasis is a parasitic disease caused by trematode worms of the genus Schistosoma and affects over 200 million people worldwide. The control and treatment of this neglected tropical disease is based on a single drug, praziquantel, which raises concerns about the development of drug resistance. This, and the lack of efficacy of praziquantel against juvenile worms, highlights the urgency for new antischistosomal therapies. In this review we focus on innovative approaches to the identification of antischistosomal drug candidates, including the use of automated assays, fragment-based screening, computer-aided and artificial intelligence-based computational methods. We highlight the current developments that may contribute to optimizing research outputs and lead to more effective drugs for this highly prevalent disease, in a more cost-effective drug discovery endeavor.
Collapse
Affiliation(s)
- José T. Moreira-Filho
- LabMol – Laboratory for Molecular Modeling and Drug Design, Faculdade de Farmácia, Universidade Federal de Goiás – UFG, Goiânia, Brazil
| | - Arthur C. Silva
- LabMol – Laboratory for Molecular Modeling and Drug Design, Faculdade de Farmácia, Universidade Federal de Goiás – UFG, Goiânia, Brazil
| | - Rafael F. Dantas
- LaBECFar – Laboratório de Bioquímica Experimental e Computacional de Fármacos, Instituto Oswaldo Cruz, Fundação Oswaldo Cruz, Rio de Janeiro, Brazil
| | - Barbara F. Gomes
- LaBECFar – Laboratório de Bioquímica Experimental e Computacional de Fármacos, Instituto Oswaldo Cruz, Fundação Oswaldo Cruz, Rio de Janeiro, Brazil
| | - Lauro R. Souza Neto
- LaBECFar – Laboratório de Bioquímica Experimental e Computacional de Fármacos, Instituto Oswaldo Cruz, Fundação Oswaldo Cruz, Rio de Janeiro, Brazil
| | - Jose Brandao-Neto
- Diamond Light Source Ltd., Didcot, United Kingdom
- Research Complex at Harwell, Didcot, United Kingdom
| | - Raymond J. Owens
- The Rosalind Franklin Institute, Harwell, United Kingdom
- Division of Structural Biology, The Wellcome Centre for Human Genetic, University of Oxford, Oxford, United Kingdom
| | - Nicholas Furnham
- Department of Infection Biology, Faculty of Infectious and Tropical Diseases, London School of Hygiene and Tropical Medicine, London, United Kingdom
| | - Bruno J. Neves
- LabMol – Laboratory for Molecular Modeling and Drug Design, Faculdade de Farmácia, Universidade Federal de Goiás – UFG, Goiânia, Brazil
| | - Floriano P. Silva-Junior
- LaBECFar – Laboratório de Bioquímica Experimental e Computacional de Fármacos, Instituto Oswaldo Cruz, Fundação Oswaldo Cruz, Rio de Janeiro, Brazil
| | - Carolina H. Andrade
- LabMol – Laboratory for Molecular Modeling and Drug Design, Faculdade de Farmácia, Universidade Federal de Goiás – UFG, Goiânia, Brazil
| |
Collapse
|
43
|
Pathogenic missense protein variants affect different functional pathways and proteomic features than healthy population variants. PLoS Biol 2021; 19:e3001207. [PMID: 33909605 PMCID: PMC8110273 DOI: 10.1371/journal.pbio.3001207] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2020] [Revised: 05/10/2021] [Accepted: 03/26/2021] [Indexed: 12/27/2022] Open
Abstract
Missense variants are present amongst the healthy population, but some of them are causative of human diseases. A classification of variants associated with “healthy” or “diseased” states is therefore not always straightforward. A deeper understanding of the nature of missense variants in health and disease, the cellular processes they may affect, and the general molecular principles which underlie these differences is essential to offer mechanistic explanations of the true impact of pathogenic variants. Here, we have formalised a statistical framework which enables robust probabilistic quantification of variant enrichment across full-length proteins, their domains, and 3D structure-defined regions. Using this framework, we validate and extend previously reported trends of variant enrichment in different protein structural regions (surface/core/interface). By examining the association of variant enrichment with available functional pathways and transcriptomic and proteomic (protein half-life, thermal stability, abundance) data, we have mined a rich set of molecular features which distinguish between pathogenic and population variants: Pathogenic variants mainly affect proteins involved in cell proliferation and nucleotide processing and are enriched in more abundant proteins. Additionally, rare population variants display features closer to common than pathogenic variants. We validate the association between these molecular features and variant pathogenicity by comparing against existing in silico variant impact annotations. This study provides molecular details into how different proteins exhibit resilience and/or sensitivity towards missense variants and provides the rationale to prioritise variant-enriched proteins and protein domains for therapeutic targeting and development. The ZoomVar database, which we created for this study, is available at fraternalilab.kcl.ac.uk/ZoomVar. It allows users to programmatically annotate missense variants with protein structural information and to calculate variant enrichment in different protein structural regions. How do can one improve the classification of genetic variants as harmful or harmless? This study uses a robust statistical analysis to exploit the interplay between protein structure, proteomic measurements and functional pathways to enable better discrimination between missense variants in health and disease.
Collapse
|
44
|
Recombinant Production and Characterization of an Extracellular Subtilisin-Like Serine Protease from Acinetobacter baumannii of Fermented Food Origin. Protein J 2021; 40:419-435. [PMID: 33870461 PMCID: PMC8053418 DOI: 10.1007/s10930-021-09986-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/05/2021] [Indexed: 12/20/2022]
Abstract
Acinetobacter baumannii is a ubiquitous bacteria that is increasingly becoming a formidable nosocomial pathogen. Due to its clinical relevance, studies on the bacteria's secretory molecules especially extracellular proteases are of interest primarily in relation to the enzyme's role in virulence. Besides, favorable properties that extracellular proteases possess may be exploited for commercial use thus there is a need to investigate extracellular proteases from Acinetobacter baumannii to gain insights into their catalytic properties. In this study, an extracellular subtilisin-like serine protease from Acinetobacter baumannii designated as SPSFQ that was isolated from fermented food was recombinantly expressed and characterized. The mature catalytically active form of SPSFQ shared a high percentage sequence identity of 99% to extracellular proteases from clinical isolates of Acinetobacter baumannii and Klebsiella pneumoniae as well as a moderately high percentage identity to other bacterial proteases with known keratinolytic and collagenolytic activity. The homology model of mature SPSFQ revealed its structure is composed of 10 β-strands, 8 α-helices, and connecting loops resembling a typical architecture of subtilisin-like α/β motif. SPSFQ is catalytically active at an optimum temperature of 40 °C and pH 9. Its activity is stimulated in the presence of Ca2+ and severely inhibited in the presence of PMSF. SPSFQ also displayed the ability to degrade several tissue-associated protein substrates such as keratin, collagen, and fibrin. Accordingly, our study shed light on the catalytic properties of a previously uncharacterized extracellular serine protease from Acinetobacter baumannii that warrants further investigations into its potential role as a virulence factor in pathogenicity and commercial applications.
Collapse
|
45
|
Cicconardi F, Krapf P, D'Annessa I, Gamisch A, Wagner HC, Nguyen AD, Economo EP, Mikheyev AS, Guénard B, Grabherr R, Andesner P, Wolfgang A, Di Marino D, Steiner FM, Schlick-Steiner BC. Genomic Signature of Shifts in Selection in a Subalpine Ant and Its Physiological Adaptations. Mol Biol Evol 2021; 37:2211-2227. [PMID: 32181804 PMCID: PMC7403626 DOI: 10.1093/molbev/msaa076] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Understanding how organisms adapt to extreme environments is fundamental and can provide insightful case studies for both evolutionary biology and climate-change biology. Here, we take advantage of the vast diversity of lifestyles in ants to identify genomic signatures of adaptation to extreme habitats such as high altitude. We hypothesized two parallel patterns would occur in a genome adapting to an extreme habitat: 1) strong positive selection on genes related to adaptation and 2) a relaxation of previous purifying selection. We tested this hypothesis by sequencing the high-elevation specialist Tetramorium alpestre and four other phylogenetically related species. In support of our hypothesis, we recorded a strong shift of selective forces in T. alpestre, in particular a stronger magnitude of diversifying and relaxed selection when compared with all other ants. We further disentangled candidate molecular adaptations in both gene expression and protein-coding sequence that were identified by our genome-wide analyses. In particular, we demonstrate that T. alpestre has 1) a higher level of expression for stv and other heat-shock proteins in chill-shock tests and 2) enzymatic enhancement of Hex-T1, a rate-limiting regulatory enzyme that controls the entry of glucose into the glycolytic pathway. Together, our analyses highlight the adaptive molecular changes that support colonization of high-altitude environments.
Collapse
Affiliation(s)
| | - Patrick Krapf
- Department of Ecology, University of Innsbruck, Innsbruck, Austria
| | - Ilda D'Annessa
- Istituto di Scienze e Tecnologie Chimiche "Giulio Natta", CNR (SCITEC-CNR), Milan, Italy
| | - Alexander Gamisch
- Department of Ecology, University of Innsbruck, Innsbruck, Austria.,Department of Biosciences, University of Salzburg, Salzburg, Austria
| | - Herbert C Wagner
- Department of Ecology, University of Innsbruck, Innsbruck, Austria
| | - Andrew D Nguyen
- Department of Entomology and Nematology, University of Florida, Gainesville, FL
| | - Evan P Economo
- Biodiversity & Biocomplexity Unit, Okinawa Institute of Science & Technology, Onna, Japan
| | - Alexander S Mikheyev
- Ecology and Evolution Unit, Okinawa Institute of Science & Technology, Onna, Japan
| | - Benoit Guénard
- School of Biological Sciences, The University of Hong Kong, Hong Kong, China
| | - Reingard Grabherr
- Institute of Biotechnology, University of Natural Resources and Life Sciences, Vienna, Austria
| | - Philipp Andesner
- Department of Ecology, University of Innsbruck, Innsbruck, Austria
| | | | - Daniele Di Marino
- Department of Life and Environmental Sciences - New York-Marche Structural Biology Center (NY-MaSBiC), Polytechnic University of Marche, Ancona, Italy
| | | | | |
Collapse
|
46
|
Hutchings J, Stancheva VG, Brown NR, Cheung ACM, Miller EA, Zanetti G. Structure of the complete, membrane-assembled COPII coat reveals a complex interaction network. Nat Commun 2021; 12:2034. [PMID: 33795673 PMCID: PMC8016994 DOI: 10.1038/s41467-021-22110-6] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2020] [Accepted: 02/12/2021] [Indexed: 01/02/2023] Open
Abstract
COPII mediates Endoplasmic Reticulum to Golgi trafficking of thousands of cargoes. Five essential proteins assemble into a two-layer architecture, with the inner layer thought to regulate coat assembly and cargo recruitment, and the outer coat forming cages assumed to scaffold membrane curvature. Here we visualise the complete, membrane-assembled COPII coat by cryo-electron tomography and subtomogram averaging, revealing the full network of interactions within and between coat layers. We demonstrate the physiological importance of these interactions using genetic and biochemical approaches. Mutagenesis reveals that the inner coat alone can provide membrane remodelling function, with organisational input from the outer coat. These functional roles for the inner and outer coats significantly move away from the current paradigm, which posits membrane curvature derives primarily from the outer coat. We suggest these interactions collectively contribute to coat organisation and membrane curvature, providing a structural framework to understand regulatory mechanisms of COPII trafficking and secretion.
Collapse
Affiliation(s)
- Joshua Hutchings
- Institute of Structural and Molecular Biology, Birkbeck College, London, UK
- Division of Biological Sciences, University of California San Diego, La Jolla, CA, USA
| | | | - Nick R Brown
- Institute of Structural and Molecular Biology, Birkbeck College, London, UK
- The Francis Crick Institute, London, UK
| | - Alan C M Cheung
- Institute of Structural and Molecular Biology, Birkbeck College, London, UK
- School of Biochemistry, University of Bristol, Bristol, UK
| | | | - Giulia Zanetti
- Institute of Structural and Molecular Biology, Birkbeck College, London, UK.
| |
Collapse
|
47
|
Zhao B, Katuwawala A, Uversky VN, Kurgan L. IDPology of the living cell: intrinsic disorder in the subcellular compartments of the human cell. Cell Mol Life Sci 2021; 78:2371-2385. [PMID: 32997198 PMCID: PMC11071772 DOI: 10.1007/s00018-020-03654-0] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2020] [Revised: 09/09/2020] [Accepted: 09/22/2020] [Indexed: 12/11/2022]
Abstract
Intrinsic disorder can be found in all proteomes of all kingdoms of life and in viruses, being particularly prevalent in the eukaryotes. We conduct a comprehensive analysis of the intrinsic disorder in the human proteins while mapping them into 24 compartments of the human cell. In agreement with previous studies, we show that human proteins are significantly enriched in disorder relative to a generic protein set that represents the protein universe. In fact, the fraction of proteins with long disordered regions and the average protein-level disorder content in the human proteome are about 3 times higher than in the protein universe. Furthermore, levels of intrinsic disorder in the majority of human subcellular compartments significantly exceed the average disorder content in the protein universe. Relative to the overall amount of disorder in the human proteome, proteins localized in the nucleus and cytoskeleton have significantly increased amounts of disorder, measured by both high disorder content and presence of multiple long intrinsically disordered regions. We empirically demonstrate that, on average, human proteins are assigned to 2.3 subcellular compartments, with proteins localized to few subcellular compartments being more disordered than the proteins that are localized to many compartments. Functionally, the disordered proteins localized in the most disorder-enriched subcellular compartments are primarily responsible for interactions with nucleic acids and protein partners. This is the first-time disorder is comprehensively mapped into the human cell. Our observations add a missing piece to the puzzle of functional disorder and its organization inside the cell.
Collapse
Affiliation(s)
- Bi Zhao
- Department of Computer Science, Virginia Commonwealth University, 401 West Main Street, Room E4225, Richmond, VA, 23284, USA
| | - Akila Katuwawala
- Department of Computer Science, Virginia Commonwealth University, 401 West Main Street, Room E4225, Richmond, VA, 23284, USA
| | - Vladimir N Uversky
- Department of Molecular Medicine, USF Health Byrd Alzheimer's Research Institute, Morsani College of Medicine, University of South Florida, 12901 Bruce B. Downs Blvd. MDC07, Tampa, FL, 33612, USA.
- Laboratory of New Methods in Biology, Institute for Biological Instrumentation of the Russian Academy of Sciences, Federal Research Center "Pushchino Scientific Center for Biological Research of the Russian Academy of Sciences", Pushchino, Russia.
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, 401 West Main Street, Room E4225, Richmond, VA, 23284, USA.
| |
Collapse
|
48
|
Røgen P. Quantifying steric hindrance and topological obstruction to protein structure superposition. Algorithms Mol Biol 2021; 16:1. [PMID: 33639968 PMCID: PMC7913338 DOI: 10.1186/s13015-020-00180-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2018] [Accepted: 12/17/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In computational structural biology, structure comparison is fundamental for our understanding of proteins. Structure comparison is, e.g., algorithmically the starting point for computational studies of structural evolution and it guides our efforts to predict protein structures from their amino acid sequences. Most methods for structural alignment of protein structures optimize the distances between aligned and superimposed residue pairs, i.e., the distances traveled by the aligned and superimposed residues during linear interpolation. Considering such a linear interpolation, these methods do not differentiate if there is room for the interpolation, if it causes steric clashes, or more severely, if it changes the topology of the compared protein backbone curves. RESULTS To distinguish such cases, we analyze the linear interpolation between two aligned and superimposed backbones. We quantify the amount of steric clashes and find all self-intersections in a linear backbone interpolation. To determine if the self-intersections alter the protein's backbone curve significantly or not, we present a path-finding algorithm that checks if there exists a self-avoiding path in a neighborhood of the linear interpolation. A new path is constructed by altering the linear interpolation using a novel interpretation of Reidemeister moves from knot theory working on three-dimensional curves rather than on knot diagrams. Either the algorithm finds a self-avoiding path or it returns a smallest set of essential self-intersections. Each of these indicates a significant difference between the folds of the aligned protein structures. As expected, we find at least one essential self-intersection separating most unknotted structures from a knotted structure, and we find even larger motions in proteins connected by obstruction free linear interpolations. We also find examples of homologous proteins that are differently threaded, and we find many distinct folds connected by longer but simple deformations. TM-align is one of the most restrictive alignment programs. With standard parameters, it only aligns residues superimposed within 5 Ångström distance. We find 42165 topological obstructions between aligned parts in 142068 TM-alignments. Thus, this restrictive alignment procedure still allows topological dissimilarity of the aligned parts. CONCLUSIONS Based on the data we conclude that our program ProteinAlignmentObstruction provides significant additional information to alignment scores based solely on distances between aligned and superimposed residue pairs.
Collapse
|
49
|
Kondra S, Sarkar T, Raghavan V, Xu W. Development of a TSR-Based Method for Protein 3-D Structural Comparison With Its Applications to Protein Classification and Motif Discovery. Front Chem 2021; 8:602291. [PMID: 33520934 PMCID: PMC7838567 DOI: 10.3389/fchem.2020.602291] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2020] [Accepted: 12/14/2020] [Indexed: 11/24/2022] Open
Abstract
Development of protein 3-D structural comparison methods is important in understanding protein functions. At the same time, developing such a method is very challenging. In the last 40 years, ever since the development of the first automated structural method, ~200 papers were published using different representations of structures. The existing methods can be divided into five categories: sequence-, distance-, secondary structure-, geometry-based, and network-based structural comparisons. Each has its uniqueness, but also limitations. We have developed a novel method where the 3-D structure of a protein is modeled using the concept of Triangular Spatial Relationship (TSR), where triangles are constructed with the Cα atoms of a protein as vertices. Every triangle is represented using an integer, which we denote as “key,” A key is computed using the length, angle, and vertex labels based on a rule-based formula, which ensures assignment of the same key to identical TSRs across proteins. A structure is thereby represented by a vector of integers. Our method is able to accurately quantify similarity of structure or substructure by matching numbers of identical keys between two proteins. The uniqueness of our method includes: (i) a unique way to represent structures to avoid performing structural superimposition; (ii) use of triangles to represent substructures as it is the simplest primitive to capture shape; (iii) complex structure comparison is achieved by matching integers corresponding to multiple TSRs. Every substructure of one protein is compared to every other substructure in a different protein. The method is used in the studies of proteases and kinases because they play essential roles in cell signaling, and a majority of these constitute drug targets. The new motifs or substructures we identified specifically for proteases and kinases provide a deeper insight into their structural relations. Furthermore, the method provides a unique way to study protein conformational changes. In addition, the results from CATH and SCOP data sets clearly demonstrate that our method can distinguish alpha helices from beta pleated sheets and vice versa. Our method has the potential to be developed into a powerful tool for efficient structure-BLAST search and comparison, just as BLAST is for sequence search and alignment.
Collapse
Affiliation(s)
- Sarika Kondra
- The Center for Advanced Computer Studies, University of Louisiana at Lafayette, Lafayette, LA, United States
| | - Titli Sarkar
- The Center for Advanced Computer Studies, University of Louisiana at Lafayette, Lafayette, LA, United States
| | - Vijay Raghavan
- The Center for Advanced Computer Studies, University of Louisiana at Lafayette, Lafayette, LA, United States
| | - Wu Xu
- Department of Chemistry, University of Louisiana at Lafayette, Lafayette, LA, United States
| |
Collapse
|
50
|
Chang A, Jeske L, Ulbrich S, Hofmann J, Koblitz J, Schomburg I, Neumann-Schaal M, Jahn D, Schomburg D. BRENDA, the ELIXIR core data resource in 2021: new developments and updates. Nucleic Acids Res 2021; 49:D498-D508. [PMID: 33211880 PMCID: PMC7779020 DOI: 10.1093/nar/gkaa1025] [Citation(s) in RCA: 341] [Impact Index Per Article: 85.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2020] [Revised: 10/14/2020] [Accepted: 10/26/2020] [Indexed: 12/31/2022] Open
Abstract
The BRENDA enzyme database (https://www.brenda-enzymes.org), established in 1987, has evolved into the main collection of functional enzyme and metabolism data. In 2018, BRENDA was selected as an ELIXIR Core Data Resource. BRENDA provides reliable data, continuous curation and updates of classified enzymes, and the integration of newly discovered enzymes. The main part contains >5 million data for ∼90 000 enzymes from ∼13 000 organisms, manually extracted from ∼157 000 primary literature references, combined with information of text and data mining, data integration, and prediction algorithms. Supplements comprise disease-related data, protein sequences, 3D structures, genome annotations, ligand information, taxonomic, bibliographic, and kinetic data. BRENDA offers an easy access to enzyme information from quick to advanced searches, text- and structured-based queries for enzyme-ligand interactions, word maps, and visualization of enzyme data. The BRENDA Pathway Maps are completely revised and updated for an enhanced interactive and intuitive usability. The new design of the Enzyme Summary Page provides an improved access to each individual enzyme. A new protein structure 3D viewer was integrated. The prediction of the intracellular localization of eukaryotic enzymes has been implemented. The new EnzymeDetector combines BRENDA enzyme annotations with protein and genome databases for the detection of eukaryotic and prokaryotic enzymes.
Collapse
Affiliation(s)
- Antje Chang
- Technische Universität Braunschweig, Braunschweig Integrated Centre of Systems Biology (BRICS), Rebenring 56, 38106 Braunschweig, Germany
| | - Lisa Jeske
- Technische Universität Braunschweig, Braunschweig Integrated Centre of Systems Biology (BRICS), Rebenring 56, 38106 Braunschweig, Germany
| | - Sandra Ulbrich
- Technische Universität Braunschweig, Braunschweig Integrated Centre of Systems Biology (BRICS), Rebenring 56, 38106 Braunschweig, Germany
| | - Julia Hofmann
- Technische Universität Braunschweig, Braunschweig Integrated Centre of Systems Biology (BRICS), Rebenring 56, 38106 Braunschweig, Germany
| | - Julia Koblitz
- Leibniz Institute DSMZ-German Collection of Microorganisms and Cell Cultures, Inhoffenstrasse 7 B, 38124 Braunschweig, Germany
| | - Ida Schomburg
- Technische Universität Braunschweig, Braunschweig Integrated Centre of Systems Biology (BRICS), Rebenring 56, 38106 Braunschweig, Germany
| | - Meina Neumann-Schaal
- Leibniz Institute DSMZ-German Collection of Microorganisms and Cell Cultures, Inhoffenstrasse 7 B, 38124 Braunschweig, Germany
| | - Dieter Jahn
- Technische Universität Braunschweig, Braunschweig Integrated Centre of Systems Biology (BRICS), Rebenring 56, 38106 Braunschweig, Germany
| | - Dietmar Schomburg
- Technische Universität Braunschweig, Braunschweig Integrated Centre of Systems Biology (BRICS), Rebenring 56, 38106 Braunschweig, Germany
| |
Collapse
|