1
|
Wu S, Guo JT. Improved prediction of DNA and RNA binding proteins with deep learning models. Brief Bioinform 2024; 25:bbae285. [PMID: 38856168 PMCID: PMC11163377 DOI: 10.1093/bib/bbae285] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2024] [Revised: 05/20/2024] [Accepted: 05/31/2024] [Indexed: 06/11/2024] Open
Abstract
Nucleic acid-binding proteins (NABPs), including DNA-binding proteins (DBPs) and RNA-binding proteins (RBPs), play important roles in essential biological processes. To facilitate functional annotation and accurate prediction of different types of NABPs, many machine learning-based computational approaches have been developed. However, the datasets used for training and testing as well as the prediction scopes in these studies have limited their applications. In this paper, we developed new strategies to overcome these limitations by generating more accurate and robust datasets and developing deep learning-based methods including both hierarchical and multi-class approaches to predict the types of NABPs for any given protein. The deep learning models employ two layers of convolutional neural network and one layer of long short-term memory. Our approaches outperform existing DBP and RBP predictors with a balanced prediction between DBPs and RBPs, and are more practically useful in identifying novel NABPs. The multi-class approach greatly improves the prediction accuracy of DBPs and RBPs, especially for the DBPs with ~12% improvement. Moreover, we explored the prediction accuracy of single-stranded DNA binding proteins and their effect on the overall prediction accuracy of NABP predictions.
Collapse
Affiliation(s)
- Siwen Wu
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC 28223, United States
| | - Jun-tao Guo
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC 28223, United States
| |
Collapse
|
2
|
Harish A. Protein structures unravel the signatures and patterns of deep time evolution. QRB DISCOVERY 2024; 5:e3. [PMID: 38616890 PMCID: PMC11016368 DOI: 10.1017/qrd.2024.4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Revised: 11/13/2023] [Accepted: 12/12/2023] [Indexed: 04/16/2024] Open
Abstract
The formulation and testing of hypotheses using 'big biology data' often lie at the interface of computational biology and structural biology. The Protein Data Bank (PDB), which was established about 50 years ago, catalogs three-dimensional (3D) shapes of organic macromolecules and showcases a structural view of biology. The comparative analysis of the structures of homologs, particularly of proteins, from different species has significantly improved the in-depth analyses of molecular and cell biological questions. In addition, computational tools that were developed to analyze the 'protein universe' are providing the means for efficient resolution of longstanding debates in cell and molecular evolution. In celebrating the golden jubilee of the PDB, much has been written about the transformative impact of PDB on a broad range of fields of scientific inquiry and how structural biology transformed the study of the fundamental processes of life. Yet, the transforming influence of PDB on one field of inquiry of fundamental interest-the reconstruction of the distant biological past-has gone almost unnoticed. Here, I discuss the recent advances to highlight how insights and tools of structural biology are bearing on the data required for the empirical resolution of vigorously debated and apparently contradicting hypotheses in evolutionary biology. Specifically, I show that evolutionary characters defined by protein structure are superior compared to conventional sequence characters for reliable, data-driven resolution of competing hypotheses about the origins of the major clades of life and evolutionary relationship among those clades. Since the better quality data unequivocally support two primary domains of life, it is imperative that the primary classification of life be revised accordingly.
Collapse
|
3
|
Kawamukai H, Takishita S, Shimizu K, Kohda D, Ishimori K, Saio T. Conformational Distribution of a Multidomain Protein Measured by Single-Pair Small-Angle X-ray Scattering. J Phys Chem Lett 2024; 15:744-750. [PMID: 38221741 PMCID: PMC10823528 DOI: 10.1021/acs.jpclett.3c02600] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 12/20/2023] [Accepted: 12/20/2023] [Indexed: 01/16/2024]
Abstract
The difficulty in evaluating the conformational distribution of proteins in solution often hinders mechanistic insights. One possible strategy for visualizing conformational distribution is distance distribution measurement by single-pair small-angle X-ray scattering (SAXS), in which the scattering interference from only a specific pair of atoms in the target molecule is extracted. Despite this promising concept, with few applications in synthetic small molecules and DNA, technical difficulties have prevented its application in protein conformational studies. This study used a synthetic tag to fix the lanthanide ion at desired sites on the protein and used single-pair SAXS with contrast matching to evaluate the conformational distribution of the multidomain protein enzyme MurD. These data highlighted the broad conformational and ligand-driven distribution shifts of MurD in solution. This study proposes an important strategy in solution structural biology that targets dynamic proteins, including multidomain and intrinsically disordered proteins.
Collapse
Affiliation(s)
- Honoka Kawamukai
- Graduate
School of Chemical Sciences and Engineering, Hokkaido University, Sapporo 060-8628, Japan
- Graduate
School of Medical Sciences, Tokushima University, Tokushima 770-8503, Japan
| | - Shumpei Takishita
- Graduate
School of Chemical Sciences and Engineering, Hokkaido University, Sapporo 060-8628, Japan
| | - Kazumi Shimizu
- Faculty
of Education and Integrated Arts and Sciences, Waseda University, Tokyo 169-8050, Japan
| | - Daisuke Kohda
- Division
of Structural Biology, Medical Institute of Bioregulation, Kyushu University, Fukuoka 812-8582, Japan
| | - Koichiro Ishimori
- Graduate
School of Chemical Sciences and Engineering, Hokkaido University, Sapporo 060-8628, Japan
- Department
of Chemistry, Faculty of Science, Hokkaido
University, Sapporo 060-0810, Japan
| | - Tomohide Saio
- Graduate
School of Medical Sciences, Tokushima University, Tokushima 770-8503, Japan
- Institute
of Advanced Medical Sciences, Tokushima
University, Tokushima 770-8503, Japan
- Fujii
Memorial Institute of Medical Sciences, Institute of Advanced Medical
Sciences, Tokushima University, Tokushima 770-8503, Japan
| |
Collapse
|
4
|
Hsu STD. Folding and functions of knotted proteins. Curr Opin Struct Biol 2023; 83:102709. [PMID: 37778185 DOI: 10.1016/j.sbi.2023.102709] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2023] [Revised: 09/02/2023] [Accepted: 09/05/2023] [Indexed: 10/03/2023]
Abstract
Topologically knotted proteins have entangled structural elements within their native structures that cannot be disentangled simply by pulling from the N- and C-termini. Systematic surveys have identified different types of knotted protein structures, constituting as much as 1% of the total entries within the Protein Data Bank. Many knotted proteins rely on their knotted structural elements to carry out evolutionarily conserved biological functions. Being knotted may also provide mechanical stability to withstand unfolding-coupled proteolysis. Reconfiguring a knotted protein topology by circular permutation or cyclization provides insights into the importance of being knotted in the context of folding and functions. With the explosion of predicted protein structures by artificial intelligence, we are now entering a new era of exploring the entangled protein universe.
Collapse
Affiliation(s)
- Shang-Te Danny Hsu
- Institute of Biological Chemistry, Academia Sinica, Taipei 11529, Taiwan; Institute of Biochemical Sciences, National Taiwan University, Taipei 10617, Taiwan; International Institute for Sustainability with Knotted Chiral Meta Matter (WPI-SKCM(2)), Hiroshima University, Higashi-Hiroshima, Hiroshima 739-8526, Japan.
| |
Collapse
|
5
|
Bordin N, Lau AM, Orengo C. Large-scale clustering of AlphaFold2 3D models shines light on the structure and function of proteins. Mol Cell 2023; 83:3950-3952. [PMID: 37977115 DOI: 10.1016/j.molcel.2023.10.039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Revised: 10/27/2023] [Accepted: 10/27/2023] [Indexed: 11/19/2023]
Abstract
Two recent studies exploited ultra-fast structural aligners and deep-learning approaches to cluster the protein structure space in the AlphaFold Database. Barrio-Hernandez et al.1 and Durairaj et al.2 uncovered fascinating new protein functions and structural features previously unknown.
Collapse
Affiliation(s)
- Nicola Bordin
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK.
| | - Andy M Lau
- Department of Computer Science, University College London, London WC1E 6BT, UK
| | - Christine Orengo
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK.
| |
Collapse
|
6
|
Durairaj J, Waterhouse AM, Mets T, Brodiazhenko T, Abdullah M, Studer G, Tauriello G, Akdel M, Andreeva A, Bateman A, Tenson T, Hauryliuk V, Schwede T, Pereira J. Uncovering new families and folds in the natural protein universe. Nature 2023; 622:646-653. [PMID: 37704037 PMCID: PMC10584680 DOI: 10.1038/s41586-023-06622-3] [Citation(s) in RCA: 17] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Accepted: 09/07/2023] [Indexed: 09/15/2023]
Abstract
We are now entering a new era in protein sequence and structure annotation, with hundreds of millions of predicted protein structures made available through the AlphaFold database1. These models cover nearly all proteins that are known, including those challenging to annotate for function or putative biological role using standard homology-based approaches. In this study, we examine the extent to which the AlphaFold database has structurally illuminated this 'dark matter' of the natural protein universe at high predicted accuracy. We further describe the protein diversity that these models cover as an annotated interactive sequence similarity network, accessible at https://uniprot3d.org/atlas/AFDB90v4 . By searching for novelties from sequence, structure and semantic perspectives, we uncovered the β-flower fold, added several protein families to Pfam database2 and experimentally demonstrated that one of these belongs to a new superfamily of translation-targeting toxin-antitoxin systems, TumE-TumA. This work underscores the value of large-scale efforts in identifying, annotating and prioritizing new protein families. By leveraging the recent deep learning revolution in protein bioinformatics, we can now shed light into uncharted areas of the protein universe at an unprecedented scale, paving the way to innovations in life sciences and biotechnology.
Collapse
Affiliation(s)
- Janani Durairaj
- Biozentrum, University of Basel, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, University of Basel, Basel, Switzerland
| | - Andrew M Waterhouse
- Biozentrum, University of Basel, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, University of Basel, Basel, Switzerland
| | - Toomas Mets
- Institute of Technology, University of Tartu, Tartu, Estonia
- Department of Experimental Medical Science, Lund University, Lund, Sweden
| | | | - Minhal Abdullah
- Institute of Technology, University of Tartu, Tartu, Estonia
- Department of Experimental Medical Science, Lund University, Lund, Sweden
| | - Gabriel Studer
- Biozentrum, University of Basel, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, University of Basel, Basel, Switzerland
| | - Gerardo Tauriello
- Biozentrum, University of Basel, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, University of Basel, Basel, Switzerland
| | | | - Antonina Andreeva
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Alex Bateman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Tanel Tenson
- Institute of Technology, University of Tartu, Tartu, Estonia
| | - Vasili Hauryliuk
- Institute of Technology, University of Tartu, Tartu, Estonia
- Department of Experimental Medical Science, Lund University, Lund, Sweden
- Science for Life Laboratory, Lund, Sweden
- Virus Centre, Lund University, Lund, Sweden
| | - Torsten Schwede
- Biozentrum, University of Basel, Basel, Switzerland.
- SIB Swiss Institute of Bioinformatics, University of Basel, Basel, Switzerland.
| | - Joana Pereira
- Biozentrum, University of Basel, Basel, Switzerland.
- SIB Swiss Institute of Bioinformatics, University of Basel, Basel, Switzerland.
| |
Collapse
|
7
|
Gollapalli P, Rudrappa S, Kumar V, Santosh Kumar HS. Domain Architecture Based Methods for Comparative Functional Genomics Toward Therapeutic Drug Target Discovery. J Mol Evol 2023; 91:598-615. [PMID: 37626222 DOI: 10.1007/s00239-023-10129-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2022] [Accepted: 08/06/2023] [Indexed: 08/27/2023]
Abstract
Genes duplicate, mutate, recombine, fuse or fission to produce new genes, or when genes are formed from de novo, novel functions arise during evolution. Researchers have tried to quantify the causes of these molecular diversification processes to know how these genes increase molecular complexity over a period of time, for instance protein domain organization. In contrast to global sequence similarity, protein domain architectures can capture key structural and functional characteristics, making them better proxies for describing functional equivalence. In Prokaryotes and eukaryotes it has proven that, domain designs are retained over significant evolutionary distances. Protein domain architectures are now being utilized to categorize and distinguish evolutionarily related proteins and find homologs among species that are evolutionarily distant from one another. Additionally, structural information stored in domain structures has accelerated homology identification and sequence search methods. Tools for functional protein annotation have been developed to discover, protein domain content, domain order, domain recurrence, and domain position as all these contribute to the prediction of protein functional accuracy. In this review, an attempt is made to summarise facts and speculations regarding the use of protein domain architecture and modularity to identify possible therapeutic targets among cellular activities based on the understanding their linked biological processes.
Collapse
Affiliation(s)
- Pavan Gollapalli
- Center for Bioinformatics and Biostatistics, Nitte (Deemed to be University), Mangalore, Karnataka, 575018, India
| | - Sushmitha Rudrappa
- Department of Biotechnology and Bioinformatics, Jnana Sahyadri Campus, Kuvempu University, Shankaraghatta, Shivamogga, Karnataka, 577451, India
| | - Vadlapudi Kumar
- Department of Biochemistry, Davangere University, Shivagangothri, Davangere, Karnataka, 577007, India
| | - Hulikal Shivashankara Santosh Kumar
- Department of Biotechnology and Bioinformatics, Jnana Sahyadri Campus, Kuvempu University, Shankaraghatta, Shivamogga, Karnataka, 577451, India.
| |
Collapse
|
8
|
Thermophilic Carboxylesterases from Hydrothermal Vents of the Volcanic Island of Ischia Active on Synthetic and Biobased Polymers and Mycotoxins. Appl Environ Microbiol 2023; 89:e0170422. [PMID: 36719236 PMCID: PMC9972953 DOI: 10.1128/aem.01704-22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
Hydrothermal vents are geographically widespread and host microorganisms with robust enzymes useful in various industrial applications. We examined microbial communities and carboxylesterases of two terrestrial hydrothermal vents of the volcanic island of Ischia (Italy) predominantly composed of Firmicutes, Proteobacteria, and Bacteroidota. High-temperature enrichment cultures with the polyester plastics polyhydroxybutyrate and polylactic acid (PLA) resulted in an increase of Thermus and Geobacillus species and to some extent Fontimonas and Schleiferia species. The screening at 37 to 70°C of metagenomic fosmid libraries from above enrichment cultures identified three hydrolases (IS10, IS11, and IS12), all derived from yet-uncultured Chloroflexota and showing low sequence identity (33 to 56%) to characterized enzymes. Enzymes expressed in Escherichia coli exhibited maximal esterase activity at 70 to 90°C, with IS11 showing the highest thermostability (90% activity after 20-min incubation at 80°C). IS10 and IS12 were highly substrate promiscuous and hydrolyzed all 51 monoester substrates tested. Enzymes were active with PLA, polyethylene terephthalate model substrate, and mycotoxin T-2 (IS12). IS10 and IS12 had a classical α/β-hydrolase core domain with a serine hydrolase catalytic triad (Ser155, His280, and Asp250) in their hydrophobic active sites. The crystal structure of IS11 resolved at 2.92 Å revealed the presence of a N-terminal β-lactamase-like domain and C-terminal lipocalin domain. The catalytic cleft of IS11 included catalytic Ser68, Lys71, Tyr160, and Asn162, whereas the lipocalin domain enclosed the catalytic cleft like a lid and contributed to substrate binding. Our study identified novel thermotolerant carboxylesterases with a broad substrate range, including polyesters and mycotoxins, for potential applications in biotechnology. IMPORTANCE High-temperature-active microbial enzymes are important biocatalysts for many industrial applications, including recycling of synthetic and biobased polyesters increasingly used in textiles, fibers, coatings and adhesives. Here, we identified three novel thermotolerant carboxylesterases (IS10, IS11, and IS12) from high-temperature enrichment cultures from Ischia hydrothermal vents and incubated with biobased polymers. The identified metagenomic enzymes originated from uncultured Chloroflexota and showed low sequence similarity to known carboxylesterases. Active sites of IS10 and IS12 had the largest effective volumes among the characterized prokaryotic carboxylesterases and exhibited high substrate promiscuity, including hydrolysis of polyesters and mycotoxin T-2 (IS12). Though less promiscuous than IS10 and IS12, IS11 had a higher thermostability with a high temperature optimum (80 to 90°C) for activity and hydrolyzed polyesters, and its crystal structure revealed an unusual lipocalin domain likely involved in substrate binding. The polyesterase activity of these enzymes makes them attractive candidates for further optimization and potential application in plastics recycling.
Collapse
|
9
|
Sykes J, Holland BR, Charleston MA. A review of visualisations of protein fold networks and their relationship with sequence and function. Biol Rev Camb Philos Soc 2023; 98:243-262. [PMID: 36210328 PMCID: PMC10092621 DOI: 10.1111/brv.12905] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2021] [Revised: 09/08/2022] [Accepted: 09/09/2022] [Indexed: 01/12/2023]
Abstract
Proteins form arguably the most significant link between genotype and phenotype. Understanding the relationship between protein sequence and structure, and applying this knowledge to predict function, is difficult. One way to investigate these relationships is by considering the space of protein folds and how one might move from fold to fold through similarity, or potential evolutionary relationships. The many individual characterisations of fold space presented in the literature can tell us a lot about how well the current Protein Data Bank represents protein fold space, how convergence and divergence may affect protein evolution, how proteins affect the whole of which they are part, and how proteins themselves function. A synthesis of these different approaches and viewpoints seems the most likely way to further our knowledge of protein structure evolution and thus, facilitate improved protein structure design and prediction.
Collapse
Affiliation(s)
- Janan Sykes
- School of Natural Sciences, University of Tasmania, Private Bag 37, Hobart, Tasmania, 7001, Australia
| | - Barbara R Holland
- School of Natural Sciences, University of Tasmania, Private Bag 37, Hobart, Tasmania, 7001, Australia
| | - Michael A Charleston
- School of Natural Sciences, University of Tasmania, Private Bag 37, Hobart, Tasmania, 7001, Australia
| |
Collapse
|
10
|
Abstract
Mechanisms of emergence and divergence of protein folds pose central questions in biological sciences. Incremental mutation and stepwise adaptation explain relationships between topologically similar protein folds. However, the universe of folds is diverse and riotous, suggesting more potent and creative forces are at play. Sequence and structure similarity are observed between distinct folds, indicating that proteins with distinct folds may share common ancestry. We found evidence of common ancestry between three distinct β-barrel folds: Scr kinase family homology (SH3), oligonucleotide/oligosaccharide-binding (OB), and cradle loop barrel (CLB). The data suggest a mechanism of fold evolution that interconverts SH3, OB, and CLB. This mechanism, which we call creative destruction, can be generalized to explain many examples of fold evolution including circular permutation. In creative destruction, an open reading frame duplicates or otherwise merges with another to produce a fused polypeptide. A merger forces two ancestral domains into a new sequence and spatial context. The fused polypeptide can explore folding landscapes that are inaccessible to either of the independent ancestral domains. However, the folding landscapes of the fused polypeptide are not fully independent of those of the ancestral domains. Creative destruction is thus partially conservative; a daughter fold inherits some motifs from ancestral folds. After merger and refolding, adaptive processes such as mutation and loss of extraneous segments optimize the new daughter fold. This model has application in disease states characterized by genetic instability. Fused proteins observed in cancer cells are likely to experience remodeled folding landscapes and realize altered folds, conferring new or altered functions.
Collapse
|
11
|
Sicilia C, Corral-Lugo A, Smialowski P, McConnell MJ, Martín-Galiano AJ. Unsupervised Machine Learning Organization of the Functional Dark Proteome of Gram-Negative "Superbugs": Six Protein Clusters Amenable for Distinct Scientific Applications. ACS OMEGA 2022; 7:46131-46145. [PMID: 36570227 PMCID: PMC9774411 DOI: 10.1021/acsomega.2c04076] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Accepted: 10/06/2022] [Indexed: 06/17/2023]
Abstract
Uncharacterized proteins have been underutilized as targets for the development of novel therapeutics for difficult-to-treat bacterial infections. To facilitate the exploration of these proteins, 2819 predicted, uncharacterized proteins (19.1% of the total) from reference strains of multidrug Acinetobacter baumannii, Klebsiella pneumoniae, and Pseudomonas aeruginosa species were organized using an unsupervised k-means machine learning algorithm. Classification using normalized values for protein length, pI, hydrophobicity, degree of conservation, structural disorder, and %AT of the coding gene rendered six natural clusters. Cluster proteins showed different trends regarding operon membership, expression, presence of unknown function domains, and interactomic relevance. Clusters 2, 4, and 5 were enriched with highly disordered proteins, nonworkable membrane proteins, and likely spurious proteins, respectively. Clusters 1, 3, and 6 showed closer distances to known antigens, antibiotic targets, and virulence factors. Up to 21.8% of proteins in these clusters were structurally covered by modeling, which allowed assessment of druggability and discontinuous B-cell epitopes. Five proteins (4 in Cluster 1) were potential druggable targets for antibiotherapy. Eighteen proteins (11 in Cluster 6) were strong B-cell and T-cell immunogen candidates for vaccine development. Conclusively, we provide a feature-based schema to fractionate the functional dark proteome of critical pathogens for fundamental and biomedical purposes.
Collapse
Affiliation(s)
- Carlos Sicilia
- Intrahospital
Infections Laboratory, National Centre for Microbiology, Instituto de Salud Carlos III (ISCIII), Majadahonda, 28220 Madrid, Spain
| | - Andrés Corral-Lugo
- Intrahospital
Infections Laboratory, National Centre for Microbiology, Instituto de Salud Carlos III (ISCIII), Majadahonda, 28220 Madrid, Spain
| | - Pawel Smialowski
- Core
Facility Bioinformatics, Biomedical Center Munich, Faculty of Medicine, Ludwig Maximilians Universität München, Munich 80539, Germany
- Institute
of Stem Cell Research, Helmholtz Center Munich, Planegg-Martinsried 82152, Germany
| | - Michael J. McConnell
- Intrahospital
Infections Laboratory, National Centre for Microbiology, Instituto de Salud Carlos III (ISCIII), Majadahonda, 28220 Madrid, Spain
| | - Antonio J. Martín-Galiano
- Intrahospital
Infections Laboratory, National Centre for Microbiology, Instituto de Salud Carlos III (ISCIII), Majadahonda, 28220 Madrid, Spain
| |
Collapse
|
12
|
The Modular Architecture of Metallothioneins Facilitates Domain Rearrangements and Contributes to Their Evolvability in Metal-Accumulating Mollusks. Int J Mol Sci 2022; 23:ijms232415824. [PMID: 36555472 PMCID: PMC9781358 DOI: 10.3390/ijms232415824] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2022] [Revised: 12/05/2022] [Accepted: 12/10/2022] [Indexed: 12/15/2022] Open
Abstract
Protein domains are independent structural and functional modules that can rearrange to create new proteins. While the evolution of multidomain proteins through the shuffling of different preexisting domains has been well documented, the evolution of domain repeat proteins and the origin of new domains are less understood. Metallothioneins (MTs) provide a good case study considering that they consist of metal-binding domain repeats, some of them with a likely de novo origin. In mollusks, for instance, most MTs are bidomain proteins that arose by lineage-specific rearrangements between six putative domains: α, β1, β2, β3, γ and δ. Some domains have been characterized in bivalves and gastropods, but nothing is known about the MTs and their domains of other Mollusca classes. To fill this gap, we investigated the metal-binding features of NpoMT1 of Nautilus pompilius (Cephalopoda class) and FcaMT1 of Falcidens caudatus (Caudofoveata class). Interestingly, whereas NpoMT1 consists of α and β1 domains and has a prototypical Cd2+ preference, FcaMT1 has a singular preference for Zn2+ ions and a distinct domain composition, including a new Caudofoveata-specific δ domain. Overall, our results suggest that the modular architecture of MTs has contributed to MT evolution during mollusk diversification, and exemplify how modularity increases MT evolvability.
Collapse
|
13
|
Mohanty P, Kapoor U, Sundaravadivelu Devarajan D, Phan TM, Rizuan A, Mittal J. Principles Governing the Phase Separation of Multidomain Proteins. Biochemistry 2022; 61:2443-2455. [PMID: 35802394 PMCID: PMC9669140 DOI: 10.1021/acs.biochem.2c00210] [Citation(s) in RCA: 37] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
A variety of membraneless organelles, often termed "biological condensates", play an important role in the regulation of cellular processes such as gene transcription, translation, and protein quality control. On the basis of experimental and theoretical investigations, liquid-liquid phase separation (LLPS) has been proposed as a possible mechanism for the origin of biological condensates. LLPS requires multivalent macromolecules that template the formation of long-range, intermolecular interaction networks and results in the formation of condensates with defined composition and material properties. Multivalent interactions driving LLPS exhibit a wide range of modes from highly stereospecific to nonspecific and involve both folded and disordered regions. Multidomain proteins serve as suitable macromolecules for promoting phase separation and achieving disparate functions due to their potential for multivalent interactions and regulation. Here, we aim to highlight the influence of the domain architecture and interdomain interactions on the phase separation of multidomain protein condensates. First, the general principles underlying these interactions are illustrated on the basis of examples of multidomain proteins that are predominantly associated with nucleic acid binding and protein quality control and contain both folded and disordered regions. Next, the examples showcase how LLPS properties of folded and disordered regions can be leveraged to engineer multidomain constructs that form condensates with the desired assembly and functional properties. Finally, we highlight the need for improvements in coarse-grained computational models that can provide molecular-level insights into multidomain protein condensates in conjunction with experimental efforts.
Collapse
Affiliation(s)
- Priyesh Mohanty
- Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, TX 77843
| | - Utkarsh Kapoor
- Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, TX 77843
| | | | - Tien Minh Phan
- Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, TX 77843
| | - Azamat Rizuan
- Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, TX 77843
| | - Jeetain Mittal
- Artie McFerrin Department of Chemical Engineering, Texas A&M University, College Station, TX 77843
| |
Collapse
|
14
|
Guo JT, Malik F. Single-Stranded DNA Binding Proteins and Their Identification Using Machine Learning-Based Approaches. Biomolecules 2022; 12:biom12091187. [PMID: 36139026 PMCID: PMC9496475 DOI: 10.3390/biom12091187] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Revised: 08/11/2022] [Accepted: 08/24/2022] [Indexed: 11/25/2022] Open
Abstract
Single-stranded DNA (ssDNA) binding proteins (SSBs) are critical in maintaining genome stability by protecting the transient existence of ssDNA from damage during essential biological processes, such as DNA replication and gene transcription. The single-stranded region of telomeres also requires protection by ssDNA binding proteins from being attacked in case it is wrongly recognized as an anomaly. In addition to their critical roles in genome stability and integrity, it has been demonstrated that ssDNA and SSB-ssDNA interactions play critical roles in transcriptional regulation in all three domains of life and viruses. In this review, we present our current knowledge of the structure and function of SSBs and the structural features for SSB binding specificity. We then discuss the machine learning-based approaches that have been developed for the prediction of SSBs from double-stranded DNA (dsDNA) binding proteins (DSBs).
Collapse
|
15
|
Romei M, Sapriel G, Imbert P, Jamay T, Chomilier J, Lecointre G, Carpentier M. Protein folds as synapomorphies of the tree of life. Evolution 2022; 76:1706-1719. [PMID: 35765784 PMCID: PMC9541633 DOI: 10.1111/evo.14550] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2021] [Revised: 05/17/2022] [Accepted: 05/31/2022] [Indexed: 01/22/2023]
Abstract
Several studies showed that folds (topology of protein secondary structures) distribution in proteomes may be a global proxy to build phylogeny. Then, some folds should be synapomorphies (derived characters exclusively shared among taxa). However, previous studies used methods that did not allow synapomorphy identification, which requires congruence analysis of folds as individual characters. Here, we map SCOP folds onto a sample of 210 species across the tree of life (TOL). Congruence is assessed using retention index of each fold for the TOL, and principal component analysis for deeper branches. Using a bicluster mapping approach, we define synapomorphic blocks of folds (SBF) sharing similar presence/absence patterns. Among the 1232 folds, 20% are universally present in our TOL, whereas 54% are reliable synapomorphies. These results are similar with CATH and ECOD databases. Eukaryotes are characterized by a large number of them, and several SBFs clearly support nested eukaryotic clades (divergence times from 1100 to 380 mya). Although clearly separated, the three superkingdoms reveal a strong mosaic pattern. This pattern is consistent with the dual origin of eukaryotes and witness secondary endosymbiosis in their phothosynthetic clades. Our study unveils direct analysis of folds synapomorphies as key characters to unravel evolutionary history of species.
Collapse
Affiliation(s)
- Martin Romei
- Institut Systématique Evolution Biodiversité (ISYEB UMR 7205)Sorbonne Université, MNHN, CNRS, EPHE, UAParisFrance,IMPMC (UMR 7590), BiBiP, Sorbonne Université, CNRS, MNHNParisFrance
| | - Guillaume Sapriel
- Institut Systématique Evolution Biodiversité (ISYEB UMR 7205)Sorbonne Université, MNHN, CNRS, EPHE, UAParisFrance,UFR des sciences de la santéUniversité Versailles‐St‐QuentinVersaillesFrance
| | - Pierre Imbert
- Institut Systématique Evolution Biodiversité (ISYEB UMR 7205)Sorbonne Université, MNHN, CNRS, EPHE, UAParisFrance
| | - Théo Jamay
- Institut Systématique Evolution Biodiversité (ISYEB UMR 7205)Sorbonne Université, MNHN, CNRS, EPHE, UAParisFrance
| | | | - Guillaume Lecointre
- Institut Systématique Evolution Biodiversité (ISYEB UMR 7205)Sorbonne Université, MNHN, CNRS, EPHE, UAParisFrance
| | - Mathilde Carpentier
- Institut Systématique Evolution Biodiversité (ISYEB UMR 7205)Sorbonne Université, MNHN, CNRS, EPHE, UAParisFrance
| |
Collapse
|
16
|
Charles T, Moss DL, Bhat P, Moore PW, Kummer NA, Bhattacharya A, Landry SJ, Mettu RR. CD4+ T-Cell Epitope Prediction by Combined Analysis of Antigen Conformational Flexibility and Peptide-MHCII Binding Affinity. Biochemistry 2022; 61:1585-1599. [PMID: 35834502 PMCID: PMC9352311 DOI: 10.1021/acs.biochem.2c00237] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
![]()
Antigen processing in the class II MHC pathway depends
on conventional
proteolytic enzymes, potentially acting on antigens in native-like
conformational states. CD4+ epitope dominance arises from a competition
among antigen folding, proteolysis, and MHCII binding. Protease-sensitive
sites, linear antibody epitopes, and CD4+ T-cell epitopes were mapped
in plague vaccine candidate F1-V to evaluate the various contributions
to CD4+ epitope dominance. Using X-ray crystal structures, antigen
processing likelihood (APL) predicts CD4+ epitopes with significant
accuracy for F1-V without considering peptide-MHCII binding affinity.
We also show that APL achieves excellent performance over two benchmark
antigen sets. The profiles of conformational flexibility derived from
the X-ray crystal structures of the F1-V proteins, Caf1 and LcrV,
were similar to the biochemical profiles of linear antibody epitope
reactivity and protease sensitivity, suggesting that the role of structure
in proteolysis was captured by the analysis of the crystal structures.
The patterns of CD4+ T-cell epitope dominance in C57BL/6, CBA, and
BALB/c mice were compared to epitope predictions based on APL, MHCII
binding, or both. For a sample of 13 diverse antigens, the accuracy
of epitope prediction by the combination of APL and I-Ab-MHCII-peptide affinity reached 36%. When MHCII allele specificity
was also diverse, such as in human immunity, prediction of dominant
epitopes by APL alone reached 42% when using a stringent scoring threshold.
Because dominant CD4+ epitopes tend to occur in conformationally stable
antigen domains, crystal structures typically are available for analysis
by APL, and thus, the requirement for a crystal structure is not a
severe limitation.
Collapse
Affiliation(s)
- Tysheena Charles
- Department of Biochemistry and Molecular Biology, Tulane University School of Medicine, New Orleans, Louisiana 70112, United States
| | - Daniel L Moss
- Department of Biochemistry and Molecular Biology, Tulane University School of Medicine, New Orleans, Louisiana 70112, United States
| | - Pawan Bhat
- Department of Biochemistry and Molecular Biology, Tulane University School of Medicine, New Orleans, Louisiana 70112, United States
| | - Peyton W Moore
- Department of Biochemistry and Molecular Biology, Tulane University School of Medicine, New Orleans, Louisiana 70112, United States
| | - Nicholas A Kummer
- Department of Biochemistry and Molecular Biology, Tulane University School of Medicine, New Orleans, Louisiana 70112, United States
| | - Avik Bhattacharya
- Department of Computer Science, Tulane University, New Orleans, Louisiana 70118, United States
| | - Samuel J Landry
- Department of Biochemistry and Molecular Biology, Tulane University School of Medicine, New Orleans, Louisiana 70112, United States
| | - Ramgopal R Mettu
- Department of Computer Science, Tulane University, New Orleans, Louisiana 70118, United States
| |
Collapse
|
17
|
Casado-Combreras MÁ, Rivero-Rodríguez F, Elena-Real CA, Molodenskiy D, Díaz-Quintana A, Martinho M, Gerbaud G, González-Arzola K, Velázquez-Campoy A, Svergun D, Belle V, De la Rosa MA, Díaz-Moreno I. PP2A is activated by cytochrome c upon formation of a diffuse encounter complex with SET/TAF-Iβ. Comput Struct Biotechnol J 2022; 20:3695-3707. [PMID: 35891793 PMCID: PMC9293736 DOI: 10.1016/j.csbj.2022.07.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2022] [Revised: 07/04/2022] [Accepted: 07/04/2022] [Indexed: 11/25/2022] Open
Abstract
Intrinsic protein flexibility is of overwhelming relevance for intermolecular recognition and adaptability of highly dynamic ensemble of complexes, and the phenomenon is essential for the understanding of numerous biological processes. These conformational ensembles-encounter complexes-lack a unique organization, which prevents the determination of well-defined high resolution structures. This is the case for complexes involving the oncoprotein SET/template-activating factor-Iβ (SET/TAF-Iβ), a histone chaperone whose functions and interactions are significantly affected by its intrinsic structural plasticity. Besides its role in chromatin remodeling, SET/TAF-Iβ is an inhibitor of protein phosphatase 2A (PP2A), which is a key phosphatase counteracting transcription and signaling events controlling the activity of DNA damage response (DDR) mediators. During DDR, SET/TAF-Iβ is sequestered by cytochrome c (Cc) upon migration of the hemeprotein from mitochondria to the cell nucleus. Here, we report that the nuclear SET/TAF-Iβ:Cc polyconformational ensemble is able to activate PP2A. In particular, the N-end folded, globular region of SET/TAF-Iβ (a.k.a. SET/TAF-Iβ ΔC)-which exhibits an unexpected, intrinsically highly dynamic behavior-is sufficient to be recognized by Cc in a diffuse encounter manner. Cc-mediated blocking of PP2A inhibition is deciphered using an integrated structural and computational approach, combining small-angle X-ray scattering, electron paramagnetic resonance, nuclear magnetic resonance, calorimetry and molecular dynamics simulations.
Collapse
Key Words
- ANP32B, Acidic leucine-rich nuclear phosphoprotein family member B
- BTFA, 3-bromo-1,1,1-trifluoroacetone
- CD, Circular dichroism
- CDK9, Cyclin-dependent kinase 9
- CW, Continuous wave
- Cc, Cytochrome c
- Cytochrome c
- DDR, DNA damage response
- DEER, Double electron–electron resonance
- DLS, Dynamic light scattering
- DMEM, Dulbecco’s modified Eagle’s medium
- DNA, Deoxyribonucleic acid
- DTT, Dithiotreitol
- Dmax, Maximum dimension
- EDTA, Ethylenediamine tetraacetic acid
- EGTA, Ethyleneglycol tetraacetic acid
- EPR, Electron paramagnetic resonance
- Encounter complex
- FBS, Fetal bovine serum
- GUI, Graphical user interface
- HEK, Human embryonic kidney cells
- HRP, Horseradish peroxidase
- I2PP2A, Inhibitor 2 of the protein phosphatase 2A
- I3PP2A, Inhibitor 3 of the protein phosphatase 2A
- INTAC, Integrator-PP2A complex
- IPTG, Isopropyl-β-D-1-thiogalactopyranoside
- ITC, Isothermal titration calorimetry
- Ip/Id, Intensity ratio of NMR resonances between paramagnetic and diamagnetic samples
- LB, Luria-Bertani
- MD, Molecular dynamics
- MTS, (1-acetoxy-2,2,5,5-tetramethyl-δ-3-pyrroline-3-methyl) methanethiosulfonate
- MTSL, (1-oxyl-2,2,5,5-tetramethyl- δ −3-pyrroline-3-methyl) methanethiosulfonate
- MW, Molecular weight
- Molecular dynamics
- NAP1, Nucleosome assembly protein 1
- NAPL, Nucleosome assembly protein L
- NMA, Normal mode analysis
- NMR, Nuclear magnetic resonance
- NPT, Constant number, pressure and temperature
- NVT, Constant number, volume and temperature
- Nuclear magnetic resonance
- OD600, Optical density measured at 600 nm
- OPC, Optimal 3-charge, 4-point rigid water model
- PCR, Polymerase chain reaction
- PME, Particle mesh Ewald
- PMSF, Phenylmethylsulfonyl fluoride
- PP2A, Protein phosphatase 2A
- PRE, Paramagnetic relaxation enhancement
- PVDF, Polyvinylidene fluoride
- Protein phosphatase 2A
- RNA, Ribonucleic acid
- RNApol II, RNA polymerase II
- Rg, Radius of gyration
- SAXS, Small-angle X-ray scattering
- SC, Sample changer
- SDS-PAGE, Sodium dodecylsulfate-polyacrylamide gel electrophoresis
- SDSL, Site-directed spin labeling
- SEC, Size-exclusion chromatography
- SET/TAF-Iβ
- SET/TAF-Iβ ΔC, SET/template-activating factor-Iβ construct lacking its C-terminal domain
- SET/TAF-Iβ, SET/template-activating factor-Iβ
- SPRi, Surface plasmon resonance imaging
- TAF-Iα, Template-activating factor-Iα
- TPBS, Tween 20-phosphate buffered saline
- VPS75, Vacuolar protein sorting-associated protein 75
- WT, Wild type
- XRD, X-ray diffraction
Collapse
Affiliation(s)
- Miguel Á. Casado-Combreras
- Institute for Chemical Research (IIQ), Scientific Research Centre “Isla de la Cartuja” (cicCartuja), University of Seville and CSIC, Avda. Américo Vespucio, 49, 41092 Seville, Spain
| | - Francisco Rivero-Rodríguez
- Institute for Chemical Research (IIQ), Scientific Research Centre “Isla de la Cartuja” (cicCartuja), University of Seville and CSIC, Avda. Américo Vespucio, 49, 41092 Seville, Spain
| | - Carlos A. Elena-Real
- Institute for Chemical Research (IIQ), Scientific Research Centre “Isla de la Cartuja” (cicCartuja), University of Seville and CSIC, Avda. Américo Vespucio, 49, 41092 Seville, Spain
- Centre de Biologie Structurale (CBS), INSERM, Centre National de la Recherche Scientifique (CNRS) and Université de Montpellier. 29 rue de Navacelles, 34090 Montpellier, France
| | - Dmitry Molodenskiy
- European Molecular Biology Laboratory, Hamburg Outstation, c/o Deutsches Elektronen-Synchrotron, Notkestr. 85, 22607 Hamburg, Germany
| | - Antonio Díaz-Quintana
- Institute for Chemical Research (IIQ), Scientific Research Centre “Isla de la Cartuja” (cicCartuja), University of Seville and CSIC, Avda. Américo Vespucio, 49, 41092 Seville, Spain
| | - Marlène Martinho
- Aix Marseille Univ. Centre National de la Recherche Scientifique (CNRS), BIP UMR7281, Bioénergétique et Ingénierie des protéines, 13402 Marseille, France
| | - Guillaume Gerbaud
- Aix Marseille Univ. Centre National de la Recherche Scientifique (CNRS), BIP UMR7281, Bioénergétique et Ingénierie des protéines, 13402 Marseille, France
| | - Katiuska González-Arzola
- Institute for Chemical Research (IIQ), Scientific Research Centre “Isla de la Cartuja” (cicCartuja), University of Seville and CSIC, Avda. Américo Vespucio, 49, 41092 Seville, Spain
| | - Adrián Velázquez-Campoy
- Institute of Biocomputation and Physic of Complex Systems (BIFI), Joint Unit GBsC-CSIC-BIFI, Universidad de Zaragoza. C. de Mariano Esquillor Gómez, Edificio I+D, 50018 Zaragoza, Spain
- Departamento de Bioquímica y Biología Molecular y Celular, Facultad de Ciencias, Universidad de Zaragoza, C. Pedro Cerbuna, 12, 50009 Zaragoza, Spain
- Instituto de Investigación Sanitaria de Aragón (IIS Aragon), Zaragoza, Spain
- Centro de Investigación Biomédica en Red en el Área Temática de Enfermedades Hepáticas y Digestivas (CIBERehd), C. de Melchor Fernández Almagro, 3, 28029 Madrid, Spain
| | - Dmitri Svergun
- European Molecular Biology Laboratory, Hamburg Outstation, c/o Deutsches Elektronen-Synchrotron, Notkestr. 85, 22607 Hamburg, Germany
| | - Valérie Belle
- Aix Marseille Univ. Centre National de la Recherche Scientifique (CNRS), BIP UMR7281, Bioénergétique et Ingénierie des protéines, 13402 Marseille, France
| | - Miguel A. De la Rosa
- Institute for Chemical Research (IIQ), Scientific Research Centre “Isla de la Cartuja” (cicCartuja), University of Seville and CSIC, Avda. Américo Vespucio, 49, 41092 Seville, Spain
| | - Irene Díaz-Moreno
- Institute for Chemical Research (IIQ), Scientific Research Centre “Isla de la Cartuja” (cicCartuja), University of Seville and CSIC, Avda. Américo Vespucio, 49, 41092 Seville, Spain
| |
Collapse
|
18
|
Quantitative In Silico Evaluation of Allergenic Proteins from Anacardium occidentale, Carya illinoinensis, Juglans regia and Pistacia vera and Their Epitopes as Precursors of Bioactive Peptides. Curr Issues Mol Biol 2022; 44:3100-3117. [PMID: 35877438 PMCID: PMC9317212 DOI: 10.3390/cimb44070214] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2022] [Revised: 06/28/2022] [Accepted: 07/01/2022] [Indexed: 11/16/2022] Open
Abstract
The aim of the study presented here was to determine if there is a correlation between the presence of specific protein domains within tree nut allergens or tree nut allergen epitopes and the frequency of bioactive fragments and the predicted susceptibility to enzymatic digestion in allergenic proteins from tree nuts of cashew (Anacardium occidentale), pecan (Carya illinoinensis), English walnut (Juglans regia) and pistachio (Pistacia vera) plants. These bioactive peptides are distributed along the length of the protein and are not enriched in IgE epitope sequences. Classification of proteins as bioactive peptide precursors based on the presence of specific protein domains may be a promising approach. Proteins possessing a vicilin, N-terminal family domain, or napin domain contain a relatively low occurrence of bioactive fragments. In contrast, proteins possessing the cupin 1 domain without the vicilin N-terminal family domain contain a relatively high total frequency of bioactive fragments and predicted release of bioactive fragments by the joint action of pepsin, trypsin, and chymotrypsin. This approach could be utilized in food science to simplify the selection of protein domains enriched for bioactive peptides.
Collapse
|
19
|
Structural and Kinetic Views of Molecular Chaperones in Multidomain Protein Folding. Int J Mol Sci 2022; 23:ijms23052485. [PMID: 35269628 PMCID: PMC8910466 DOI: 10.3390/ijms23052485] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2022] [Revised: 02/18/2022] [Accepted: 02/21/2022] [Indexed: 12/10/2022] Open
Abstract
Despite recent developments in protein structure prediction, the process of the structure formation, folding, remains poorly understood. Notably, folding of multidomain proteins, which involves multiple steps of segmental folding, is one of the biggest questions in protein science. Multidomain protein folding often requires the assistance of molecular chaperones. Molecular chaperones promote or delay the folding of the client protein, but the detailed mechanisms are still unclear. This review summarizes the findings of biophysical and structural studies on the mechanism of multidomain protein folding mediated by molecular chaperones and explains how molecular chaperones recognize the client proteins and alter their folding properties. Furthermore, we introduce several recent studies that describe the concept of kinetics-activity relationships to explain the mechanism of functional diversity of molecular chaperones.
Collapse
|
20
|
Jadid N, Prasetyowati I, Rosidah NLA, Ermavitalini D, Nurhatika S, Nurhidayati T, Purnobasuki H. In Silico Analysis of Partial Fatty Acid Desaturase 2 cDNA From Reutealis trisperma (Blanco) Airy Shaw. Bioinform Biol Insights 2022; 15:11779322211005747. [PMID: 35173423 PMCID: PMC8842343 DOI: 10.1177/11779322211005747] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2020] [Accepted: 03/08/2021] [Indexed: 11/16/2022] Open
Abstract
Reutealis trisperma oil is a new source for biodiesel production. The predominant fatty acids in this plant are stearic acid (9%), palmitic acid (10%), oleic acid (12%), linoleic acid (19%), and α-eleostearic acid (51%). The presence of polyunsaturated fatty acids (PUFAs), linoleic acid, and α-eleostearic acid decreases the oxidation stability of R. trisperma biodiesel. Although several studies have suggested that the fatty acid desaturase 2 (FAD2) enzyme is involved in the regulation of fatty acid desaturation, little is known about the genetic information of FAD2 in R. trisperma. The objectives of this study were to isolate, characterize, and determine the relationship between the R. trisperma FAD2 fragment and other Euphorbiaceae plants. cDNA fragments were isolated using reverse transcription polymerase chain reaction (PCR). The DNA sequence obtained by sequencing was used for further analysis. In silico analysis identified the fragment identity, subcellular localization, and phylogenetic construction of the R. trisperma FAD2 cDNA fragment and Euphorbiaceae. The results showed that a 923-bp partial sequence of R. trisperma FAD2 was successfully isolated. Based on in silico analysis, FAD2 was predicted to encode 260 amino acids, had a domain similarity with Omega-6 fatty acid desaturase, and was located in the endoplasmic reticulum membrane. The R. trisperma FAD2 fragment was more closely related to Vernicia fordii (HM755946.1).
Collapse
Affiliation(s)
- Nurul Jadid
- Department of Biology, Institut Teknologi Sepuluh Nopember, Surabaya, Indonesia
| | - Indah Prasetyowati
- Department of Biology, Institut Teknologi Sepuluh Nopember, Surabaya, Indonesia
| | | | - Dini Ermavitalini
- Department of Biology, Institut Teknologi Sepuluh Nopember, Surabaya, Indonesia
| | - Sri Nurhatika
- Department of Biology, Institut Teknologi Sepuluh Nopember, Surabaya, Indonesia
| | - Tutik Nurhidayati
- Department of Biology, Institut Teknologi Sepuluh Nopember, Surabaya, Indonesia
| | - Hery Purnobasuki
- Department of Biology, Universitas Airlangga, Surabaya, Indonesia
| |
Collapse
|
21
|
Bukhari SNH, Jain A, Haq E, Mehbodniya A, Webber J. Machine Learning Techniques for the Prediction of B-Cell and T-Cell Epitopes as Potential Vaccine Targets with a Specific Focus on SARS-CoV-2 Pathogen: A Review. Pathogens 2022; 11:pathogens11020146. [PMID: 35215090 PMCID: PMC8879824 DOI: 10.3390/pathogens11020146] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2021] [Revised: 01/19/2022] [Accepted: 01/21/2022] [Indexed: 02/01/2023] Open
Abstract
The only part of an antigen (a protein molecule found on the surface of a pathogen) that is composed of epitopes specific to T and B cells is recognized by the human immune system (HIS). Identification of epitopes is considered critical for designing an epitope-based peptide vaccine (EBPV). Although there are a number of vaccine types, EBPVs have received less attention thus far. It is important to mention that EBPVs have a great deal of untapped potential for boosting vaccination safety—they are less expensive and take a short time to produce. Thus, in order to quickly contain global pandemics such as the ongoing outbreak of coronavirus disease 2019 (COVID-19) caused by the severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2), as well as epidemics and endemics, EBPVs are considered promising vaccine types. The high mutation rate of SARS-CoV-2 has posed a great challenge to public health worldwide because either the composition of existing vaccines has to be changed or a new vaccine has to be developed to protect against its different variants. In such scenarios, time being the critical factor, EBPVs can be a promising alternative. To design an effective and viable EBPV against different strains of a pathogen, it is important to identify the putative T- and B-cell epitopes. Using the wet-lab experimental approach to identify these epitopes is time-consuming and costly because the experimental screening of a vast number of potential epitope candidates is required. Fortunately, various available machine learning (ML)-based prediction methods have reduced the burden related to the epitope mapping process by decreasing the potential epitope candidate list for experimental trials. Moreover, these methods are also cost-effective, scalable, and fast. This paper presents a systematic review of various state-of-the-art and relevant ML-based methods and tools for predicting T- and B-cell epitopes. Special emphasis is placed on highlighting and analyzing various models for predicting epitopes of SARS-CoV-2, the causative agent of COVID-19. Based on the various methods and tools discussed, future research directions for epitope prediction are presented.
Collapse
Affiliation(s)
- Syed Nisar Hussain Bukhari
- University Institute of Computing, Chandigarh University, NH-95, Chandigarh-Ludhiana Highway, Mohali 140413, India;
- Correspondence:
| | - Amit Jain
- University Institute of Computing, Chandigarh University, NH-95, Chandigarh-Ludhiana Highway, Mohali 140413, India;
| | - Ehtishamul Haq
- Department of Biotechnology, University of Kashmir, Srinagar 190006, India;
| | - Abolfazl Mehbodniya
- Department of Electronics and Communication Engineering, Kuwait College of Science and Technology, Kuwait City 20185145, Kuwait;
| | - Julian Webber
- Graduate School of Engineering Science, Osaka University, Osaka 560-8531, Japan;
| |
Collapse
|
22
|
Ozger ZB, Cihan P. A novel ensemble fuzzy classification model in SARS-CoV-2 B-cell epitope identification for development of protein-based vaccine. Appl Soft Comput 2021; 116:108280. [PMID: 34931117 PMCID: PMC8673934 DOI: 10.1016/j.asoc.2021.108280] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2021] [Revised: 11/25/2021] [Accepted: 12/03/2021] [Indexed: 12/23/2022]
Abstract
B-cell epitope prediction research has received growing interest since the development of the first method. B-cell epitope identification with the aid of an accurate prediction method is one of the most important steps in epitope-based vaccine development, immunodiagnostic testing, antibody production, disease diagnosis, and treatment. Nevertheless, using experimental methods in epitope mapping is very time-consuming, costly, and labor-intensive. Therefore, although successful predictions with in silico methods are very important in epitope prediction, there are limited studies in this area. The aim of this study is to propose a new approach for successfully predicting B-cell epitopes for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). In this study, the SARS-CoV B-cell epitope prediction performances of different fuzzy learning classification models genetic cooperative competitive learning (GCCL), fuzzy genetics-based machine learning (GBML), Chi’s method (CHI), Ishibuchi’s method with weight factor (W), structural learning algorithm on vague environment (SLAVE) and the state-of-the-art ensemble fuzzy classification model were compared. The obtained results showed that the proposed ensemble approach has the lowest error in SARS-CoV B-cell epitope estimation compared to the base fuzzy learners (average error rates; ensemble fuzzy=8.33, GCCL=30.42, GBML=23.82, CHI=29.17, W=46.25, and SLAVE=20.42). SARS-CoV and SARS-CoV-2 have high genome similarities. Therefore, the most successful method determined for SARS-CoV B-cell epitope prediction was used in SARS-CoV-2 cell epitope prediction. Finally, the eventual B-cell epitope prediction results obtained for SARS-CoV-2 with the ensemble fuzzy classification model were compared with the epitope sequences predicted by the BepiPred server and immunoinformatics studies in the literature for the same protein sequences according to VaxiJen 2.0 scores. We hope that the developed epitope prediction method will help design effective vaccines and drugs against future outbreaks of the coronavirus family, especially SARS-CoV-2 and its possible mutations.
Collapse
Affiliation(s)
- Zeynep Banu Ozger
- Department of Computer Engineering, Sutcu Imam University, 46040, Kahramanmaras, Turkey
| | - Pınar Cihan
- Department of Computer Engineering, Tekirdag Namik Kemal University, 59860, Corlu, Tekirdag, Turkey
| |
Collapse
|
23
|
Coyote-Maestas W, Nedrud D, Suma A, He Y, Matreyek KA, Fowler DM, Carnevale V, Myers CL, Schmidt D. Probing ion channel functional architecture and domain recombination compatibility by massively parallel domain insertion profiling. Nat Commun 2021; 12:7114. [PMID: 34880224 PMCID: PMC8654947 DOI: 10.1038/s41467-021-27342-0] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2021] [Accepted: 11/16/2021] [Indexed: 11/10/2022] Open
Abstract
Protein domains are the basic units of protein structure and function. Comparative analysis of genomes and proteomes showed that domain recombination is a main driver of multidomain protein functional diversification and some of the constraining genomic mechanisms are known. Much less is known about biophysical mechanisms that determine whether protein domains can be combined into viable protein folds. Here, we use massively parallel insertional mutagenesis to determine compatibility of over 300,000 domain recombination variants of the Inward Rectifier K+ channel Kir2.1 with channel surface expression. Our data suggest that genomic and biophysical mechanisms acted in concert to favor gain of large, structured domain at protein termini during ion channel evolution. We use machine learning to build a quantitative biophysical model of domain compatibility in Kir2.1 that allows us to derive rudimentary rules for designing domain insertion variants that fold and traffic to the cell surface. Positional Kir2.1 responses to motif insertion clusters into distinct groups that correspond to contiguous structural regions of the channel with distinct biophysical properties tuned towards providing either folding stability or gating transitions. This suggests that insertional profiling is a high-throughput method to annotate function of ion channel structural regions.
Collapse
Affiliation(s)
- Willow Coyote-Maestas
- grid.17635.360000000419368657Department of Biochemistry, Molecular Biology & Biophysics, University of Minnesota, Minneapolis, MN 55455 USA
| | - David Nedrud
- grid.17635.360000000419368657Department of Biochemistry, Molecular Biology & Biophysics, University of Minnesota, Minneapolis, MN 55455 USA
| | - Antonio Suma
- grid.264727.20000 0001 2248 3398Department of Chemistry, Temple University, Philadelphia, PA 19122 USA
| | - Yungui He
- grid.17635.360000000419368657Department of Genetics, Cell Biology & Development, University of Minnesota, Minneapolis, MN 55455 USA
| | - Kenneth A. Matreyek
- grid.67105.350000 0001 2164 3847Department of Pathology, Case Western Reserve University School of Medicine, Cleveland, OH 44106 USA
| | - Douglas M. Fowler
- grid.34477.330000000122986657Department of Genome Sciences, University of Washington, Seattle, WA 98115 USA ,grid.34477.330000000122986657Department of Bioengineering, University of Washington, Seattle, WA 98115 USA
| | - Vincenzo Carnevale
- grid.264727.20000 0001 2248 3398Department of Chemistry, Temple University, Philadelphia, PA 19122 USA
| | - Chad L. Myers
- grid.17635.360000000419368657Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN 55455 USA
| | - Daniel Schmidt
- Department of Genetics, Cell Biology & Development, University of Minnesota, Minneapolis, MN, 55455, USA.
| |
Collapse
|
24
|
Hirn M, Little A. Wavelet invariants for statistically robust multi-reference alignment. INFORMATION AND INFERENCE : A JOURNAL OF THE IMA 2021; 10:1287-1351. [PMID: 35070296 PMCID: PMC8782248 DOI: 10.1093/imaiai/iaaa016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
We propose a nonlinear, wavelet-based signal representation that is translation invariant and robust to both additive noise and random dilations. Motivated by the multi-reference alignment problem and generalizations thereof, we analyze the statistical properties of this representation given a large number of independent corruptions of a target signal. We prove the nonlinear wavelet-based representation uniquely defines the power spectrum but allows for an unbiasing procedure that cannot be directly applied to the power spectrum. After unbiasing the representation to remove the effects of the additive noise and random dilations, we recover an approximation of the power spectrum by solving a convex optimization problem, and thus reduce to a phase retrieval problem. Extensive numerical experiments demonstrate the statistical robustness of this approximation procedure.
Collapse
Affiliation(s)
- Matthew Hirn
- Department of Computational Mathematics, Science and Engineering, Department of Mathematics and Center for Quantum Computing, Science and Engineering, Michigan State University, East Lansing, MI 48824
| | - Anna Little
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, MI 48824
| |
Collapse
|
25
|
Gilchrist CLM, Chooi YH. Synthaser: a CD-Search enabled Python toolkit for analysing domain architecture of fungal secondary metabolite megasynth(et)ases. Fungal Biol Biotechnol 2021; 8:13. [PMID: 34763725 PMCID: PMC8582187 DOI: 10.1186/s40694-021-00120-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2021] [Accepted: 10/29/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Fungi are prolific producers of secondary metabolites (SMs), which are bioactive small molecules with important applications in medicine, agriculture and other industries. The backbones of a large proportion of fungal SMs are generated through the action of large, multi-domain megasynth(et)ases such as polyketide synthases (PKSs) and nonribosomal peptide synthetases (NRPSs). The structure of these backbones is determined by the domain architecture of the corresponding megasynth(et)ase, and thus accurate annotation and classification of these architectures is an important step in linking SMs to their biosynthetic origins in the genome. RESULTS Here we report synthaser, a Python package leveraging the NCBI's conserved domain search tool for remote prediction and classification of fungal megasynth(et)ase domain architectures. Synthaser is capable of batch sequence analysis, and produces rich textual output and interactive visualisations which allow for quick assessment of the megasynth(et)ase diversity of a fungal genome. Synthaser uses a hierarchical rule-based classification system, which can be extensively customised by the user through a web application ( http://gamcil.github.io/synthaser ). We show that synthaser provides more accurate domain architecture predictions than comparable tools which rely on curated profile hidden Markov model (pHMM)-based approaches; the utilisation of the NCBI conserved domain database also allows for significantly greater flexibility compared to pHMM approaches. In addition, we demonstrate how synthaser can be applied to large scale genome mining pipelines through the construction of an Aspergillus PKS similarity network. CONCLUSIONS Synthaser is an easy to use tool that represents a significant upgrade to previous domain architecture analysis tools. It is freely available under a MIT license from PyPI ( https://pypi.org/project/synthaser ) and GitHub ( https://github.com/gamcil/synthaser ).
Collapse
Affiliation(s)
- Cameron L M Gilchrist
- School of Molecular Sciences, The University of Western Australia, 35 Stirling Hwy, Crawley, 6009, Australia.
| | - Yit-Heng Chooi
- School of Molecular Sciences, The University of Western Australia, 35 Stirling Hwy, Crawley, 6009, Australia.
| |
Collapse
|
26
|
Torres PHM, Rossi AD, Blundell TL. ProtCHOIR: a tool for proteome-scale generation of homo-oligomers. Brief Bioinform 2021; 22:bbab182. [PMID: 34015821 PMCID: PMC8574958 DOI: 10.1093/bib/bbab182] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2021] [Revised: 04/04/2021] [Accepted: 04/20/2021] [Indexed: 01/10/2023] Open
Abstract
The rapid developments in gene sequencing technologies achieved in the recent decades, along with the expansion of knowledge on the three-dimensional structures of proteins, have enabled the construction of proteome-scale databases of protein models such as the Genome3D and ModBase. Nevertheless, although gene products are usually expressed as individual polypeptide chains, most biological processes are associated with either transient or stable oligomerisation. In the PDB databank, for example, ~40% of the deposited structures contain at least one homo-oligomeric interface. Unfortunately, databases of protein models are generally devoid of multimeric structures. To tackle this particular issue, we have developed ProtCHOIR, a tool that is able to generate homo-oligomeric structures in an automated fashion, providing detailed information for the input protein and output complex. ProtCHOIR requires input of either a sequence or a protomeric structure that is queried against a pre-constructed local database of homo-oligomeric structures, then extensively analyzed using well-established tools such as PSI-Blast, MAFFT, PISA and Molprobity. Finally, MODELLER is employed to achieve the construction of the homo-oligomers. The output complex is thoroughly analyzed taking into account its stereochemical quality, interfacial stabilities, hydrophobicity and conservation profile. All these data are then summarized in a user-friendly HTML report that can be saved or printed as a PDF file. The software is easily parallelizable and also outputs a comma-separated file with summary statistics that can straightforwardly be concatenated as a spreadsheet-like document for large-scale data analyses. As a proof-of-concept, we built oligomeric models for the Mabellini Mycobacterium abscessus structural proteome database. ProtCHOIR can be run as a web-service and the code can be obtained free-of-charge at http://lmdm.biof.ufrj.br/protchoir.
Collapse
|
27
|
Moffat L, Jones DT. Increasing the accuracy of single sequence prediction methods using a deep semi-supervised learning framework. Bioinformatics 2021; 37:3744-3751. [PMID: 34213528 PMCID: PMC8570780 DOI: 10.1093/bioinformatics/btab491] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Revised: 06/08/2021] [Accepted: 06/30/2021] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Over the past 50 years, our ability to model protein sequences with evolutionary information has progressed in leaps and bounds. However, even with the latest deep learning methods, the modelling of a critically important class of proteins, single orphan sequences, remains unsolved. RESULTS By taking a bioinformatics approach to semi-supervised machine learning, we develop Profile Augmentation of Single Sequences (PASS), a simple but powerful framework for building accurate single-sequence methods. To demonstrate the effectiveness of PASS we apply it to the mature field of secondary structure prediction. In doing so we develop S4PRED, the successor to the open-source PSIPRED-Single method, which achieves an unprecedented Q3 score of 75.3% on the standard CB513 test. PASS provides a blueprint for the development of a new generation of predictive methods, advancing our ability to model individual protein sequences. AVAILABILITY AND IMPLEMENTATION The S4PRED model is available as open source software on the PSIPRED GitHub repository (https://github.com/psipred/s4pred), along with documentation. It will also be provided as a part of the PSIPRED web service (http://bioinf.cs.ucl.ac.uk/psipred/). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Lewis Moffat
- Department of Computer Science, University College London, London WC1E 6BT, UK
- Biomedical Data Science Laboratory, The Francis Crick Institute, London NW1 1AT, UK
| | - David T Jones
- Department of Computer Science, University College London, London WC1E 6BT, UK
- Biomedical Data Science Laboratory, The Francis Crick Institute, London NW1 1AT, UK
| |
Collapse
|
28
|
Alvarez-Carreño C, Penev PI, Petrov AS, Williams LD. Fold Evolution before LUCA: Common Ancestry of SH3 Domains and OB Domains. Mol Biol Evol 2021; 38:5134-5143. [PMID: 34383917 PMCID: PMC8557408 DOI: 10.1093/molbev/msab240] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
SH3 and OB are the simplest, oldest, and most common protein domains within the translation system. SH3 and OB domains are β-barrels that are structurally similar but are topologically distinct. To transform an OB domain to a SH3 domain, β-strands must be permuted in a multistep and evolutionarily implausible mechanism. Here, we explored relationships between SH3 and OB domains of ribosomal proteins, initiation, and elongation factors using a combined sequence- and structure-based approach. We detect a common core of SH3 and OB domains, as a region of significant structure and sequence similarity. The common core contains four β-strands and a loop, but omits the fifth β-strand, which is variable and is absent from some OB and SH3 domain proteins. The structure of the common core immediately suggests a simple permutation mechanism for interconversion between SH3 and OB domains, which appear to share an ancestor. The OB domain was formed by duplication and adaptation of the SH3 domain core, or vice versa, in a simple and probable transformation. By employing the folding algorithm AlphaFold2, we demonstrated that an ancestral reconstruction of a permuted SH3 sequence folds into an OB structure, and an ancestral reconstruction of a permuted OB sequence folds into a SH3 structure. The tandem SH3 and OB domains in the universal ribosomal protein uL2 share a common ancestor, suggesting that the divergence of these two domains occurred before the last universal common ancestor.
Collapse
Affiliation(s)
- Claudia Alvarez-Carreño
- NASA Center for the Origin of Life, Georgia Institute of Technology, Atlanta, GA, USA
- School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, GA, USA
| | - Petar I Penev
- NASA Center for the Origin of Life, Georgia Institute of Technology, Atlanta, GA, USA
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA, USA
| | - Anton S Petrov
- NASA Center for the Origin of Life, Georgia Institute of Technology, Atlanta, GA, USA
- School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, GA, USA
| | - Loren Dean Williams
- NASA Center for the Origin of Life, Georgia Institute of Technology, Atlanta, GA, USA
- School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, GA, USA
| |
Collapse
|
29
|
Sandaruwan PD, Wannige CT. An improved deep learning model for hierarchical classification of protein families. PLoS One 2021; 16:e0258625. [PMID: 34669708 PMCID: PMC8528337 DOI: 10.1371/journal.pone.0258625] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2020] [Accepted: 10/01/2021] [Indexed: 12/28/2022] Open
Abstract
Although genes carry information, proteins are the main role player in providing all the functionalities of a living organism. Massive amounts of different proteins involve in every function that occurs in a cell. These amino acid sequences can be hierarchically classified into a set of families and subfamilies depending on their evolutionary relatedness and similarities in their structure or function. Protein characterization to identify protein structure and function is done accurately using laboratory experiments. With the rapidly increasing huge amount of novel protein sequences, these experiments have become difficult to carry out since they are expensive, time-consuming, and laborious. Therefore, many computational classification methods are introduced to classify proteins and predict their functional properties. With the progress of the performance of the computational techniques, deep learning plays a key role in many areas. Novel deep learning models such as DeepFam, ProtCNN have been presented to classify proteins into their families recently. However, these deep learning models have been used to carry out the non-hierarchical classification of proteins. In this research, we propose a deep learning neural network model named DeepHiFam with high accuracy to classify proteins hierarchically into different levels simultaneously. The model achieved an accuracy of 98.38% for protein family classification and more than 80% accuracy for the classification of protein subfamilies and sub-subfamilies. Further, DeepHiFam performed well in the non-hierarchical classification of protein families and achieved an accuracy of 98.62% and 96.14% for the popular Pfam dataset and COG dataset respectively.
Collapse
|
30
|
Bioinformatic prediction and identification of immunogenic epitopes of the antigenic 14-3-3 protein of Echinococcus multilocularis. Acta Trop 2021; 220:105955. [PMID: 33979643 DOI: 10.1016/j.actatropica.2021.105955] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2020] [Revised: 04/30/2021] [Accepted: 05/04/2021] [Indexed: 02/06/2023]
Abstract
INTRODUCTION Alveolar echinococcosis is a high-risk parasitic disease caused by the larval stage of Echinococcus multilocularis. The study aimed to predict and identify the dominant Th1/Th2 and B cell epitopes of the antigen protein 14-3-3 beta:alpha from Echinococcus multilocularis. METHODS A comparison of the four amino acid sequences of 14-3-3 beta:alpha was respectively derived from Echinococcus multilocularis, Rattus norvegicus, Canis lupus familiaris, and Homo sapiens was carried out by CLUSTALW to provide a basis for excluding similar epitopes. The amino acid sequence information was analyzed by SOPMA and the homology model was established by Swiss-Model. IEDB and SYFPEITHI were used to predict T cell epitopes. According to the Bcepred and ABCpred, the B cell epitopes were comprehensively predicted and analyzed. The dominant epitopes were validated by Lymphocyte Proliferation, ELISA, ELISpot, and Flow cytometry. RESULTS Eight potential epitopes of 14-3-3 from Echinococcus multilocularis were screened according to the results of prediction and analysis: 14-3-31-15, 14-3-36-21, 14-3-371-86, 14-3-3144-157, 14-3-3145-166, 14-3-3146-160, 14-3-3153-161, and 14-3-3164-177. The 3D structure model of the protein was constructed and the location distribution of potential epitope was ascertained. Respectively, the epitopes of the dominant antigen of B cells were validated as 14-3-3145-166 and 14-3-3164-177; the Th1 dominant antigen epitopes were 14-3-36-21, 14-3-3145-166; and the Th2 dominant epitopes was 14-3-3145-166. CONCLUSION In this study, two dominant antigen epitopes of B cells, two Th1 dominant antigen epitopes, and one Th2 dominant antigen epitope were validated. Our work provides a basis for the subsequent development of efficient and safe vaccines targeting epitopes of Echinococcus multilocularis.
Collapse
|
31
|
The Serological Cross-Detection of Bat-Borne Hantaviruses: A Valid Strategy or Taking Chances? Viruses 2021; 13:v13071188. [PMID: 34206220 PMCID: PMC8309984 DOI: 10.3390/v13071188] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2021] [Revised: 05/21/2021] [Accepted: 05/24/2021] [Indexed: 11/17/2022] Open
Abstract
Bats are hosts of a range of viruses, and their great diversity and unique characteristics that distinguish them from all other mammals have been related to the maintenance, evolution, and dissemination of these pathogens. Recently, very divergent hantaviruses have been discovered in distinct species of bats worldwide, but their association with human disease remains unclear. Considering the low success rates of detecting hantavirus RNA in bat tissues and that to date no hantaviruses have been isolated from bat samples, immunodiagnostic tools could be very helpful to understand pathogenesis, epidemiology, and geographic range of bat-borne hantaviruses. In this sense, we aimed to identify in silico immunogenic B-cell epitopes present on bat-borne hantaviruses nucleoprotein (NP) and verify if they are conserved among them and other selected members of Mammantavirinae, using a combination of (the three most used) different prediction algorithms, ELLIPRO, Discotope 2.0, and PEPITO server. To support our data, we in silico modeled 3D structures of NPs from representative members of bat-borne hantaviruses, using comparative and ab initio methods due to the absence of crystallographic structures of studied proteins or similar models in the Protein Data Bank. Our analysis demonstrated the antigenic complexity of the bat-borne hantaviruses group, showing a low sequence conservation of epitopes among members of its own group and a minor conservation degree in comparison to Orthohantavirus, with a recognized importance to public health. Our data suggest that the use of recombinant rodent-borne hantavirus NPs to cross-detect antibodies against bat- or shrew-borne viruses could underestimate the real impact of this virus in nature.
Collapse
|
32
|
Czibula G, Albu AI, Bocicor MI, Chira C. AutoPPI: An Ensemble of Deep Autoencoders for Protein-Protein Interaction Prediction. ENTROPY 2021; 23:e23060643. [PMID: 34064042 PMCID: PMC8223997 DOI: 10.3390/e23060643] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/15/2021] [Revised: 05/08/2021] [Accepted: 05/19/2021] [Indexed: 01/06/2023]
Abstract
Proteins are essential molecules, that must correctly perform their roles for the good health of living organisms. The majority of proteins operate in complexes and the way they interact has pivotal influence on the proper functioning of such organisms. In this study we address the problem of protein–protein interaction and we propose and investigate a method based on the use of an ensemble of autoencoders. Our approach, entitled AutoPPI, adopts a strategy based on two autoencoders, one for each type of interactions (positive and negative) and we advance three types of neural network architectures for the autoencoders. Experiments were performed on several data sets comprising proteins from four different species. The results indicate good performances of our proposed model, with accuracy and AUC values of over 0.97 in all cases. The best performing model relies on a Siamese architecture in both the encoder and the decoder, which advantageously captures common features in protein pairs. Comparisons with other machine learning techniques applied for the same problem prove that AutoPPI outperforms most of its contenders, for the considered data sets.
Collapse
|
33
|
Structural genomics and the Protein Data Bank. J Biol Chem 2021; 296:100747. [PMID: 33957120 PMCID: PMC8166929 DOI: 10.1016/j.jbc.2021.100747] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2021] [Revised: 04/16/2021] [Accepted: 04/30/2021] [Indexed: 12/14/2022] Open
Abstract
The field of Structural Genomics arose over the last 3 decades to address a large and rapidly growing divergence between microbial genomic, functional, and structural data. Several international programs took advantage of the vast genomic sequence information and evaluated the feasibility of structure determination for expanded and newly discovered protein families. As a consequence, structural genomics has developed structure-determination pipelines and applied them to a wide range of novel, uncharacterized proteins, often from “microbial dark matter,” and later to proteins from human pathogens. Advances were especially needed in protein production and rapid de novo structure solution. The experimental three-dimensional models were promptly made public, facilitating structure determination of other members of the family and helping to understand their molecular and biochemical functions. Improvements in experimental methods and databases resulted in fast progress in molecular and structural biology. The Protein Data Bank structure repository played a central role in the coordination of structural genomics efforts and the structural biology community as a whole. It facilitated development of standards and validation tools essential for maintaining high quality of deposited structural data.
Collapse
|
34
|
On the Emergence of Orientational Order in Folded Proteins with Implications for Allostery. Symmetry (Basel) 2021. [DOI: 10.3390/sym13050770] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open
Abstract
The beautiful structures of single- and multi-domain proteins are clearly ordered in some fashion but cannot be readily classified using group theory methods that are successfully used to describe periodic crystals. For this reason, protein structures are considered to be aperiodic, and may have evolved this way for functional purposes, especially in instances that require a combination of softness and rigidity within the same molecule. By analyzing the solved protein structures, we show that orientational symmetry is broken in the aperiodic arrangement of the secondary structure elements (SSEs), which we deduce by calculating the nematic order parameter, P2. We find that the folded structures are nematic droplets with a broad distribution of P2. We argue that a non-zero value of P2, leads to an arrangement of the SSEs that can resist external forces, which is a requirement for allosteric proteins. Such proteins, which resist mechanical forces in some regions while being flexible in others, transmit signals from one region of the protein to another (action at a distance) in response to binding of ligands (oxygen, ATP, or other small molecules).
Collapse
|
35
|
Wan X, Tan X. A protein structural study based on the centrality analysis of protein sequence feature networks. PLoS One 2021; 16:e0248861. [PMID: 33780482 PMCID: PMC8006989 DOI: 10.1371/journal.pone.0248861] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2020] [Accepted: 03/05/2021] [Indexed: 11/19/2022] Open
Abstract
In this paper, we use network approaches to analyze the relations between protein sequence features for the top hierarchical classes of CATH and SCOP. We use fundamental connectivity measures such as correlation (CR), normalized mutual information rate (nMIR), and transfer entropy (TE) to analyze the pairwise-relationships between the protein sequence features, and use centrality measures to analyze weighted networks constructed from the relationship matrices. In the centrality analysis, we find both commonalities and differences between the different protein 3D structural classes. Results show that all top hierarchical classes of CATH and SCOP present strong non-deterministic interactions for the composition and arrangement features of Cystine (C), Methionine (M), Tryptophan (W), and also for the arrangement features of Histidine (H). The different protein 3D structural classes present different preferences in terms of their centrality distributions and significant features.
Collapse
Affiliation(s)
- Xiaogeng Wan
- College of Mathematics and Physics, Beijing University of Chemical Technology, Beijing, China
- * E-mail:
| | - Xinying Tan
- The Fourth Center of PLA General Hospital, Beijing, China
| |
Collapse
|
36
|
Joshi T, Garg S, Estaña A, Cortés J, Bernadó P, Das S, Kammath AR, Sagar A, Rakshit S. Interdomain linkers tailor the stability of immunoglobulin repeats in polyproteins. Biochem Biophys Res Commun 2021; 550:43-48. [PMID: 33684619 DOI: 10.1016/j.bbrc.2021.02.114] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2021] [Accepted: 02/24/2021] [Indexed: 11/28/2022]
Abstract
Linkers in polyproteins are considered as mere spacers between two adjacent domains. However, a series of studies using single-molecule force spectroscopy have recently reported distinct thermodynamic stability of I27 in polyproteins with varying linkers and indicated the vital role of linkers in domain stability. A flexible glycine rich linker (-(GGG)n, n ≥ 3) featured unfolding at lower forces than the regularly used arg-ser (RS) based linker. Interdomain interactions among I27 domains in Gly-rich linkers were suggested to lead to reduced domain stability. However, the negative impact of inter domain interactions on domain stability is thermodynamically counter-intuitive and demanded thorough investigations. Here, using an array of ensemble equilibrium experiments and in-silico measurements with I27 singlet and doublets with two aforementioned linkers, we delineate that the inter-domain interactions in fact raise the stability of the polyprotein with RS linker. More surprisingly, a highly flexible Gly-rich linker has no interference on the stability of polyprotein. Overall, we conclude that flexible linkers are preferred in a polyprotein for maintaining domain's independence.
Collapse
Affiliation(s)
- Tanuja Joshi
- Department of Chemical Sciences, Indian Institute of Science Education and Research, Mohali, Punjab, India
| | - Surbhi Garg
- Department of Chemical Sciences, Indian Institute of Science Education and Research, Mohali, Punjab, India
| | - Alejandro Estaña
- Centre de BiochimieStructurale, INSERM, CNRS, Université de Montpellier, Montpellier, France; LAAS-CNRS, Université de Toulouse, CNRS, Toulouse, France
| | - Juan Cortés
- LAAS-CNRS, Université de Toulouse, CNRS, Toulouse, France
| | - Pau Bernadó
- Centre de BiochimieStructurale, INSERM, CNRS, Université de Montpellier, Montpellier, France
| | - Sayan Das
- Department of Chemical Sciences, Indian Institute of Science Education and Research, Mohali, Punjab, India
| | - Anjana R Kammath
- Department of Chemical Sciences, Indian Institute of Science Education and Research, Mohali, Punjab, India
| | - Amin Sagar
- Department of Chemical Sciences, Indian Institute of Science Education and Research, Mohali, Punjab, India; Centre de BiochimieStructurale, INSERM, CNRS, Université de Montpellier, Montpellier, France.
| | - Sabyasachi Rakshit
- Department of Chemical Sciences, Indian Institute of Science Education and Research, Mohali, Punjab, India; Centre for Protein Science Design and Engineering, Indian Institute of Science Education and Research, Mohali, Punjab, India.
| |
Collapse
|
37
|
Zhao B, Katuwawala A, Uversky VN, Kurgan L. IDPology of the living cell: intrinsic disorder in the subcellular compartments of the human cell. Cell Mol Life Sci 2021; 78:2371-2385. [PMID: 32997198 PMCID: PMC11071772 DOI: 10.1007/s00018-020-03654-0] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2020] [Revised: 09/09/2020] [Accepted: 09/22/2020] [Indexed: 12/11/2022]
Abstract
Intrinsic disorder can be found in all proteomes of all kingdoms of life and in viruses, being particularly prevalent in the eukaryotes. We conduct a comprehensive analysis of the intrinsic disorder in the human proteins while mapping them into 24 compartments of the human cell. In agreement with previous studies, we show that human proteins are significantly enriched in disorder relative to a generic protein set that represents the protein universe. In fact, the fraction of proteins with long disordered regions and the average protein-level disorder content in the human proteome are about 3 times higher than in the protein universe. Furthermore, levels of intrinsic disorder in the majority of human subcellular compartments significantly exceed the average disorder content in the protein universe. Relative to the overall amount of disorder in the human proteome, proteins localized in the nucleus and cytoskeleton have significantly increased amounts of disorder, measured by both high disorder content and presence of multiple long intrinsically disordered regions. We empirically demonstrate that, on average, human proteins are assigned to 2.3 subcellular compartments, with proteins localized to few subcellular compartments being more disordered than the proteins that are localized to many compartments. Functionally, the disordered proteins localized in the most disorder-enriched subcellular compartments are primarily responsible for interactions with nucleic acids and protein partners. This is the first-time disorder is comprehensively mapped into the human cell. Our observations add a missing piece to the puzzle of functional disorder and its organization inside the cell.
Collapse
Affiliation(s)
- Bi Zhao
- Department of Computer Science, Virginia Commonwealth University, 401 West Main Street, Room E4225, Richmond, VA, 23284, USA
| | - Akila Katuwawala
- Department of Computer Science, Virginia Commonwealth University, 401 West Main Street, Room E4225, Richmond, VA, 23284, USA
| | - Vladimir N Uversky
- Department of Molecular Medicine, USF Health Byrd Alzheimer's Research Institute, Morsani College of Medicine, University of South Florida, 12901 Bruce B. Downs Blvd. MDC07, Tampa, FL, 33612, USA.
- Laboratory of New Methods in Biology, Institute for Biological Instrumentation of the Russian Academy of Sciences, Federal Research Center "Pushchino Scientific Center for Biological Research of the Russian Academy of Sciences", Pushchino, Russia.
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, 401 West Main Street, Room E4225, Richmond, VA, 23284, USA.
| |
Collapse
|
38
|
Sun B, Liu Z, Fang X, Wang X, Lai C, Liu L, Xiao C, Jiang Y, Wang F. Improving the performance of proteomic analysis via VAILase cleavage and 193-nm ultraviolet photodissociation. Anal Chim Acta 2021; 1155:338340. [PMID: 33766312 DOI: 10.1016/j.aca.2021.338340] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2020] [Revised: 01/27/2021] [Accepted: 02/16/2021] [Indexed: 10/22/2022]
Abstract
Further improving the proteomic identification coverage and reliability is still challenging in the mass spectrometry (MS)-based proteomics. Herein, we combine VAILase and trypsin digestion with 193-nm ultraviolet photodissociation (UVPD) and higher-energy collision dissociation (HCD) to improve the performance of bottom-up proteomics. As VAILase exhibits high complementarity to trypsin, the proteome sequence coverage is improved obviously whether with HCD or 193-nm UVPD. The high diversity of fragment ion types produced by UVPD contributes to the improvements of identification reliability for both trypsin- and VAILase-digested peptides with an average XCorr score improvement of 10%.
Collapse
Affiliation(s)
- Binwen Sun
- CAS Key Laboratory of Separation Sciences for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, 116023, China; University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Zheyi Liu
- CAS Key Laboratory of Separation Sciences for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, 116023, China
| | - Xiang Fang
- National Institute of Metrology, Beijing, 100013, China
| | - Xiaolei Wang
- CAS Key Laboratory of Separation Sciences for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, 116023, China; State Key Laboratory of Molecular Reaction Dynamics, Dalian Institute of Chemical Physics, Dalian, 116023, China
| | - Can Lai
- CAS Key Laboratory of Separation Sciences for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, 116023, China; University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Lin Liu
- School of Life Sciences, Anhui University, 230601, Hefei, Anhui, China
| | - Chunlei Xiao
- State Key Laboratory of Molecular Reaction Dynamics, Dalian Institute of Chemical Physics, Dalian, 116023, China.
| | - You Jiang
- National Institute of Metrology, Beijing, 100013, China.
| | - Fangjun Wang
- CAS Key Laboratory of Separation Sciences for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, 116023, China; University of Chinese Academy of Sciences, Beijing, 100049, China.
| |
Collapse
|
39
|
Wan X, Tan X. A Simple Protein Evolutionary Classification Method Based on the Mutual Relations Between Protein Sequences. Curr Bioinform 2021. [DOI: 10.2174/1574893615666200305090055] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
Protein is a kind of important organics in life. It is varied with its
sequences, structures and functions. Protein evolutionary classification is one of the popular
research topics in computational bioinformatics. Many studies have used protein sequence
information to classify the evolutionary relationships of proteins. As the amount of protein
sequence data increases, efficient computational tools are needed to make efficient protein
evolutionary classifications with high accuracies in the big data paradigm.
Methods:
In this study, we propose a new simple and efficient computational approach based on
the normalized mutual information rates to compute the relationship between protein sequences,
we then use the “distances” defined on the relationships to perform the evolutionary classifications
of proteins. The new method is computational efficient, model-free and unsupervised, which does
not require training data when performing classifications.
Result:
Simulation studies on various examples demonstrate the efficiency of the new method.
We use precision-recall curves to compare the efficiency of our new method with traditional
methods, results show that the new method outperforms the traditional methods in most of the
cases when performing evolutionary classifications.
Conclusion:
The new method is simple and proved to be efficient in protein evolutionary
classifications, which is useful in future evolutionary analysis particularly in the big data paradigm.
Collapse
Affiliation(s)
- Xiaogeng Wan
- Department of Mathematics, College of Mathematics and Physics, Beijing University of Chemical Technology, Beijing, 100029, China
| | - Xinying Tan
- The Fourth Center of PLA General Hospital, Beijing, 100037, China
| |
Collapse
|
40
|
Youssef L, Miranda J, Blasco M, Paules C, Crovetto F, Palomo M, Torramade-Moix S, García-Calderó H, Tura-Ceide O, Dantas AP, Hernandez-Gea V, Herrero P, Canela N, Campistol JM, Garcia-Pagan JC, Diaz-Ricart M, Gratacos E, Crispi F. Complement and coagulation cascades activation is the main pathophysiological pathway in early-onset severe preeclampsia revealed by maternal proteomics. Sci Rep 2021; 11:3048. [PMID: 33542402 PMCID: PMC7862439 DOI: 10.1038/s41598-021-82733-z] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2020] [Accepted: 12/02/2020] [Indexed: 12/16/2022] Open
Abstract
Preeclampsia is a pregnancy-specific multisystem disorder and a leading cause of maternal and perinatal morbidity and mortality. The exact pathogenesis of this multifactorial disease remains poorly defined. We applied proteomics analysis on maternal blood samples collected from 14 singleton pregnancies with early-onset severe preeclampsia and 6 uncomplicated pregnancies to investigate the pathophysiological pathways involved in this specific subgroup of preeclampsia. Maternal blood was drawn at diagnosis for cases and at matched gestational age for controls. LC-MS/MS proteomics analysis was conducted, and data were analyzed by multivariate and univariate statistical approaches with the identification of differential pathways by exploring the global human protein-protein interaction network. The unsupervised multivariate analysis (the principal component analysis) showed a clear difference between preeclamptic and uncomplicated pregnancies. The supervised multivariate analysis using orthogonal partial least square discriminant analysis resulted in a model with goodness of fit (R2X = 0.99, p < 0.001) and a strong predictive ability (Q2Y = 0.8, p < 0.001). By univariate analysis, we found 17 proteins statistically different after 5% FDR correction (q-value < 0.05). Pathway enrichment analysis revealed 5 significantly enriched pathways whereby the activation of the complement and coagulation cascades was on top (p = 3.17e-07). To validate these results, we assessed the deposits of C5b-9 complement complex and on endothelial cells that were exposed to activated plasma from an independent set of 4 cases of early-onset severe preeclampsia and 4 uncomplicated pregnancies. C5b-9 and Von Willbrand factor deposits were significantly higher in early-onset severe preeclampsia. Future studies are warranted to investigate potential therapeutic targets for early-onset severe preeclampsia within the complement and coagulation pathway.
Collapse
Affiliation(s)
- Lina Youssef
- BCNatal | Fetal Medicine Research Center (Hospital Clínic and Hospital Sant Joan de Déu), Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), University of Barcelona, Barcelona, Spain
| | - Jezid Miranda
- BCNatal | Fetal Medicine Research Center (Hospital Clínic and Hospital Sant Joan de Déu), Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), University of Barcelona, Barcelona, Spain
| | - Miquel Blasco
- Nephrology and Renal Transplantation Department, Hospital Clínic, Centro de Referencia en Enfermedad Glomerular Compleja del Sistema Nacional de Salud (CSUR), University of Barcelona, Barcelona, Spain
| | - Cristina Paules
- BCNatal | Fetal Medicine Research Center (Hospital Clínic and Hospital Sant Joan de Déu), Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), University of Barcelona, Barcelona, Spain
| | - Francesca Crovetto
- BCNatal | Fetal Medicine Research Center (Hospital Clínic and Hospital Sant Joan de Déu), Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), University of Barcelona, Barcelona, Spain
| | - Marta Palomo
- Josep Carreras Leukaemia Research Institute, Hospital Clinic, University of Barcelona Campus, Barcelona, Spain
- Hematopathology, Centre Diagnòstic Biomèdic (CDB), Hospital Clinic, Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), University of Barcelona, Barcelona, Spain
- Barcelona Endothelium Team (BET), Barcelona, Spain
| | - Sergi Torramade-Moix
- Hematopathology, Centre Diagnòstic Biomèdic (CDB), Hospital Clinic, Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), University of Barcelona, Barcelona, Spain
| | - Héctor García-Calderó
- Barcelona Hepatic Hemodynamics Laboratory, Liver Unit, Hospital Clinic, Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), University of Barcelona, Barcelona, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBEREHD), Health Care Provider of the European Reference Network on Rare Liver Disorders (ERN-Liver), Barcelona, Spain
| | - Olga Tura-Ceide
- Department of Pulmonary Medicine, Hospital Clínic, Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), University of Barcelona, Barcelona, Spain
- Biomedical Research Networking Center on Respiratory Diseases (CIBERES), Madrid, Spain
- Girona Biomedical Research Institute - IDIBGI, Girona, Spain
| | - Ana Paula Dantas
- Cardiovascular Institute, Hospital Clinic, Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), University of Barcelona, Barcelona, Spain
| | - Virginia Hernandez-Gea
- Barcelona Hepatic Hemodynamics Laboratory, Liver Unit, Hospital Clinic, Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), University of Barcelona, Barcelona, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBEREHD), Health Care Provider of the European Reference Network on Rare Liver Disorders (ERN-Liver), Barcelona, Spain
| | - Pol Herrero
- Eurecat, Centre Tecnològic de Catalunya, Centre for Omic Sciences (COS), Joint Unit Universitat Rovira i Virgili-EURECAT, Unique Scientific and Technical Infrastructures (ICTS), 43204, Reus, Spain
| | - Nuria Canela
- Eurecat, Centre Tecnològic de Catalunya, Centre for Omic Sciences (COS), Joint Unit Universitat Rovira i Virgili-EURECAT, Unique Scientific and Technical Infrastructures (ICTS), 43204, Reus, Spain
| | - Josep Maria Campistol
- Nephrology and Renal Transplantation Department, Hospital Clínic, Centro de Referencia en Enfermedad Glomerular Compleja del Sistema Nacional de Salud (CSUR), University of Barcelona, Barcelona, Spain
- Centre for Biomedical Research on Rare Diseases (CIBER-ER), Madrid, Spain
| | - Joan Carles Garcia-Pagan
- Barcelona Hepatic Hemodynamics Laboratory, Liver Unit, Hospital Clinic, Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), University of Barcelona, Barcelona, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBEREHD), Health Care Provider of the European Reference Network on Rare Liver Disorders (ERN-Liver), Barcelona, Spain
| | - Maribel Diaz-Ricart
- Hematopathology, Centre Diagnòstic Biomèdic (CDB), Hospital Clinic, Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), University of Barcelona, Barcelona, Spain
- Barcelona Endothelium Team (BET), Barcelona, Spain
| | - Eduard Gratacos
- BCNatal | Fetal Medicine Research Center (Hospital Clínic and Hospital Sant Joan de Déu), Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), University of Barcelona, Barcelona, Spain.
- Centre for Biomedical Research on Rare Diseases (CIBER-ER), Madrid, Spain.
- Department of Maternal-Fetal Medicine (ICGON), Hospital Clínic, Sabino de Arana 1, 08028, Barcelona, Spain.
| | - Fatima Crispi
- BCNatal | Fetal Medicine Research Center (Hospital Clínic and Hospital Sant Joan de Déu), Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), University of Barcelona, Barcelona, Spain
- Centre for Biomedical Research on Rare Diseases (CIBER-ER), Madrid, Spain
| |
Collapse
|
41
|
Searching protein space for ancient sub-domain segments. Curr Opin Struct Biol 2021; 68:105-112. [PMID: 33476896 DOI: 10.1016/j.sbi.2020.11.006] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2020] [Accepted: 11/29/2020] [Indexed: 01/08/2023]
Abstract
Evolutionary processes that formed the current protein universe left their traces, among them homologous segments that recur, or are 'reused,' in multiple proteins. These reused segments, called 'themes,' can be found at various scales, the best known of which is the domain. Yet, recent studies have begun to focus on the evolutionary insights that can be derived from sub-domain-scale themes, which are candidates for traces of more ancient events. Characterizing these may provide clues to the emergence of domains. Particularly interesting are themes that are reused across dissimilar contexts, that is, where the rest of the protein domain differs. We survey computational studies identifying reused themes within different contexts at the sub-domain level.
Collapse
|
42
|
Feiler CG, Weiss MS, Blankenfeldt W. The hypothetical periplasmic protein PA1624 from Pseudomonas aeruginosa folds into a unique two-domain structure. Acta Crystallogr F Struct Biol Commun 2020; 76:609-615. [PMID: 33263573 PMCID: PMC7716261 DOI: 10.1107/s2053230x20014612] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2020] [Accepted: 11/04/2020] [Indexed: 12/02/2022] Open
Abstract
The crystal structure of the 268-residue periplasmic protein PA1624 from the opportunistic pathogen Pseudomonas aeruginosa PAO1 was determined to high resolution using the Se-SAD method for initial phasing. The protein was found to be monomeric and the structure consists of two domains, domains 1 and 2, comprising residues 24-184 and 185-268, respectively. The fold of these domains could not be predicted even using state-of-the-art prediction methods, and similarity searches revealed only a very distant homology to known structures, namely to Mog1p/PsbP-like and OmpA-like proteins for the N- and C-terminal domains, respectively. Since PA1624 is only present in an important human pathogen, its unique structure and periplasmic location render it a potential drug target. Consequently, the results presented here may open new avenues for the discovery and design of antibacterial drugs.
Collapse
Affiliation(s)
- Christian G. Feiler
- Macromolecular Crystallography (HZB-MX), Helmholtz-Zentrum Berlin, Albert-Einstein-Strasse 15, D-12489 Berlin, Germany
- Structure and Function of Proteins, Helmholtz Centre for Infection Research, Inhoffenstrasse 7, D-389124 Braunschweig, Germany
| | - Manfred S. Weiss
- Macromolecular Crystallography (HZB-MX), Helmholtz-Zentrum Berlin, Albert-Einstein-Strasse 15, D-12489 Berlin, Germany
| | - Wulf Blankenfeldt
- Structure and Function of Proteins, Helmholtz Centre for Infection Research, Inhoffenstrasse 7, D-389124 Braunschweig, Germany
- Institute for Biochemistry, Biotechnology and Bioinformatics, Technische Universität Braunschweig, Spielmannstrasse 7, D-38106 Braunschweig, Germany
| |
Collapse
|
43
|
We need to keep a reproducible trace of facts, predictions, and hypotheses from gene to function in the era of big data. PLoS Biol 2020; 18:e3000999. [PMID: 33253151 PMCID: PMC7728211 DOI: 10.1371/journal.pbio.3000999] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Revised: 12/10/2020] [Indexed: 01/18/2023] Open
Abstract
How do we scale biological science to the demand of next generation biology and medicine to keep track of the facts, predictions, and hypotheses? These days, enormous amounts of DNA sequence and other omics data are generated. Since these data contain the blueprint for life, it is imperative that we interpret it accurately. The abundance of DNA is only one part of the challenge. Artificial Intelligence (AI) and network methods routinely build on large screens, single cell technologies, proteomics, and other modalities to infer or predict biological functions and phenotypes associated with proteins, pathways, and organisms. As a first step, how do we systematically trace the provenance of knowledge from experimental ground truth to gene function predictions and annotations? Here, we review the main challenges in tracking the evolution of biological knowledge and propose several specific solutions to provenance and computational tracing of evidence in functional linkage networks.
Collapse
|
44
|
Hedman AM, Lundholm C, Andolf E, Pershagen G, Fall T, Almqvist C. Longitudinal plasma inflammatory proteome profiling during pregnancy in the Born into Life study. Sci Rep 2020; 10:17819. [PMID: 33082373 PMCID: PMC7575597 DOI: 10.1038/s41598-020-74722-5] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2020] [Accepted: 10/05/2020] [Indexed: 02/07/2023] Open
Abstract
The maternal immune system is going through considerable changes during pregnancy. However, little is known about the determinants of the inflammatory proteome and its relation to pregnancy stages. Our aim was to investigate the plasma inflammatory proteome before, during and after pregnancy. In addition we wanted to test whether maternal and child outcomes were associated with the proteome. A cohort of 94 healthy women, enrolled in a longitudinal study with assessments at up to five time points around pregnancy, ninety-two inflammatory proteins were analysed in plasma with a multiplex Proximity Extension Assay. First, principal components analysis were applied and thereafter regression modelling while correcting for multiple testing. We found profound shifts in the overall inflammatory proteome associated with pregnancy stage after multiple testing (p < .001). Moreover, maternal body mass index (BMI) was associated with inflammatory proteome primarily driven by VEGFA, CCL3 and CSF-1 (p < .05). The levels of most inflammatory proteins changed substantially during pregnancy and some of these were related to biological processes such as regulation of immune response. Maternal BMI was significantly associated with higher levels of three inflammation proteins calling for more research in the interplay between pregnancy, inflammation and BMI.
Collapse
Affiliation(s)
- Anna M Hedman
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, PO Box 281, 171 77, Stockholm, Sweden.
| | - Cecilia Lundholm
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, PO Box 281, 171 77, Stockholm, Sweden
| | - Ellika Andolf
- Department of Clinical Sciences, Danderyd Hospital, Stockholm, Sweden
| | - Göran Pershagen
- Institute of Environmental Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Tove Fall
- Department of Medical Sciences, Molecular Epidemiology and Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Catarina Almqvist
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, PO Box 281, 171 77, Stockholm, Sweden.,Pediatric Allergy and Pulmonology Unit, Astrid Lindgren Children's Hospital, Karolinska University Hospital, Stockholm, Sweden
| |
Collapse
|
45
|
Decomposing Structural Response Due to Sequence Changes in Protein Domains with Machine Learning. J Mol Biol 2020; 432:4435-4446. [PMID: 32485208 DOI: 10.1016/j.jmb.2020.05.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2020] [Revised: 05/06/2020] [Accepted: 05/27/2020] [Indexed: 10/24/2022]
Abstract
How protein domain structure changes in response to mutations is not well understood. Some mutations change the structure drastically, while most only result in small changes. To gain an understanding of this, we decompose the relationship between changes in domain sequence and structure using machine learning. We select pairs of evolutionarily related domains with a broad range of evolutionary distances. In contrast to earlier studies, we do not find a strictly linear relationship between sequence and structural changes. We train a random forest regressor that predicts the structural similarity between pairs with an average accuracy of 0.029 lDDT ( local Distance Difference Test) score, and a correlation coefficient of 0.92. Decomposing the feature importance shows that the domain length, or analogously, size is the most important feature. Our model enables assessing deviations in relative structural response, and thus prediction of evolutionary trajectories, in protein domains across evolution.
Collapse
|
46
|
Russo G, Reche P, Pennisi M, Pappalardo F. The combination of artificial intelligence and systems biology for intelligent vaccine design. Expert Opin Drug Discov 2020; 15:1267-1281. [PMID: 32662677 DOI: 10.1080/17460441.2020.1791076] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
INTRODUCTION A new body of evidence depicts the applications of artificial intelligence and systems biology in vaccine design and development. The combination of both approaches shall revolutionize healthcare, accelerating clinical trial processes and reducing the costs and time involved in drug research and development. AREAS COVERED This review explores the basics of artificial intelligence and systems biology approaches in the vaccine development pipeline. The topics include a detailed description of epitope prediction tools for designing epitope-based vaccines and agent-based models for immune system response prediction, along with a focus on their potentiality to facilitate clinical trial phases. EXPERT OPINION Artificial intelligence and systems biology offer the opportunity to avoid the inefficiencies and failures that arise in the classical vaccine development pipeline. One promising solution is the combination of both methodologies in a multiscale perspective through an accurate pipeline. We are entering an 'in silico era' in which scientific partnerships, including a more and more increasing creation of an 'ecosystem' of collaboration and multidisciplinary approach, are relevant for addressing the long and risky road of vaccine discovery and development. In this context, regulatory guidance should be developed to qualify the in silico trials as evidence for intelligent vaccine development.
Collapse
Affiliation(s)
- Giulia Russo
- Department of Drug Sciences, University of Catania , Catania, Italy
| | - Pedro Reche
- Department of Immunology, Universidad Complutense De Madrid, Ciudad Universitaria , Madrid, Spain
| | - Marzio Pennisi
- Computer Science Institute, DiSIT, University of Eastern Piedmont , Italy
| | | |
Collapse
|
47
|
Structural Modeling and Ligand-Binding Prediction for Analysis of Structure-Unknown and Function-Unknown Proteins Using FORTE Alignment and PoSSuM Pocket Search. Methods Mol Biol 2020. [PMID: 32621216 DOI: 10.1007/978-1-0716-0708-4_1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2023]
Abstract
Structural data of biomolecules, such as those of proteins and nucleic acids, provide much information for estimation of their functions. For structure-unknown proteins, structure information is obtainable by modeling their structures based on sequence similarity of proteins. Moreover, information related to ligands or ligand-binding sites is necessary to elucidate protein functions because the binding of ligands can engender not only the activation and inactivation of the proteins but also the modification of protein functions. This chapter presents methods using our profile-profile alignment server FORTE and the PoSSuM ligand-binding site database for prediction of the structure and potential ligand-binding sites of structure-unknown and function-unknown proteins, aimed at protein function prediction.
Collapse
|
48
|
Correa Marrero M, Immink RGH, de Ridder D, van Dijk ADJ. Improved inference of intermolecular contacts through protein-protein interaction prediction using coevolutionary analysis. Bioinformatics 2020; 35:2036-2042. [PMID: 30398547 DOI: 10.1093/bioinformatics/bty924] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2018] [Revised: 10/11/2018] [Accepted: 11/05/2018] [Indexed: 01/09/2023] Open
Abstract
MOTIVATION Predicting residue-residue contacts between interacting proteins is an important problem in bioinformatics. The growing wealth of sequence data can be used to infer these contacts through correlated mutation analysis on multiple sequence alignments of interacting homologs of the proteins of interest. This requires correct identification of pairs of interacting proteins for many species, in order to avoid introducing noise (i.e. non-interacting sequences) in the analysis that will decrease predictive performance. RESULTS We have designed Ouroboros, a novel algorithm to reduce such noise in intermolecular contact prediction. Our method iterates between weighting proteins according to how likely they are to interact based on the correlated mutations signal, and predicting correlated mutations based on the weighted sequence alignment. We show that this approach accurately discriminates between protein interaction versus non-interaction and simultaneously improves the prediction of intermolecular contact residues compared to a naive application of correlated mutation analysis. This requires no training labels concerning interactions or contacts. Furthermore, the method relaxes the assumption of one-to-one interaction of previous approaches, allowing for the study of many-to-many interactions. AVAILABILITY AND IMPLEMENTATION Source code and test data are available at www.bif.wur.nl/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Richard G H Immink
- Laboratory of Molecular Biology, Department of Plant Sciences.,Bioscience, Wageningen Plant Research
| | | | - Aalt D J van Dijk
- Bioinformatics Group, Department of Plant Sciences.,Bioscience, Wageningen Plant Research.,Biometris, Department of Plant Sciences, Wageningen University & Research, Wageningen PB, The Netherlands
| |
Collapse
|
49
|
Chandonia JM, Fox NK, Brenner SE. SCOPe: classification of large macromolecular structures in the structural classification of proteins-extended database. Nucleic Acids Res 2020; 47:D475-D481. [PMID: 30500919 PMCID: PMC6323910 DOI: 10.1093/nar/gky1134] [Citation(s) in RCA: 81] [Impact Index Per Article: 20.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2018] [Accepted: 11/27/2018] [Indexed: 11/12/2022] Open
Abstract
The SCOPe (Structural Classification of Proteins—extended, https://scop.berkeley.edu) database hierarchically classifies domains from the majority of proteins of known structure according to their structural and evolutionary relationships. SCOPe also incorporates and updates the ASTRAL compendium, which provides multiple databases and tools to aid in the analysis of the sequences and structures of proteins classified in SCOPe. Protein structures are classified using a combination of manual curation and highly precise automated methods. In the current release of SCOPe, 2.07, we have focused our manual curation efforts on larger protein structures, including the spliceosome, proteasome and RNA polymerase I, as well as many other Pfam families that had not previously been classified. Domains from these large protein complexes are distinctive in several ways: novel non-globular folds are more common, and domains from previously observed protein families often have N- or C-terminal extensions that were disordered or not present in previous structures. The current monthly release update, SCOPe 2.07–2018-10–18, classifies 90 992 PDB entries (about two thirds of PDB entries).
Collapse
Affiliation(s)
- John-Marc Chandonia
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA.,Molecular Biophysics and Integrated Bioimaging Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Naomi K Fox
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA.,Molecular Biophysics and Integrated Bioimaging Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Steven E Brenner
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA.,Department of Plant and Microbial Biology, University of California, Berkeley, CA 94720, USA
| |
Collapse
|
50
|
Zaman AB, Kamranfar P, Domeniconi C, Shehu A. Reducing Ensembles of Protein Tertiary Structures Generated De Novo via Clustering. Molecules 2020; 25:E2228. [PMID: 32397410 PMCID: PMC7248879 DOI: 10.3390/molecules25092228] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2020] [Revised: 04/21/2020] [Accepted: 04/28/2020] [Indexed: 11/16/2022] Open
Abstract
Controlling the quality of tertiary structures computed for a protein molecule remains a central challenge in de-novo protein structure prediction. The rule of thumb is to generate as many structures as can be afforded, effectively acknowledging that having more structures increases the likelihood that some will reside near the sought biologically-active structure. A major drawback with this approach is that computing a large number of structures imposes time and space costs. In this paper, we propose a novel clustering-based approach which we demonstrate to significantly reduce an ensemble of generated structures without sacrificing quality. Evaluations are related on both benchmark and CASP target proteins. Structure ensembles subjected to the proposed approach and the source code of the proposed approach are publicly-available at the links provided in Section 1.
Collapse
Affiliation(s)
- Ahmed Bin Zaman
- Department of Computer Science, George Mason University, Fairfax, VA 22030, USA; (A.B.Z.); (P.K.)
| | - Parastoo Kamranfar
- Department of Computer Science, George Mason University, Fairfax, VA 22030, USA; (A.B.Z.); (P.K.)
| | - Carlotta Domeniconi
- Department of Computer Science, George Mason University, Fairfax, VA 22030, USA; (A.B.Z.); (P.K.)
| | - Amarda Shehu
- Department of Computer Science, George Mason University, Fairfax, VA 22030, USA; (A.B.Z.); (P.K.)
- Center for Advancing Human-Machine Partnerships, George Mason University, Fairfax, VA 22030, USA
- Department of Bioengineering, George Mason University, Fairfax, VA 22030, USA
- School of Systems Biology, George Mason University, Fairfax, VA 22030, USA
| |
Collapse
|