1
|
Yang P, Ning K. How much metagenome data is needed for protein structure prediction: The advantages of targeted approach from the ecological and evolutionary perspectives. IMETA 2022; 1:e9. [PMID: 38867727 PMCID: PMC10989767 DOI: 10.1002/imt2.9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Revised: 12/23/2021] [Accepted: 01/04/2022] [Indexed: 06/14/2024]
Abstract
It has been proven that three-dimensional protein structures could be modeled by supplementing homologous sequences with metagenome sequences. Even though a large volume of metagenome data is utilized for such purposes, a significant proportion of proteins remain unsolved. In this review, we focus on identifying ecological and evolutionary patterns in metagenome data, decoding the complicated relationships of these patterns with protein structures, and investigating how these patterns can be effectively used to improve protein structure prediction. First, we proposed the metagenome utilization efficiency and marginal effect model to quantify the divergent distribution of homologous sequences for the protein family. Second, we proposed that the targeted approach effectively identifies homologous sequences from specified biomes compared with the untargeted approach's blind search. Finally, we determined the lower bound for metagenome data required for predicting all the protein structures in the Pfam database and showed that the present metagenome data is insufficient for this purpose. In summary, we discovered ecological and evolutionary patterns in the metagenome data that may be used to predict protein structures effectively. The targeted approach is promising in terms of effectively extracting homologous sequences and predicting protein structures using these patterns.
Collapse
Affiliation(s)
- Pengshuo Yang
- Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular‐Imaging, Department of Bioinformatics and Systems BiologyCenter of AI Biology, College of Life Science and Technology, Huazhong University of Science and TechnologyWuhanHubeiChina
| | - Kang Ning
- Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular‐Imaging, Department of Bioinformatics and Systems BiologyCenter of AI Biology, College of Life Science and Technology, Huazhong University of Science and TechnologyWuhanHubeiChina
| |
Collapse
|
2
|
Numan M, Bukhari SA, Rehman MU, Mustafa G, Sadia B. Phylogenetic analyses, protein modeling and active site prediction of two pathogenesis related (PR2 and PR3) genes from bread wheat. PLoS One 2021; 16:e0257392. [PMID: 34506613 PMCID: PMC8432781 DOI: 10.1371/journal.pone.0257392] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2021] [Accepted: 08/31/2021] [Indexed: 11/18/2022] Open
Abstract
Wheat is a major staple food and has been extensively grown around the globe. Sessile nature of plants has exposed them to a lot of biotic and abiotic stresses including fungal pathogen attack. Puccinia graminis f.sp. tritici causes stem rust in the wheat crop and leads to 70% decrease in its production. Pathogenesis-related (PR) proteins provide plants with defense against different fungal pathogens as these proteins have antifungal activities. This study was designed to screen Pakistani wheat varieties for PR2 and PR3 proteins and their in silico characterization. PR2 and PR3 genes were screened and isolated by PCR amplification from wheat variety Chenab-70 and Frontana, respectively. The nucleotide sequences of PR2 and PR3 genes were deposited in GenBank with accession numbers MT303867 and MZ766118, respectively. Physicochemical properties, secondary and tertiary structure predictions, and molecular docking of protein sequences of PR2 and PR3 were performed using different bioinformatics tools and software. PR2 and PR3 genes were identified to encode β-1,3-glucanase and chitinase proteins, respectively. Molecular docking of both PR2 and PR3 proteins with beta-glucan and chitin (i.e. their respective ligands) showed crucial amino acid residues involved in molecular interactions. Conclusively, molecular docking analysis of β-1,3-glucanase and chitinase proteins revealed crucial amino acid residues which are involved in ligand binding and important interactions which might have important role in plant defense against fungal pathogens. Moreover, the active residues in the active sties of these proteins can be identified through mutational studies and resulting information might help understanding how these proteins are involved in plant defense mechanisms.
Collapse
Affiliation(s)
- Muhammad Numan
- Department of Biochemistry, Government College University, Faisalabad, Pakistan
| | - Shazia Anwer Bukhari
- Department of Biochemistry, Government College University, Faisalabad, Pakistan
- * E-mail:
| | - Mahmood-ur- Rehman
- Department of Bioinformatics and Biotechnology, Government College University, Faisalabad, Pakistan
| | - Ghulam Mustafa
- Department of Biochemistry, Government College University, Faisalabad, Pakistan
| | - Bushra Sadia
- Centre of Agricultural Biochemistry and Biotechnology (CABB), University of Agriculture, Faisalabad, Pakistan
| |
Collapse
|
3
|
Pasipoularides A. Implementing genome-driven personalized cardiology in clinical practice. J Mol Cell Cardiol 2018; 115:142-157. [PMID: 29343412 PMCID: PMC5820118 DOI: 10.1016/j.yjmcc.2018.01.008] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/16/2017] [Revised: 01/04/2018] [Accepted: 01/12/2018] [Indexed: 12/18/2022]
Abstract
Genomics designates the coordinated investigation of a large number of genes in the context of a biological process or disease. It may be long before we attain comprehensive understanding of the genomics of common complex cardiovascular diseases (CVDs) such as inherited cardiomyopathies, valvular diseases, primary arrhythmogenic conditions, congenital heart syndromes, hypercholesterolemia and atherosclerotic heart disease, hypertensive syndromes, and heart failure with preserved/reduced ejection fraction. Nonetheless, as genomics is evolving rapidly, it is constructive to survey now pertinent concepts and breakthroughs. Today, clinical multimodal electronic medical/health records (EMRs/EHRs) incorporating genomic information establish a continuously-learning, vast knowledge-network with seamless cycling between clinical application and research. It can inform insights into specific pathogenetic pathways, guide biomarker-assisted precise diagnoses and individualized treatments, and stratify prognoses. Complex CVDs blend multiple interacting genomic variants, epigenetics, and environmental risk-factors, engendering progressions of multifaceted disease-manifestations, including clinical symptoms and signs. There is no straight-line linkage between genetic cause(s) or causal gene-variant(s) and disease phenotype(s). Because of interactions involving modifier-gene influences, (micro)-environmental, and epigenetic effects, the same variant may actually produce dissimilar abnormalities in different individuals. Implementing genome-driven personalized cardiology in clinical practice reveals that the study of CVDs at the level of molecules and cells can yield crucial clinical benefits. Complementing evidence-based medicine guidelines from large ("one-size fits all") randomized controlled trials, genomics-based personalized or precision cardiology is a most-creditable paradigm: It provides customizable approaches to prevent, diagnose, and manage CVDs with treatments directly/precisely aimed at causal defects identified by high-throughput genomic technologies. They encompass stem cell and gene therapies exploiting CRISPR-Cas9-gene-editing, and metabolomic-pharmacogenomic therapeutic modalities, precisely fine-tuned for the individual patient. Following the Human Genome Project, many expected genomics technology to provide imminent solutions to intractable medical problems, including CVDs. This eagerness has reaped some disappointment that advances have not yet materialized to the degree anticipated. Undoubtedly, personalized genetic/genomics testing is an emergent technology that should not be applied without supplementary phenotypic/clinical information: Genotype≠Phenotype. However, forthcoming advances in genomics will naturally build on prior attainments and, combined with insights into relevant epigenetics and environmental factors, can plausibly eradicate intractable CVDs, improving human health and well-being.
Collapse
Affiliation(s)
- Ares Pasipoularides
- Consulting Professor of Surgery, Emeritus Faculty of Surgery and of Biomedical Engineering, Duke University School of Medicine and Graduate School, Durham, NC 27710, USA.
| |
Collapse
|
4
|
Goswami A, Liu X, Cai W, Wyche TP, Bugni TS, Meurillon M, Peyrottes S, Perigaud C, Nonaka K, Rohr J, Van Lanen SG. Evidence that oxidative dephosphorylation by the nonheme Fe(II), α-ketoglutarate:UMP oxygenase occurs by stereospecific hydroxylation. FEBS Lett 2017; 591:468-478. [PMID: 28074470 DOI: 10.1002/1873-3468.12554] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2016] [Revised: 12/23/2016] [Accepted: 12/25/2016] [Indexed: 11/08/2022]
Abstract
LipL and Cpr19 are nonheme, mononuclear Fe(II)-dependent, α-ketoglutarate (αKG):UMP oxygenases that catalyze the formation of CO2 , succinate, phosphate, and uridine-5'-aldehyde, the last of which is a biosynthetic precursor for several nucleoside antibiotics that inhibit bacterial translocase I (MraY). To better understand the chemistry underlying this unusual oxidative dephosphorylation and establish a mechanistic framework for LipL and Cpr19, we report herein the synthesis of two biochemical probes-[1',3',4',5',5'-2 H]UMP and the phosphonate derivative of UMP-and their activity with both enzymes. The results are consistent with a reaction coordinate that proceeds through the loss of one 2 H atom of [1',3',4',5',5'-2 H]UMP and stereospecific hydroxylation geminal to the phosphoester to form a cryptic intermediate, (5'R)-5'-hydroxy-UMP. Thus, these enzyme catalysts can additionally be assigned as UMP hydroxylase-phospholyases.
Collapse
Affiliation(s)
- Anwesha Goswami
- Department of Pharmaceutical Sciences, University of Kentucky, Lexington, KY, USA
| | - Xiaodong Liu
- Department of Pharmaceutical Sciences, University of Kentucky, Lexington, KY, USA
| | - Wenlong Cai
- Department of Pharmaceutical Sciences, University of Kentucky, Lexington, KY, USA
| | - Thomas P Wyche
- Department of Pharmaceutical Sciences, University of Wisconsin-Madison, WI, USA
| | - Tim S Bugni
- Department of Pharmaceutical Sciences, University of Wisconsin-Madison, WI, USA
| | - Maïa Meurillon
- Nucleosides and Phosphorylated Effectors Team, IBMM, UMR5247 CNRS University Montpellier, France
| | - Suzanne Peyrottes
- Nucleosides and Phosphorylated Effectors Team, IBMM, UMR5247 CNRS University Montpellier, France
| | - Christian Perigaud
- Nucleosides and Phosphorylated Effectors Team, IBMM, UMR5247 CNRS University Montpellier, France
| | - Koichi Nonaka
- Biologics Technology Research Laboratories, R&D Division, Daiichi Sankyo Co., Ltd., Gunma, Japan
| | - Jürgen Rohr
- Department of Pharmaceutical Sciences, University of Kentucky, Lexington, KY, USA
| | - Steven G Van Lanen
- Department of Pharmaceutical Sciences, University of Kentucky, Lexington, KY, USA
| |
Collapse
|
5
|
Deka H, Sarmah R, Sharma A, Biswas S. Modelling and Characterization of Glial Fibrillary Acidic Protein. Bioinformation 2015; 11:393-400. [PMID: 26420920 PMCID: PMC4574122 DOI: 10.6026/97320630011393] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2015] [Revised: 07/30/2015] [Accepted: 08/01/2015] [Indexed: 12/20/2022] Open
Abstract
Glial Fibrillary Acidic Protein (GFAP) is an intermediate-filament (IF) protein that maintains the astrocytes of the Central Nervous System in Human. This is differentially expressed during serological studies in inflamed condition such as Rheumatoid Arthritis (RA). Therefore, it is of interest to glean molecular insight using a model of GFAP (49.88 kDa) due to its crystallographic nonavailability. The present study has been taken into consideration to construct computational protein model using Modeller 9.11. The structural relevance of the protein was verified using Gromacs 4.5 followed by validation through PROCHECK, Verify 3D, WHAT-IF, ERRAT and PROVE for reliability. The constructed three dimensional (3D) model of GFAP protein had been scrutinized to reveal the associated functions by identifying ligand binding sites and active sites. Molecular level interaction study revealed five possible surface cavities as active sites. The model finds application in further computational analysis towards drug discovery in order to minimize the effect of inflammation.
Collapse
Affiliation(s)
- Hemchandra Deka
- CSIR- Institute of Genomics and Integrative Biology, Mall Road, Delhi, India ; Centre for Bioinformatics Studies, Dibrugarh University, Assam, India
| | - Rajeev Sarmah
- Centre for Bioinformatics Studies, Dibrugarh University, Assam, India
| | - Ankita Sharma
- CSIR- Institute of Genomics and Integrative Biology, Mall Road, Delhi, India
| | - Sagarika Biswas
- CSIR- Institute of Genomics and Integrative Biology, Mall Road, Delhi, India
| |
Collapse
|
6
|
Kister A. Amino acid distribution rules predict protein fold: protein grammar for beta-strand sandwich-like structures. Biomolecules 2015; 5:41-59. [PMID: 25625198 PMCID: PMC4384110 DOI: 10.3390/biom5010041] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2014] [Accepted: 12/31/2014] [Indexed: 11/16/2022] Open
Abstract
We present an alternative approach to protein 3D folding prediction based on determination of rules that specify distribution of "favorable" residues, that are mainly responsible for a given fold formation, and "unfavorable" residues, that are incompatible with that fold, in polypeptide sequences. The process of determining favorable and unfavorable residues is iterative. The starting assumptions are based on the general principles of protein structure formation as well as structural features peculiar to a protein fold under investigation. The initial assumptions are tested one-by-one for a set of all known proteins with a given structure. The assumption is accepted as a "rule of amino acid distribution" for the protein fold if it holds true for all, or near all, structures. If the assumption is not accepted as a rule, it can be modified to better fit the data and then tested again in the next step of the iterative search algorithm, or rejected. We determined the set of amino acid distribution rules for a large group of beta sandwich-like proteins characterized by a specific arrangement of strands in two beta sheets. It was shown that this set of rules is highly sensitive (~90%) and very specific (~99%) for identifying sequences of proteins with specified beta sandwich fold structure. The advantage of the proposed approach is that it does not require that query proteins have a high degree of homology to proteins with known structure. So long as the query protein satisfies residue distribution rules, it can be confidently assigned to its respective protein fold. Another advantage of our approach is that it allows for a better understanding of which residues play an essential role in protein fold formation. It may, therefore, facilitate rational protein engineering design.
Collapse
Affiliation(s)
- Alexander Kister
- Department of Mathematics, Rutgers University, Piscataway, NJ 08854, USA.
| |
Collapse
|
7
|
Bushey DF, Bannon GA, Delaney BF, Graser G, Hefford M, Jiang X, Lee TC, Madduri KM, Pariza M, Privalle LS, Ranjan R, Saab-Rincon G, Schafer BW, Thelen JJ, Zhang JX, Harper MS. Characteristics and safety assessment of intractable proteins in genetically modified crops. Regul Toxicol Pharmacol 2014; 69:154-70. [DOI: 10.1016/j.yrtph.2014.03.003] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2013] [Revised: 03/07/2014] [Accepted: 03/15/2014] [Indexed: 10/25/2022]
|
8
|
Abbasi E, Ghatee M, Shiri M. FRAN and RBF-PSO as two components of a hyper framework to recognize protein folds. Comput Biol Med 2013; 43:1182-91. [DOI: 10.1016/j.compbiomed.2013.05.017] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2012] [Revised: 05/21/2013] [Accepted: 05/22/2013] [Indexed: 10/26/2022]
|
9
|
Dudkiewicz M, Szczepińska T, Grynberg M, Pawłowski K. A novel protein kinase-like domain in a selenoprotein, widespread in the tree of life. PLoS One 2012; 7:e32138. [PMID: 22359664 PMCID: PMC3281104 DOI: 10.1371/journal.pone.0032138] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2011] [Accepted: 01/24/2012] [Indexed: 12/21/2022] Open
Abstract
Selenoproteins serve important functions in many organisms, usually providing essential oxidoreductase enzymatic activity, often for defense against toxic xenobiotic substances. Most eukaryotic genomes possess a small number of these proteins, usually not more than 20. Selenoproteins belong to various structural classes, often related to oxidoreductase function, yet a few of them are completely uncharacterised. Here, the structural and functional prediction for the uncharacterised selenoprotein O (SELO) is presented. Using bioinformatics tools, we predict that SELO protein adopts a three-dimensional fold similar to protein kinases. Furthermore, we argue that despite the lack of conservation of the “classic” catalytic aspartate residue of the archetypical His-Arg-Asp motif, SELO kinases might have retained catalytic phosphotransferase activity, albeit with an atypical active site. Lastly, the role of the selenocysteine residue is considered and the possibility of an oxidoreductase-regulated kinase function for SELO is discussed. The novel kinase prediction is discussed in the context of functional data on SELO orthologues in model organisms, FMP40 a.k.a.YPL222W (yeast), and ydiU (bacteria). Expression data from bacteria and yeast suggest a role in oxidative stress response. Analysis of genomic neighbourhoods of SELO homologues in the three domains of life points toward a role in regulation of ABC transport, in oxidative stress response, or in basic metabolism regulation. Among bacteria possessing SELO homologues, there is a significant over-representation of aquatic organisms, also of aerobic ones. The selenocysteine residue in SELO proteins occurs only in few members of this protein family, including proteins from Metazoa, and few small eukaryotes (Ostreococcus, stramenopiles). It is also demonstrated that enterobacterial mchC proteins involved in maturation of bactericidal antibiotics, microcins, form a distant subfamily of the SELO proteins. The new protein structural domain, with a putative kinase function assigned, expands the known kinome and deserves experimental determination of its biological role within the cell-signaling network.
Collapse
Affiliation(s)
| | - Teresa Szczepińska
- Nencki Institute of Experimental Biology, Polish Academy of Sciences, Warsaw, Poland
| | - Marcin Grynberg
- Institute of Biochemistry and Biophysics, Polish Academy of Sciences, Warsaw, Poland
| | - Krzysztof Pawłowski
- Nencki Institute of Experimental Biology, Polish Academy of Sciences, Warsaw, Poland
- Warsaw University of Life Sciences, Warsaw, Poland
- * E-mail:
| |
Collapse
|
10
|
Abstract
Loop modeling is crucial for high-quality homology model construction outside conserved secondary structure elements. Dozens of loop modeling protocols involving a range of database and ab initio search algorithms and a variety of scoring functions have been proposed. Knowledge-based loop modeling methods are very fast and some can successfully and reliably predict loops up to about eight residues long. Several recent ab initio loop simulation methods can be used to construct accurate models of loops up to 12-13 residues long, albeit at a substantial computational cost. Major current challenges are the simulations of loops longer than 12-13 residues, the modeling of multiple interacting flexible loops, and the sensitivity of the loop predictions to the accuracy of the loop environment.
Collapse
|
11
|
McLean LR, Zhang Y, Li H, Choi YM, Han Z, Vaz RJ, Li Y. Fragment screening of inhibitors for MIF tautomerase reveals a cryptic surface binding site. Bioorg Med Chem Lett 2010; 20:1821-4. [PMID: 20185308 DOI: 10.1016/j.bmcl.2010.02.009] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2009] [Revised: 02/01/2010] [Accepted: 02/02/2010] [Indexed: 10/19/2022]
Abstract
In the course of a fragment screening campaign by in silico docking followed by X-ray crystallography, a novel binding site for migration inhibitory factor (MIF) inhibitors was demonstrated. The site is formed by rotation of the side-chain of Tyr-36 to reveal a surface binding site in MIF that is hydrophobic and surrounded by aromatic side-chain residues. The crystal structures of two small inhibitors that bind to this site and of a quinolinone inhibitor, that spans the canonical deep pocket near Pro-1 and the new surface binding site, have been solved. These results suggest new opportunities for structure-based design of MIF inhibitors.
Collapse
Affiliation(s)
- Larry R McLean
- Discovery Research, Sanofi-aventis, Bridgewater, NJ 08807, USA.
| | | | | | | | | | | | | |
Collapse
|