1
|
Kundu S. Fe(2)OG: an integrated HMM profile-based web server to predict and analyze putative non-haem iron(II)- and 2-oxoglutarate-dependent dioxygenase function in protein sequences. BMC Res Notes 2021; 14:80. [PMID: 33648553 PMCID: PMC7923460 DOI: 10.1186/s13104-021-05477-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2020] [Accepted: 02/03/2021] [Indexed: 12/16/2022] Open
Abstract
Objective Non-haem iron(II)- and 2-oxoglutarate-dependent dioxygenases (i2OGdd), are a taxonomically and functionally diverse group of enzymes. The active site comprises ferrous iron in a hexa-coordinated distorted octahedron with the apoenzyme, 2-oxoglutarate and a displaceable water molecule. Current information on novel i2OGdd members is sparse and relies on computationally-derived annotation schema. The dissimilar amino acid composition and variable active site geometry thereof, results in differing reaction chemistries amongst i2OGdd members. An additional need of researchers is a curated list of sequences with putative i2OGdd function which can be probed further for empirical data. Results This work reports the implementation of \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$Fe\left(2\right)OG$$\end{document}Fe2OG, a web server with dual functionality and an extension of previous work on i2OGdd enzymes \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\left(Fe\left(2\right)OG\equiv \{H2OGpred,DB2OG\}\right)$$\end{document}Fe2OG≡{H2OGpred,DB2OG}. \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$Fe\left(2\right)OG$$\end{document}Fe2OG, in this form is completely revised, updated (URL, scripts, repository) and will strengthen the knowledge base of investigators on i2OGdd biochemistry and function. \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$Fe\left(2\right)OG$$\end{document}Fe2OG, utilizes the superior predictive propensity of HMM-profiles of laboratory validated i2OGdd members to predict probable active site geometries in user-defined protein sequences. \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$Fe\left(2\right)OG$$\end{document}Fe2OG, also provides researchers with a pre-compiled list of analyzed and searchable i2OGdd-like sequences, many of which may be clinically relevant. \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$Fe(2)OG$$\end{document}Fe(2)OG, is freely available (http://204.152.217.16/Fe2OG.html) and supersedes all previous versions, i.e., H2OGpred, DB2OG.
Collapse
Affiliation(s)
- Siddhartha Kundu
- Department of Biochemistry, All India Institute of Medical Sciences, Ansari Nagar, New Delhi, 110029, India.
| |
Collapse
|
2
|
Origin, evolution and functional characterization of the land plant glycoside hydrolase subfamily GH5_11. Mol Phylogenet Evol 2019; 138:205-218. [DOI: 10.1016/j.ympev.2019.05.031] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2019] [Revised: 05/20/2019] [Accepted: 05/23/2019] [Indexed: 01/20/2023]
|
3
|
Kundu S. Insights into the mechanism(s) of digestion of crystalline cellulose by plant class C GH9 endoglucanases. J Mol Model 2019; 25:240. [PMID: 31338614 PMCID: PMC7385011 DOI: 10.1007/s00894-019-4133-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2018] [Accepted: 07/11/2019] [Indexed: 02/03/2023]
Abstract
Biofuels such as γ-valerolactone, bioethanol, and biodiesel are derived from potentially fermentable cellulose and vegetable oils. Plant class C GH9 endoglucanases are CBM49-encompassing hydrolases that cleave the β (1 → 4) glycosidic linkage of contiguous D-glucopyranose residues of crystalline cellulose. Here, I analyse 3D-homology models of characterised and putative class C enzymes to glean insights into the contribution of the GH9, linker, and CBM49 to the mechanism(s) of crystalline cellulose digestion. Crystalline cellulose may be accommodated in a surface groove which is imperfectly bounded by the GH9_CBM49, GH9_linker, and linker_CBM49 surfaces and thence digested in a solvent accessible subsurface cavity. The physical dimensions and distortions thereof, of the groove, are mediated in part by the bulky side chains of aromatic amino acids that comprise it and may also result in a strained geometry of the bound cellulose polymer. These data along with an almost complete absence of measurable cavities, along with poorly conserved, hydrophobic, and heterogeneous amino acid composition, increased atomic motion of the CBM49_linker junction, and docking experiements with ligands of lower degrees of polymerization suggests a modulatory rather than direct role for CBM49 in catalysis. Crystalline cellulose is the de facto substrate for CBM-containing plant and non-plant GH9 enzymes, a finding supported by exceptional sequence- and structural-homology. However, despite the implied similarity in general acid-base catalysis of crystalline cellulose, this study also highlights qualitative differences in substrate binding and glycosidic bond cleavage amongst class C members. Results presented may aid the development of novel plant-based GH9 endoglucanases that could extract and utilise potential fermentable carbohydrates from biomass. Crystalline cellulose digestion by plant class C GH9 endoglucanases - an in silico assessment of function. ![]()
Collapse
Affiliation(s)
- Siddhartha Kundu
- Department of Biochemistry, Army College of Medical Sciences, Brar Square, Delhi Cantt., New Delhi, 110010, India.
| |
Collapse
|
4
|
Zhang W, Sun P, He Q, Shu F, Deng H. Transcriptome analysis of near-isogenic line provides novel insights into genes associated with panicle traits regulation in rice. PLoS One 2018; 13:e0199077. [PMID: 29924832 PMCID: PMC6010284 DOI: 10.1371/journal.pone.0199077] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2018] [Accepted: 05/31/2018] [Indexed: 11/18/2022] Open
Abstract
Panicle traits in rice impact yield and quality. The OsGRF4 gene encodes a growth-regulating factor controlling panicle traits, and was recently cloned. Gene expression profiling analysis can be used to study the molecular mechanisms underlying OsGRF4 regulation. Use of near-isogenic lines (NILs) reduces genetic background noise in omics studies. We compared transcriptome profiling of 7 cm long young panicles of NIL-Osgrf4 and NIL-OsGRF4 using RNAs sequence analyses. Eighty differentially expressed genes (DEGs) were identified. Our target gene OsGRF4 was up-regulated in NIL-OsGRF4 plants, which is consistent with a previous qPCR analysis. Hierarchical cluster analysis showed OsGRF4 is tightly clustered with the up-regulated DEG LOC_Os02g47320. Gene Ontology (GO) and KEGG analysis suggested that DEGs were primarily involved in somatic embryogenesis and chitinase activity. Two up-regulated DEGs, LOC_Os04g41680 and LOC_Os04g41620, were significantly enriched in the top 8 GO terms, and were over_represented in term of seed development, and may play key roles in grain shape regulation. The transcription factor Osmyb1 also exhibited differential expression between NILs, and may be is an important regulator of panicle traits. By searching reported functions of DEGs and by co-localization with previous identified quantitative trait loci (QTL), we determined that the pleiotropic gene OsGRF4 may also be involve in abiotic stress resistance. This study provides new candidates genes for further understanding the molecular mechanisms underlying rice panicle trait regulation.
Collapse
Affiliation(s)
- Wuhan Zhang
- State Key Laboratory of Hybrid Rice, Hunan Hybrid Rice Research Center, Hunan Academy of Agricultural Sciences, Changsha, China
- Collaborative Innovation Center of Grain and Oil Crops in South China, Changsha, China
- China National Japonica Rice Research and Development Center, Tianjin, China
| | - Pingyong Sun
- State Key Laboratory of Hybrid Rice, Hunan Hybrid Rice Research Center, Hunan Academy of Agricultural Sciences, Changsha, China
| | - Qiang He
- State Key Laboratory of Hybrid Rice, Hunan Hybrid Rice Research Center, Hunan Academy of Agricultural Sciences, Changsha, China
- Collaborative Innovation Center of Grain and Oil Crops in South China, Changsha, China
| | - Fu Shu
- State Key Laboratory of Hybrid Rice, Hunan Hybrid Rice Research Center, Hunan Academy of Agricultural Sciences, Changsha, China
- Collaborative Innovation Center of Grain and Oil Crops in South China, Changsha, China
| | - Huafeng Deng
- State Key Laboratory of Hybrid Rice, Hunan Hybrid Rice Research Center, Hunan Academy of Agricultural Sciences, Changsha, China
- Collaborative Innovation Center of Grain and Oil Crops in South China, Changsha, China
- * E-mail:
| |
Collapse
|
5
|
Mathematical Basis of Predicting Dominant Function in Protein Sequences by a Generic HMM-ANN Algorithm. Acta Biotheor 2018; 66:135-148. [PMID: 29700659 PMCID: PMC7250805 DOI: 10.1007/s10441-018-9327-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2017] [Accepted: 04/16/2018] [Indexed: 12/11/2022]
Abstract
The accurate annotation of an unknown protein sequence depends on extant data of template sequences. This could be empirical or sets of reference sequences, and provides an exhaustive pool of probable functions. Individual methods of predicting dominant function possess shortcomings such as varying degrees of inter-sequence redundancy, arbitrary domain inclusion thresholds, heterogeneous parameterization protocols, and ill-conditioned input channels. Here, I present a rigorous theoretical derivation of various steps of a generic algorithm that integrates and utilizes several statistical methods to predict the dominant function in unknown protein sequences. The accompanying mathematical proofs, interval definitions, analysis, and numerical computations presented are meant to offer insights not only into the specificity and accuracy of predictions, but also provide details of the operatic mechanisms involved in the integration and its ensuing rigor. The algorithm uses numerically modified raw hidden markov model scores of well defined sets of training sequences and clusters them on the basis of known function. The results are then fed into an artificial neural network, the predictions of which can be refined using the available data. This pipeline is trained recursively and can be used to discern the dominant principal function, and thereby, annotate an unknown protein sequence. Whilst, the approach is complex, the specificity of the final predictions can benefit laboratory workers design their experiments with greater confidence.
Collapse
|
6
|
Kundu S, Sharma R. Origin, evolution, and divergence of plant class C GH9 endoglucanases. BMC Evol Biol 2018; 18:79. [PMID: 29848310 PMCID: PMC5977491 DOI: 10.1186/s12862-018-1185-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2017] [Accepted: 04/18/2018] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Glycoside hydrolases of the GH9 family encode cellulases that predominantly function as endoglucanases and have wide applications in the food, paper, pharmaceutical, and biofuel industries. The partitioning of plant GH9 endoglucanases, into classes A, B, and C, is based on the differential presence of transmembrane, signal peptide, and the carbohydrate binding module (CBM49). There is considerable debate on the distribution and the functions of these enzymes which may vary in different organisms. In light of these findings we examined the origin, emergence, and subsequent divergence of plant GH9 endoglucanases, with an emphasis on elucidating the role of CBM49 in the digestion of crystalline cellulose by class C members. RESULTS Since, the digestion of crystalline cellulose mandates the presence of a well-defined set of aromatic and polar amino acids and/or an attributable domain that can mediate this conversion, we hypothesize a vertical mode of transfer of genes that could favour the emergence of class C like GH9 endoglucanase activity in land plants from potentially ancestral non plant taxa. We demonstrated the concomitant occurrence of a GH9 domain with CBM49 and other homologous carbohydrate binding modules, in putative endoglucanase sequences from several non-plant taxa. In the absence of comparable full length CBMs, we have characterized several low strength patterns that could approximate the CBM49, thereby, extending support for digestion of crystalline cellulose to other segments of the protein. We also provide data suggestive of the ancestral role of putative class C GH9 endoglucanases in land plants, which includes detailed phylogenetics and the presence and subsequent loss of CBM49, transmembrane, and signal peptide regions in certain populations of early land plants. These findings suggest that classes A and B of modern vascular land plants may have emerged by diverging directly from CBM49 encompassing putative class C enzymes. CONCLUSION Our detailed phylogenetic and bioinformatics analysis of putative GH9 endoglucanase sequences across major taxa suggests that plant class C enzymes, despite their recent discovery, could function as the last common ancestor of classes A and B. Additionally, research into their ability to digest or inter-convert crystalline and amorphous forms of cellulose could make them lucrative candidates for engineering biofuel feedstock.
Collapse
Affiliation(s)
- Siddhartha Kundu
- Department of Biochemistry, Government of NCT of Delhi, Dr. Baba Saheb Ambedkar Medical College & Hospital, New Delhi, 110085, India. .,Crop Genetics and Informatics Group, School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi, 110067, India.
| | - Rita Sharma
- Crop Genetics and Informatics Group, School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi, 110067, India.
| |
Collapse
|
7
|
Mathematical basis of improved protein subfamily classification by a HMM-based sequence filter. Math Biosci 2017; 293:75-80. [PMID: 28916136 DOI: 10.1016/j.mbs.2017.09.001] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2016] [Revised: 06/14/2017] [Accepted: 09/11/2017] [Indexed: 11/22/2022]
Abstract
Informative phylogenetic analysis is dependent on the presence of curated and annotated sequences. This may be complemented by the simultaneous availability of empirical data pertaining to their in vivo function. Confounding sequences, with their similarity to more than one functional cluster, can therefore, render any categorization ambiguous, subjective, and imprecise. Here, I analyze and discuss the development of a mathematical expression that can characterize a potential confounding protein sequence. Specifically, statistical descriptors of combinatorially arranged profile HMM scores are computed and evaluated. The resultant data is then incorporated into an index of sequence suitability. The sequence may then be recommended as either suitable for inclusion or be excluded all together. The index is independent of experimental data and, can, be computed from the primary structure of the protein sequence. This can be utilized to trim previously grouped sequences and can either finalize the composition of training set or reduce the search space of sequences to be tested.
Collapse
|
8
|
Mathur S, Umakanth AV, Tonapi VA, Sharma R, Sharma MK. Sweet sorghum as biofuel feedstock: recent advances and available resources. BIOTECHNOLOGY FOR BIOFUELS 2017; 10:146. [PMID: 28603553 PMCID: PMC5465577 DOI: 10.1186/s13068-017-0834-9] [Citation(s) in RCA: 61] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/13/2016] [Accepted: 05/30/2017] [Indexed: 05/08/2023]
Abstract
Sweet sorghum is a promising target for biofuel production. It is a C4 crop with low input requirements and accumulates high levels of sugars in its stalks. However, large-scale planting on marginal lands would require improved varieties with optimized biofuel-related traits and tolerance to biotic and abiotic stresses. Considering this, many studies have been carried out to generate genetic and genomic resources for sweet sorghum. In this review, we discuss various attributes of sweet sorghum that make it an ideal candidate for biofuel feedstock, and provide an overview of genetic diversity, tools, and resources available for engineering and/or marker-assisting breeding of sweet sorghum. Finally, the progress made so far, in identification of genes/quantitative trait loci (QTLs) important for agronomic traits and ongoing molecular breeding efforts to generate improved varieties, has been discussed.
Collapse
Affiliation(s)
- Supriya Mathur
- Crop Genetics & Informatics Group, School of Biotechnology, Jawaharlal Nehru University, New Delhi, India
| | - A. V. Umakanth
- Indian Council of Agricultural Research-Indian Institute of Millets Research, Hyderabad, India
| | - V. A. Tonapi
- Indian Council of Agricultural Research-Indian Institute of Millets Research, Hyderabad, India
| | - Rita Sharma
- Crop Genetics & Informatics Group, School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi, India
| | - Manoj K. Sharma
- Crop Genetics & Informatics Group, School of Biotechnology, Jawaharlal Nehru University, New Delhi, India
| |
Collapse
|