1
|
Lv P, Wan J, Zhang C, Hina A, Al Amin GM, Begum N, Zhao T. Unraveling the Diverse Roles of Neglected Genes Containing Domains of Unknown Function (DUFs): Progress and Perspective. Int J Mol Sci 2023; 24:ijms24044187. [PMID: 36835600 PMCID: PMC9966272 DOI: 10.3390/ijms24044187] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2022] [Revised: 02/06/2023] [Accepted: 02/08/2023] [Indexed: 02/22/2023] Open
Abstract
Domain of unknown function (DUF) is a general term for many uncharacterized domains with two distinct features: relatively conservative amino acid sequence and unknown function of the domain. In the Pfam 35.0 database, 4795 (24%) gene families belong to the DUF type, yet, their functions remain to be explored. This review summarizes the characteristics of the DUF protein families and their functions in regulating plant growth and development, generating responses to biotic and abiotic stress, and other regulatory roles in plant life. Though very limited information is available about these proteins yet, by taking advantage of emerging omics and bioinformatic tools, functional studies of DUF proteins could be utilized in future molecular studies.
Collapse
Affiliation(s)
- Peiyun Lv
- National Center for Soybean Improvement, Key Laboratory of Biology and Genetics and Breeding for Soybean, Ministry of Agriculture, State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing 210095, China
| | - Jinlu Wan
- National Center for Soybean Improvement, Key Laboratory of Biology and Genetics and Breeding for Soybean, Ministry of Agriculture, State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing 210095, China
| | - Chunting Zhang
- National Center for Soybean Improvement, Key Laboratory of Biology and Genetics and Breeding for Soybean, Ministry of Agriculture, State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing 210095, China
| | - Aiman Hina
- National Center for Soybean Improvement, Key Laboratory of Biology and Genetics and Breeding for Soybean, Ministry of Agriculture, State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing 210095, China
| | - G M Al Amin
- Department of Botany, Jagannath University, Dhaka 1100, Bangladesh
| | - Naheeda Begum
- National Center for Soybean Improvement, Key Laboratory of Biology and Genetics and Breeding for Soybean, Ministry of Agriculture, State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing 210095, China
- Correspondence: (N.B.); (T.Z.)
| | - Tuanjie Zhao
- National Center for Soybean Improvement, Key Laboratory of Biology and Genetics and Breeding for Soybean, Ministry of Agriculture, State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing 210095, China
- Correspondence: (N.B.); (T.Z.)
| |
Collapse
|
2
|
Liu B, Nan J, Zu X, Zhang X, Xiao Q. Identification of Genome Sequences of Polyphosphate-Accumulating Organisms by Machine Learning. Front Cell Dev Biol 2021; 8:626221. [PMID: 33537313 PMCID: PMC7848102 DOI: 10.3389/fcell.2020.626221] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2020] [Accepted: 12/15/2020] [Indexed: 11/13/2022] Open
Abstract
In the field of sewage treatment, the identification of polyphosphate-accumulating organisms (PAOs) usually relies on biological experiments. However, biological experiments are not only complicated and time-consuming, but also costly. In recent years, machine learning has been widely used in many fields, but it is seldom used in the water treatment. The present work presented a high accuracy support vector machine (SVM) algorithm to realize the rapid identification and prediction of PAOs. We obtained 6,318 genome sequences of microorganisms from the publicly available microbial genome database for comparative analysis (MBGD). Minimap2 was used to compare the genomes of the obtained microorganisms in pairs, and read the overlap. The SVM model was established using the similarity of the genome sequences. In this SVM model, the average accuracy is 0.9628 ± 0.019 with 10-fold cross-validation. By predicting 2,652 microorganisms, 22 potential PAOs were obtained. Through the analysis of the predicted potential PAOs, most of them could be indirectly verified their phosphorus removal characteristics from previous reports. The SVM model we built shows high prediction accuracy and good stability.
Collapse
Affiliation(s)
- Bohan Liu
- State Key Laboratory of Urban Water Resource and Environment, School of Environment, Harbin Institute of Technology, Harbin, China
| | - Jun Nan
- State Key Laboratory of Urban Water Resource and Environment, School of Environment, Harbin Institute of Technology, Harbin, China
| | - Xuehui Zu
- State Key Laboratory of Urban Water Resource and Environment, School of Environment, Harbin Institute of Technology, Harbin, China
| | - Xinhui Zhang
- State Key Laboratory of Urban Water Resource and Environment, School of Environment, Harbin Institute of Technology, Harbin, China
| | - Qiliang Xiao
- State Key Laboratory of Urban Water Resource and Environment, School of Environment, Harbin Institute of Technology, Harbin, China
| |
Collapse
|
3
|
Zhong H, Zhang H, Guo R, Wang Q, Huang X, Liao J, Li Y, Huang Y, Wang Z. Characterization and Functional Divergence of a Novel DUF668 Gene Family in Rice Based on Comprehensive Expression Patterns. Genes (Basel) 2019; 10:genes10120980. [PMID: 31795257 PMCID: PMC6969926 DOI: 10.3390/genes10120980] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2019] [Revised: 11/24/2019] [Accepted: 11/26/2019] [Indexed: 12/22/2022] Open
Abstract
The domain of unknown function (DUF) superfamily encodes proteins of unknown functions in plants. Among them, DUF668 family members in plants possess a 29 amino-acid conserved domain, and this family has not been described previously. Here, we report this plant-specific novel DUF668 gene family containing 12 OsDUF668 genes in rice (Oryza sativa) and 91 DUF668s for the other seven plant species. In our study, DUF668 genes were present in both dicot and monocot plants, indicating that DUF668 is a conserved gene family that originated by predating the dicot–monocot divergence. Based on the gene structure and motif composition, the DUF668 family consists of two distinct clades, I and II in the phylogenetic tree. Remarkably, OsDUF668 genes clustered on the chromosomes merely show close phylogenetic relationships, suggesting that gene duplications or collinearity seldom happened. Cis-elements prediction display that over 80% of DUF668s contain phytohormone and light responsiveness factors. Further comprehensive experimental analyses of the OsDUF668 family are implemented in 22 different tissues, five hormone treatments, seven environmental factor stresses, and two pathogen-defense related stresses. The OsDUF668 genes express ubiquitously in analyzed rice tissues, and seven genes show tissue-specific high expression profiles. All OsDUF668s respond to drought, and some of Avr9/Cf-9 rapidly elicited genes resist to salt, wound, and rice blast with rapidly altered expression patterns. These findings imply that OsDUF668 is essential for drought-enduring and plant defense. Together, our results bring the important role of the DUF668 gene family in rice development and fitness to the fore.
Collapse
Affiliation(s)
- Hua Zhong
- State Key Laboratory for Hybrid Rice, College of Life Sciences, Wuhan University, Wuhan 430072, China
| | - Hongyu Zhang
- Key Laboratory of Crop Physiology, Ecology and Genetic Breeding (Jiangxi Agricultural University), Ministry of Education of the P.R. China, Nanchang 330045, China
| | - Rong Guo
- Key Laboratory of Crop Physiology, Ecology and Genetic Breeding (Jiangxi Agricultural University), Ministry of Education of the P.R. China, Nanchang 330045, China
| | - Qiang Wang
- Key Laboratory of Crop Physiology, Ecology and Genetic Breeding (Jiangxi Agricultural University), Ministry of Education of the P.R. China, Nanchang 330045, China
| | - Xiaoping Huang
- Key Laboratory of Crop Physiology, Ecology and Genetic Breeding (Jiangxi Agricultural University), Ministry of Education of the P.R. China, Nanchang 330045, China
| | - Jianglin Liao
- Key Laboratory of Crop Physiology, Ecology and Genetic Breeding (Jiangxi Agricultural University), Ministry of Education of the P.R. China, Nanchang 330045, China
- Southern Regional Collaborative Innovation Center for Grain and Oil Crops in China, Changsha 410128, China
| | - Yangsheng Li
- State Key Laboratory for Hybrid Rice, College of Life Sciences, Wuhan University, Wuhan 430072, China
| | - Yingjin Huang
- Key Laboratory of Crop Physiology, Ecology and Genetic Breeding (Jiangxi Agricultural University), Ministry of Education of the P.R. China, Nanchang 330045, China
- Southern Regional Collaborative Innovation Center for Grain and Oil Crops in China, Changsha 410128, China
| | - Zhaohai Wang
- Key Laboratory of Crop Physiology, Ecology and Genetic Breeding (Jiangxi Agricultural University), Ministry of Education of the P.R. China, Nanchang 330045, China
- Southern Regional Collaborative Innovation Center for Grain and Oil Crops in China, Changsha 410128, China
- Correspondence:
| |
Collapse
|
4
|
Ghosh S, Chatterji D. Two zinc finger proteins from Mycobacterium smegmatis: DNA binding and activation of transcription. Genes Cells 2017. [PMID: 28639742 DOI: 10.1111/gtc.12507] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Single zinc finger domain containing proteins are very few in number. Of numerous zinc finger proteins in eukaryotes, only three of them like GAGA, Superman and DNA binding by one finger (Dof) have single zinc finger domain. Although few zinc finger proteins have been described in eubacteria, no protein with single C4 zinc finger has been described in details in anyone of them. In this article, we are describing two novel C-terminal C4 zinc finger proteins-Msmeg_0118 and Msmeg_3613 from Mycobacterium smegmatis. We have named these proteins as Mszfp1 (Mycobacterial Single Zinc Finger Protein 1) and Mszfp2 (Mycobacterial Single Zinc Finger Protein 2). Both the proteins are expressed constitutively, can bind to DNA and regulate transcription. It appears that Mszfp1 and Mszfp2 may activate transcription by interacting with RNA polymerase.
Collapse
Affiliation(s)
- Subho Ghosh
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, 560012, India
| | - Dipankar Chatterji
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, 560012, India
| |
Collapse
|
5
|
Wang L, Shen R, Chen LT, Liu YG. Characterization of a novel DUF1618 gene family in rice. JOURNAL OF INTEGRATIVE PLANT BIOLOGY 2014; 56:151-158. [PMID: 24237627 DOI: 10.1111/jipb.12130] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/24/2013] [Accepted: 11/05/2013] [Indexed: 06/02/2023]
Abstract
Domain of unknown function (DUF) proteins represent a number of gene families that encode functionally uncharacterized proteins in eukaryotes. For example, DUF1618 family members in plants possess a 56-199-amino acid conserved domain and this family has not been described previously. Here, we report the characterization of 121 DUF1618 genes identified in the rice genome. Based on phylogenetic analysis, the rice DUF1618 family was divided into two major groups, each group consisting of two clades. Most DUF1618 genes with close phylogenetic relationships are located in gene clusters on the chromosomes, indicating that gene duplications increased the number of DUF1618 genes. A search for DUF1618 genes in genomic and/or expressed sequence tag databases for 35 other plant species showed that DUF1618 genes are only present in several monocot plants, suggesting that DUF1618 is a new gene family that originated after the dicot-monocot divergence. Based on public microarray databases, most rice DUF1618 genes are expressed at relatively low levels. Further experimental analysis showed that the transcriptional levels of some DUF1618 genes varied in different cultivars, and some responded to stress and hormone conditions, suggesting their important roles for development and fitness in rice (Oryza sativa L.).
Collapse
Affiliation(s)
- Lan Wang
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, College of Life Sciences, South China Agricultural University, Guangzhou, 510642, China; College of Agriculture, South China Agricultural University, Guangzhou, 510642, China
| | | | | | | |
Collapse
|
6
|
Kumar A, Chiu HJ, Axelrod HL, Morse A, Elsliger MA, Wilson IA, Deacon A. Ligands in PSI structures. Acta Crystallogr Sect F Struct Biol Cryst Commun 2010; 66:1309-16. [PMID: 20944227 PMCID: PMC2954221 DOI: 10.1107/s1744309110008092] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2009] [Accepted: 03/03/2010] [Indexed: 11/17/2022]
Abstract
Approximately 65% of PSI structures report some type of ligand(s) that is bound in the crystal structure. Here, a description is given of how such ligands are handled and analyzed at the JCSG and a survey of the types, variety and frequency of ligands that are observed in the PSI structures is also compiled and analyzed, including illustrations of how these bound ligands have provided functional clues for annotation of proteins with little or no previous experimental characterization. Furthermore, a web server was developed as a tool to mine and analyze the PSI structures for bound ligands and other identifying features.
Collapse
Affiliation(s)
- Abhinav Kumar
- Joint Center for Structural Genomics, http://www.jcsg.org, USA
- Stanford Synchrotron Radiation Lightsource, SLAC National Accelerator Laboratory, Menlo Park, CA, USA
| | - Hsiu-Ju Chiu
- Joint Center for Structural Genomics, http://www.jcsg.org, USA
- Stanford Synchrotron Radiation Lightsource, SLAC National Accelerator Laboratory, Menlo Park, CA, USA
| | - Herbert L. Axelrod
- Joint Center for Structural Genomics, http://www.jcsg.org, USA
- Stanford Synchrotron Radiation Lightsource, SLAC National Accelerator Laboratory, Menlo Park, CA, USA
| | - Andrew Morse
- Joint Center for Structural Genomics, http://www.jcsg.org, USA
- Center for Research in Biological Systems, University of California, San Diego, La Jolla, CA, USA
| | - Marc-André Elsliger
- Joint Center for Structural Genomics, http://www.jcsg.org, USA
- Department of Molecular Biology, The Scripps Research Institute, La Jolla, CA, USA
| | - Ian A. Wilson
- Joint Center for Structural Genomics, http://www.jcsg.org, USA
- Department of Molecular Biology, The Scripps Research Institute, La Jolla, CA, USA
| | - Ashley Deacon
- Joint Center for Structural Genomics, http://www.jcsg.org, USA
- Stanford Synchrotron Radiation Lightsource, SLAC National Accelerator Laboratory, Menlo Park, CA, USA
| |
Collapse
|
7
|
Krishna SS, Weekes D, Bakolitsa C, Elsliger MA, Wilson IA, Godzik A, Wooley J. TOPSAN: use of a collaborative environment for annotating, analyzing and disseminating data on JCSG and PSI structures. Acta Crystallogr Sect F Struct Biol Cryst Commun 2010; 66:1143-7. [PMID: 20944203 PMCID: PMC2954197 DOI: 10.1107/s1744309110035736] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2010] [Accepted: 09/06/2010] [Indexed: 11/17/2022]
Abstract
The NIH Protein Structure Initiative centers, such as the Joint Center for Structural Genomics (JCSG), have developed highly efficient technological platforms that are capable of experimentally determining the three-dimensional structures of hundreds of proteins per year. However, the overwhelming majority of the almost 5000 protein structures determined by these centers have yet to be described in the peer-reviewed literature. In a high-throughput structural genomics environment, the process of structure determination occurs independently of any associated experimental characterization of function, which creates a challenge for the annotation and analysis of structures and the publication of these results. This challenge has been addressed by developing TOPSAN (`The Open Protein Structure Annotation Network'), which enables the generation of knowledge via collaborations among globally distributed contributors supported by automated amalgamation of available information. TOPSAN currently provides annotations for all protein structures determined by the JCSG in addition to preliminary annotations on a large number of structures from the other PSI production centers. TOPSAN-enabled collaborations have resulted in insightful structure-function analysis for many proteins and have led to numerous peer-reviewed publications, as exemplified by the articles included in this issue of Acta Crystallographica Section F.
Collapse
Affiliation(s)
- S. Sri Krishna
- Joint Center for Structural Genomics, http://www.jcsg.org, USA
- Center for Research in Biological Systems, University of California, San Diego, La Jolla, California, USA
| | - Dana Weekes
- Joint Center for Structural Genomics, http://www.jcsg.org, USA
- Center for Research in Biological Systems, University of California, San Diego, La Jolla, California, USA
| | - Constantina Bakolitsa
- Joint Center for Structural Genomics, http://www.jcsg.org, USA
- Center for Research in Biological Systems, University of California, San Diego, La Jolla, California, USA
| | - Marc-André Elsliger
- Joint Center for Structural Genomics, http://www.jcsg.org, USA
- Department of Molecular Biology, The Scripps Research Institute, La Jolla, California, USA
| | - Ian A. Wilson
- Joint Center for Structural Genomics, http://www.jcsg.org, USA
- Department of Molecular Biology, The Scripps Research Institute, La Jolla, California, USA
| | - Adam Godzik
- Joint Center for Structural Genomics, http://www.jcsg.org, USA
- Center for Research in Biological Systems, University of California, San Diego, La Jolla, California, USA
- Program on Bioinformatics and Systems Biology, Sanford–Burnham Medical Research Institute, La Jolla, California, USA
| | - John Wooley
- Joint Center for Structural Genomics, http://www.jcsg.org, USA
- Center for Research in Biological Systems, University of California, San Diego, La Jolla, California, USA
- Program on Bioinformatics and Systems Biology, Sanford–Burnham Medical Research Institute, La Jolla, California, USA
| |
Collapse
|
8
|
Bateman A, Coggill P, Finn RD. DUFs: families in search of function. Acta Crystallogr Sect F Struct Biol Cryst Commun 2010; 66:1148-52. [PMID: 20944204 PMCID: PMC2954198 DOI: 10.1107/s1744309110001685] [Citation(s) in RCA: 172] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2009] [Accepted: 01/13/2010] [Indexed: 11/30/2022]
Abstract
Domains of unknown function (DUFs) are a large set of uncharacterized protein families that are found in the Pfam database. Here, the scale and growth of functionally uncharacterized families in biological databases are surveyed and the prospects for discovering their function are examined. In particular, the important role that structural genomics can play in identifying potential function is evaluated.
Collapse
Affiliation(s)
- Alex Bateman
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, England.
| | | | | |
Collapse
|
9
|
Andreeva A, Murzin AG. Structural classification of proteins and structural genomics: new insights into protein folding and evolution. Acta Crystallogr Sect F Struct Biol Cryst Commun 2010; 66:1190-7. [PMID: 20944210 PMCID: PMC2954204 DOI: 10.1107/s1744309110007177] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2010] [Accepted: 02/24/2010] [Indexed: 11/10/2022]
Abstract
During the past decade, the Protein Structure Initiative (PSI) centres have become major contributors of new families, superfamilies and folds to the Structural Classification of Proteins (SCOP) database. The PSI results have increased the diversity of protein structural space and accelerated our understanding of it. This review article surveys a selection of protein structures determined by the Joint Center for Structural Genomics (JCSG). It presents previously undescribed β-sheet architectures such as the double barrel and spiral β-roll and discusses new examples of unusual topologies and peculiar structural features observed in proteins characterized by the JCSG and other Structural Genomics centres.
Collapse
Affiliation(s)
- Antonina Andreeva
- MRC Laboratory of Molecular Biology, Hills Road, Cambridge CB2 0QH, England
| | - Alexey G. Murzin
- MRC Laboratory of Molecular Biology, Hills Road, Cambridge CB2 0QH, England
| |
Collapse
|