1
|
Yang H, Zhu L, Wang X, Song Y, Dong Y, Xu W. Extension characteristics of TdT and its application in biosensors. Crit Rev Biotechnol 2024; 44:981-995. [PMID: 37880088 DOI: 10.1080/07388551.2023.2270772] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Revised: 08/18/2023] [Accepted: 09/14/2023] [Indexed: 10/27/2023]
Abstract
The advantages of rapid amplification of nucleic acid without a template based on terminal deoxyribonucleotidyl transferase (TdT) have been widely used in the field of biosensors. However, the catalytic efficiency of TdT is affected by extension conditions. The sensitivity of TdT- mediated biosensors can be improved only under appropriate conditions. Therefore, in this review, we provide a comprehensive overview of TdT extension characteristics and its applications in biosensors. We focus on the relationship between TdT extension conditions and extension efficiency. Furthermore, the construction strategy of TdT-mediated biosensors according to five different recognition types and their applications in targets are discussed and, finally, several current challenges and prospects in the field are taken into consideration.
Collapse
Affiliation(s)
- He Yang
- Department of Nutrition and Health, Ministry of Education, Key Laboratory of Precision Nutrition and Food Quality, Food Laboratory of Zhongyuan, China Agricultural University, Beijing, China
| | - Longjiao Zhu
- Department of Nutrition and Health, Ministry of Education, Key Laboratory of Precision Nutrition and Food Quality, Food Laboratory of Zhongyuan, China Agricultural University, Beijing, China
| | - Xinxin Wang
- Department of Nutrition and Health, Ministry of Education, Key Laboratory of Precision Nutrition and Food Quality, Food Laboratory of Zhongyuan, China Agricultural University, Beijing, China
| | - Yuhan Song
- Department of Nutrition and Health, Ministry of Education, Key Laboratory of Precision Nutrition and Food Quality, Food Laboratory of Zhongyuan, China Agricultural University, Beijing, China
- College of Food Science and Nutritional Engineering, Key Laboratory of Safety Assessment of Genetically Modified Organism (Food Safety), China Agricultural University, Beijing, China
| | - Yulan Dong
- Department of Nutrition and Health, Ministry of Education, Key Laboratory of Precision Nutrition and Food Quality, Food Laboratory of Zhongyuan, China Agricultural University, Beijing, China
- College of Veterinary Medicine, China Agricultural University, Beijing, China
| | - Wentao Xu
- Department of Nutrition and Health, Ministry of Education, Key Laboratory of Precision Nutrition and Food Quality, Food Laboratory of Zhongyuan, China Agricultural University, Beijing, China
- College of Food Science and Nutritional Engineering, Key Laboratory of Safety Assessment of Genetically Modified Organism (Food Safety), China Agricultural University, Beijing, China
| |
Collapse
|
2
|
Dutta A, Kanaujia SP. The Structural Features of MlaD Illuminate its Unique Ligand-Transporting Mechanism and Ancestry. Protein J 2024; 43:298-315. [PMID: 38347327 DOI: 10.1007/s10930-023-10179-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/22/2023] [Indexed: 05/01/2024]
Abstract
The membrane-associated solute-binding protein (SBP) MlaD of the maintenance of lipid asymmetry (Mla) system has been reported to help the transport of phospholipids (PLs) between the outer and inner membranes of Gram-negative bacteria. Despite the availability of structural information, the molecular mechanism underlying the transport of PLs and the ancestry of the protein MlaD remain unclear. In this study, we report the crystal structures of the periplasmic region of MlaD from Escherichia coli (EcMlaD) at a resolution range of 2.3-3.2 Å. The EcMlaD protomer consists of two distinct regions, viz. N-terminal β-barrel fold consisting of seven strands (referred to as MlaD domain) and C-terminal α-helical domain (HD). The protein EcMlaD oligomerizes to give rise to a homo-hexameric ring with a central channel that is hydrophobic and continuous with a variable diameter. Interestingly, the structural analysis revealed that the HD, instead of the MlaD domain, plays a critical role in determining the oligomeric state of the protein. Based on the analysis of available structural information, we propose a working mechanism of PL transport, viz. "asymmetric protomer movement (APM)". Wherein half of the EcMlaD hexamer would rise in the periplasmic side along with an outward movement of pore loops, resulting in the change of the central channel geometry. Furthermore, this study highlights that, unlike typical SBPs, EcMlaD possesses a fold similar to EF/AMT-type beta(6)-barrel and a unique ancestry. Altogether, the findings firmly establish EcMlaD to be a non-canonical SBP with a unique ligand-transport mechanism.
Collapse
Affiliation(s)
- Angshu Dutta
- Department of Biosciences and Bioengineering, Indian Institute of Technology Guwahati, Guwahati, Assam, 781039, India
| | - Shankar Prasad Kanaujia
- Department of Biosciences and Bioengineering, Indian Institute of Technology Guwahati, Guwahati, Assam, 781039, India.
| |
Collapse
|
3
|
Hou Z, Yang Y, Ma Z, Wong KC, Li X. Learning the protein language of proteome-wide protein-protein binding sites via explainable ensemble deep learning. Commun Biol 2023; 6:73. [PMID: 36653447 PMCID: PMC9849350 DOI: 10.1038/s42003-023-04462-5] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Accepted: 01/11/2023] [Indexed: 01/20/2023] Open
Abstract
Protein-protein interactions (PPIs) govern cellular pathways and processes, by significantly influencing the functional expression of proteins. Therefore, accurate identification of protein-protein interaction binding sites has become a key step in the functional analysis of proteins. However, since most computational methods are designed based on biological features, there are no available protein language models to directly encode amino acid sequences into distributed vector representations to model their characteristics for protein-protein binding events. Moreover, the number of experimentally detected protein interaction sites is much smaller than that of protein-protein interactions or protein sites in protein complexes, resulting in unbalanced data sets that leave room for improvement in their performance. To address these problems, we develop an ensemble deep learning model (EDLM)-based protein-protein interaction (PPI) site identification method (EDLMPPI). Evaluation results show that EDLMPPI outperforms state-of-the-art techniques including several PPI site prediction models on three widely-used benchmark datasets including Dset_448, Dset_72, and Dset_164, which demonstrated that EDLMPPI is superior to those PPI site prediction models by nearly 10% in terms of average precision. In addition, the biological and interpretable analyses provide new insights into protein binding site identification and characterization mechanisms from different perspectives. The EDLMPPI webserver is available at http://www.edlmppi.top:5002/ .
Collapse
Affiliation(s)
- Zilong Hou
- grid.64924.3d0000 0004 1760 5735School of Artificial Intelligence, Jilin University, Jilin, China
| | - Yuning Yang
- grid.27446.330000 0004 1789 9163 Information Science and Technology, Northeast Normal University, Jilin, China
| | - Zhiqiang Ma
- grid.27446.330000 0004 1789 9163 Information Science and Technology, Northeast Normal University, Jilin, China
| | - Ka-chun Wong
- grid.35030.350000 0004 1792 6846Department of Computer Science, City University of Hong Kong, Hong Kong SAR, China
| | - Xiangtao Li
- grid.64924.3d0000 0004 1760 5735School of Artificial Intelligence, Jilin University, Jilin, China
| |
Collapse
|
4
|
Savino S, Desmet T, Franceus J. Insertions and deletions in protein evolution and engineering. Biotechnol Adv 2022; 60:108010. [PMID: 35738511 DOI: 10.1016/j.biotechadv.2022.108010] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2022] [Revised: 06/15/2022] [Accepted: 06/16/2022] [Indexed: 11/17/2022]
Abstract
Protein evolution or engineering studies are traditionally focused on amino acid substitutions and the way these contribute to fitness. Meanwhile, the insertion and deletion of amino acids is often overlooked, despite being one of the most common sources of genetic variation. Recent methodological advances and successful engineering stories have demonstrated that the time is ripe for greater emphasis on these mutations and their understudied effects. This review highlights the evolutionary importance and biotechnological relevance of insertions and deletions (indels). We provide a comprehensive overview of approaches that can be employed to include indels in random, (semi)-rational or computational protein engineering pipelines. Furthermore, we discuss the tolerance to indels at the structural level, address how domain indels can link the function of unrelated proteins, and feature studies that illustrate the surprising and intriguing potential of frameshift mutations.
Collapse
Affiliation(s)
- Simone Savino
- Centre for Synthetic Biology (CSB), Department of Biotechnology, Ghent University, Coupure Links 653, 9000 Ghent, Belgium
| | - Tom Desmet
- Centre for Synthetic Biology (CSB), Department of Biotechnology, Ghent University, Coupure Links 653, 9000 Ghent, Belgium
| | - Jorick Franceus
- Centre for Synthetic Biology (CSB), Department of Biotechnology, Ghent University, Coupure Links 653, 9000 Ghent, Belgium..
| |
Collapse
|
5
|
García-Cebollada H, López A, Sancho J. Protposer: the web server that readily proposes protein stabilizing mutations with high PPV. Comput Struct Biotechnol J 2022; 20:2415-2433. [PMID: 35664235 PMCID: PMC9133766 DOI: 10.1016/j.csbj.2022.05.008] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2022] [Revised: 05/05/2022] [Accepted: 05/05/2022] [Indexed: 01/23/2023] Open
Abstract
Protein stability is a requisite for most biotechnological and medical applications of proteins. As natural proteins tend to suffer from a low conformational stability ex vivo, great efforts have been devoted toward increasing their stability through rational design and engineering of appropriate mutations. Unfortunately, even the best currently used predictors fail to compute the stability of protein variants with sufficient accuracy and their usefulness as tools to guide the rational stabilisation of proteins is limited. We present here Protposer, a protein stabilising tool based on a different approach. Instead of quantifying changes in stability, Protposer uses structure- and sequence-based screening modules to nominate candidate mutations for subsequent evaluation by a logistic regression model, carefully trained to avoid overfitting. Thus, Protposer analyses PDB files in search for stabilization opportunities and provides a ranked list of promising mutations with their estimated success rates (eSR), their probabilities of being stabilising by at least 0.5 kcal/mol. The agreement between eSRs and actual positive predictive values (PPV) on external datasets of mutations is excellent. When Protposer is used with its Optimal kappa selection threshold, its PPV is above 0.7. Even with less stringent thresholds, Protposer largely outperforms FoldX, Rosetta and PoPMusiC. Indicating the PDB file of the protein suffices to obtain a ranked list of mutations, their eSRs and hints on the likely source of the stabilization expected. Protposer is a distinct, straightforward and highly successful tool to design protein stabilising mutations, and it is freely available for academic use at http://webapps.bifi.es/the-protposer.
Collapse
|
6
|
Kumar K, Srivastava H, Das A, Tribhuvan KU, Durgesh K, Joshi R, Sevanthi AM, Jain PK, Singh NK, Gaikwad K. Identification and characterization of MADS box gene family in pigeonpea for their role during floral transition. 3 Biotech 2021; 11:108. [PMID: 33569264 DOI: 10.1007/s13205-020-02605-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2020] [Accepted: 12/23/2020] [Indexed: 12/16/2022] Open
Abstract
MADS box genes are class of transcription factors involved in various physiological and developmental processes in plants. To understand their role in floral transition-related pathways, a genome-wide identification was done in Cajanus cajan, identifying 102 members which were classified into two different groups based on their gene structure. The status of all these genes was further analyzed in three wild species i.e. C. scarabaeoides, C. platycarpus and C. cajanifolius which revealed absence of 31-34 MADS box genes in them hinting towards their role in domestication and evolution. We could locate only a single copy of both FLOWERING LOCUS C (FLC) and SHORT VEGETATIVE PHASE (SVP) genes, while three paralogs of SUPPRESSOR OF ACTIVATION OF CONSTANS 1 (SOC1) were found in C. cajan genome. One of those SOC1 paralogs i.e. CcMADS1.5 was found to be missing in all three wild relatives, also forming separate clade in phylogeny. This SOC1 gene was also lacking the characteristic MADS box domain in it. Expression profiling of major MADS box genes involved in flowering was done in different tissues viz shoot apical meristem, vegetative leaf, reproductive meristem, and reproductive bud. Gene-based time tree of FLC and SOC1 gene dictates their divergence from Arabidopsis before 71 and 23 million year ago (mya), respectively. This study provides valuable insights into the functional characteristics, expression pattern, and evolution of MADS box proteins in grain legumes with emphasis on C. cajan, which may help in further characterizing these genes. SUPPLEMENTARY INFORMATION The online version contains supplementary material available at 10.1007/s13205-020-02605-7.
Collapse
Affiliation(s)
- Kuldeep Kumar
- ICAR-National Institute for Plant Biotechnology, New Delhi, 110012 India
- ICAR-Indian Institute of Pulses Research, Kanpur, 208024 Uttar Pradesh India
| | - Harsha Srivastava
- ICAR-National Institute for Plant Biotechnology, New Delhi, 110012 India
| | - Antara Das
- ICAR-National Institute for Plant Biotechnology, New Delhi, 110012 India
| | - Kishor U Tribhuvan
- ICAR-National Institute for Plant Biotechnology, New Delhi, 110012 India
- ICAR-Indian Institute of Agricultural Biotechnology, Ranchi, 834010 Jharkhand India
| | - Kumar Durgesh
- Division of Genetics, ICAR-Indian Agricultural Reserch Institute, New Delhi, 110012 India
| | - Rekha Joshi
- Division of Genetics, ICAR-Indian Agricultural Reserch Institute, New Delhi, 110012 India
| | | | - Pradeep Kumar Jain
- ICAR-National Institute for Plant Biotechnology, New Delhi, 110012 India
| | | | - Kishor Gaikwad
- ICAR-National Institute for Plant Biotechnology, New Delhi, 110012 India
| |
Collapse
|
7
|
MrHex1 is Required for Woronin Body Formation, Fungal Development and Virulence in Metarhizium robertsii. J Fungi (Basel) 2020; 6:jof6030172. [PMID: 32937856 PMCID: PMC7559983 DOI: 10.3390/jof6030172] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2020] [Revised: 09/10/2020] [Accepted: 09/11/2020] [Indexed: 02/07/2023] Open
Abstract
The Woronin body (WB) is a peroxisome-derived dense-core vesicle, a self-assembling hexagonal crystal of a single protein Hex1. This organelle is specific to the ascomycete fungi belonging to the Pezizomycotina subphylum by functioning in sealing septal pores in response to mycelium damage and the control of cell heterogeneity. We retrieved all available Hex1-domain containing proteins of different fungi from the GenBank database and found considerable length variations among 460 obtained Hex1 proteins. However, a highly conserved Hex1 domain containing 75 amino acid residues with a specific S/A-R/S-L consensus motif for targeting peroxisome is present at the carboxy-terminus of each protein. A homologous Hex1 gene, named MrHex1, was deleted in the entomopathogenic fungus Metarhizium robertsii. It was found that MrHex1 was responsible for WB formation in M. robertsii and involved in sealing septal pores to maintain cell integrity and heterogeneity. Different assays indicated that, relative to the wild-type (WT) strain, ∆Mrhex1 demonstrated a growth defect on a solid medium and substantial reductions of conidiation, appressorium formation and topical infectivity against insect hosts. However, there was no obvious virulence difference between WT and mutants during injection of insects. We also found that ∆MrHex1 could tolerate different stress conditions like the WT and the gene-rescued mutant of M. robertsii, which is in contrast to the reports of the stress-response defects of the Hex1 null mutants of other fungal species. In addition to revealing the phenotypic/functional alterations of the Hex1 deletion mutants between different pathotype fungi, the results of this study may benefit the understanding of the evolution and WB-control of fungal entomopathogenicity.
Collapse
|
8
|
Mahajan S, Ramya TNC. Nature-inspired engineering of an F-type lectin for increased binding strength. Glycobiology 2019; 28:933-948. [PMID: 30202877 DOI: 10.1093/glycob/cwy082] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2017] [Accepted: 09/07/2018] [Indexed: 11/13/2022] Open
Abstract
Individual lectin-carbohydrate interactions are usually of low affinity. However, high avidity is frequently attained by the multivalent presentation of glycans on biological surfaces coupled with the occurrence of high order lectin oligomers or tandem repeats of lectin domains in the polypeptide. F-type lectins are l-fucose binding lectins with a typical sequence motif, HX(26)RXDX(4)R/K, whose residues participate in l-fucose binding. We previously reported the presence of a few eukaryotic F-type lectin domains with partial sequence duplication that results in the presence of two l-fucose-binding sequence motifs. We hypothesized that such partial sequence duplication would result in greater avidity of lectin-ligand interactions. Inspired by this example from Nature, we attempted to engineer a bacterial F-type lectin domain from Streptosporangium roseum to attain avid binding by mimicking partial duplication. The engineered lectin demonstrated 12-fold greater binding strength than the wild-type lectin to multivalent fucosylated glycoconjugates. However, the affinity to the monosaccharide l-fucose in solution was similar and partial sequence duplication did not result in an additional functional l-fucose binding site. We also cloned, expressed and purified a Branchiostoma floridae F-type lectin domain with naturally occurring partial sequence duplication and confirmed that the duplicated region with the F-type lectin sequence motif did not participate in l-fucose binding. We found that the greater binding strength of the engineered lectin from S. roseum was instead due to increased oligomerization. We believe that this Nature-inspired strategy might be useful for engineering lectins to improve binding strength in various applications.
Collapse
Affiliation(s)
- Sonal Mahajan
- Institute of Microbial Technology, Sector 39-A, Chandigarh, India
| | - T N C Ramya
- Institute of Microbial Technology, Sector 39-A, Chandigarh, India
| |
Collapse
|
9
|
You C, Liu C, Li Y, Jiang P, Ma Q. Structural and enzymatic analysis of the cytochrome b 5 reductase domain of Ulva prolifera nitrate reductase. Int J Biol Macromol 2018; 111:1175-1182. [PMID: 29371148 DOI: 10.1016/j.ijbiomac.2018.01.140] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2017] [Revised: 01/17/2018] [Accepted: 01/20/2018] [Indexed: 10/18/2022]
Abstract
Rapid accumulations of unattached green macroalgae, referred to as blooms, constitute ecological disasters and occur in many coastal regions. Ulva are a major cause of blooms, owing to their high nitrogen utilization capacity, which requires nitrate reductase (NR) activity; however, molecular characterization of Ulva NR remains lacking. Herein we determined the crystal structure and performed an enzymatic analysis of the cytochrome b5 reductase domain of Ulva prolifera NR (UpCbRNR). The structural analysis revealed an N-terminal FAD-binding domain primarily consisting of six antiparallel β strands, a C-terminal NADH-binding domain forming a Rossmann fold, and a three β-stranded linker region connecting these two domains. The FAD cofactor was located in the cleft between the two domains and interacted primarily with the FAD-binding domain. UpCbRNR shares similarities in overall structure and cofactor interactions with homologs, and its catalytic ability is comparable to that of higher plant CbRNRs. Structure and sequence comparisons of homologs revealed two regions of sequence length variation potentially useful for phylogenetic analysis: one in the FAD-binding domain, specific to U. prolifera, and another in the linker region that may be used to differentiate between plant, fungi, and animal homologs. Our data will facilitate molecular-level understanding of nitrate assimilation in Ulva.
Collapse
Affiliation(s)
- Cai You
- Key Laboratory of Experimental Marine Biology, Institute of Oceanology, Chinese Academy of Sciences, Nanhai Road 7, Qingdao 266071, China; Laboratory for Marine Biology and Biotechnology, Qingdao National Laboratory for Marine Science and Technology, Qingdao 266237, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Changshui Liu
- Key Laboratory of Experimental Marine Biology, Institute of Oceanology, Chinese Academy of Sciences, Nanhai Road 7, Qingdao 266071, China; Laboratory for Marine Biology and Biotechnology, Qingdao National Laboratory for Marine Science and Technology, Qingdao 266237, China
| | - Yingjie Li
- Key Laboratory of Experimental Marine Biology, Institute of Oceanology, Chinese Academy of Sciences, Nanhai Road 7, Qingdao 266071, China; Laboratory for Marine Biology and Biotechnology, Qingdao National Laboratory for Marine Science and Technology, Qingdao 266237, China
| | - Peng Jiang
- Key Laboratory of Experimental Marine Biology, Institute of Oceanology, Chinese Academy of Sciences, Nanhai Road 7, Qingdao 266071, China; Laboratory for Marine Biology and Biotechnology, Qingdao National Laboratory for Marine Science and Technology, Qingdao 266237, China.
| | - Qingjun Ma
- Key Laboratory of Experimental Marine Biology, Institute of Oceanology, Chinese Academy of Sciences, Nanhai Road 7, Qingdao 266071, China; Laboratory for Marine Biology and Biotechnology, Qingdao National Laboratory for Marine Science and Technology, Qingdao 266237, China.
| |
Collapse
|
10
|
Do H, Kang E, Yang B, Cha HJ, Choi YS. A tyrosinase, mTyr-CNK, that is functionally available as a monophenol monooxygenase. Sci Rep 2017; 7:17267. [PMID: 29222480 PMCID: PMC5722948 DOI: 10.1038/s41598-017-17635-0] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2017] [Accepted: 11/28/2017] [Indexed: 12/20/2022] Open
Abstract
Tyrosinase efficiently catalyzes the ortho-hydroxylation of monophenols and the oxidation of diphenols without any additional cofactors. Although it is of significant interest for the biosynthesis of catechol derivatives, the rapid catechol oxidase activity and inactivation of tyrosinase have hampered its practical utilization as a monophenol monooxygenase. Here, we prepared a functional tyrosinase that exhibited a distinguished monophenolase/diphenolase activity ratio (Vmax mono/ Vmax di = 3.83) and enhanced catalytic efficiency against L-tyrosine (kcat = 3.33 ± 0.18 s−1, Km = 2.12 ± 0.14 mM at 20 °C and pH 6.0). This enzyme was still highly active in ice water (>80%), and its activity was well conserved below 30 °C. In vitro DOPA modification, with a remarkably high yield as a monophenol monooxygenase, was achieved by the enzyme taking advantage of these biocatalytic properties. These results demonstrate the strong potential for this enzyme’s use as a monophenol monooxygenase in biomedical and industrial applications.
Collapse
Affiliation(s)
- Hyunsu Do
- Department of Chemical Engineering and Applied Chemistry, Chungnam National University, Daejeon, 34134, South Korea
| | - Eungsu Kang
- Department of Chemical Engineering and Applied Chemistry, Chungnam National University, Daejeon, 34134, South Korea
| | - Byeongseon Yang
- Department of Chemical Engineering, Pohang University of Science and Technology, Pohang, 37673, South Korea
| | - Hyung Joon Cha
- Department of Chemical Engineering, Pohang University of Science and Technology, Pohang, 37673, South Korea
| | - Yoo Seong Choi
- Department of Chemical Engineering and Applied Chemistry, Chungnam National University, Daejeon, 34134, South Korea.
| |
Collapse
|
11
|
Measuring Accelerated Rates of Insertions and Deletions Independent of Rates of Nucleotide Substitution. J Mol Evol 2016; 83:137-146. [PMID: 27770175 PMCID: PMC5080320 DOI: 10.1007/s00239-016-9761-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2016] [Accepted: 10/11/2016] [Indexed: 11/16/2022]
Abstract
Evolutionary constraint for insertions and deletions (indels) is not necessarily equal to constraint for nucleotide substitutions for any given region of a genome. Knowing the variation in indel-specific evolutionary rates across the sequence will aid our understanding of evolutionary constraints on indels, and help us infer how indels have contributed to the evolution of the sequence. However, unlike for nucleotide substitutions, there has been no phylogenetic method that can statistically infer significantly different rates of indels across the sequence space independent of substitution rates. Here, we have developed a software that will find sites with accelerated evolutionary rates specific to indels, by introducing a scaling parameter that only applies to the indel rates and not to the nucleotide substitution rates. Using the software, we show that we can find regions of accelerated rates of indels in the protein alignments of primate genomes. We also confirm that the sites that have high rates of indels are different from the sites that have high rates of nucleotide substitutions within the protein sequences. By identifying regions with accelerated rates of indels independent of nucleotide substitutions, we will be able to better understand the impact of indel mutations on protein sequence evolution.
Collapse
|
12
|
Molecular Dynamics Simulations and Structural Analysis to Decipher Functional Impact of a Twenty Residue Insert in the Ternary Complex of Mus musculus TdT Isoform. PLoS One 2016; 11:e0157286. [PMID: 27311013 PMCID: PMC4911049 DOI: 10.1371/journal.pone.0157286] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2015] [Accepted: 05/26/2016] [Indexed: 01/08/2023] Open
Abstract
Insertions/deletions are common evolutionary tools employed to alter the structural and functional repertoire of protein domains. An insert situated proximal to the active site or ligand binding site frequently impacts protein function; however, the effect of distal indels on protein activity and/or stability are often not studied. In this paper, we have investigated a distal insert, which influences the function and stability of a unique DNA polymerase, called terminal deoxynucleotidyl transferase (TdT). TdT (EC:2.7.7.31) is a monomeric 58 kDa protein belonging to family X of eukaryotic DNA polymerases and known for its role in V(D)J recombination as well as in non-homologous end-joining (NHEJ) pathways. Two murine isoforms of TdT, with a length difference of twenty residues and having different biochemical properties, have been studied. All-atom molecular dynamics simulations at different temperatures and interaction network analyses were performed on the short and long-length isoforms. We observed conformational changes in the regions distal to the insert position (thumb subdomain) in the longer isoform, which indirectly affects the activity and stability of the enzyme through a mediating loop (Loop1). A structural rationale could be provided to explain the reduced polymerization rate as well as increased thermosensitivity of the longer isoform caused by peripherally located length variations within a DNA polymerase. These observations increase our understanding of the roles of length variants in introducing functional diversity in protein families in general.
Collapse
|
13
|
Das S, Dawson NL, Orengo CA. Diversity in protein domain superfamilies. Curr Opin Genet Dev 2015; 35:40-9. [PMID: 26451979 PMCID: PMC4686048 DOI: 10.1016/j.gde.2015.09.005] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2015] [Revised: 09/07/2015] [Accepted: 09/08/2015] [Indexed: 01/25/2023]
Abstract
Whilst ∼93% of domain superfamilies appear to be relatively structurally and functionally conserved based on the available data from the CATH-Gene3D domain classification resource, the remainder are much more diverse. In this review, we consider how domains in some of the most ubiquitous and promiscuous superfamilies have evolved, in particular the plasticity in their functional sites and surfaces which expands the repertoire of molecules they interact with and actions performed on them. To what extent can we identify a core function for these superfamilies which would allow us to develop a ‘domain grammar of function’ whereby a protein's biological role can be proposed from its constituent domains? Clearly the first step is to understand the extent to which these components vary and how changes in their molecular make-up modifies function.
Collapse
Affiliation(s)
- Sayoni Das
- Institute of Structural and Molecular Biology, UCL, 627 Darwin Building, Gower Street, WC1E 6BT, UK
| | - Natalie L Dawson
- Institute of Structural and Molecular Biology, UCL, 627 Darwin Building, Gower Street, WC1E 6BT, UK
| | - Christine A Orengo
- Institute of Structural and Molecular Biology, UCL, 627 Darwin Building, Gower Street, WC1E 6BT, UK.
| |
Collapse
|
14
|
Surkont J, Diekmann Y, Ryder PV, Pereira-Leal JB. Coiled-coil length: Size does matter. Proteins 2015; 83:2162-9. [PMID: 26387794 DOI: 10.1002/prot.24932] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2015] [Revised: 08/23/2015] [Accepted: 09/14/2015] [Indexed: 11/09/2022]
Abstract
Protein evolution is governed by processes that alter primary sequence but also the length of proteins. Protein length may change in different ways, but insertions, deletions and duplications are the most common. An optimal protein size is a trade-off between sequence extension, which may change protein stability or lead to acquisition of a new function, and shrinkage that decreases metabolic cost of protein synthesis. Despite the general tendency for length conservation across orthologous proteins, the propensity to accept insertions and deletions is heterogeneous along the sequence. For example, protein regions rich in repetitive peptide motifs are well known to extensively vary their length across species. Here, we analyze length conservation of coiled-coils, domains formed by an ubiquitous, repetitive peptide motif present in all domains of life, that frequently plays a structural role in the cell. We observed that, despite the repetitive nature, the length of coiled-coil domains is generally highly conserved throughout the tree of life, even when the remaining parts of the protein change, including globular domains. Length conservation is independent of primary amino acid sequence variation, and represents a conservation of domain physical size. This suggests that the conservation of domain size is due to functional constraints.
Collapse
Affiliation(s)
| | - Yoan Diekmann
- Instituto Gulbenkian de Ciência, Oeiras, 2780-156, Portugal.,Physiology Course, Marine Biological Laboratory, Woods Hole, Massachusetts, 02543
| | - Pearl V Ryder
- Physiology Course, Marine Biological Laboratory, Woods Hole, Massachusetts, 02543.,Emory University School of Medicine, Atlanta, Georgia, 30322
| | - Jose B Pereira-Leal
- Instituto Gulbenkian de Ciência, Oeiras, 2780-156, Portugal.,Physiology Course, Marine Biological Laboratory, Woods Hole, Massachusetts, 02543
| |
Collapse
|
15
|
Andreatta ME, Levine JA, Foy SG, Guzman LD, Kosinski LJ, Cordes MHJ, Masel J. The Recent De Novo Origin of Protein C-Termini. Genome Biol Evol 2015; 7:1686-701. [PMID: 26002864 PMCID: PMC4494051 DOI: 10.1093/gbe/evv098] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
Abstract
Protein-coding sequences can arise either from duplication and divergence of existing sequences, or de novo from noncoding DNA. Unfortunately, recently evolved de novo genes can be hard to distinguish from false positives, making their study difficult. Here, we study a more tractable version of the process of conversion of noncoding sequence into coding: the co-option of short segments of noncoding sequence into the C-termini of existing proteins via the loss of a stop codon. Because we study recent additions to potentially old genes, we are able to apply a variety of stringent quality filters to our annotations of what is a true protein-coding gene, discarding the putative proteins of unknown function that are typical of recent fully de novo genes. We identify 54 examples of C-terminal extensions in Saccharomyces and 28 in Drosophila, all of them recent enough to still be polymorphic. We find one putative gene fusion that turns out, on close inspection, to be the product of replicated assembly errors, further highlighting the issue of false positives in the study of rare events. Four of the Saccharomyces C-terminal extensions (to ADH1, ARP8, TPM2, and PIS1) that survived our quality filters are predicted to lead to significant modification of a protein domain structure.
Collapse
Affiliation(s)
- Matthew E Andreatta
- Department of Ecology & Evolutionary Biology, University of Arizona Present address: Aegis Sciences, Nashville, TN
| | - Joshua A Levine
- Department of Ecology & Evolutionary Biology, University of Arizona
| | - Scott G Foy
- Department of Ecology & Evolutionary Biology, University of Arizona
| | - Lynette D Guzman
- Department of Ecology & Evolutionary Biology, University of Arizona Present address: Program in Mathematics Education, Michigan State University, MI
| | - Luke J Kosinski
- Biochemistry and Molecular & Cellular Biology Graduate Program, University of Arizona
| | | | - Joanna Masel
- Department of Ecology & Evolutionary Biology, University of Arizona
| |
Collapse
|
16
|
Prakash A, Bateman A. Domain atrophy creates rare cases of functional partial protein domains. Genome Biol 2015; 16:88. [PMID: 25924720 PMCID: PMC4432964 DOI: 10.1186/s13059-015-0655-8] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2014] [Accepted: 04/15/2015] [Indexed: 01/12/2023] Open
Abstract
BACKGROUND Protein domains display a range of structural diversity, with numerous additions and deletions of secondary structural elements between related domains. We have observed a small number of cases of surprising large-scale deletions of core elements of structural domains. We propose a new concept called domain atrophy, where protein domains lose a significant number of core structural elements. RESULTS Here, we implement a new pipeline to systematically identify new cases of domain atrophy across all known protein sequences. The output of this pipeline was carefully checked by hand, which filtered out partial domain instances that were unlikely to represent true domain atrophy due to misannotations or un-annotated sequence fragments. We identify 75 cases of domain atrophy, of which eight cases are found in a three-dimensional protein structure and 67 cases have been inferred based on mapping to a known homologous structure. Domains with structural variations include ancient folds such as the TIM-barrel and Rossmann folds. Most of these domains are observed to show structural loss that does not affect their functional sites. CONCLUSION Our analysis has significantly increased the known cases of domain atrophy. We discuss specific instances of domain atrophy and see that there has often been a compensatory mechanism that helps to maintain the stability of the partial domain. Our study indicates that although domain atrophy is an extremely rare phenomenon, protein domains under certain circumstances can tolerate extreme mutations giving rise to partial, but functional, domains.
Collapse
Affiliation(s)
- Ananth Prakash
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK.
| | - Alex Bateman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK.
| |
Collapse
|
17
|
Chen CR, Makhatadze GI. ProteinVolume: calculating molecular van der Waals and void volumes in proteins. BMC Bioinformatics 2015; 16:101. [PMID: 25885484 PMCID: PMC4379742 DOI: 10.1186/s12859-015-0531-2] [Citation(s) in RCA: 70] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2014] [Accepted: 03/10/2015] [Indexed: 11/20/2022] Open
Abstract
Background Voids and cavities in the native protein structure determine the pressure unfolding of proteins. In addition, the volume changes due to the interaction of newly exposed atoms with solvent upon protein unfolding also contribute to the pressure unfolding of proteins. Quantitative understanding of these effects is important for predicting and designing proteins with predefined response to changes in hydrostatic pressure using computational approaches. The molecular surface volume is a useful metric that describes contribution of geometrical volume, which includes van der Waals volume and volume of the voids, to the total volume of a protein in solution, thus isolating the effects of hydration for separate calculations. Results We developed ProteinVolume, a highly robust and easy-to-use tool to compute geometric volumes of proteins. ProteinVolume generates the molecular surface of a protein and uses an innovative flood-fill algorithm to calculate the individual components of the molecular surface volume, van der Waals and intramolecular void volumes. ProteinVolume is user friendly and is available as a web-server or a platform-independent command-line version. Conclusions ProteinVolume is a highly accurate and fast application to interrogate geometric volumes of proteins. ProteinVolume is a free web server available on http://gmlab.bio.rpi.edu. Free-standing platform-independent Java-based ProteinVolume executable is also freely available at this web site. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0531-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Calvin R Chen
- Department of Biological Sciences and Center for Biotechnology and Interdisciplinary Studies, Rensselaer Polytechnic Institute, 110 8th Street, Troy, NY, 12180, USA.
| | - George I Makhatadze
- Department of Biological Sciences and Center for Biotechnology and Interdisciplinary Studies, Rensselaer Polytechnic Institute, 110 8th Street, Troy, NY, 12180, USA.
| |
Collapse
|
18
|
Mahajan S, de Brevern AG, Sanejouand YH, Srinivasan N, Offmann B. Use of a structural alphabet to find compatible folds for amino acid sequences. Protein Sci 2014; 24:145-53. [PMID: 25297700 DOI: 10.1002/pro.2581] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2014] [Accepted: 10/06/2014] [Indexed: 01/01/2023]
Abstract
The structural annotation of proteins with no detectable homologs of known 3D structure identified using sequence-search methods is a major challenge today. We propose an original method that computes the conditional probabilities for the amino-acid sequence of a protein to fit to known protein 3D structures using a structural alphabet, known as "Protein Blocks" (PBs). PBs constitute a library of 16 local structural prototypes that approximate every part of protein backbone structures. It is used to encode 3D protein structures into 1D PB sequences and to capture sequence to structure relationships. Our method relies on amino acid occurrence matrices, one for each PB, to score global and local threading of query amino acid sequences to protein folds encoded into PB sequences. It does not use any information from residue contacts or sequence-search methods or explicit incorporation of hydrophobic effect. The performance of the method was assessed with independent test datasets derived from SCOP 1.75A. With a Z-score cutoff that achieved 95% specificity (i.e., less than 5% false positives), global and local threading showed sensitivity of 64.1% and 34.2%, respectively. We further tested its performance on 57 difficult CASP10 targets that had no known homologs in PDB: 38 compatible templates were identified by our approach and 66% of these hits yielded correctly predicted structures. This method scales-up well and offers promising perspectives for structural annotations at genomic level. It has been implemented in the form of a web-server that is freely available at http://www.bo-protscience.fr/forsa.
Collapse
Affiliation(s)
- Swapnil Mahajan
- Université de La Réunion, DSIMB, UMR-S S1134, Saint Denis Messag Cedex 09, La Réunion, F-97715, France; INSERM, UMR-S 1134, DSIMB, F-75739, Paris, France; Laboratoire d'Excellence, GR-Ex, Paris, F-75739, France; Université de Nantes, UFIP CNRS UMR 6286 Faculté des Sciences et Techniques, 2 rue de la Houssinière, 44392, Nantes Cedex 03, France
| | | | | | | | | |
Collapse
|
19
|
Loss of quaternary structure is associated with rapid sequence divergence in the OSBS family. Proc Natl Acad Sci U S A 2014; 111:8535-40. [PMID: 24872444 DOI: 10.1073/pnas.1318703111] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
The rate of protein evolution is determined by a combination of selective pressure on protein function and biophysical constraints on protein folding and structure. Determining the relative contributions of these properties is an unsolved problem in molecular evolution with broad implications for protein engineering and function prediction. As a case study, we examined the structural divergence of the rapidly evolving o-succinylbenzoate synthase (OSBS) family, which catalyzes a step in menaquinone synthesis in diverse microorganisms and plants. On average, the OSBS family is much more divergent than other protein families from the same set of species, with the most divergent family members sharing <15% sequence identity. Comparing 11 representative structures revealed that loss of quaternary structure and large deletions or insertions are associated with the family's rapid evolution. Neither of these properties has been investigated in previous studies to identify factors that affect the rate of protein evolution. Intriguingly, one subfamily retained a multimeric quaternary structure and has small insertions and deletions compared with related enzymes that catalyze diverse reactions. Many proteins in this subfamily catalyze both OSBS and N-succinylamino acid racemization (NSAR). Retention of ancestral structural characteristics in the NSAR/OSBS subfamily suggests that the rate of protein evolution is not proportional to the capacity to evolve new protein functions. Instead, structural features that are conserved among proteins with diverse functions might contribute to the evolution of new functions.
Collapse
|
20
|
Mutt E, Rani SS, Sowdhamini R. Structural updates of alignment of protein domains and consequences on evolutionary models of domain superfamilies. BioData Min 2013; 6:20. [PMID: 24237883 PMCID: PMC4175504 DOI: 10.1186/1756-0381-6-20] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2012] [Accepted: 09/24/2013] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Influx of newly determined crystal structures into primary structural databases is increasing at a rapid pace. This leads to updation of primary and their dependent secondary databases which makes large scale analysis of structures even more challenging. Hence, it becomes essential to compare and appreciate replacement of data and inclusion of new data that is critical between two updates. PASS2 is a database that retains structure-based sequence alignments of protein domain superfamilies and relies on SCOP database for its hierarchy and definition of superfamily members. Since, accurate alignments of distantly related proteins are useful evolutionary models for depicting variations within protein superfamilies, this study aims to trace the changes in data in between PASS2 updates. RESULTS In this study, differences in superfamily compositions, family constituents and length variations between different versions of PASS2 have been tracked. Studying length variations in protein domains, which have been introduced by indels (insertions/deletions), are important because theses indels act as evolutionary signatures in introducing variations in substrate specificity, domain interactions and sometimes even regulating protein stability. With this objective of classifying the nature and source of variations in the superfamilies during transitions (between the different versions of PASS2), increasing length-rigidity of the superfamilies in the recent version is observed. In order to study such length-variant superfamilies in detail, an improved classification approach is also presented, which divides the superfamilies into distinct groups based on their extent of length variation. CONCLUSIONS An objective study in terms of transition between the database updates, detailed investigation of the new/old members and examination of their structural alignments is non-trivial and will help researchers in designing experiments on specific superfamilies, in various modelling studies, in linking representative superfamily members to rapidly expanding sequence space and in evaluating the effects of length variations of new members in drug target proteins. The improved objective classification scheme developed here would be useful in future for automatic analysis of length variation in cases of updates of databases or even within different secondary databases.
Collapse
Affiliation(s)
- Eshita Mutt
- National Centre for Biological Sciences (TIFR), UAS-GKVK Campus, Bellary Road, 560 065 Bangalore, India.
| | | | | |
Collapse
|
21
|
Mutt E, Mathew OK, Sowdhamini R. LenVarDB: database of length-variant protein domains. Nucleic Acids Res 2013; 42:D246-50. [PMID: 24194591 PMCID: PMC3964994 DOI: 10.1093/nar/gkt1014] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Protein domains are functionally and structurally independent modules, which add to the functional variety of proteins. This array of functional diversity has been enabled by evolutionary changes, such as amino acid substitutions or insertions or deletions, occurring in these protein domains. Length variations (indels) can introduce changes at structural, functional and interaction levels. LenVarDB (freely available at http://caps.ncbs.res.in/lenvardb/) traces these length variations, starting from structure-based sequence alignments in our Protein Alignments organized as Structural Superfamilies (PASS2) database, across 731 structural classification of proteins (SCOP)-based protein domain superfamilies connected to 2 730 625 sequence homologues. Alignment of sequence homologues corresponding to a structural domain is available, starting from a structure-based sequence alignment of the superfamily. Orientation of the length-variant (indel) regions in protein domains can be visualized by mapping them on the structure and on the alignment. Knowledge about location of length variations within protein domains and their visual representation will be useful in predicting changes within structurally or functionally relevant sites, which may ultimately regulate protein function. Non-technical summary: Evolutionary changes bring about natural changes to proteins that may be found in many organisms. Such changes could be reflected as amino acid substitutions or insertions–deletions (indels) in protein sequences. LenVarDB is a database that provides an early overview of observed length variations that were set among 731 protein families and after examining >2 million sequences. Indels are followed up to observe if they are close to the active site such that they can affect the activity of proteins. Inclusion of such information can aid the design of bioengineering experiments.
Collapse
Affiliation(s)
- Eshita Mutt
- International Institute of Information Technology-Hyderabad, Gachibowli, Hyderabad 500032, Andhra Pradesh, India, National Centre for Biological Sciences (TIFR), UAS-GKVK Campus, Bellary Road, Bangalore 560065, Karnataka, India and SASTRA University, Tirumalaisamudram, Thanjavur 613401, Tamil Nadu, India
| | | | | |
Collapse
|
22
|
Mahajan S, Agarwal G, Iftekhar M, Offmann B, de Brevern AG, Srinivasan N. DoSA: Database of Structural Alignments. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2013; 2013:bat048. [PMID: 23846594 PMCID: PMC3708618 DOI: 10.1093/database/bat048] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/02/2022]
Abstract
Protein structure alignment is a crucial step in protein structure–function analysis. Despite the advances in protein structure alignment algorithms, some of the local conformationally similar regions are mislabeled as structurally variable regions (SVRs). These regions are not well superimposed because of differences in their spatial orientations. The Database of Structural Alignments (DoSA) addresses this gap in identification of local structural similarities obscured in global protein structural alignments by realigning SVRs using an algorithm based on protein blocks. A set of protein blocks is a structural alphabet that abstracts protein structures into 16 unique local structural motifs. DoSA provides unique information about 159 780 conformationally similar and 56 140 conformationally dissimilar SVRs in 74 705 pairwise structural alignments of homologous proteins. The information provided on conformationally similar and dissimilar SVRs can be helpful to model loop regions. It is also conceivable that conformationally similar SVRs with conserved residues could potentially contribute toward functional integrity of homologues, and hence identifying such SVRs could be helpful in understanding the structural basis of protein function. Database URL:http://bo-protscience.fr/dosa/
Collapse
Affiliation(s)
- Swapnil Mahajan
- Dynamique des Structures et Interactions des Macromolécules Biologiques, UMR-S INSERM S665, Faculté des Sciences et Technologies, Université de La Réunion, F-97715 Saint Denis Messag Cedex 09, La Réunion, France
| | | | | | | | | | | |
Collapse
|
23
|
Peng FY, Weselake RJ. Genome-wide identification and analysis of the B3 superfamily of transcription factors in Brassicaceae and major crop plants. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2013; 126:1305-19. [PMID: 23377560 DOI: 10.1007/s00122-013-2054-4] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/28/2012] [Accepted: 01/09/2013] [Indexed: 05/04/2023]
Abstract
The plant-specific B3 superfamily of transcription factors has diverse functions in plant growth and development. Using a genome-wide domain analysis, we identified 92, 187, 58, 90, 81, 55, and 77 B3 transcription factor genes in the sequenced genome of Arabidopsis, Brassica rapa, castor bean (Ricinus communis), cocoa (Theobroma cacao), soybean (Glycine max), maize (Zea mays), and rice (Oryza sativa), respectively. The B3 superfamily has substantially expanded during the evolution in eudicots particularly in Brassicaceae, as compared to monocots in the analysis. We observed domain duplication in some of these B3 proteins, forming more complex domain architectures than currently understood. We found that the length of B3 domains exhibits a large variation, which may affect their exact number of α-helices and β-sheets in the core structure of B3 domains, and possibly have functional implications. Analysis of the public microarray data indicated that most of the B3 gene pairs encoding Arabidopsis-rice orthologs are preferentially expressed in different tissues, suggesting their different roles in these two species. Using ESTs in crops, we identified many B3 genes preferentially expressed in reproductive tissues. In a sequence-based quantitative trait loci analysis in rice and maize, we have found many B3 genes associated with traits such as grain yield, seed weight and number, and protein content. Our results provide a framework for future studies into the function of B3 genes in different phases of plant development, especially the ones related to traits in major crops.
Collapse
Affiliation(s)
- Fred Y Peng
- Agricultural Lipid Biotechnology Program, Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, AB T6G 2P5, Canada
| | | |
Collapse
|
24
|
Residue mutations and their impact on protein structure and function: detecting beneficial and pathogenic changes. Biochem J 2013; 449:581-94. [DOI: 10.1042/bj20121221] [Citation(s) in RCA: 131] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
The present review focuses on the evolution of proteins and the impact of amino acid mutations on function from a structural perspective. Proteins evolve under the law of natural selection and undergo alternating periods of conservative evolution and of relatively rapid change. The likelihood of mutations being fixed in the genome depends on various factors, such as the fitness of the phenotype or the position of the residues in the three-dimensional structure. For example, co-evolution of residues located close together in three-dimensional space can occur to preserve global stability. Whereas point mutations can fine-tune the protein function, residue insertions and deletions (‘decorations’ at the structural level) can sometimes modify functional sites and protein interactions more dramatically. We discuss recent developments and tools to identify such episodic mutations, and examine their applications in medical research. Such tools have been tested on simulated data and applied to real data such as viruses or animal sequences. Traditionally, there has been little if any cross-talk between the fields of protein biophysics, protein structure–function and molecular evolution. However, the last several years have seen some exciting developments in combining these approaches to obtain an in-depth understanding of how proteins evolve. For example, a better understanding of how structural constraints affect protein evolution will greatly help us to optimize our models of sequence evolution. The present review explores this new synthesis of perspectives.
Collapse
|
25
|
Bolshoy A, Tatarinova T. Methods of combinatorial optimization to reveal factors affecting gene length. Bioinform Biol Insights 2012; 6:317-27. [PMID: 23300345 PMCID: PMC3528112 DOI: 10.4137/bbi.s10525] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
In this paper we present a novel method for genome ranking according to gene lengths. The main outcomes described in this paper are the following: the formulation of the genome ranking problem, presentation of relevant approaches to solve it, and the demonstration of preliminary results from prokaryotic genomes ordering. Using a subset of prokaryotic genomes, we attempted to uncover factors affecting gene length. We have demonstrated that hyperthermophilic species have shorter genes as compared with mesophilic organisms, which probably means that environmental factors affect gene length. Moreover, these preliminary results show that environmental factors group together in ranking evolutionary distant species.
Collapse
Affiliation(s)
- Alexander Bolshoy
- Department of Evolutionary and Environmental Biology and Institute of Evolution, University of Haifa, Israel. ; Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | | |
Collapse
|
26
|
Joseph AP, Valadié H, Srinivasan N, de Brevern AG. Local structural differences in homologous proteins: specificities in different SCOP classes. PLoS One 2012; 7:e38805. [PMID: 22745680 PMCID: PMC3382195 DOI: 10.1371/journal.pone.0038805] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2011] [Accepted: 05/10/2012] [Indexed: 11/19/2022] Open
Abstract
The constant increase in the number of solved protein structures is of great help in understanding the basic principles behind protein folding and evolution. 3-D structural knowledge is valuable in designing and developing methods for comparison, modelling and prediction of protein structures. These approaches for structure analysis can be directly implicated in studying protein function and for drug design. The backbone of a protein structure favours certain local conformations which include α-helices, β-strands and turns. Libraries of limited number of local conformations (Structural Alphabets) were developed in the past to obtain a useful categorization of backbone conformation. Protein Block (PB) is one such Structural Alphabet that gave a reasonable structure approximation of 0.42 Å. In this study, we use PB description of local structures to analyse conformations that are preferred sites for structural variations and insertions, among group of related folds. This knowledge can be utilized in improving tools for structure comparison that work by analysing local structure similarities. Conformational differences between homologous proteins are known to occur often in the regions comprising turns and loops. Interestingly, these differences are found to have specific preferences depending upon the structural classes of proteins. Such class-specific preferences are mainly seen in the all-β class with changes involving short helical conformations and hairpin turns. A test carried out on a benchmark dataset also indicates that the use of knowledge on the class specific variations can improve the performance of a PB based structure comparison approach. The preference for the indel sites also seem to be confined to a few backbone conformations involving β-turns and helix C-caps. These are mainly associated with short loops joining the regular secondary structures that mediate a reversal in the chain direction. Rare β-turns of type I’ and II’ are also identified as preferred sites for insertions.
Collapse
Affiliation(s)
- Agnel Praveen Joseph
- INSERM, UMR-S 665, Dynamique des Structures et Interactions des Macromolécules Biologiques (DSIMB), Paris, France
- Univ Paris Diderot, Sorbonne Paris Cité, UMR 665, Paris, France
- Institut National de la Transfusion Sanguine (INTS), Paris, France
| | - Hélène Valadié
- INSERM UMR-S 726, DSIMB, Université Paris Diderot - Paris 7, Paris, France
| | | | - Alexandre G. de Brevern
- INSERM, UMR-S 665, Dynamique des Structures et Interactions des Macromolécules Biologiques (DSIMB), Paris, France
- Univ Paris Diderot, Sorbonne Paris Cité, UMR 665, Paris, France
- Institut National de la Transfusion Sanguine (INTS), Paris, France
- * E-mail:
| |
Collapse
|
27
|
Hydrophobic forces and the length limit of foldable protein domains. Proc Natl Acad Sci U S A 2012; 109:9851-6. [PMID: 22665780 DOI: 10.1073/pnas.1207382109] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
To find the native conformation (fold), proteins sample a subspace that is typically hundreds of orders of magnitude smaller than their full conformational space. Whether such fast folding is intrinsic or the result of natural selection, and what is the longest foldable protein, are open questions. Here, we derive the average conformational degeneracy of a lattice polypeptide chain in water and quantitatively show that the constraints associated with hydrophobic forces are themselves sufficient to reduce the effective conformational space to a size compatible with the folding of proteins up to approximately 200 amino acids long within a biologically reasonable amount of time. This size range is in general agreement with the experimental protein domain length distribution obtained from approximately 1,200 proteins. Molecular dynamics simulations of the Trp-cage protein confirm this picture on the free energy landscape. Our analytical and computational results are consistent with a model in which the length and time scales of protein folding, as well as the modular nature of large proteins, are dictated primarily by inherent physical forces, whereas natural selection determines the native state.
Collapse
|
28
|
Gandhimathi A, Nair AG, Sowdhamini R. PASS2 version 4: an update to the database of structure-based sequence alignments of structural domain superfamilies. Nucleic Acids Res 2011; 40:D531-4. [PMID: 22123743 PMCID: PMC3245109 DOI: 10.1093/nar/gkr1096] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Accurate structure-based sequence alignments of distantly related proteins are crucial in gaining insight about protein domains that belong to a superfamily. The PASS2 database provides alignments of proteins related at the superfamily level and are characterized by low sequence identity. We thus report an automated, updated version of the superfamily alignment database known as PASS2.4, consisting of 1961 superfamilies and 10 569 protein domains, which is in direct correspondence with SCOP (1.75) database. Database organization, improved methods for efficient structure-based sequence alignments and the analysis of extreme distantly related proteins within superfamilies formed the focus of this update. Alignment of family-specific functional residues can be realized using such alignments and is shown using one superfamily as an example. The database of alignments and other related features can be accessed at http://caps.ncbs.res.in/pass2/.
Collapse
Affiliation(s)
- A Gandhimathi
- National centre for Biological Sciences, TIFR, GKVK campus, Bangalore 560 065, Karnataka, India
| | | | | |
Collapse
|
29
|
Agarwal G, Mahajan S, Srinivasan N, de Brevern AG. Identification of local conformational similarity in structurally variable regions of homologous proteins using protein blocks. PLoS One 2011; 6:e17826. [PMID: 21445259 PMCID: PMC3060819 DOI: 10.1371/journal.pone.0017826] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2010] [Accepted: 02/15/2011] [Indexed: 11/18/2022] Open
Abstract
Structure comparison tools can be used to align related protein structures to identify structurally conserved and variable regions and to infer functional and evolutionary relationships. While the conserved regions often superimpose well, the variable regions appear non superimposable. Differences in homologous protein structures are thought to be due to evolutionary plasticity to accommodate diverged sequences during evolution. One of the kinds of differences between 3-D structures of homologous proteins is rigid body displacement. A glaring example is not well superimposed equivalent regions of homologous proteins corresponding to α-helical conformation with different spatial orientations. In a rigid body superimposition, these regions would appear variable although they may contain local similarity. Also, due to high spatial deviation in the variable region, one-to-one correspondence at the residue level cannot be determined accurately. Another kind of difference is conformational variability and the most common example is topologically equivalent loops of two homologues but with different conformations. In the current study, we present a refined view of the “structurally variable” regions which may contain local similarity obscured in global alignment of homologous protein structures. As structural alphabet is able to describe local structures of proteins precisely through Protein Blocks approach, conformational similarity has been identified in a substantial number of ‘variable’ regions in a large data set of protein structural alignments; optimal residue-residue equivalences could be achieved on the basis of Protein Blocks which led to improved local alignments. Also, through an example, we have demonstrated how the additional information on local backbone structures through protein blocks can aid in comparative modeling of a loop region. In addition, understanding on sequence-structure relationships can be enhanced through our approach. This has been illustrated through examples where the equivalent regions in homologous protein structures share sequence similarity to varied extent but do not preserve local structure.
Collapse
Affiliation(s)
- Garima Agarwal
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India
| | - Swapnil Mahajan
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India
- National Centre for Biological Sciences, Tata Institute of Fundamental Research, UAS-GKVK Campus, Bangalore, India
| | | | - Alexandre G. de Brevern
- Dynamique des Structures et Interactions des Macromolécules Biologiques (DSIMB), INSERM, U665, Paris, France
- Université Paris Diderot - Paris 7, UMR-S665, Paris, France
- Institut National de la Transfusion Sanguine (INTS), Paris, France
| |
Collapse
|
30
|
Dessailly BH, Redfern OC, Cuff AL, Orengo CA. Detailed analysis of function divergence in a large and diverse domain superfamily: toward a refined protocol of function classification. Structure 2011; 18:1522-35. [PMID: 21070951 DOI: 10.1016/j.str.2010.08.017] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2010] [Revised: 08/06/2010] [Accepted: 08/13/2010] [Indexed: 10/18/2022]
Abstract
Some superfamilies contain large numbers of protein domains with very different functions. The ability to refine the functional classification of domains within these superfamilies is necessary for better understanding the evolution of functions and to guide function prediction of new relatives. To achieve this, a suitable starting point is the detailed analysis of functional divisions and mechanisms of functional divergence in a single superfamily. Here, we present such a detailed analysis in the superfamily of HUP domains. A biologically meaningful functional classification of HUP domains is obtained manually. Mechanisms of function diversification are investigated in detail using this classification. We observe that structural motifs play an important role in shaping broad functional divergence, whereas residue-level changes shape diversity at a more specific level. In parallel we examine the ability of an automated protocol to capture the biologically meaningful classification, with a view to automatically extending this classification in the future.
Collapse
Affiliation(s)
- Benoit H Dessailly
- Department of Structural and Molecular Biology, University College of London, Gower Street, London WC1E6BT, UK.
| | | | | | | |
Collapse
|
31
|
Pugalenthi G, Kandaswamy KK, Suganthan PN, Sowdhamini R, Martinetz T, Kolatkar PR. SMpred: a support vector machine approach to identify structural motifs in protein structure without using evolutionary information. J Biomol Struct Dyn 2011; 28:405-14. [PMID: 20919755 DOI: 10.1080/07391102.2010.10507369] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Abstract
Knowledge of three dimensional structure is essential to understand the function of a protein. Although the overall fold is made from the whole details of its sequence, a small group of residues, often called as structural motifs, play a crucial role in determining the protein fold and its stability. Identification of such structural motifs requires sufficient number of sequence and structural homologs to define conservation and evolutionary information. Unfortunately, there are many structures in the protein structure databases have no homologous structures or sequences. In this work, we report an SVM method, SMpred, to identify structural motifs from single protein structure without using sequence and structural homologs. SMpred method was trained and tested using 132 proteins domains containing 581 motifs. SMpred method achieved 78.79% accuracy with 79.06% sensitivity and 78.53% specificity. The performance of SMpred was evaluated with MegaMotifBase using 188 proteins containing 1161 motifs. Out of 1161 motifs, SMpred correctly identified 1503 structural motifs reported in MegaMotifBase. Further, we showed that SMpred is useful approach for the length deviant superfamilies and single member superfamilies. This result suggests the usefulness of our approach for facilitating the identification of structural motifs in protein structure in the absence of sequence and structural homologs. The dataset and executable for the SMpred algorithm is available at http://www3.ntu.edu.sg/home/EPNSugan/index_files/SMpred.htm.
Collapse
Affiliation(s)
- Ganesan Pugalenthi
- Laboratory of Structural Biochemistry, Genome Institute of Singapore, 60 Biopolis Street, Singapore 138672
| | | | | | | | | | | |
Collapse
|