1
|
Middendorf L, Eicholt LA. Random, de novo, and conserved proteins: How structure and disorder predictors perform differently. Proteins 2024; 92:757-767. [PMID: 38226524 DOI: 10.1002/prot.26652] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Revised: 10/18/2023] [Accepted: 12/01/2023] [Indexed: 01/17/2024]
Abstract
Understanding the emergence and structural characteristics of de novo and random proteins is crucial for unraveling protein evolution and designing novel enzymes. However, experimental determination of their structures remains challenging. Recent advancements in protein structure prediction, particularly with AlphaFold2 (AF2), have expanded our knowledge of protein structures, but their applicability to de novo and random proteins is unclear. In this study, we investigate the structural predictions and confidence scores of AF2 and protein language model-based predictor ESMFold for de novo and conserved proteins from Drosophila and a dataset of comparable random proteins. We find that the structural predictions for de novo and random proteins differ significantly from conserved proteins. Interestingly, a positive correlation between disorder and confidence scores (pLDDT) is observed for de novo and random proteins, in contrast to the negative correlation observed for conserved proteins. Furthermore, the performance of structure predictors for de novo and random proteins is hampered by the lack of sequence identity. We also observe fluctuating median predicted disorder among different sequence length quartiles for random proteins, suggesting an influence of sequence length on disorder predictions. In conclusion, while structure predictors provide initial insights into the structural composition of de novo and random proteins, their accuracy and applicability to such proteins remain limited. Experimental determination of their structures is necessary for a comprehensive understanding. The positive correlation between disorder and pLDDT could imply a potential for conditional folding and transient binding interactions of de novo and random proteins.
Collapse
Affiliation(s)
- Lasse Middendorf
- Institute for Evolution and Biodiversity, University of Muenster, Muenster, Germany
| | - Lars A Eicholt
- Institute for Evolution and Biodiversity, University of Muenster, Muenster, Germany
| |
Collapse
|
2
|
Wang Y, Li S, Zhou Z, Sun L, Sun J, Shen C, Gao R, Song J, Pu X. The Functional Characteristics and Soluble Expression of Saffron CsCCD2. Int J Mol Sci 2023; 24:15090. [PMID: 37894770 PMCID: PMC10606151 DOI: 10.3390/ijms242015090] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Revised: 10/10/2023] [Accepted: 10/10/2023] [Indexed: 10/29/2023] Open
Abstract
Crocins are important natural products predominantly obtained from the stigma of saffron, and that can be utilized as a medicinal compound, spice, and colorant with significant promise in the pharmaceutical, food, and cosmetic industries. Carotenoid cleavage dioxygenase 2 (CsCCD2) is a crucial limiting enzyme that has been reported to be responsible for the cleavage of zeaxanthin in the crocin biosynthetic pathway. However, the catalytic activity of CsCCD2 on β-carotene/lycopene remains elusive, and the soluble expression of CsCCD2 remains a big challenge. In this study, we reported the functional characteristics of CsCCD2, that can catalyze not only zeaxanthin cleavage but also β-carotene and lycopene cleavage. The molecular basis of the divergent functionality of CsCCD2 was elucidated using bioinformatic analysis and truncation studies. The protein expression optimization results demonstrated that the use of a maltose-binding protein (MBP) tag and the optimization of the induction conditions resulted in the production of more soluble protein. Correspondingly, the catalytic efficiency of soluble CsCCD2 was higher than that of the insoluble one, and the results further validated its functional verification. This study not only broadened the substrate profile of CsCCD2, but also achieved the soluble expression of CsCCD2. It provides a firm platform for CsCCD2 crystal structure resolution and facilitates the synthesis of crocetin and crocins.
Collapse
Affiliation(s)
- Ying Wang
- Inflammation and Immune Mediated Diseases Laboratory of Anhui Province, Anhui Institute of Innovative Drugs, School of Pharmacy, Anhui Medical University, Hefei 230032, China
- Center of Traditional Chinese Medicine Formula Granule, Anhui Medical University, Hefei 230032, China
| | - Siqi Li
- Inflammation and Immune Mediated Diseases Laboratory of Anhui Province, Anhui Institute of Innovative Drugs, School of Pharmacy, Anhui Medical University, Hefei 230032, China
- Center of Traditional Chinese Medicine Formula Granule, Anhui Medical University, Hefei 230032, China
| | - Ze Zhou
- Inflammation and Immune Mediated Diseases Laboratory of Anhui Province, Anhui Institute of Innovative Drugs, School of Pharmacy, Anhui Medical University, Hefei 230032, China
- Center of Traditional Chinese Medicine Formula Granule, Anhui Medical University, Hefei 230032, China
| | - Lifen Sun
- Inflammation and Immune Mediated Diseases Laboratory of Anhui Province, Anhui Institute of Innovative Drugs, School of Pharmacy, Anhui Medical University, Hefei 230032, China
- Center of Traditional Chinese Medicine Formula Granule, Anhui Medical University, Hefei 230032, China
| | - Jing Sun
- Inflammation and Immune Mediated Diseases Laboratory of Anhui Province, Anhui Institute of Innovative Drugs, School of Pharmacy, Anhui Medical University, Hefei 230032, China
- Center of Traditional Chinese Medicine Formula Granule, Anhui Medical University, Hefei 230032, China
| | - Chuanpu Shen
- Inflammation and Immune Mediated Diseases Laboratory of Anhui Province, Anhui Institute of Innovative Drugs, School of Pharmacy, Anhui Medical University, Hefei 230032, China
- Center of Traditional Chinese Medicine Formula Granule, Anhui Medical University, Hefei 230032, China
| | - Ranran Gao
- Institute of Chinese Materia Medica, China Academy of Chinese Medical Sciences, Beijing 100700, China
| | - Jingyuan Song
- Key Lab of Chinese Medicine Resources Conservation, State Administration of Traditional Chinese Medicine of the People’s Republic of China, Institute of Medicinal Plant Development, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100193, China
| | - Xiangdong Pu
- Inflammation and Immune Mediated Diseases Laboratory of Anhui Province, Anhui Institute of Innovative Drugs, School of Pharmacy, Anhui Medical University, Hefei 230032, China
- Center of Traditional Chinese Medicine Formula Granule, Anhui Medical University, Hefei 230032, China
| |
Collapse
|
3
|
Jomrit J, Suhardi S, Summpunn P. Effects of Signal Peptide and Chaperone Co-Expression on Heterologous Protein Production in Escherichia coli. Molecules 2023; 28:5594. [PMID: 37513466 PMCID: PMC10384211 DOI: 10.3390/molecules28145594] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Revised: 07/17/2023] [Accepted: 07/21/2023] [Indexed: 07/30/2023] Open
Abstract
Various host systems have been employed to increase the yield of recombinant proteins. However, some recombinant proteins were successfully produced at high yields but with no functional activities. To achieve both high protein yield and high activities, molecular biological strategies have been continuously developed. This work describes the effect of signal peptide (SP) and co-expression of molecular chaperones on the production of active recombinant protein in Escherichia coli. Extracellular enzymes from Bacillus subtilis, including β-1,4-xylanase, β-1,4-glucanase, and β-mannanase constructed with and without their signal peptides and intracellular enzymes from Pseudomonas stutzeri ST201, including benzoylformate decarboxylase (BFDC), benzaldehyde dehydrogenase (BADH), and d-phenylglycine aminotransferase (d-PhgAT) were cloned and overexpressed in E. coli BL21(DE3). Co-expression of molecular chaperones with all enzymes studied was also investigated. Yields of β-1,4-xylanase (Xyn), β-1,4-glucanase (Cel), and β-mannanase (Man), when constructed without their N-terminal signal peptides, increased 1112.61-, 1.75-, and 1.12-fold, respectively, compared to those of spXyn, spCel, and spMan, when constructed with their signal peptides. For the natural intracellular enzymes, the chaperones, GroEL-GroES complex, increased yields of active BFDC, BADH, and d-PhgAT, up to 1.31-, 4.94- and 37.93-fold, respectively, and also increased yields of Man and Xyn up to 1.53- and 3.46-fold, respectively, while other chaperones including DnaK-DnaJ-GrpE and Trigger factor (Tf) showed variable effects with these enzymes. This study successfully cloned and overexpressed extracellular and intracellular enzymes in E. coli BL21(DE3). When the signal peptide regions of the secretory enzymes were removed, yields of active enzymes were higher than those with intact signal peptides. In addition, a higher yield of active enzymes was obtained, in general, when these enzymes were co-expressed with appropriate chaperones. Therefore, E. coli can produce cytoplasmic and secretory enzymes effectively if only the enzyme coding sequence without its signal peptide is used and appropriate chaperones are co-expressed to assist in correct folding.
Collapse
Affiliation(s)
- Juntratip Jomrit
- School of Pharmacy, Walailak University, Nakhon Si Thammarat 80160, Thailand
| | - Suhardi Suhardi
- Department of Animal Science, Faculty of Agriculture, Mulawarman University, Samarinda 75123, Indonesia
| | - Pijug Summpunn
- Food Technology and Innovation Research Center of Excellence, School of Agricultural Technology and Food industry, Walailak University, Nakhon Si Thammarat 80160, Thailand
| |
Collapse
|
4
|
Heames B, Buchel F, Aubel M, Tretyachenko V, Loginov D, Novák P, Lange A, Bornberg-Bauer E, Hlouchová K. Experimental characterization of de novo proteins and their unevolved random-sequence counterparts. Nat Ecol Evol 2023; 7:570-580. [PMID: 37024625 PMCID: PMC10089919 DOI: 10.1038/s41559-023-02010-2] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2022] [Accepted: 02/10/2023] [Indexed: 04/08/2023]
Abstract
De novo gene emergence provides a route for new proteins to be formed from previously non-coding DNA. Proteins born in this way are considered random sequences and typically assumed to lack defined structure. While it remains unclear how likely a de novo protein is to assume a soluble and stable tertiary structure, intersecting evidence from random sequence and de novo-designed proteins suggests that native-like biophysical properties are abundant in sequence space. Taking putative de novo proteins identified in human and fly, we experimentally characterize a library of these sequences to assess their solubility and structure propensity. We compare this library to a set of synthetic random proteins with no evolutionary history. Bioinformatic prediction suggests that de novo proteins may have remarkably similar distributions of biophysical properties to unevolved random sequences of a given length and amino acid composition. However, upon expression in vitro, de novo proteins exhibit moderately higher solubility which is further induced by the DnaK chaperone system. We suggest that while synthetic random sequences are a useful proxy for de novo proteins in terms of structure propensity, de novo proteins may be better integrated in the cellular system than random expectation, given their higher solubility.
Collapse
Affiliation(s)
- Brennen Heames
- Institute for Evolution and Biodiversity, University of Münster, Münster, Germany
| | - Filip Buchel
- Department of Cell Biology, Charles University, BIOCEV, Prague, Czech Republic
- Department of Biochemistry, Charles University, Prague, Czech Republic
| | - Margaux Aubel
- Institute for Evolution and Biodiversity, University of Münster, Münster, Germany
| | | | - Dmitry Loginov
- Institute of Microbiology, Czech Academy of Sciences, Prague, Czech Republic
| | - Petr Novák
- Institute of Microbiology, Czech Academy of Sciences, Prague, Czech Republic
| | - Andreas Lange
- Institute for Evolution and Biodiversity, University of Münster, Münster, Germany
| | - Erich Bornberg-Bauer
- Institute for Evolution and Biodiversity, University of Münster, Münster, Germany.
- Department of Protein Evolution, MPI for Developmental Biology, Tübingen, Germany.
| | - Klára Hlouchová
- Department of Cell Biology, Charles University, BIOCEV, Prague, Czech Republic.
- Institute of Organic Chemistry and Biochemistry, Czech Academy of Sciences, Prague, Czech Republic.
| |
Collapse
|
5
|
Aubel M, Eicholt L, Bornberg-Bauer E. Assessing structure and disorder prediction tools for de novo emerged proteins in the age of machine learning. F1000Res 2023; 12:347. [PMID: 37113259 PMCID: PMC10126731 DOI: 10.12688/f1000research.130443.1] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 03/17/2023] [Indexed: 03/31/2023] Open
Abstract
Background: De novo protein coding genes emerge from scratch in the non-coding regions of the genome and have, per definition, no homology to other genes. Therefore, their encoded de novo proteins belong to the so-called "dark protein space". So far, only four de novo protein structures have been experimentally approximated. Low homology, presumed high disorder and limited structures result in low confidence structural predictions for de novo proteins in most cases. Here, we look at the most widely used structure and disorder predictors and assess their applicability for de novo emerged proteins. Since AlphaFold2 is based on the generation of multiple sequence alignments and was trained on solved structures of largely conserved and globular proteins, its performance on de novo proteins remains unknown. More recently, natural language models of proteins have been used for alignment-free structure predictions, potentially making them more suitable for de novo proteins than AlphaFold2. Methods: We applied different disorder predictors (IUPred3 short/long, flDPnn) and structure predictors, AlphaFold2 on the one hand and language-based models (Omegafold, ESMfold, RGN2) on the other hand, to four de novo proteins with experimental evidence on structure. We compared the resulting predictions between the different predictors as well as to the existing experimental evidence. Results: Results from IUPred, the most widely used disorder predictor, depend heavily on the choice of parameters and differ significantly from flDPnn which has been found to outperform most other predictors in a comparative assessment study recently. Similarly, different structure predictors yielded varying results and confidence scores for de novo proteins. Conclusions: We suggest that, while in some cases protein language model based approaches might be more accurate than AlphaFold2, the structure prediction of de novo emerged proteins remains a difficult task for any predictor, be it disorder or structure.
Collapse
Affiliation(s)
- Margaux Aubel
- Institute for Evolution and Bidiversity, University of Muenster, Muenster, 48149, Germany
| | - Lars Eicholt
- Institute for Evolution and Bidiversity, University of Muenster, Muenster, 48149, Germany
| | - Erich Bornberg-Bauer
- Institute for Evolution and Bidiversity, University of Muenster, Muenster, 48149, Germany
- Department Protein Evolution, Max Planck-Institute for Biology, Tuebingen, 72076, Germany
| |
Collapse
|
6
|
Evolution and implications of de novo genes in humans. Nat Ecol Evol 2023:10.1038/s41559-023-02014-y. [PMID: 36928843 DOI: 10.1038/s41559-023-02014-y] [Citation(s) in RCA: 17] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2022] [Accepted: 02/06/2023] [Indexed: 03/18/2023]
Abstract
Genes and translated open reading frames (ORFs) that emerged de novo from previously non-coding sequences provide species with opportunities for adaptation. When aberrantly activated, some human-specific de novo genes and ORFs have disease-promoting properties-for instance, driving tumour growth. Thousands of putative de novo coding sequences have been described in humans, but we still do not know what fraction of those ORFs has readily acquired a function. Here, we discuss the challenges and controversies surrounding the detection, mechanisms of origin, annotation, validation and characterization of de novo genes and ORFs. Through manual curation of literature and databases, we provide a thorough table with most de novo genes reported for humans to date. We re-evaluate each locus by tracing the enabling mutations and list proposed disease associations, protein characteristics and supporting evidence for translation and protein detection. This work will support future explorations of de novo genes and ORFs in humans.
Collapse
|