1
|
Yang W, Du Q, Zhou X, Wu C, Bao J. PDFll: Predictors of Disorder and Function of Proteins from the Language of Life. J Comput Biol 2024. [PMID: 39246251 DOI: 10.1089/cmb.2024.0506] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/10/2024] Open
Abstract
The identification of intrinsically disordered proteins and their functional roles is largely dependent on the performance of computational predictors, necessitating a high standard of accuracy in these tools. In this context, we introduce a novel series of computational predictors, termed PDFll (Predictors of Disorder and Function of proteins from the Language of Life), which are designed to offer precise predictions of protein disorder and associated functional roles based on protein sequences. PDFll is developed through a two-step process. Initially, it leverages large-scale protein language models (pLMs), trained on an extensive dataset comprising billions of protein sequences. Subsequently, the embeddings derived from pLMs are integrated into streamlined, yet sophisticated, deep-learning models to generate predictions. These predictions notably surpass the performance of existing state-of-the-art predictors, particularly those that forecast disorder and function without utilizing evolutionary information.
Collapse
Affiliation(s)
- Wanyi Yang
- College of Life Sciences, Sichuan University, Chengdu, China
| | - Qingsong Du
- College of Life Sciences, Sichuan University, Chengdu, China
| | - Xunyu Zhou
- College of Life Sciences, Sichuan University, Chengdu, China
| | - Chuanfang Wu
- College of Life Sciences, Sichuan University, Chengdu, China
| | - Jinku Bao
- College of Life Sciences, Sichuan University, Chengdu, China
| |
Collapse
|
2
|
Paquay S, Duraffourd J, Bury M, Heremans IP, Caligiore F, Gerin I, Stroobant V, Jacobs J, Pinon A, Graff J, Vertommen D, Van Schaftingen E, Dewulf JP, Bommer GT. ACAD10 and ACAD11 allow entry of 4-hydroxy fatty acids into β-oxidation. Cell Mol Life Sci 2024; 81:367. [PMID: 39174697 PMCID: PMC11342911 DOI: 10.1007/s00018-024-05397-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2024] [Revised: 08/02/2024] [Accepted: 08/05/2024] [Indexed: 08/24/2024]
Abstract
Hydroxylated fatty acids are important intermediates in lipid metabolism and signaling. Surprisingly, the metabolism of 4-hydroxy fatty acids remains largely unexplored. We found that both ACAD10 and ACAD11 unite two enzymatic activities to introduce these metabolites into mitochondrial and peroxisomal β-oxidation, respectively. First, they phosphorylate 4-hydroxyacyl-CoAs via a kinase domain, followed by an elimination of the phosphate to form enoyl-CoAs catalyzed by an acyl-CoA dehydrogenase (ACAD) domain. Studies in knockout cell lines revealed that ACAD10 preferentially metabolizes shorter chain 4-hydroxy fatty acids than ACAD11 (i.e. 6 carbons versus 10 carbons). Yet, recombinant proteins showed comparable activity on the corresponding 4-hydroxyacyl-CoAs. This suggests that the localization of ACAD10 and ACAD11 to mitochondria and peroxisomes, respectively, might influence their physiological substrate spectrum. Interestingly, we observed that ACAD10 is cleaved internally during its maturation generating a C-terminal part consisting of the ACAD domain, and an N-terminal part comprising the kinase domain and a haloacid dehalogenase (HAD) domain. HAD domains often exhibit phosphatase activity, but negligible activity was observed in the case of ACAD10. Yet, inactivation of a presumptive key residue in this domain significantly increased the kinase activity, suggesting that this domain might have acquired a regulatory function to prevent accumulation of the phospho-hydroxyacyl-CoA intermediate. Taken together, our work reveals that 4-hydroxy fatty acids enter mitochondrial and peroxisomal fatty acid β-oxidation via two enzymes with an overlapping substrate repertoire.
Collapse
Affiliation(s)
- Stéphanie Paquay
- Metabolic Research Group, de Duve Institute & WELRI, Université Catholique de Louvain, 1200, Brussels, Belgium
- WELBIO Department, WEL Research Institute, avenue Pasteur, 6, 1300, Wavre, Belgium
- Department of Pediatric Neurology and Metabolic Diseases, Cliniques Universitaires St. Luc, Université Catholique de Louvain, 1200, Brussels, Belgium
| | - Julia Duraffourd
- Metabolic Research Group, de Duve Institute & WELRI, Université Catholique de Louvain, 1200, Brussels, Belgium
- WELBIO Department, WEL Research Institute, avenue Pasteur, 6, 1300, Wavre, Belgium
| | - Marina Bury
- Metabolic Research Group, de Duve Institute & WELRI, Université Catholique de Louvain, 1200, Brussels, Belgium
- WELBIO Department, WEL Research Institute, avenue Pasteur, 6, 1300, Wavre, Belgium
| | - Isaac P Heremans
- Metabolic Research Group, de Duve Institute & WELRI, Université Catholique de Louvain, 1200, Brussels, Belgium
- WELBIO Department, WEL Research Institute, avenue Pasteur, 6, 1300, Wavre, Belgium
| | - Francesco Caligiore
- Metabolic Research Group, de Duve Institute & WELRI, Université Catholique de Louvain, 1200, Brussels, Belgium
- WELBIO Department, WEL Research Institute, avenue Pasteur, 6, 1300, Wavre, Belgium
| | - Isabelle Gerin
- Metabolic Research Group, de Duve Institute & WELRI, Université Catholique de Louvain, 1200, Brussels, Belgium
- WELBIO Department, WEL Research Institute, avenue Pasteur, 6, 1300, Wavre, Belgium
| | | | - Jean Jacobs
- Metabolic Research Group, de Duve Institute & WELRI, Université Catholique de Louvain, 1200, Brussels, Belgium
- WELBIO Department, WEL Research Institute, avenue Pasteur, 6, 1300, Wavre, Belgium
| | - Aymeric Pinon
- Metabolic Research Group, de Duve Institute & WELRI, Université Catholique de Louvain, 1200, Brussels, Belgium
- WELBIO Department, WEL Research Institute, avenue Pasteur, 6, 1300, Wavre, Belgium
| | - Julie Graff
- Metabolic Research Group, de Duve Institute & WELRI, Université Catholique de Louvain, 1200, Brussels, Belgium
- WELBIO Department, WEL Research Institute, avenue Pasteur, 6, 1300, Wavre, Belgium
| | - Didier Vertommen
- Protein Phosphorylation Unit, de Duve Institute & MASSPROT Platform, Université Catholique de Louvain, 1200, Brussels, Belgium
| | - Emile Van Schaftingen
- Metabolic Research Group, de Duve Institute & WELRI, Université Catholique de Louvain, 1200, Brussels, Belgium
- WELBIO Department, WEL Research Institute, avenue Pasteur, 6, 1300, Wavre, Belgium
| | - Joseph P Dewulf
- Metabolic Research Group, de Duve Institute & WELRI, Université Catholique de Louvain, 1200, Brussels, Belgium
- WELBIO Department, WEL Research Institute, avenue Pasteur, 6, 1300, Wavre, Belgium
- Department of Laboratory Medicine, Cliniques Universitaires St. Luc, Université Catholique de Louvain, 1200, Brussels, Belgium
| | - Guido T Bommer
- Metabolic Research Group, de Duve Institute & WELRI, Université Catholique de Louvain, 1200, Brussels, Belgium.
- WELBIO Department, WEL Research Institute, avenue Pasteur, 6, 1300, Wavre, Belgium.
| |
Collapse
|
3
|
Mindel V, Brodsky S, Cohen A, Manadre W, Jonas F, Carmi M, Barkai N. Intrinsically disordered regions of the Msn2 transcription factor encode multiple functions using interwoven sequence grammars. Nucleic Acids Res 2024; 52:2260-2272. [PMID: 38109289 PMCID: PMC10954448 DOI: 10.1093/nar/gkad1191] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Revised: 11/04/2023] [Accepted: 12/11/2023] [Indexed: 12/20/2023] Open
Abstract
Intrinsically disordered regions (IDRs) are abundant in eukaryotic proteins, but their sequence-function relationship remains poorly understood. IDRs of transcription factors (TFs) can direct promoter selection and recruit coactivators, as shown for the budding yeast TF Msn2. To examine how IDRs encode both these functions, we compared genomic binding specificity, coactivator recruitment, and gene induction amongst a large set of designed Msn2-IDR mutants. We find that both functions depend on multiple regions across the > 600AA IDR. Yet, transcription activity was readily disrupted by mutations that showed no effect on the Msn2 binding specificity. Our data attribute this differential sensitivity to the integration of a relaxed, composition-based code directing binding specificity with a more stringent, motif-based code controlling the recruitment of coactivators and transcription activity. Therefore, Msn2 utilizes interwoven sequence grammars for encoding multiple functions, suggesting a new IDR design paradigm of potentially general use.
Collapse
Affiliation(s)
- Vladimir Mindel
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Sagie Brodsky
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Aileen Cohen
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Wajd Manadre
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Felix Jonas
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Miri Carmi
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Naama Barkai
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot 76100, Israel
| |
Collapse
|
4
|
Peng J, Zhao L. The origin and structural evolution of de novo genes in Drosophila. Nat Commun 2024; 15:810. [PMID: 38280868 PMCID: PMC10821953 DOI: 10.1038/s41467-024-45028-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Accepted: 01/09/2024] [Indexed: 01/29/2024] Open
Abstract
Recent studies reveal that de novo gene origination from previously non-genic sequences is a common mechanism for gene innovation. These young genes provide an opportunity to study the structural and functional origins of proteins. Here, we combine high-quality base-level whole-genome alignments and computational structural modeling to study the origination, evolution, and protein structures of lineage-specific de novo genes. We identify 555 de novo gene candidates in D. melanogaster that originated within the Drosophilinae lineage. Sequence composition, evolutionary rates, and expression patterns indicate possible gradual functional or adaptive shifts with their gene ages. Surprisingly, we find little overall protein structural changes in candidates from the Drosophilinae lineage. We identify several candidates with potentially well-folded protein structures. Ancestral sequence reconstruction analysis reveals that most potentially well-folded candidates are often born well-folded. Single-cell RNA-seq analysis in testis shows that although most de novo gene candidates are enriched in spermatocytes, several young candidates are biased towards the early spermatogenesis stage, indicating potentially important but less emphasized roles of early germline cells in the de novo gene origination in testis. This study provides a systematic overview of the origin, evolution, and protein structural changes of Drosophilinae-specific de novo genes.
Collapse
Affiliation(s)
- Junhui Peng
- Laboratory of Evolutionary Genetics and Genomics, The Rockefeller University, New York, NY, USA
| | - Li Zhao
- Laboratory of Evolutionary Genetics and Genomics, The Rockefeller University, New York, NY, USA.
| |
Collapse
|
5
|
Le NQK. Leveraging transformers-based language models in proteome bioinformatics. Proteomics 2023; 23:e2300011. [PMID: 37381841 DOI: 10.1002/pmic.202300011] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2023] [Revised: 06/13/2023] [Accepted: 06/13/2023] [Indexed: 06/30/2023]
Abstract
In recent years, the rapid growth of biological data has increased interest in using bioinformatics to analyze and interpret this data. Proteomics, which studies the structure, function, and interactions of proteins, is a crucial area of bioinformatics. Using natural language processing (NLP) techniques in proteomics is an emerging field that combines machine learning and text mining to analyze biological data. Recently, transformer-based NLP models have gained significant attention for their ability to process variable-length input sequences in parallel, using self-attention mechanisms to capture long-range dependencies. In this review paper, we discuss the recent advancements in transformer-based NLP models in proteome bioinformatics and examine their advantages, limitations, and potential applications to improve the accuracy and efficiency of various tasks. Additionally, we highlight the challenges and future directions of using these models in proteome bioinformatics research. Overall, this review provides valuable insights into the potential of transformer-based NLP models to revolutionize proteome bioinformatics.
Collapse
Affiliation(s)
- Nguyen Quoc Khanh Le
- Professional Master Program in Artificial Intelligence in Medicine, College of Medicine, Taipei Medical University, Taipei, Taiwan
- AIBioMed Research Group, Taipei Medical University, Taipei, Taiwan
- Research Center for Artificial Intelligence in Medicine, Taipei Medical University, Taipei, Taiwan
- Translational Imaging Research Center, Taipei Medical University Hospital, Taipei, Taiwan
| |
Collapse
|
6
|
Samel A, Väärtnõu F, Verk L, Kurg K, Mutso M, Kurg R. How the Intrinsically Disordered N-Terminus of Cancer/Testis Antigen MAGEA10 Is Responsible for Its Expression, Nuclear Localisation and Aberrant Migration. Biomolecules 2023; 13:1704. [PMID: 38136576 PMCID: PMC10741916 DOI: 10.3390/biom13121704] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Revised: 11/16/2023] [Accepted: 11/22/2023] [Indexed: 12/24/2023] Open
Abstract
Melanoma-associated antigen A (MAGEA) subfamily proteins are normally expressed in testis and/or placenta. However, aberrant expression is detected in the tumour cells of multiple types of human cancer. MAGEA expression is mainly observed in cancers that have acquired malignant phenotypes, invasiveness and metastasis, and the expression of MAGEA family proteins has been linked to poor prognosis in cancer patients. All MAGE proteins share the common MAGE homology domain (MHD) which encompasses up to 70% of the protein; however, the areas flanking the MHD region vary between family members and are poorly conserved. To investigate the molecular basis of MAGEA10 expression and anomalous mobility in gel, deletion and point-mutation, analyses of the MAGEA10 protein were performed. Our data show that the intrinsically disordered N-terminal domain and, specifically, the first seven amino acids containing a unique linear motif, PRAPKR, are responsible for its expression, aberrant migration in SDS-PAGE and nuclear localisation. The aberrant migration in gel and nuclear localisation are not related to each other. Hiding the N-terminus with an epitope tag strongly affected its mobility in gel and expression in cells. Our results suggest that the intrinsically disordered domains flanking the MHD determine the unique properties of individual MAGEA proteins.
Collapse
Affiliation(s)
| | | | | | | | | | - Reet Kurg
- Institute of Technology, University of Tartu, 50411 Tartu, Estonia; (A.S.); (F.V.); (L.V.); (K.K.); (M.M.)
| |
Collapse
|