1
|
Kohyama S, Frohn BP, Babl L, Schwille P. Machine learning-aided design and screening of an emergent protein function in synthetic cells. Nat Commun 2024; 15:2010. [PMID: 38443351 PMCID: PMC10914801 DOI: 10.1038/s41467-024-46203-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Accepted: 02/16/2024] [Indexed: 03/07/2024] Open
Abstract
Recently, utilization of Machine Learning (ML) has led to astonishing progress in computational protein design, bringing into reach the targeted engineering of proteins for industrial and biomedical applications. However, the design of proteins for emergent functions of core relevance to cells, such as the ability to spatiotemporally self-organize and thereby structure the cellular space, is still extremely challenging. While on the generative side conditional generative models and multi-state design are on the rise, for emergent functions there is a lack of tailored screening methods as typically needed in a protein design project, both computational and experimental. Here we describe a proof-of-principle of how such screening, in silico and in vitro, can be achieved for ML-generated variants of a protein that forms intracellular spatiotemporal patterns. For computational screening we use a structure-based divide-and-conquer approach to find the most promising candidates, while for the subsequent in vitro screening we use synthetic cell-mimics as established by Bottom-Up Synthetic Biology. We then show that the best screened candidate can indeed completely substitute the wildtype gene in Escherichia coli. These results raise great hopes for the next level of synthetic biology, where ML-designed synthetic proteins will be used to engineer cellular functions.
Collapse
Affiliation(s)
- Shunshi Kohyama
- Dept. Cellular and Molecular Biophysics, Max Planck Institute of Biochemistry, Martinsried, D-82152, Germany
| | - Béla P Frohn
- Dept. Cellular and Molecular Biophysics, Max Planck Institute of Biochemistry, Martinsried, D-82152, Germany
| | - Leon Babl
- Dept. Cellular and Molecular Biophysics, Max Planck Institute of Biochemistry, Martinsried, D-82152, Germany
| | - Petra Schwille
- Dept. Cellular and Molecular Biophysics, Max Planck Institute of Biochemistry, Martinsried, D-82152, Germany.
| |
Collapse
|
2
|
de Crécy-Lagard V, Swairjo MA. On the necessity to include multiple types of evidence when predicting molecular function of proteins. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.18.571875. [PMID: 38187591 PMCID: PMC10769224 DOI: 10.1101/2023.12.18.571875] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2024]
Abstract
Machine learning-based platforms are currently revolutionizing many fields of molecular biology including structure prediction for monomers or complexes, predicting the consequences of mutations, or predicting the functions of proteins. However, these platforms use training sets based on currently available knowledge and, in essence, are not built to discover novelty. Hence, claims of discovering novel functions for protein families using artificial intelligence should be carefully dissected, as the dangers of overpredictions are real as we show in a detailed analysis of the prediction made by Kim et al 1 on the function of the YciO protein in the model organism Escherichia coli .
Collapse
|
3
|
Cai P, Liu S, Zhang D, Xing H, Han M, Liu D, Gong L, Hu QN. SynBioTools: a one-stop facility for searching and selecting synthetic biology tools. BMC Bioinformatics 2023; 24:152. [PMID: 37069545 PMCID: PMC10111727 DOI: 10.1186/s12859-023-05281-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2023] [Accepted: 04/11/2023] [Indexed: 04/19/2023] Open
Abstract
BACKGROUND The rapid development of synthetic biology relies heavily on the use of databases and computational tools, which are also developing rapidly. While many tool registries have been created to facilitate tool retrieval, sharing, and reuse, no relatively comprehensive tool registry or catalog addresses all aspects of synthetic biology. RESULTS We constructed SynBioTools, a comprehensive collection of synthetic biology databases, computational tools, and experimental methods, as a one-stop facility for searching and selecting synthetic biology tools. SynBioTools includes databases, computational tools, and methods extracted from reviews via SCIentific Table Extraction, a scientific table-extraction tool that we built. Approximately 57% of the resources that we located and included in SynBioTools are not mentioned in bio.tools, the dominant tool registry. To improve users' understanding of the tools and to enable them to make better choices, the tools are grouped into nine modules (each with subdivisions) based on their potential biosynthetic applications. Detailed comparisons of similar tools in every classification are included. The URLs, descriptions, source references, and the number of citations of the tools are also integrated into the system. CONCLUSIONS SynBioTools is freely available at https://synbiotools.lifesynther.com/ . It provides end-users and developers with a useful resource of categorized synthetic biology databases, tools, and methods to facilitate tool retrieval and selection.
Collapse
Affiliation(s)
- Pengli Cai
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Sheng Liu
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Dachuan Zhang
- Ecological Systems Design, Institute of Environmental Engineering, ETH Zurich, 8093, Zurich, Switzerland
| | - Huadong Xing
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Mengying Han
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Dongliang Liu
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Linlin Gong
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Qian-Nan Hu
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China.
| |
Collapse
|
4
|
Zheng Y, Young ND, Song J, Chang BC, Gasser RB. An informatic workflow for the enhanced annotation of excretory/secretory proteins of Haemonchus contortus. Comput Struct Biotechnol J 2023; 21:2696-2704. [PMID: 37143762 PMCID: PMC10151223 DOI: 10.1016/j.csbj.2023.03.025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Revised: 03/16/2023] [Accepted: 03/16/2023] [Indexed: 03/19/2023] Open
Abstract
Major advances in genomic and associated technologies have demanded reliable bioinformatic tools and workflows for the annotation of genes and their products via comparative analyses using well-curated reference data sets, accessible in public repositories. However, the accurate in silico annotation of molecules (proteins) encoded in organisms (e.g., multicellular parasites) which are evolutionarily distant from those for which these extensive reference data sets are available, including invertebrate model organisms (e.g., Caenorhabditis elegans - free-living nematode, and Drosophila melanogaster - the vinegar fly) and vertebrate species (e.g., Homo sapiens and Mus musculus), remains a major challenge. Here, we constructed an informatic workflow for the enhanced annotation of biologically-important, excretory/secretory (ES) proteins ("secretome") encoded in the genome of a parasitic roundworm, called Haemonchus contortus (commonly known as the barber's pole worm). We critically evaluated the performance of five distinct methods, refined some of them, and then combined the use of all five methods to comprehensively annotate ES proteins, according to gene ontology, biological pathways and/or metabolic (enzymatic) processes. Then, using optimised parameter settings, we applied this workflow to comprehensively annotate 2591 of all 3353 proteins (77.3%) in the secretome of H. contortus. This result is a substantial improvement (10-25%) over previous annotations using individual, "off-the-shelf" algorithms and default settings, indicating the ready applicability of the present, refined workflow to gene/protein sequence data sets from a wide range of organisms in the Tree-of-Life.
Collapse
|
5
|
Romero M, Nakano FK, Finke J, Rocha C, Vens C. Leveraging class hierarchy for detecting missing annotations on hierarchical multi-label classification. Comput Biol Med 2023; 152:106423. [PMID: 36529023 DOI: 10.1016/j.compbiomed.2022.106423] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Revised: 11/09/2022] [Accepted: 12/11/2022] [Indexed: 12/15/2022]
Abstract
With the development of new sequencing technologies, availability of genomic data has grown exponentially. Over the past decade, numerous studies have used genomic data to identify associations between genes and biological functions. While these studies have shown success in annotating genes with functions, they often assume that genes are completely annotated and fail to take into account that datasets are sparse and noisy. This work proposes a method to detect missing annotations in the context of hierarchical multi-label classification. More precisely, our method exploits the relations of functions, represented as a hierarchy, by computing probabilities based on the paths of functions in the hierarchy. By performing several experiments on a variety of rice (Oriza sativa Japonica), we showcase that the proposed method accurately detects missing annotations and yields superior results when compared to state-of-art methods from the literature.
Collapse
Affiliation(s)
- Miguel Romero
- Department of Electronics and Computer Science, Pontificia Universidad Javeriana, Calle 18 N 118-250, Cali, 760031, Colombia.
| | - Felipe Kenji Nakano
- Department of Public Health and Primary Care, KU Leuven Campus KULAK, Etienne Sabbelaan 53, Kortrijk, 8500, Belgium; Itec, imec research group at KU Leuven, Etienne Sabbelaan 53, Kortrijk, 8500, Belgium.
| | - Jorge Finke
- Department of Electronics and Computer Science, Pontificia Universidad Javeriana, Calle 18 N 118-250, Cali, 760031, Colombia.
| | - Camilo Rocha
- Department of Electronics and Computer Science, Pontificia Universidad Javeriana, Calle 18 N 118-250, Cali, 760031, Colombia.
| | - Celine Vens
- Department of Public Health and Primary Care, KU Leuven Campus KULAK, Etienne Sabbelaan 53, Kortrijk, 8500, Belgium; Itec, imec research group at KU Leuven, Etienne Sabbelaan 53, Kortrijk, 8500, Belgium.
| |
Collapse
|
6
|
Escudeiro P, Henry CS, Dias RP. Functional characterization of prokaryotic dark matter: the road so far and what lies ahead. CURRENT RESEARCH IN MICROBIAL SCIENCES 2022; 3:100159. [PMID: 36561390 PMCID: PMC9764257 DOI: 10.1016/j.crmicr.2022.100159] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2022] [Revised: 07/18/2022] [Accepted: 08/05/2022] [Indexed: 12/25/2022] Open
Abstract
Eight-hundred thousand to one trillion prokaryotic species may inhabit our planet. Yet, fewer than two-hundred thousand prokaryotic species have been described. This uncharted fraction of microbial diversity, and its undisclosed coding potential, is known as the "microbial dark matter" (MDM). Next-generation sequencing has allowed to collect a massive amount of genome sequence data, leading to unprecedented advances in the field of genomics. Still, harnessing new functional information from the genomes of uncultured prokaryotes is often limited by standard classification methods. These methods often rely on sequence similarity searches against reference genomes from cultured species. This hinders the discovery of unique genetic elements that are missing from the cultivated realm. It also contributes to the accumulation of prokaryotic gene products of unknown function among public sequence data repositories, highlighting the need for new approaches for sequencing data analysis and classification. Increasing evidence indicates that these proteins of unknown function might be a treasure trove of biotechnological potential. Here, we outline the challenges, opportunities, and the potential hidden within the functional dark matter (FDM) of prokaryotes. We also discuss the pitfalls surrounding molecular and computational approaches currently used to probe these uncharted waters, and discuss future opportunities for research and applications.
Collapse
Affiliation(s)
- Pedro Escudeiro
- BioISI - Instituto de Biosistemas e Ciências Integrativas, Faculdade de Ciências, Universidade de Lisboa, Lisboa 1749-016, Portugal
| | - Christopher S. Henry
- Argonne National Laboratory, Lemont, Illinois, USA,University of Chicago, Chicago, Illinois, USA
| | - Ricardo P.M. Dias
- BioISI - Instituto de Biosistemas e Ciências Integrativas, Faculdade de Ciências, Universidade de Lisboa, Lisboa 1749-016, Portugal,iXLab - Innovation for National Biological Resilience, Faculdade de Ciências, Universidade de Lisboa, Lisboa 1749-016, Portugal,Corresponding author.
| |
Collapse
|
7
|
Merino GA, Saidi R, Milone DH, Stegmayer G, Martin MJ. Hierarchical deep learning for predicting GO annotations by integrating protein knowledge. Bioinformatics 2022; 38:4488-4496. [PMID: 35929781 PMCID: PMC9524999 DOI: 10.1093/bioinformatics/btac536] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2021] [Revised: 07/18/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Experimental testing and manual curation are the most precise ways for assigning Gene Ontology (GO) terms describing protein functions. However, they are expensive, time-consuming and cannot cope with the exponential growth of data generated by high-throughput sequencing methods. Hence, researchers need reliable computational systems to help fill the gap with automatic function prediction. The results of the last Critical Assessment of Function Annotation challenge revealed that GO-terms prediction remains a very challenging task. Recent developments on deep learning are significantly breaking out the frontiers leading to new knowledge in protein research thanks to the integration of data from multiple sources. However, deep models hitherto developed for functional prediction are mainly focused on sequence data and have not achieved breakthrough performances yet. RESULTS We propose DeeProtGO, a novel deep-learning model for predicting GO annotations by integrating protein knowledge. DeeProtGO was trained for solving 18 different prediction problems, defined by the three GO sub-ontologies, the type of proteins, and the taxonomic kingdom. Our experiments reported higher prediction quality when more protein knowledge is integrated. We also benchmarked DeeProtGO against state-of-the-art methods on public datasets, and showed it can effectively improve the prediction of GO annotations. AVAILABILITY AND IMPLEMENTATION DeeProtGO and a case of use are available at https://github.com/gamerino/DeeProtGO. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Rabie Saidi
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge CB101SD, UK
| | - Diego H Milone
- Research Institute for Signals, Systems and Computational Intelligence (sinc(i)), FICH-UNL, CONICET, Ciudad Universitaria UNL, Santa Fe 3000, Argentina
| | - Georgina Stegmayer
- Research Institute for Signals, Systems and Computational Intelligence (sinc(i)), FICH-UNL, CONICET, Ciudad Universitaria UNL, Santa Fe 3000, Argentina
| | - Maria J Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge CB101SD, UK
| |
Collapse
|
8
|
Fenoy E, Edera AA, Stegmayer G. Transfer learning in proteins: evaluating novel protein learned representations for bioinformatics tasks. Brief Bioinform 2022; 23:6618242. [PMID: 35758229 DOI: 10.1093/bib/bbac232] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2022] [Revised: 05/13/2022] [Accepted: 05/18/2022] [Indexed: 11/13/2022] Open
Abstract
A representation method is an algorithm that calculates numerical feature vectors for samples in a dataset. Such vectors, also known as embeddings, define a relatively low-dimensional space able to efficiently encode high-dimensional data. Very recently, many types of learned data representations based on machine learning have appeared and are being applied to several tasks in bioinformatics. In particular, protein representation learning methods integrate different types of protein information (sequence, domains, etc.), in supervised or unsupervised learning approaches, and provide embeddings of protein sequences that can be used for downstream tasks. One task that is of special interest is the automatic function prediction of the huge number of novel proteins that are being discovered nowadays and are still totally uncharacterized. However, despite its importance, up to date there is not a fair benchmark study of the predictive performance of existing proposals on the same large set of proteins and for very concrete and common bioinformatics tasks. Therefore, this lack of benchmark studies prevent the community from using adequate predictive methods for accelerating the functional characterization of proteins. In this study, we performed a detailed comparison of protein sequence representation learning methods, explaining each approach and comparing them with an experimental benchmark on several bioinformatics tasks: (i) determining protein sequence similarity in the embedding space; (ii) inferring protein domains and (iii) predicting ontology-based protein functions. We examine the advantages and disadvantages of each representation approach over the benchmark results. We hope the results and the discussion of this study can help the community to select the most adequate machine learning-based technique for protein representation according to the bioinformatics task at hand.
Collapse
Affiliation(s)
- Emilio Fenoy
- Research Institute for Signals, Systems and Computational Intelligence sinc(i) (CONICET-UNL), Ciudad Universitaria, Santa Fe, Argentina
| | - Alejando A Edera
- Research Institute for Signals, Systems and Computational Intelligence sinc(i) (CONICET-UNL), Ciudad Universitaria, Santa Fe, Argentina
| | - Georgina Stegmayer
- Research Institute for Signals, Systems and Computational Intelligence sinc(i) (CONICET-UNL), Ciudad Universitaria, Santa Fe, Argentina
| |
Collapse
|
9
|
Reijnders MJMF, Waterhouse RM. CrowdGO: Machine learning and semantic similarity guided consensus Gene Ontology annotation. PLoS Comput Biol 2022; 18:e1010075. [PMID: 35560159 PMCID: PMC9132264 DOI: 10.1371/journal.pcbi.1010075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2021] [Revised: 05/25/2022] [Accepted: 04/04/2022] [Indexed: 11/29/2022] Open
Abstract
Characterising gene function for the ever-increasing number and diversity of species with annotated genomes relies almost entirely on computational prediction methods. These software are also numerous and diverse, each with different strengths and weaknesses as revealed through community benchmarking efforts. Meta-predictors that assess consensus and conflict from individual algorithms should deliver enhanced functional annotations. To exploit the benefits of meta-approaches, we developed CrowdGO, an open-source consensus-based Gene Ontology (GO) term meta-predictor that employs machine learning models with GO term semantic similarities and information contents. By re-evaluating each gene-term annotation, a consensus dataset is produced with high-scoring confident annotations and low-scoring rejected annotations. Applying CrowdGO to results from a deep learning-based, a sequence similarity-based, and two protein domain-based methods, delivers consensus annotations with improved precision and recall. Furthermore, using standard evaluation measures CrowdGO performance matches that of the community’s best performing individual methods. CrowdGO therefore offers a model-informed approach to leverage strengths of individual predictors and produce comprehensive and accurate gene functional annotations. New technologies mean that we are able to read the genetic blueprints in the form of complete genome sequences from many different species. We are also able to use computational methods combined with evidence from experiments to map out the locations in the genomes of many thousands of genes and other important regions. However, discovering and characterising the biological functions of all these genes and their protein products requires considerably more experimental work. In order to gain insights into the possible functions of the many genes currently lacking functional information from experiments we must therefore rely on methods that computationally predict protein functions. Many different software tools have been developed to tackle this challenge, each with their own strengths and weaknesses as shown by several community-based competitions that assess the performance of the predictors. Taking advantage of powerful modern machine learning techniques, we developed CrowdGO, a new software that aims to combine predictions from several tools and produce comprehensive and accurate gene functional annotations. CrowdGO is able to computationally assess agreements and conflicts amongst annotations from different predictors to then re-evaluate the results and deliver enhanced predictions of protein functions.
Collapse
Affiliation(s)
- Maarten J. M. F. Reijnders
- Department of Ecology and Evolution, University of Lausanne, and Swiss Institute of Bioinformatics, Lausanne, Switzerland
- * E-mail: (MJMFR); (RMW)
| | - Robert M. Waterhouse
- Department of Ecology and Evolution, University of Lausanne, and Swiss Institute of Bioinformatics, Lausanne, Switzerland
- * E-mail: (MJMFR); (RMW)
| |
Collapse
|
10
|
Törönen P, Holm L. PANNZER-A practical tool for protein function prediction. Protein Sci 2022; 31:118-128. [PMID: 34562305 PMCID: PMC8740830 DOI: 10.1002/pro.4193] [Citation(s) in RCA: 40] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2021] [Revised: 09/22/2021] [Accepted: 09/22/2021] [Indexed: 01/03/2023]
Abstract
The facility of next-generation sequencing has led to an explosion of gene catalogs for novel genomes, transcriptomes and metagenomes, which are functionally uncharacterized. Computational inference has emerged as a necessary substitute for first-hand experimental evidence. PANNZER (Protein ANNotation with Z-scoRE) is a high-throughput functional annotation web server that stands out among similar publically accessible web servers in supporting submission of up to 100,000 protein sequences at once and providing both Gene Ontology (GO) annotations and free text description predictions. Here, we demonstrate the use of PANNZER and discuss future plans and challenges. We present two case studies to illustrate problems related to data quality and method evaluation. Some commonly used evaluation metrics and evaluation datasets promote methods that favor unspecific and broad functional classes over more informative and specific classes. We argue that this can bias the development of automated function prediction methods. The PANNZER web server and source code are available at http://ekhidna2.biocenter.helsinki.fi/sanspanz/.
Collapse
Affiliation(s)
- Petri Törönen
- Institute of Biotechnology, Helsinki Institute of Life Sciences, University of HelsinkiHelsinkiFinland
| | - Liisa Holm
- Institute of Biotechnology, Helsinki Institute of Life Sciences, University of HelsinkiHelsinkiFinland,Organismal and Evolutionary Biology Research Program, Faculty of BiosciencesUniversity of HelsinkiHelsinkiFinland
| |
Collapse
|
11
|
Torres M, Yang H, Romero AE, Paccanaro A. Protein function prediction for newly sequenced organisms. NAT MACH INTELL 2021. [DOI: 10.1038/s42256-021-00419-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
|
12
|
RNA seq and quantitative proteomic analysis of Dictyostelium knock-out cells lacking the core autophagy proteins ATG9 and/or ATG16. BMC Genomics 2021; 22:444. [PMID: 34126926 PMCID: PMC8204557 DOI: 10.1186/s12864-021-07756-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2021] [Accepted: 05/26/2021] [Indexed: 01/09/2023] Open
Abstract
BACKGROUND Autophagy is an evolutionary ancient mechanism that sequesters substrates for degradation within autolysosomes. The process is driven by many autophagy-related (ATG) proteins, including the core members ATG9 and ATG16. However, the functions of these two core ATG proteins still need further elucidation. Here, we applied RNAseq and tandem mass tag (TMT) proteomic approaches to identify differentially expressed genes (DEGs) and proteins (DEPs) in Dictyostelium discoideum ATG9‾, ATG16‾ and ATG9‾/16‾ strains in comparison to AX2 wild-type cells. RESULT In total, we identified 332 (279 up and 53 down), 639 (487 up and 152 down) and 260 (114 up and 146 down) DEGs and 124 (83 up and 41 down), 431 (238 up and 193 down) and 677 (347 up and 330 down) DEPs in ATG9‾, ATG16‾ and ATG9‾/16‾ strains, respectively. Thus, in the single knock-out strains, the number of DEGs was higher than the number of DEPs while in the double knock-out strain the number of DEPs was higher. Comparison of RNAseq and proteomic data further revealed, that only a small proportion of the transcriptional changes were reflected on the protein level. Gene ontology (GO) analysis revealed an enrichment of DEPs involved in lipid metabolism and oxidative phosphorylation. Furthermore, we found increased expression of the anti-oxidant enzymes glutathione reductase (gsr) and catalase A (catA) in ATG16‾ and ATG9‾/16‾ cells, respectively, indicating adaptation to excess reactive oxygen species (ROS). CONCLUSIONS Our study provides the first combined transcriptome and proteome analysis of ATG9‾, ATG16‾ and ATG9‾/16‾ cells. Our results suggest, that most changes in protein abundance were not caused by transcriptional changes, but were rather due to changes in protein homeostasis. In particular, knock-out of atg9 and/or atg16 appears to cause dysregulation of lipid metabolism and oxidative phosphorylation.
Collapse
|
13
|
Abstract
INTRODUCTION Knowledge graphs have proven to be promising systems of information storage and retrieval. Due to the recent explosion of heterogeneous multimodal data sources generated in the biomedical domain, and an industry shift toward a systems biology approach, knowledge graphs have emerged as attractive methods of data storage and hypothesis generation. AREAS COVERED In this review, the author summarizes the applications of knowledge graphs in drug discovery. They evaluate their utility; differentiating between academic exercises in graph theory, and useful tools to derive novel insights, highlighting target identification and drug repurposing as two areas showing particular promise. They provide a case study on COVID-19, summarizing the research that used knowledge graphs to identify repurposable drug candidates. They describe the dangers of degree and literature bias, and discuss mitigation strategies. EXPERT OPINION Whilst knowledge graphs and graph-based machine learning have certainly shown promise, they remain relatively immature technologies. Many popular link prediction algorithms fail to address strong biases in biomedical data, and only highlight biological associations, failing to model causal relationships in complex dynamic biological systems. These problems need to be addressed before knowledge graphs reach their true potential in drug discovery.
Collapse
Affiliation(s)
- Finlay MacLean
- Target Identification., BenevolentAI, United Kingdom of Great Britain and Northern Ireland
| |
Collapse
|