1
|
Hasibi R, Michoel T, Oyarzún DA. Integration of graph neural networks and genome-scale metabolic models for predicting gene essentiality. NPJ Syst Biol Appl 2024; 10:24. [PMID: 38448436 PMCID: PMC10917767 DOI: 10.1038/s41540-024-00348-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Accepted: 02/08/2024] [Indexed: 03/08/2024] Open
Abstract
Genome-scale metabolic models are powerful tools for understanding cellular physiology. Flux balance analysis (FBA), in particular, is an optimization-based approach widely employed for predicting metabolic phenotypes. In model microbes such as Escherichia coli, FBA has been successful at predicting essential genes, i.e. those genes that impair survival when deleted. A central assumption in this approach is that both wild type and deletion strains optimize the same fitness objective. Although the optimality assumption may hold for the wild type metabolic network, deletion strains are not subject to the same evolutionary pressures and knock-out mutants may steer their metabolism to meet other objectives for survival. Here, we present FlowGAT, a hybrid FBA-machine learning strategy for predicting essentiality directly from wild type metabolic phenotypes. The approach is based on graph-structured representation of metabolic fluxes predicted by FBA, where nodes correspond to enzymatic reactions and edges quantify the propagation of metabolite mass flow between a reaction and its neighbours. We integrate this information into a graph neural network that can be trained on knock-out fitness assay data. Comparisons across different model architectures reveal that FlowGAT predictions for E. coli are close to those of FBA for several growth conditions. This suggests that essentiality of enzymatic genes can be predicted by exploiting the inherent network structure of metabolism. Our approach demonstrates the benefits of combining the mechanistic insights afforded by genome-scale models with the ability of deep learning to infer patterns from complex datasets.
Collapse
Affiliation(s)
- Ramin Hasibi
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
| | - Tom Michoel
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
| | - Diego A Oyarzún
- School of Biological Sciences, University of Edinburgh, Edinburgh, UK.
- School of Informatics, University of Edinburgh, Edinburgh, UK.
| |
Collapse
|
2
|
Liang Y, Luo H, Lin Y, Gao F. Recent advances in the characterization of essential genes and development of a database of essential genes. IMETA 2024; 3:e157. [PMID: 38868518 PMCID: PMC10989110 DOI: 10.1002/imt2.157] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Accepted: 10/09/2023] [Indexed: 06/14/2024]
Abstract
Over the past few decades, there has been a significant interest in the study of essential genes, which are crucial for the survival of an organism under specific environmental conditions and thus have practical applications in the fields of synthetic biology and medicine. An increasing amount of experimental data on essential genes has been obtained with the continuous development of technological methods. Meanwhile, various computational prediction methods, related databases and web servers have emerged accordingly. To facilitate the study of essential genes, we have established a database of essential genes (DEG), which has become popular with continuous updates to facilitate essential gene feature analysis and prediction, drug and vaccine development, as well as artificial genome design and construction. In this article, we summarized the studies of essential genes, overviewed the relevant databases, and discussed their practical applications. Furthermore, we provided an overview of the main applications of DEG and conducted comprehensive analyses based on its latest version. However, it should be noted that the essential gene is a dynamic concept instead of a binary one, which presents both opportunities and challenges for their future development.
Collapse
Affiliation(s)
| | - Hao Luo
- Department of PhysicsTianjin UniversityTianjinChina
| | - Yan Lin
- Department of PhysicsTianjin UniversityTianjinChina
| | - Feng Gao
- Department of PhysicsTianjin UniversityTianjinChina
- Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education)Tianjin UniversityTianjinChina
- SynBio Research PlatformCollaborative Innovation Center of Chemical Science and Engineering (Tianjin)TianjinChina
| |
Collapse
|
3
|
Freischem LJ, Oyarzún DA. A Machine Learning Approach for Predicting Essentiality of Metabolic Genes. Methods Mol Biol 2024; 2760:345-369. [PMID: 38468098 DOI: 10.1007/978-1-0716-3658-9_20] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/13/2024]
Abstract
The identification of essential genes is a key challenge in systems and synthetic biology, particularly for engineering metabolic pathways that convert feedstocks into valuable products. Assessment of gene essentiality at a genome scale requires large and costly growth assays of knockout strains. Here we describe a strategy to predict the essentiality of metabolic genes using binary classification algorithms. The approach combines elements from genome-scale metabolic models, directed graphs, and machine learning into a predictive model that can be trained on small knockout data. We demonstrate the efficacy of this approach using the most complete metabolic model of Escherichia coli and various machine learning algorithms for binary classification.
Collapse
Affiliation(s)
| | - Diego A Oyarzún
- School of Informatics, University of Edinburgh, Edinburgh, UK.
- School of Biological Sciences, University of Edinburgh, Edinburgh, UK.
| |
Collapse
|
4
|
Giordano M, Falbo E, Maddalena L, Piccirillo M, Granata I. Untangling the Context-Specificity of Essential Genes by Means of Machine Learning: A Constructive Experience. Biomolecules 2023; 14:18. [PMID: 38254618 PMCID: PMC10813179 DOI: 10.3390/biom14010018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Revised: 11/29/2023] [Accepted: 12/20/2023] [Indexed: 01/24/2024] Open
Abstract
Gene essentiality is a genetic concept crucial for a comprehensive understanding of life and evolution. In the last decade, many essential genes (EGs) have been determined using different experimental and computational approaches, and this information has been used to reduce the genomes of model organisms. A growing amount of evidence highlights that essentiality is a property that depends on the context. Because of their importance in vital biological processes, recognising context-specific EGs (csEGs) could help for identifying new potential pharmacological targets and to improve precision therapeutics. Since most of the computational procedures proposed to identify and predict EGs neglect their context-specificity, we focused on this aspect, providing a theoretical and experimental overview of the literature, data and computational methods dedicated to recognising csEGs. To this end, we adapted existing computational methods to exploit a specific context (the kidney tissue) and experimented with four different prediction methods using the labels provided by four different identification approaches. The considerations derived from the analysis of the obtained results, confirmed and validated also by further experiments for a different tissue context, provide the reader with guidance on exploiting existing tools for achieving csEGs identification and prediction.
Collapse
Affiliation(s)
- Maurizio Giordano
- Institute for High-Performance Computing and Networking (ICAR), National Research Council (CNR), V. Pietro Castellino 111, 80131 Naples, Italy; (E.F.); (L.M.); (M.P.); (I.G.)
| | | | | | | | | |
Collapse
|
5
|
Lung cancer prediction using multi-gene genetic programming by selecting automatic features from amino acid sequences. Comput Biol Chem 2022; 98:107638. [DOI: 10.1016/j.compbiolchem.2022.107638] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2021] [Revised: 12/22/2021] [Accepted: 02/01/2022] [Indexed: 02/07/2023]
|
6
|
Campos TL, Korhonen PK, Hofmann A, Gasser RB, Young ND. Harnessing model organism genomics to underpin the machine learning-based prediction of essential genes in eukaryotes - Biotechnological implications. Biotechnol Adv 2021; 54:107822. [PMID: 34461202 DOI: 10.1016/j.biotechadv.2021.107822] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2021] [Revised: 08/17/2021] [Accepted: 08/24/2021] [Indexed: 12/17/2022]
Abstract
The availability of high-quality genomes and advances in functional genomics have enabled large-scale studies of essential genes in model eukaryotes, including the 'elegant worm' (Caenorhabditis elegans; Nematoda) and the 'vinegar fly' (Drosophila melanogaster; Arthropoda). However, this is not the case for other, much less-studied organisms, such as socioeconomically important parasites, for which functional genomic platforms usually do not exist. Thus, there is a need to develop innovative techniques or approaches for the prediction, identification and investigation of essential genes. A key approach that could enable the prediction of such genes is machine learning (ML). Here, we undertake an historical review of experimental and computational approaches employed for the characterisation of essential genes in eukaryotes, with a particular focus on model ecdysozoans (C. elegans and D. melanogaster), and discuss the possible applicability of ML-approaches to organisms such as socioeconomically important parasites. We highlight some recent results showing that high-performance ML, combined with feature engineering, allows a reliable prediction of essential genes from extensive, publicly available 'omic data sets, with major potential to prioritise such genes (with statistical confidence) for subsequent functional genomic validation. These findings could 'open the door' to fundamental and applied research areas. Evidence of some commonality in the essential gene-complement between these two organisms indicates that an ML-engineering approach could find broader applicability to ecdysozoans such as parasitic nematodes or arthropods, provided that suitably large and informative data sets become/are available for proper feature engineering, and for the robust training and validation of algorithms. This area warrants detailed exploration to, for example, facilitate the identification and characterisation of essential molecules as novel targets for drugs and vaccines against parasitic diseases. This focus is particularly important, given the substantial impact that such diseases have worldwide, and the current challenges associated with their prevention and control and with drug resistance in parasite populations.
Collapse
Affiliation(s)
- Tulio L Campos
- Department of Veterinary Biosciences, Melbourne Veterinary School, The University of Melbourne, Parkville, Victoria 3010, Australia; Bioinformatics Core Facility, Instituto Aggeu Magalhães, Fundação Oswaldo Cruz (IAM-Fiocruz), Recife, Pernambuco, Brazil
| | - Pasi K Korhonen
- Department of Veterinary Biosciences, Melbourne Veterinary School, The University of Melbourne, Parkville, Victoria 3010, Australia
| | - Andreas Hofmann
- Department of Veterinary Biosciences, Melbourne Veterinary School, The University of Melbourne, Parkville, Victoria 3010, Australia
| | - Robin B Gasser
- Department of Veterinary Biosciences, Melbourne Veterinary School, The University of Melbourne, Parkville, Victoria 3010, Australia.
| | - Neil D Young
- Department of Veterinary Biosciences, Melbourne Veterinary School, The University of Melbourne, Parkville, Victoria 3010, Australia.
| |
Collapse
|
7
|
de Souza ID, Reis CF, Morais DAA, Fernandes VGS, Cavalcante JVF, Dalmolin RJS. Ancestry analysis indicates two different sets of essential genes in eukaryotic model species. Funct Integr Genomics 2021; 21:523-531. [PMID: 34279742 DOI: 10.1007/s10142-021-00794-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2020] [Revised: 06/02/2021] [Accepted: 06/10/2021] [Indexed: 11/28/2022]
Abstract
Essential genes are so-called because they are crucial for organism perpetuation. Those genes are usually related to essential functions to cellular metabolism or multicellular homeostasis. Deleterious alterations on essential genes produce a spectrum of phenotypes in multicellular organisms. The effects range from the impairment of the fertilization process, disruption of fetal development, to loss of reproductive capacity. Essential genes are described as more evolutionarily conserved than non-essential genes. However, there is no consensus about the relationship between gene essentiality and gene age. Here, we identified essential genes in five model eukaryotic species (Saccharomyces cerevisiae, Schizosaccharomyces pombe, Drosophila melanogaster, Caenorhabditis elegans, and Mus musculus) and estimate their evolutionary ancestry and their network properties. We observed that essential genes, on average, are older than other genes in all species investigated. The relationship of network properties and gene essentiality convey with previous findings, showing essential genes as important nodes in biological networks. As expected, we also observed that essential orthologs shared by the five species evaluated here are old. However, all the species evaluated here have a specific set of young essential genes not shared among them. Additionally, these two groups of essential genes are involved with distinct biological functions, suggesting two sets of essential genes: (i) a set of old essential genes common to all the evaluated species, regulating basic cellular functions, and (ii) a set of young essential genes exclusive to each species, which perform specific essential functions in each species.
Collapse
Affiliation(s)
- Iara D de Souza
- Bioinformatics Multidisciplinary Environment - IMD, Federal University of Rio Grande Do Norte, Av. Odilon Gomes de Lima, 1722, Capim Macio, Natal, RN, 59078-400, Brazil
| | - Clovis F Reis
- Bioinformatics Multidisciplinary Environment - IMD, Federal University of Rio Grande Do Norte, Av. Odilon Gomes de Lima, 1722, Capim Macio, Natal, RN, 59078-400, Brazil
| | - Diego A A Morais
- Bioinformatics Multidisciplinary Environment - IMD, Federal University of Rio Grande Do Norte, Av. Odilon Gomes de Lima, 1722, Capim Macio, Natal, RN, 59078-400, Brazil
| | - Vítor G S Fernandes
- Bioinformatics Multidisciplinary Environment - IMD, Federal University of Rio Grande Do Norte, Av. Odilon Gomes de Lima, 1722, Capim Macio, Natal, RN, 59078-400, Brazil
| | - João Vitor F Cavalcante
- Bioinformatics Multidisciplinary Environment - IMD, Federal University of Rio Grande Do Norte, Av. Odilon Gomes de Lima, 1722, Capim Macio, Natal, RN, 59078-400, Brazil
| | - Rodrigo J S Dalmolin
- Bioinformatics Multidisciplinary Environment - IMD, Federal University of Rio Grande Do Norte, Av. Odilon Gomes de Lima, 1722, Capim Macio, Natal, RN, 59078-400, Brazil. .,Department of Biochemistry - CB, Federal University of Rio Grande Do Norte, Campus Universitário UFRN, Lagoa Nova, Natal, RN, 59078-970, Brazil.
| |
Collapse
|
8
|
Caldu-Primo JL, Verduzco-Martínez JA, Alvarez-Buylla ER, Davila-Velderrain J. In vivo and in vitro human gene essentiality estimations capture contrasting functional constraints. NAR Genom Bioinform 2021; 3:lqab063. [PMID: 34268495 PMCID: PMC8276763 DOI: 10.1093/nargab/lqab063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2021] [Revised: 06/18/2021] [Accepted: 07/07/2021] [Indexed: 11/28/2022] Open
Abstract
Gene essentiality estimation is a popular empirical approach to link genotypes to phenotypes. In humans, essentiality is estimated based on loss-of-function (LoF) mutation intolerance, either from population exome sequencing (in vivo) data or CRISPR-based in vitro perturbation experiments. Both approaches identify genes presumed to have detrimental consequences on the organism upon mutation. Are these genes constrained by having key cellular/organismal roles? Do in vivo and in vitro estimations equally recover these constraints? Insights into these questions have important implications in generalizing observations from cell models and interpreting disease risk genes. To empirically address these questions, we integrate genome-scale datasets and compare structural, functional and evolutionary features of essential genes versus genes with extremely high mutational tolerance. We found that essentiality estimates do recover functional constraints. However, the organismal or cellular context of estimation leads to functionally contrasting properties underlying the constraint. Our results suggest that depletion of LoF mutations in human populations effectively captures organismal-level functional constraints not experimentally accessible through CRISPR-based screens. Finally, we identify a set of genes (OrgEssential), which are mutationally intolerant in vivo but highly tolerant in vitro. These genes drive observed functional constraint differences and have an unexpected preference for nervous system expression.
Collapse
Affiliation(s)
- Jose Luis Caldu-Primo
- Instituto de Ecología, Universidad Nacional Autónoma de México, Cd. Universitaria, CDMX., 04510, México
| | - Jorge Armando Verduzco-Martínez
- Departamento de Biología Celular y Genética, Facultad de Ciencias Biológicas, Universidad Autónoma de Nuevo León, San Nicolás de los Garza, Nuevo León, 66400, México
| | - Elena R Alvarez-Buylla
- Instituto de Ecología, Universidad Nacional Autónoma de México, Cd. Universitaria, CDMX., 04510, México
| | | |
Collapse
|
9
|
Campos TL, Korhonen PK, Young ND. Cross-Predicting Essential Genes between Two Model Eukaryotic Species Using Machine Learning. Int J Mol Sci 2021; 22:5056. [PMID: 34064595 PMCID: PMC8150380 DOI: 10.3390/ijms22105056] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2021] [Revised: 05/07/2021] [Accepted: 05/08/2021] [Indexed: 12/24/2022] Open
Abstract
Experimental studies of Caenorhabditis elegans and Drosophila melanogaster have contributed substantially to our understanding of molecular and cellular processes in metazoans at large. Since the publication of their genomes, functional genomic investigations have identified genes that are essential or non-essential for survival in each species. Recently, a range of features linked to gene essentiality have been inferred using a machine learning (ML)-based approach, allowing essentiality predictions within a species. Nevertheless, predictions between species are still elusive. Here, we undertake a comprehensive study using ML to discover and validate features of essential genes common to both C. elegans and D. melanogaster. We demonstrate that the cross-species prediction of gene essentiality is possible using a subset of features linked to nucleotide/protein sequences, protein orthology and subcellular localisation, single-cell RNA-seq, and histone methylation markers. Complementary analyses showed that essential genes are enriched for transcription and translation functions and are preferentially located away from heterochromatin regions of C. elegans and D. melanogaster chromosomes. The present work should enable the cross-prediction of essential genes between model and non-model metazoans.
Collapse
Affiliation(s)
- Tulio L. Campos
- Department of Veterinary Biosciences, Melbourne Veterinary School, Faculty of Veterinary and Agricultural Sciences, The University of Melbourne, Parkville, VIC 3010, Australia; (T.L.C.); (P.K.K.)
- Bioinformatics Core Facility, Instituto Aggeu Magalhães, Fundação Oswaldo Cruz (IAM-Fiocruz), Recife 50740-465, PE, Brazil
| | - Pasi K. Korhonen
- Department of Veterinary Biosciences, Melbourne Veterinary School, Faculty of Veterinary and Agricultural Sciences, The University of Melbourne, Parkville, VIC 3010, Australia; (T.L.C.); (P.K.K.)
| | - Neil D. Young
- Department of Veterinary Biosciences, Melbourne Veterinary School, Faculty of Veterinary and Agricultural Sciences, The University of Melbourne, Parkville, VIC 3010, Australia; (T.L.C.); (P.K.K.)
| |
Collapse
|
10
|
Reynolds KA, Rosa-Molinar E, Ward RE, Zhang H, Urbanowicz BR, Settles AM. Accelerating biological insight for understudied genes. Integr Comp Biol 2021; 61:2233-2243. [PMID: 33970251 DOI: 10.1093/icb/icab029] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The rapid expansion of genome sequence data is increasing the discovery of protein-coding genes across all domains of life. Annotating these genes with reliable functional information is necessary to understand evolution, to define the full biochemical space accessed by nature, and to identify target genes for biotechnology improvements. The vast majority of proteins are annotated based on sequence conservation with no specific biological, biochemical, genetic, or cellular function identified. Recent technical advances throughout the biological sciences enable experimental research on these understudied protein-coding genes in a broader collection of species. However, scientists have incentives and biases to continue focusing on well documented genes within their preferred model organism. This perspective suggests a research model that seeks to break historic silos of research bias by enabling interdisciplinary teams to accelerate biological functional annotation. We propose an initiative to develop coordinated projects of collaborating evolutionary biologists, cell biologists, geneticists, and biochemists that will focus on subsets of target genes in multiple model organisms. Concurrent analysis in multiple organisms takes advantage of evolutionary divergence and selection, which causes individual species to be better suited as experimental models for specific genes. Most importantly, multisystem approaches would encourage transdisciplinary critical thinking and hypothesis testing that is inherently slow in current biological research.
Collapse
Affiliation(s)
- Kimberly A Reynolds
- The Green Center for Systems Biology and the Department of Biophysics, The University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Eduardo Rosa-Molinar
- Department of Pharmacology & Toxicology, The University of Kansas, Lawrence, KS 66047, USA
| | - Robert E Ward
- Department of Biology, Case Western Reserve University, Cleveland, OH 44106, USA
| | - Hongbin Zhang
- Department of Soil and Crop Sciences, Texas A&M University, College Station, TX 77843, USA
| | - Breeanna R Urbanowicz
- Department of Biochemistry and Molecular Biology, University of Georgia, Athens, Georgia 30602, USA
| | - A Mark Settles
- Bioengineering Branch, NASA Ames Research Center, Moffett Field, CA USA
| |
Collapse
|
11
|
Henkel L, Rauscher B, Schmitt B, Winter J, Boutros M. Genome-scale CRISPR screening at high sensitivity with an empirically designed sgRNA library. BMC Biol 2020; 18:174. [PMID: 33228647 PMCID: PMC7686728 DOI: 10.1186/s12915-020-00905-1] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2020] [Accepted: 10/20/2020] [Indexed: 12/22/2022] Open
Abstract
Background In recent years, large-scale genetic screens using the CRISPR/Cas9 system have emerged as scalable approaches able to interrogate gene function with unprecedented efficiency and specificity in various biological contexts. By this means, functional dependencies on both the protein-coding and noncoding genome of numerous cell types in different organisms have been interrogated. However, screening designs vary greatly and criteria for optimal experimental implementation and library composition are still emerging. Given their broad utility in functionally annotating genomes, the application and interpretation of genome-scale CRISPR screens would greatly benefit from consistent and optimal design criteria. Results We report advantages of conducting viability screens in selected Cas9 single-cell clones in contrast to Cas9 bulk populations. We further systematically analyzed published CRISPR screens in human cells to identify single-guide (sg) RNAs with consistent high on-target and low off-target activity. Selected guides were collected in a novel genome-scale sgRNA library, which efficiently identifies core and context-dependent essential genes. Conclusion We show how empirically designed libraries in combination with an optimized experimental design increase the dynamic range in gene essentiality screens at reduced library coverage. Supplementary information The online version contains supplementary material available at 10.1186/s12915-020-00905-1.
Collapse
Affiliation(s)
- Luisa Henkel
- German Cancer Research Center (DKFZ), Division Signaling and Functional Genomics and Heidelberg University, BioQuant and Medical Faculty Mannheim, D-69120, Heidelberg, Germany
| | - Benedikt Rauscher
- German Cancer Research Center (DKFZ), Division Signaling and Functional Genomics and Heidelberg University, BioQuant and Medical Faculty Mannheim, D-69120, Heidelberg, Germany
| | - Barbara Schmitt
- German Cancer Research Center (DKFZ), Division Signaling and Functional Genomics and Heidelberg University, BioQuant and Medical Faculty Mannheim, D-69120, Heidelberg, Germany
| | - Jan Winter
- German Cancer Research Center (DKFZ), Division Signaling and Functional Genomics and Heidelberg University, BioQuant and Medical Faculty Mannheim, D-69120, Heidelberg, Germany
| | - Michael Boutros
- German Cancer Research Center (DKFZ), Division Signaling and Functional Genomics and Heidelberg University, BioQuant and Medical Faculty Mannheim, D-69120, Heidelberg, Germany.
| |
Collapse
|
12
|
Weiße J, Rosemann J, Krauspe V, Kappler M, Eckert AW, Haemmerle M, Gutschner T. RNA-Binding Proteins as Regulators of Migration, Invasion and Metastasis in Oral Squamous Cell Carcinoma. Int J Mol Sci 2020; 21:E6835. [PMID: 32957697 PMCID: PMC7555251 DOI: 10.3390/ijms21186835] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2020] [Revised: 09/14/2020] [Accepted: 09/17/2020] [Indexed: 02/06/2023] Open
Abstract
Nearly 7.5% of all human protein-coding genes have been assigned to the class of RNA-binding proteins (RBPs), and over the past decade, RBPs have been increasingly recognized as important regulators of molecular and cellular homeostasis. RBPs regulate the post-transcriptional processing of their target RNAs, i.e., alternative splicing, polyadenylation, stability and turnover, localization, or translation as well as editing and chemical modification, thereby tuning gene expression programs of diverse cellular processes such as cell survival and malignant spread. Importantly, metastases are the major cause of cancer-associated deaths in general, and particularly in oral cancers, which account for 2% of the global cancer mortality. However, the roles and architecture of RBPs and RBP-controlled expression networks during the diverse steps of the metastatic cascade are only incompletely understood. In this review, we will offer a brief overview about RBPs and their general contribution to post-transcriptional regulation of gene expression. Subsequently, we will highlight selected examples of RBPs that have been shown to play a role in oral cancer cell migration, invasion, and metastasis. Last but not least, we will present targeting strategies that have been developed to interfere with the function of some of these RBPs.
Collapse
Affiliation(s)
- Jonas Weiße
- Junior Research Group ‘RNA Biology and Pathogenesis’, Medical Faculty, Martin-Luther University Halle-Wittenberg, 06120 Halle/Saale, Germany; (J.W.); (J.R.); (V.K.)
| | - Julia Rosemann
- Junior Research Group ‘RNA Biology and Pathogenesis’, Medical Faculty, Martin-Luther University Halle-Wittenberg, 06120 Halle/Saale, Germany; (J.W.); (J.R.); (V.K.)
| | - Vanessa Krauspe
- Junior Research Group ‘RNA Biology and Pathogenesis’, Medical Faculty, Martin-Luther University Halle-Wittenberg, 06120 Halle/Saale, Germany; (J.W.); (J.R.); (V.K.)
| | - Matthias Kappler
- Department of Oral and Maxillofacial Plastic Surgery, Medical Faculty, Martin Luther University Halle-Wittenberg, 06120 Halle (Saale), Germany;
| | - Alexander W. Eckert
- Department of Cranio Maxillofacial Surgery, Paracelsus Medical University, 90471 Nuremberg, Germany;
| | - Monika Haemmerle
- Institute of Pathology, Section for Experimental Pathology, Medical Faculty, Martin-Luther University Halle-Wittenberg, 06120 Halle/Saale, Germany;
| | - Tony Gutschner
- Junior Research Group ‘RNA Biology and Pathogenesis’, Medical Faculty, Martin-Luther University Halle-Wittenberg, 06120 Halle/Saale, Germany; (J.W.); (J.R.); (V.K.)
| |
Collapse
|
13
|
Campos TL, Korhonen PK, Hofmann A, Gasser RB, Young ND. Combined use of feature engineering and machine-learning to predict essential genes in Drosophila melanogaster. NAR Genom Bioinform 2020; 2:lqaa051. [PMID: 33575603 PMCID: PMC7671374 DOI: 10.1093/nargab/lqaa051] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2020] [Revised: 06/05/2020] [Accepted: 07/04/2020] [Indexed: 12/17/2022] Open
Abstract
Characterizing genes that are critical for the survival of an organism (i.e. essential) is important to gain a deep understanding of the fundamental cellular and molecular mechanisms that sustain life. Functional genomic investigations of the vinegar fly, Drosophila melanogaster, have unravelled the functions of numerous genes of this model species, but results from phenomic experiments can sometimes be ambiguous. Moreover, the features underlying gene essentiality are poorly understood, posing challenges for computational prediction. Here, we harnessed comprehensive genomic-phenomic datasets publicly available for D. melanogaster and a machine-learning-based workflow to predict essential genes of this fly. We discovered strong predictors of such genes, paving the way for computational predictions of essentiality in less-studied arthropod pests and vectors of infectious diseases.
Collapse
Affiliation(s)
- Tulio L Campos
- Department of Veterinary Biosciences, Melbourne Veterinary School, The University of Melbourne, Parkville, Victoria 3010, Australia
| | - Pasi K Korhonen
- Department of Veterinary Biosciences, Melbourne Veterinary School, The University of Melbourne, Parkville, Victoria 3010, Australia
| | - Andreas Hofmann
- Department of Veterinary Biosciences, Melbourne Veterinary School, The University of Melbourne, Parkville, Victoria 3010, Australia
| | - Robin B Gasser
- Department of Veterinary Biosciences, Melbourne Veterinary School, The University of Melbourne, Parkville, Victoria 3010, Australia
| | - Neil D Young
- Department of Veterinary Biosciences, Melbourne Veterinary School, The University of Melbourne, Parkville, Victoria 3010, Australia
| |
Collapse
|
14
|
Campos TL, Korhonen PK, Sternberg PW, Gasser RB, Young ND. Predicting gene essentiality in Caenorhabditis elegans by feature engineering and machine-learning. Comput Struct Biotechnol J 2020; 18:1093-1102. [PMID: 32489524 PMCID: PMC7251299 DOI: 10.1016/j.csbj.2020.05.008] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2020] [Revised: 05/01/2020] [Accepted: 05/06/2020] [Indexed: 02/08/2023] Open
Abstract
Defining genes that are essential for life has major implications for understanding critical biological processes and mechanisms. Although essential genes have been identified and characterised experimentally using functional genomic tools, it is challenging to predict with confidence such genes from molecular and phenomic data sets using computational methods. Using extensive data sets available for the model organism Caenorhabditis elegans, we constructed here a machine-learning (ML)-based workflow for the prediction of essential genes on a genome-wide scale. We identified strong predictors for such genes and showed that trained ML models consistently achieve highly-accurate classifications. Complementary analyses revealed an association between essential genes and chromosomal location. Our findings reveal that essential genes in C. elegans tend to be located in or near the centre of autosomal chromosomes; are positively correlated with low single nucleotide polymorphim (SNP) densities and epigenetic markers in promoter regions; are involved in protein and nucleotide processing; are transcribed in most cells; are enriched in reproductive tissues or are targets for small RNAs bound to the argonaut CSR-1. Based on these results, we hypothesise an interplay between epigenetic markers and small RNA pathways in the germline, with transcription-based memory; this hypothesis warrants testing. From a technical perspective, further work is needed to evaluate whether the present ML-based approach will be applicable to other metazoans (including Drosophila melanogaster) for which comprehensive data sets (i.e. genomic, transcriptomic, proteomic, variomic, epigenetic and phenomic) are available.
Collapse
Key Words
- CDS, coding sequence
- CRISPR, Clustered Regularly Interspaced Short Palindromic Repeats
- Caenorhabditis elegans
- ES, Essentiality Score
- EST, expressed sequence tag
- Essential genes
- Essentiality predictions
- GBM, Gradient Boosting Method
- GFF, general feature format
- GLM, Generalised Linear Model
- GO, gene ontology
- ML, machine-learning
- Machine-learning
- NN, Artificial Neural Network
- PPI, protein-protein interaction
- PR-AUC, Area Under the Precision-Recall Curve
- RF, Random Forest
- RNAi, RNA interference
- ROC-AUC, Area Under the Receiver Operating Characteristic Curve
- SNP, single nucleotide polymorphism
- SPLS, Sparse Partial Least Squares
- SVM, Support-Vector Machine
- TEA, Tissue Enrichment Analysis tool (WormBase)
- TSS, transcription start site
- VCF, variant call file
Collapse
Affiliation(s)
- Tulio L Campos
- Department of Veterinary Biosciences, Melbourne Veterinary School, The University of Melbourne, Parkville, Victoria 3010, Australia.,Instituto Aggeu Magalhães, Fundação Oswaldo Cruz (IAM-Fiocruz), Recife, Pernambuco, Brazil
| | - Pasi K Korhonen
- Department of Veterinary Biosciences, Melbourne Veterinary School, The University of Melbourne, Parkville, Victoria 3010, Australia
| | - Paul W Sternberg
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, United States
| | - Robin B Gasser
- Department of Veterinary Biosciences, Melbourne Veterinary School, The University of Melbourne, Parkville, Victoria 3010, Australia
| | - Neil D Young
- Department of Veterinary Biosciences, Melbourne Veterinary School, The University of Melbourne, Parkville, Victoria 3010, Australia
| |
Collapse
|
15
|
Pei J, Kinch LN, Otwinowski Z, Grishin NV. Mutation severity spectrum of rare alleles in the human genome is predictive of disease type. PLoS Comput Biol 2020; 16:e1007775. [PMID: 32413045 PMCID: PMC7255613 DOI: 10.1371/journal.pcbi.1007775] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2019] [Revised: 05/28/2020] [Accepted: 03/06/2020] [Indexed: 12/19/2022] Open
Abstract
The human genome harbors a variety of genetic variations. Single-nucleotide changes that alter amino acids in protein-coding regions are one of the major causes of human phenotypic variation and diseases. These single-amino acid variations (SAVs) are routinely found in whole genome and exome sequencing. Evaluating the functional impact of such genomic alterations is crucial for diagnosis of genetic disorders. We developed DeepSAV, a deep-learning convolutional neural network to differentiate disease-causing and benign SAVs based on a variety of protein sequence, structural and functional properties. Our method outperforms most stand-alone programs, and the version incorporating population and gene-level information (DeepSAV+PG) has similar predictive power as some of the best available. We transformed DeepSAV scores of rare SAVs in the human population into a quantity termed "mutation severity measure" for each human protein-coding gene. It reflects a gene's tolerance to deleterious missense mutations and serves as a useful tool to study gene-disease associations. Genes implicated in cancer, autism, and viral interaction are found by this measure as intolerant to mutations, while genes associated with a number of other diseases are scored as tolerant. Among known disease-associated genes, those that are mutation-intolerant are likely to function in development and signal transduction pathways, while those that are mutation-tolerant tend to encode metabolic and mitochondrial proteins.
Collapse
Affiliation(s)
- Jimin Pei
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
| | - Lisa N. Kinch
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
| | - Zbyszek Otwinowski
- Departments of Biophysics and Biochemistry, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
| | - Nick V. Grishin
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
- Departments of Biophysics and Biochemistry, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
- * E-mail:
| |
Collapse
|
16
|
Liu Y, Wu M, Liu C, Li XL, Zheng J. SL 2MF: Predicting Synthetic Lethality in Human Cancers via Logistic Matrix Factorization. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:748-757. [PMID: 30969932 DOI: 10.1109/tcbb.2019.2909908] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Synthetic lethality (SL) is a promising concept for novel discovery of anti-cancer drug targets. However, wet-lab experiments for detecting SLs are faced with various challenges, such as high cost, low consistency across platforms, or cell lines. Therefore, computational prediction methods are needed to address these issues. This paper proposes a novel SL prediction method, named SL2 MF, which employs logistic matrix factorization to learn latent representations of genes from the observed SL data. The probability that two genes are likely to form SL is modeled by the linear combination of gene latent vectors. As known SL pairs are more trustworthy than unknown pairs, we design importance weighting schemes to assign higher importance weights for known SL pairs and lower importance weights for unknown pairs in SL2 MF. Moreover, we also incorporate biological knowledge about genes from protein-protein interaction (PPI) data and Gene Ontology (GO). In particular, we calculate the similarity between genes based on their GO annotations and topological properties in the PPI network. Extensive experiments on the SL interaction data from SynLethDB database have been conducted to demonstrate the effectiveness of SL2 MF.
Collapse
|
17
|
Meinke DW. Genome-wide identification of EMBRYO-DEFECTIVE (EMB) genes required for growth and development in Arabidopsis. THE NEW PHYTOLOGIST 2020; 226:306-325. [PMID: 31334862 DOI: 10.1111/nph.16071] [Citation(s) in RCA: 95] [Impact Index Per Article: 23.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/02/2019] [Accepted: 07/10/2019] [Indexed: 05/20/2023]
Abstract
With the emergence of high-throughput methods in plant biology, the importance of long-term projects characterized by incremental advances involving multiple laboratories can sometimes be overlooked. Here, I highlight my 40-year effort to isolate and characterize the most common class of mutants encountered in Arabidopsis (Arabidopsis thaliana): those defective in embryo development. I present an updated dataset of 510 EMBRYO-DEFECTIVE (EMB) genes identified throughout the Arabidopsis community; include important details on 2200 emb mutants and 241 pigment-defective embryo (pde) mutants analyzed in my laboratory; provide curated datasets with key features and publication links for each EMB gene identified; revisit past estimates of 500-1000 total EMB genes in Arabidopsis; document 83 double mutant combinations reported to disrupt embryo development; emphasize the importance of following established nomenclature guidelines and acknowledging allele history in research publications; and consider how best to extend community-based curation and screening efforts to approach saturation for this diverse class of mutants in the future. Continued advances in identifying EMB genes and characterizing their loss-of-function mutant alleles are needed to understand genotype-to-phenotype relationships in Arabidopsis on a broad scale, and to document the contributions of large numbers of essential genes to plant growth and development.
Collapse
Affiliation(s)
- David W Meinke
- Department of Plant Biology, Ecology, and Evolution, Oklahoma State University, Stillwater, OK, 74078, USA
| |
Collapse
|
18
|
An Evaluation of Machine Learning Approaches for the Prediction of Essential Genes in Eukaryotes Using Protein Sequence-Derived Features. Comput Struct Biotechnol J 2019; 17:785-796. [PMID: 31312416 PMCID: PMC6607062 DOI: 10.1016/j.csbj.2019.05.008] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2019] [Revised: 05/23/2019] [Accepted: 05/26/2019] [Indexed: 12/23/2022] Open
Abstract
The availability of whole-genome sequences and associated multi-omics data sets, combined with advances in gene knockout and knockdown methods, has enabled large-scale annotation and exploration of gene and protein functions in eukaryotes. Knowing which genes are essential for the survival of eukaryotic organisms is paramount for an understanding of the basic mechanisms of life, and could assist in identifying intervention targets in eukaryotic pathogens and cancer. Here, we studied essential gene orthologs among selected species of eukaryotes, and then employed a systematic machine-learning approach, using protein sequence-derived features and selection procedures, to investigate essential gene predictions within and among species. We showed that the numbers of essential gene orthologs comprise small fractions when compared with the total number of orthologs among the eukaryotic species studied. In addition, we demonstrated that machine-learning models trained with subsets of essentiality-related data performed better than random guessing of gene essentiality for a particular species. Consistent with our gene ortholog analysis, the predictions of essential genes among multiple (including distantly-related) species is possible, yet challenging, suggesting that most essential genes are unique to a species. The present work provides a foundation for the expansion of genome-wide essentiality investigations in eukaryotes using machine learning approaches.
Collapse
Key Words
- CRISPR, Clustered regularly interspaced short palindromic repeats
- Essential genes
- Essentiality prediction
- Eukaryotes
- GBM, Gradient boosting method
- GI, Genetic interaction
- GLM, Generalised linear model
- GO, Gene ontology
- ML, Machine-learning
- Machine-learning
- NN, Artificial neural network
- OGEE, Online GEne essentiality database
- PPI, Protein-protein interaction
- PR-AUC, Area under the precision-recall curve
- RF, Random Forest
- RNAi, RNA interference
- ROC-AUC, Area under the receiver operating characteristic curve
- SPLS, Sparse partial least squares
- SVM, Support-Vector machine
Collapse
|
19
|
Karakitsou E, Foguet C, de Atauri P, Kultima K, Khoonsari PE, Martins dos Santos VA, Saccenti E, Rosato A, Cascante M. Metabolomics in systems medicine: an overview of methods and applications. ACTA ACUST UNITED AC 2019. [DOI: 10.1016/j.coisb.2019.03.009] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|
20
|
Price C, Gill S, Ho ZV, Davidson SM, Merkel E, McFarland JM, Leung L, Tang A, Kost-Alimova M, Tsherniak A, Jonas O, Vazquez F, Hahn WC. Genome-Wide Interrogation of Human Cancers Identifies EGLN1 Dependency in Clear Cell Ovarian Cancers. Cancer Res 2019; 79:2564-2579. [PMID: 30898838 DOI: 10.1158/0008-5472.can-18-2674] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2018] [Revised: 01/18/2019] [Accepted: 03/14/2019] [Indexed: 12/17/2022]
Abstract
We hypothesized that candidate dependencies for which there are small molecules that are either approved or in advanced development for a nononcology indication may represent potential therapeutic targets. To test this hypothesis, we performed genome-scale loss-of-function screens in hundreds of cancer cell lines. We found that knockout of EGLN1, which encodes prolyl hydroxylase domain-containing protein 2 (PHD2), reduced the proliferation of a subset of clear cell ovarian cancer cell lines in vitro. EGLN1-dependent cells exhibited sensitivity to the pan-EGLN inhibitor FG-4592. The response to FG-4592 was reversed by deletion of HIF1A, demonstrating that EGLN1 dependency was related to negative regulation of HIF1A. We also found that ovarian clear cell tumors susceptible to both genetic and pharmacologic inhibition of EGLN1 required intact HIF1A. Collectively, these observations identify EGLN1 as a cancer target with therapeutic potential. SIGNIFICANCE: These findings reveal a differential dependency of clear cell ovarian cancers on EGLN1, thus identifying EGLN1 as a potential therapeutic target in clear cell ovarian cancer patients.
Collapse
Affiliation(s)
- Colles Price
- Broad Institute of Harvard and MIT, Cambridge, Massachusetts.,Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts.,Harvard Medical School, Boston, Massachusetts
| | - Stanley Gill
- Broad Institute of Harvard and MIT, Cambridge, Massachusetts.,Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts
| | - Zandra V Ho
- Broad Institute of Harvard and MIT, Cambridge, Massachusetts
| | - Shawn M Davidson
- Broad Institute of Harvard and MIT, Cambridge, Massachusetts.,Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, Massachusetts
| | - Erin Merkel
- Broad Institute of Harvard and MIT, Cambridge, Massachusetts
| | | | - Lisa Leung
- Broad Institute of Harvard and MIT, Cambridge, Massachusetts
| | - Andrew Tang
- Broad Institute of Harvard and MIT, Cambridge, Massachusetts
| | | | - Aviad Tsherniak
- Broad Institute of Harvard and MIT, Cambridge, Massachusetts
| | - Oliver Jonas
- Broad Institute of Harvard and MIT, Cambridge, Massachusetts.,Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, Massachusetts
| | - Francisca Vazquez
- Broad Institute of Harvard and MIT, Cambridge, Massachusetts.,Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts
| | - William C Hahn
- Broad Institute of Harvard and MIT, Cambridge, Massachusetts. .,Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts.,Harvard Medical School, Boston, Massachusetts.,Department of Medicine, Brigham and Women's Hospital, Boston, Massachusetts
| |
Collapse
|
21
|
Systematic analysis reveals the prevalence and principles of bypassable gene essentiality. Nat Commun 2019; 10:1002. [PMID: 30824696 PMCID: PMC6397241 DOI: 10.1038/s41467-019-08928-1] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2018] [Accepted: 02/07/2019] [Indexed: 12/12/2022] Open
Abstract
Gene essentiality is a variable phenotypic trait, but to what extent and how essential genes can become dispensable for viability remain unclear. Here, we investigate 'bypass of essentiality (BOE)' - an underexplored type of digenic genetic interaction that renders essential genes dispensable. Through analyzing essential genes on one of the six chromosome arms of the fission yeast Schizosaccharomyces pombe, we find that, remarkably, as many as 27% of them can be converted to non-essential genes by BOE interactions. Using this dataset we identify three principles of essentiality bypass: bypassable essential genes tend to have lower importance, tend to exhibit differential essentiality between species, and tend to act with other bypassable genes. In addition, we delineate mechanisms underlying bypassable essentiality, including the previously unappreciated mechanism of dormant redundancy between paralogs. The new insights gained on bypassable essentiality deepen our understanding of genotype-phenotype relationships and will facilitate drug development related to essential genes.
Collapse
|
22
|
Sierzputowska K, Baxter CR, Housden BE. Variable Dose Analysis: A Novel High-throughput RNAi Screening Method for Drosophila Cells. Bio Protoc 2018; 8:e3112. [PMID: 34532554 DOI: 10.21769/bioprotoc.3112] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2018] [Revised: 10/07/2018] [Accepted: 10/10/2018] [Indexed: 11/02/2022] Open
Abstract
Genetic screens are a powerful approach to identify previously uncharacterized genes involved in specific biological processes. Several technologies have been developed for high-throughput screens using reagents such as RNAi or CRISPR, and each approach is associated with specific advantages and disadvantages. Variable Dose Analysis (VDA), is an RNAi-based method developed in Drosophila cells that improves signal-to-noise ratio compared to previous methods. VDA assays are performed by co-transfecting cells with a plasmid expressing shRNA, (a type of RNAi that can be easily expressed from a DNA plasmid) against a gene of interest and a second plasmid expressing a fluorescent reporter protein. Fluorescent protein expression, can be used as an indirect readout of shRNA expression and therefore target gene knockdown efficiency. Using this approach, we can measure phenotypes over a range of knockdown efficiencies in a single sample. When applied to genetic interaction screens, VDA results in improved consistency between screens and reliable detection of known interactions. Furthermore, because phenotypes are analyzed over a range of target gene knockdown efficiencies, VDA allows the detection of phenotypes and genetic interactions involving essential genes at sub-lethal knockdown efficiency. This therefore represents a powerful approach to high-throughput screening applicable to a wide range of biological questions.
Collapse
Affiliation(s)
- Katarzyna Sierzputowska
- Living Systems Institute, University of Exeter, Exeter, United Kingdom.,College of Life and Environmental Sciences, University of Exeter, Exeter, United Kingdom.,College of Medicine and Health, University of Exeter, Exeter, United Kingdom
| | - Chris R Baxter
- Living Systems Institute, University of Exeter, Exeter, United Kingdom.,College of Medicine and Health, University of Exeter, Exeter, United Kingdom
| | - Benjamin E Housden
- Living Systems Institute, University of Exeter, Exeter, United Kingdom.,College of Medicine and Health, University of Exeter, Exeter, United Kingdom
| |
Collapse
|
23
|
Yu S, Zheng C, Zhou F, Baillie DL, Rose AM, Deng Z, Chu JSC. Genomic identification and functional analysis of essential genes in Caenorhabditis elegans. BMC Genomics 2018; 19:871. [PMID: 30514206 PMCID: PMC6278001 DOI: 10.1186/s12864-018-5251-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2018] [Accepted: 11/14/2018] [Indexed: 11/27/2022] Open
Abstract
Background Essential genes are required for an organism’s viability and their functions can vary greatly, spreading across many pathways. Due to the importance of essential genes, large scale efforts have been undertaken to identify the complete set of essential genes and to understand their function. Studies of genome architecture and organization have found that genes are not randomly disturbed in the genome. Results Using combined genetic mapping, Illumina sequencing, and bioinformatics analyses, we successfully identified 44 essential genes with 130 lethal mutations in genomic regions of C. elegans of around 7.3 Mb from Chromosome I (left). Of the 44 essential genes, six of which were genes not characterized previously by mutant alleles, let-633/let-638 (B0261.1), let-128 (C53H9.2), let-511 (W09C3.4), let-162 (Y47G6A.18), let-510 (Y47G6A.19), and let-131 (Y71G12B.6). Examine essential genes with Hi-C data shows that essential genes tend to cluster within TAD units rather near TAD boundaries. We have also shown that essential genes in the left half of chromosome I in C. elegans function in enzyme and nucleic acid binding activities during fundamental processes, such as DNA replication, transcription, and translation. From protein-protein interaction networks, essential genes exhibit more protein connectivity than non-essential genes in the genome. Also, many of the essential genes show strong expression in embryos or early larvae stages, indicating that they are important to early development. Conclusions Our results confirmed that this work provided a more comprehensive picture of the essential gene and their functional characterization. These genetic resources will offer important tools for further heath and disease research. Electronic supplementary material The online version of this article (10.1186/s12864-018-5251-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Shicheng Yu
- Key Laboratory of Combinatorial Biosynthesis and Drug Discovery, Ministry of Education, School of Pharmaceutical Sciences, Wuhan University, Wuhan, 430071, China. .,Wuhan Frasergen Bioinformatics, Wuhan East Lake High-tech Zone, Wuhan, 430075, China.
| | - Chaoran Zheng
- Key Laboratory of Combinatorial Biosynthesis and Drug Discovery, Ministry of Education, School of Pharmaceutical Sciences, Wuhan University, Wuhan, 430071, China
| | - Fan Zhou
- Wuhan Frasergen Bioinformatics, Wuhan East Lake High-tech Zone, Wuhan, 430075, China
| | - David L Baillie
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC, V5A 1S6, Canada
| | - Ann M Rose
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada
| | - Zixin Deng
- Key Laboratory of Combinatorial Biosynthesis and Drug Discovery, Ministry of Education, School of Pharmaceutical Sciences, Wuhan University, Wuhan, 430071, China.
| | | |
Collapse
|
24
|
Rani S, Sharma A, Goel M. Insights into archaeal chaperone machinery: a network-based approach. Cell Stress Chaperones 2018; 23:1257-1274. [PMID: 30178307 PMCID: PMC6237683 DOI: 10.1007/s12192-018-0933-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2018] [Revised: 08/03/2018] [Accepted: 08/20/2018] [Indexed: 11/30/2022] Open
Abstract
Molecular chaperones are a diverse group of proteins that ensure proteome integrity by helping the proteins fold correctly and maintain their native state, thus preventing their misfolding and subsequent aggregation. The chaperone machinery of archaeal organisms has been thought to closely resemble that found in humans, at least in terms of constituent players. Very few studies have been ventured into system-level analysis of chaperones and their functioning in archaeal cells. In this study, we attempted such an analysis of chaperone-assisted protein folding in archaeal organisms through network approach using Picrophilus torridus as model system. The study revealed that DnaK protein of Hsp70 system acts as hub in protein-protein interaction network. However, DnaK protein was present only in a subset of archaeal organisms and absent from many archaea, especially members of Crenarchaeota phylum. Therefore, a similar network was created for another archaeal organism, Sulfolobus solfataricus, a member of Crenarchaeota. The chaperone network of S. solfataricus suggested that thermosomes played an integral part of hub proteins in archaeal organisms, where DnaK was absent. We further compared the chaperone network of archaea with that found in eukaryotic systems, by creating a similar network for Homo sapiens. In the human chaperone network, the UBC protein, a part of ubiquitination system, was the most important module, and interestingly, this system is known to be absent in archaeal organisms. Comprehensive comparison of these networks leads to several interesting conclusions regarding similarities and differences within archaeal chaperone machinery in comparison to humans.
Collapse
Affiliation(s)
- Shikha Rani
- Department of Biophysics, University of Delhi South Campus, Benito Jurarez Road, New Delhi, 110021, India
| | - Ankush Sharma
- Department of Molecular Genetics, Erasmus University Medical Center, Rotterdam, The Netherlands
- Center for Computational Science, University of Miami, Miami, FL, USA
| | - Manisha Goel
- Department of Biophysics, University of Delhi South Campus, Benito Jurarez Road, New Delhi, 110021, India.
| |
Collapse
|
25
|
Czeizler E, Wu KC, Gratie C, Kanhaiya K, Petre I. Structural Target Controllability of Linear Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:1217-1228. [PMID: 29994605 DOI: 10.1109/tcbb.2018.2797271] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Computational analysis of the structure of intra-cellular molecular interaction networks can suggest novel therapeutic approaches for systemic diseases like cancer. Recent research in the area of network science has shown that network control theory can be a powerful tool in the understanding and manipulation of such bio-medical networks. In 2011, Liu et al. developed a polynomial time algorithm computing the size of the minimal set of nodes controlling a linear network. In 2014, Gao et al. generalized the problem for target control, minimizing the set of nodes controlling a target within a linear network. The authors developed a Greedy approximation algorithm while leaving open the complexity of the optimization problem. We prove here that the target controllability problem is NP-hard in all practical setups, i.e., when the control power of any individual input is bounded by some constant. We also show that the algorithm provided by Gao et al. fails to provide a valid solution in some special cases, and an additional validation step is required. We fix and improve their algorithm using several heuristics, obtaining in the end an up to 10-fold decrease in running time and also a decrease in the size of solutions.
Collapse
|
26
|
Rauscher B, Heigwer F, Henkel L, Hielscher T, Voloshanenko O, Boutros M. Toward an integrated map of genetic interactions in cancer cells. Mol Syst Biol 2018; 14:e7656. [PMID: 29467179 PMCID: PMC5820685 DOI: 10.15252/msb.20177656] [Citation(s) in RCA: 55] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2017] [Revised: 01/20/2018] [Accepted: 01/23/2018] [Indexed: 12/13/2022] Open
Abstract
Cancer genomes often harbor hundreds of molecular aberrations. Such genetic variants can be drivers or passengers of tumorigenesis and create vulnerabilities for potential therapeutic exploitation. To identify genotype-dependent vulnerabilities, forward genetic screens in different genetic backgrounds have been conducted. We devised MINGLE, a computational framework to integrate CRISPR/Cas9 screens originating from different libraries building on approaches pioneered for genetic network discovery in model organisms. We applied this method to integrate and analyze data from 85 CRISPR/Cas9 screens in human cancer cells combining functional data with information on genetic variants to explore more than 2.1 million gene-background relationships. In addition to known dependencies, we identified new genotype-specific vulnerabilities of cancer cells. Experimental validation of predicted vulnerabilities identified GANAB and PRKCSH as new positive regulators of Wnt/β-catenin signaling. By clustering genes with similar genetic interaction profiles, we drew the largest genetic network in cancer cells to date. Our scalable approach highlights how diverse genetic screens can be integrated to systematically build informative maps of genetic interactions in cancer, which can grow dynamically as more data are included.
Collapse
Affiliation(s)
- Benedikt Rauscher
- Division of Signaling and Functional Genomics, German Cancer Research Center (DKFZ), Heidelberg, Germany
- Department of Cell and Molecular Biology, Medical Faculty Mannheim, Heidelberg University, Heidelberg, Germany
| | - Florian Heigwer
- Division of Signaling and Functional Genomics, German Cancer Research Center (DKFZ), Heidelberg, Germany
- Department of Cell and Molecular Biology, Medical Faculty Mannheim, Heidelberg University, Heidelberg, Germany
| | - Luisa Henkel
- Division of Signaling and Functional Genomics, German Cancer Research Center (DKFZ), Heidelberg, Germany
- Department of Cell and Molecular Biology, Medical Faculty Mannheim, Heidelberg University, Heidelberg, Germany
| | - Thomas Hielscher
- Division of Biostatistics, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Oksana Voloshanenko
- Division of Signaling and Functional Genomics, German Cancer Research Center (DKFZ), Heidelberg, Germany
- Department of Cell and Molecular Biology, Medical Faculty Mannheim, Heidelberg University, Heidelberg, Germany
| | - Michael Boutros
- Division of Signaling and Functional Genomics, German Cancer Research Center (DKFZ), Heidelberg, Germany
- Department of Cell and Molecular Biology, Medical Faculty Mannheim, Heidelberg University, Heidelberg, Germany
| |
Collapse
|
27
|
Abstract
Gene essentiality is a founding concept of genetics with important implications in both fundamental and applied research. Multiple screens have been performed over the years in bacteria, yeasts, animals and more recently in human cells to identify essential genes. A mounting body of evidence suggests that gene essentiality, rather than being a static and binary property, is both context dependent and evolvable in all kingdoms of life. This concept of a non-absolute nature of gene essentiality changes our fundamental understanding of essential biological processes and could directly affect future treatment strategies for cancer and infectious diseases.
Collapse
|
28
|
Kanhaiya K, Czeizler E, Gratie C, Petre I. Controlling Directed Protein Interaction Networks in Cancer. Sci Rep 2017; 7:10327. [PMID: 28871116 PMCID: PMC5583175 DOI: 10.1038/s41598-017-10491-y] [Citation(s) in RCA: 39] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2017] [Accepted: 08/09/2017] [Indexed: 02/06/2023] Open
Abstract
Control theory is a well-established approach in network science, with applications in bio-medicine and cancer research. We build on recent results for structural controllability of directed networks, which identifies a set of driver nodes able to control an a-priori defined part of the network. We develop a novel and efficient approach for the (targeted) structural controllability of cancer networks and demonstrate it for the analysis of breast, pancreatic, and ovarian cancer. We build in each case a protein-protein interaction network and focus on the survivability-essential proteins specific to each cancer type. We show that these essential proteins are efficiently controllable from a relatively small computable set of driver nodes. Moreover, we adjust the method to find the driver nodes among FDA-approved drug-target nodes. We find that, while many of the drugs acting on the driver nodes are part of known cancer therapies, some of them are not used for the cancer types analyzed here; some drug-target driver nodes identified by our algorithms are not known to be used in any cancer therapy. Overall we show that a better understanding of the control dynamics of cancer through computational modelling can pave the way for new efficient therapeutic approaches and personalized medicine.
Collapse
Affiliation(s)
- Krishna Kanhaiya
- Computational Biomodeling Laboratory, Turku Centre for Computer Science, and Department of Computer Science, Åbo Akademi University, Turku, 20500, Finland
| | - Eugen Czeizler
- Computational Biomodeling Laboratory, Turku Centre for Computer Science, and Department of Computer Science, Åbo Akademi University, Turku, 20500, Finland
- National Institute for Research and Development for Biological Sciences, Bucharest, Romania
| | - Cristian Gratie
- Computational Biomodeling Laboratory, Turku Centre for Computer Science, and Department of Computer Science, Åbo Akademi University, Turku, 20500, Finland
| | - Ion Petre
- Computational Biomodeling Laboratory, Turku Centre for Computer Science, and Department of Computer Science, Åbo Akademi University, Turku, 20500, Finland.
| |
Collapse
|
29
|
Chakravorty S, Hegde M. Gene and Variant Annotation for Mendelian Disorders in the Era of Advanced Sequencing Technologies. Annu Rev Genomics Hum Genet 2017; 18:229-256. [PMID: 28415856 DOI: 10.1146/annurev-genom-083115-022545] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Comprehensive annotations of genetic and noncoding regions and corresponding accurate variant classification for Mendelian diseases are the next big challenge in the new genomic era of personalized medicine. Progress in the development of faster and more accurate pipelines for genome annotation and variant classification will lead to the discovery of more novel disease associations and candidate therapeutic targets. This ultimately will facilitate better patient recruitment in clinical trials. In this review, we describe the trends in research at the intersection of basic and clinical genomics that aims to increase understanding of overall genomic complexity, complex inheritance patterns of disease, and patient-phenotype-specific genomic associations. We describe the emerging field of translational functional genomics, which integrates other functional "-omics" approaches that support next-generation sequencing genomic data in order to facilitate personalized diagnostics, disease management, biomarker discovery, and medicine. We also discuss the utility of this integrated approach for diagnostic clinics and medical databases and its role in the future of personalized medicine.
Collapse
Affiliation(s)
- Samya Chakravorty
- Department of Human Genetics, Emory University School of Medicine, Atlanta, Georgia 30322;
| | - Madhuri Hegde
- Department of Human Genetics, Emory University School of Medicine, Atlanta, Georgia 30322;
| |
Collapse
|
30
|
Housden BE, Muhar M, Gemberling M, Gersbach CA, Stainier DYR, Seydoux G, Mohr SE, Zuber J, Perrimon N. Loss-of-function genetic tools for animal models: cross-species and cross-platform differences. Nat Rev Genet 2016; 18:24-40. [PMID: 27795562 DOI: 10.1038/nrg.2016.118] [Citation(s) in RCA: 125] [Impact Index Per Article: 15.6] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Our understanding of the genetic mechanisms that underlie biological processes has relied extensively on loss-of-function (LOF) analyses. LOF methods target DNA, RNA or protein to reduce or to ablate gene function. By analysing the phenotypes that are caused by these perturbations the wild-type function of genes can be elucidated. Although all LOF methods reduce gene activity, the choice of approach (for example, mutagenesis, CRISPR-based gene editing, RNA interference, morpholinos or pharmacological inhibition) can have a major effect on phenotypic outcomes. Interpretation of the LOF phenotype must take into account the biological process that is targeted by each method. The practicality and efficiency of LOF methods also vary considerably between model systems. We describe parameters for choosing the optimal combination of method and system, and for interpreting phenotypes within the constraints of each method.
Collapse
Affiliation(s)
- Benjamin E Housden
- Department of Genetics, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, Massachusetts 02115, USA
| | - Matthias Muhar
- Research Institute of Molecular Pathology (IMP), Vienna Biocenter (VBC), Vienna 1030, Austria
| | - Matthew Gemberling
- Department of Biomedical Engineering and the Center for Genomic and Computational Biology, Duke University, Durham, North Carolina 27708, USA
| | - Charles A Gersbach
- Department of Biomedical Engineering and the Center for Genomic and Computational Biology, Duke University, Durham, North Carolina 27708, USA
| | - Didier Y R Stainier
- Department of Developmental Genetics, Max Planck Institute for Heart and Lung Research, 43 Ludwigstrasse, Bad Nauheim 61231, Germany
| | - Geraldine Seydoux
- Department of Molecular Biology and Genetics, Johns Hopkins University School of Medicine, 725 North Wolfe Street, Baltimore, Maryland 21218, USA.,Howard Hughes Medical Institute, 725 North Wolfe Street, Baltimore, Maryland 21218, USA
| | - Stephanie E Mohr
- Department of Genetics, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, Massachusetts 02115, USA
| | - Johannes Zuber
- Research Institute of Molecular Pathology (IMP), Vienna Biocenter (VBC), Vienna 1030, Austria
| | - Norbert Perrimon
- Department of Genetics, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, Massachusetts 02115, USA.,Howard Hughes Medical Institute, 77 Avenue Louis Pasteur, Boston, Massachusetts 02115, USA
| |
Collapse
|
31
|
Peterson A. CRISPR: express delivery to any DNA address. Oral Dis 2016; 23:5-11. [PMID: 27040868 DOI: 10.1111/odi.12487] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2016] [Accepted: 03/25/2016] [Indexed: 12/26/2022]
Abstract
The sudden emergence and worldwide adoption of CRISPR gene-editing technology confronts humanity with unprecedented opportunities and choices. CRISPR's transformative impact on our future understanding of biology, along with its potential to unleash control over the most fundamental of biological processes, is predictable by already achieved applications. Although its origin, composition, and function were revealed only recently, close to 3000 CRISPR-based publications have appeared including insightful and diversely focused reviews referenced here. Adding further to scientific and public awareness, a recent symposium addressed the ethical implications of interfacing CRISPR technology and human biology. However, the magnitude of CRISPR's rapidly emerging power mandates its broadest assessment. Only with the participation of a diverse and informed community can the most effective and humanity-positive CRISPR applications be defined. This brief review is aimed at those with little previous exposure to the CRISPR revolution. The molecules that constitute CRISPR's core components and their functional organization are described along with how the mechanism has been harnessed to edit genome structure and modulate gene function. Additionally, a glimpse into CRISPR's potential to unleash genetic changes with far-reaching consequences is presented.
Collapse
Affiliation(s)
- A Peterson
- Laboratory of Developmental Biology, Departments of Oncology, Human Genetics, Neurology & Neurosurgery, McGill University, Montreal, QC, Canada
| |
Collapse
|
32
|
Czeizler E, Gratie C, Chiu WK, Kanhaiya K, Petre I. Target Controllability of Linear Networks. COMPUTATIONAL METHODS IN SYSTEMS BIOLOGY 2016. [DOI: 10.1007/978-3-319-45177-0_5] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|