1
|
Landon S, Chalkley O, Breese G, Grierson C, Marucci L. Understanding Metabolic Flux Behaviour in Whole-Cell Model Output. Front Mol Biosci 2021; 8:732079. [PMID: 34977150 PMCID: PMC8718694 DOI: 10.3389/fmolb.2021.732079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2021] [Accepted: 10/28/2021] [Indexed: 11/30/2022] Open
Abstract
Whole-cell modelling is a newly expanding field that has many applications in lab experiment design and predictive drug testing. Although whole-cell model output contains a wealth of information, it is complex and high dimensional and thus hard to interpret. Here, we present an analysis pipeline that combines machine learning, dimensionality reduction, and network analysis to interpret and visualise metabolic reaction fluxes from a set of single gene knockouts simulated in the Mycoplasma genitalium whole-cell model. We found that the reaction behaviours show trends that correlate with phenotypic classes of the simulation output, highlighting particular cellular subsystems that malfunction after gene knockouts. From a graphical representation of the metabolic network, we saw that there is a set of reactions that can be used as markers of a phenotypic class, showing their importance within the network. Our analysis pipeline can support the understanding of the complexity of in silico cells without detailed knowledge of the constituent parts, which can help to understand the effects of gene knockouts and, as whole-cell models become more widely built and used, aid genome design.
Collapse
Affiliation(s)
- Sophie Landon
- BrisSynBio, University of Bristol, Bristol, United Kingdom
- Department of Engineering Mathematics, University of Bristol, Bristol, United Kingdom
| | - Oliver Chalkley
- BrisSynBio, University of Bristol, Bristol, United Kingdom
- Department of Engineering Mathematics, University of Bristol, Bristol, United Kingdom
- Bristol Centre for Complexity Science, Department of Engineering Mathematics, University of Bristol, Bristol, United Kingdom
| | - Gus Breese
- Department of Engineering Mathematics, University of Bristol, Bristol, United Kingdom
| | - Claire Grierson
- BrisSynBio, University of Bristol, Bristol, United Kingdom
- School of Biological Sciences, University of Bristol, Bristol, United Kingdom
| | - Lucia Marucci
- BrisSynBio, University of Bristol, Bristol, United Kingdom
- Department of Engineering Mathematics, University of Bristol, Bristol, United Kingdom
- School of Cellular and Molecular Medicine, University of Bristol, Bristol, United Kingdom
| |
Collapse
|
2
|
An Evaluation of Machine Learning Approaches for the Prediction of Essential Genes in Eukaryotes Using Protein Sequence-Derived Features. Comput Struct Biotechnol J 2019; 17:785-796. [PMID: 31312416 PMCID: PMC6607062 DOI: 10.1016/j.csbj.2019.05.008] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2019] [Revised: 05/23/2019] [Accepted: 05/26/2019] [Indexed: 12/23/2022] Open
Abstract
The availability of whole-genome sequences and associated multi-omics data sets, combined with advances in gene knockout and knockdown methods, has enabled large-scale annotation and exploration of gene and protein functions in eukaryotes. Knowing which genes are essential for the survival of eukaryotic organisms is paramount for an understanding of the basic mechanisms of life, and could assist in identifying intervention targets in eukaryotic pathogens and cancer. Here, we studied essential gene orthologs among selected species of eukaryotes, and then employed a systematic machine-learning approach, using protein sequence-derived features and selection procedures, to investigate essential gene predictions within and among species. We showed that the numbers of essential gene orthologs comprise small fractions when compared with the total number of orthologs among the eukaryotic species studied. In addition, we demonstrated that machine-learning models trained with subsets of essentiality-related data performed better than random guessing of gene essentiality for a particular species. Consistent with our gene ortholog analysis, the predictions of essential genes among multiple (including distantly-related) species is possible, yet challenging, suggesting that most essential genes are unique to a species. The present work provides a foundation for the expansion of genome-wide essentiality investigations in eukaryotes using machine learning approaches.
Collapse
Key Words
- CRISPR, Clustered regularly interspaced short palindromic repeats
- Essential genes
- Essentiality prediction
- Eukaryotes
- GBM, Gradient boosting method
- GI, Genetic interaction
- GLM, Generalised linear model
- GO, Gene ontology
- ML, Machine-learning
- Machine-learning
- NN, Artificial neural network
- OGEE, Online GEne essentiality database
- PPI, Protein-protein interaction
- PR-AUC, Area under the precision-recall curve
- RF, Random Forest
- RNAi, RNA interference
- ROC-AUC, Area under the receiver operating characteristic curve
- SPLS, Sparse partial least squares
- SVM, Support-Vector machine
Collapse
|
3
|
Kabir M, Barradas A, Tzotzos GT, Hentges KE, Doig AJ. Properties of genes essential for mouse development. PLoS One 2017; 12:e0178273. [PMID: 28562614 PMCID: PMC5451031 DOI: 10.1371/journal.pone.0178273] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2017] [Accepted: 05/10/2017] [Indexed: 12/20/2022] Open
Abstract
Essential genes are those that are critical for life. In the specific case of the mouse, they are the set of genes whose deletion means that a mouse is unable to survive after birth. As such, they are the key minimal set of genes needed for all the steps of development to produce an organism capable of life ex utero. We explored a wide range of sequence and functional features to characterise essential (lethal) and non-essential (viable) genes in mice. Experimental data curated manually identified 1301 essential genes and 3451 viable genes. Very many sequence features show highly significant differences between essential and viable mouse genes. Essential genes generally encode complex proteins, with multiple domains and many introns. These genes tend to be: long, highly expressed, old and evolutionarily conserved. These genes tend to encode ligases, transferases, phosphorylated proteins, intracellular proteins, nuclear proteins, and hubs in protein-protein interaction networks. They are involved with regulating protein-protein interactions, gene expression and metabolic processes, cell morphogenesis, cell division, cell proliferation, DNA replication, cell differentiation, DNA repair and transcription, cell differentiation and embryonic development. Viable genes tend to encode: membrane proteins or secreted proteins, and are associated with functions such as cellular communication, apoptosis, behaviour and immune response, as well as housekeeping and tissue specific functions. Viable genes are linked to transport, ion channels, signal transduction, calcium binding and lipid binding, consistent with their location in membranes and involvement with cell-cell communication. From the analysis of the composite features of essential and viable genes, we conclude that essential genes tend to be required for intracellular functions, and viable genes tend to be involved with extracellular functions and cell-cell communication. Knowledge of the features that are over-represented in essential genes allows for a deeper understanding of the functions and processes implemented during mammalian development.
Collapse
Affiliation(s)
- Mitra Kabir
- Faculty of Biology, Medicine, and Health, University of Manchester, Manchester, United Kingdom
- Manchester Institute of Biotechnology and Department of Chemistry, Faculty of Science and Engineering, The University of Manchester, Manchester, United Kingdom
| | - Ana Barradas
- Faculty of Biology, Medicine, and Health, University of Manchester, Manchester, United Kingdom
| | - George T. Tzotzos
- Department of Agriculture, Food and Environmental Sciences, Marche Polytechnic University, Ancona, Italy
| | - Kathryn E. Hentges
- Faculty of Biology, Medicine, and Health, University of Manchester, Manchester, United Kingdom
| | - Andrew J. Doig
- Manchester Institute of Biotechnology and Department of Chemistry, Faculty of Science and Engineering, The University of Manchester, Manchester, United Kingdom
| |
Collapse
|
4
|
Huang X, Xu J, Chen L, Wang Y, Gu X, Peng X, Yang G. Analysis of transcriptome data reveals multifactor constraint on codon usage in Taenia multiceps. BMC Genomics 2017; 18:308. [PMID: 28427327 PMCID: PMC5397707 DOI: 10.1186/s12864-017-3704-8] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2016] [Accepted: 04/12/2017] [Indexed: 12/04/2022] Open
Abstract
Background Codon usage bias (CUB) is an important evolutionary feature in genomes that has been widely observed in many organisms. However, the synonymous codon usage pattern in the genome of T. multiceps remains to be clarified. In this study, we analyzed the codon usage of T. multiceps based on the transcriptome data to reveal the constraint factors and to gain an improved understanding of the mechanisms that shape synonymous CUB. Results Analysis of a total of 8,620 annotated mRNA sequences from T. multiceps indicated only a weak codon bias, with mean GC and GC3 content values of 49.29% and 51.43%, respectively. Our analysis indicated that nucleotide composition, mutational pressure, natural selection, gene expression level, amino acids with grand average of hydropathicity (GRAVY) and aromaticity (Aromo) and the effective selection of amino-acids all contributed to the codon usage in T. multiceps. Among these factors, natural selection was implicated as the major factor affecting the codon usage variation in T. multiceps. The codon usage of ribosome genes was affected mainly by mutations, while the essential genes were affected mainly by selection. In addition, 21codons were identified as “optimal codons”. Overall, the optimal codons were GC-rich (GC:AU, 41:22), and ended with G or C (except CGU). Furthermore, different degrees of variation in codon usage were found between T. multiceps and Escherichia coli, yeast, Homo sapiens. However, little difference was found between T. multiceps and Taenia pisiformis. Conclusions In this study, the codon usage pattern of T. multiceps was analyzed systematically and factors affected CUB were also identified. This is the first study of codon biology in T. multiceps. Understanding the codon usage pattern in T. multiceps can be helpful for the discovery of new genes, molecular genetic engineering and evolutionary studies. Electronic supplementary material The online version of this article (doi:10.1186/s12864-017-3704-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Xing Huang
- Department of Parasitology, College of Veterinary Medicine, Sichuan Agricultural University, Chengdu, 611130, China.,Chengdu Agricultural College, Chengdu, 611130, China
| | - Jing Xu
- Department of Parasitology, College of Veterinary Medicine, Sichuan Agricultural University, Chengdu, 611130, China
| | - Lin Chen
- Meat-processing Application Key Laboratory of Sichuan Province, College of Pharmacy and Biological Engineering, Chengdu University, Chengdu, 610106, China
| | - Yu Wang
- Department of Parasitology, College of Veterinary Medicine, Sichuan Agricultural University, Chengdu, 611130, China
| | - Xiaobin Gu
- Department of Parasitology, College of Veterinary Medicine, Sichuan Agricultural University, Chengdu, 611130, China
| | - Xuerong Peng
- College of Science, Sichuan Agricultural University, Ya'an, 625014, China
| | - Guangyou Yang
- Department of Parasitology, College of Veterinary Medicine, Sichuan Agricultural University, Chengdu, 611130, China.
| |
Collapse
|
5
|
An Approach for Predicting Essential Genes Using Multiple Homology Mapping and Machine Learning Algorithms. BIOMED RESEARCH INTERNATIONAL 2016; 2016:7639397. [PMID: 27660763 PMCID: PMC5021884 DOI: 10.1155/2016/7639397] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/30/2016] [Revised: 07/25/2016] [Accepted: 08/04/2016] [Indexed: 11/17/2022]
Abstract
Investigation of essential genes is significant to comprehend the minimal gene sets of cell and discover potential drug targets. In this study, a novel approach based on multiple homology mapping and machine learning method was introduced to predict essential genes. We focused on 25 bacteria which have characterized essential genes. The predictions yielded the highest area under receiver operating characteristic (ROC) curve (AUC) of 0.9716 through tenfold cross-validation test. Proper features were utilized to construct models to make predictions in distantly related bacteria. The accuracy of predictions was evaluated via the consistency of predictions and known essential genes of target species. The highest AUC of 0.9552 and average AUC of 0.8314 were achieved when making predictions across organisms. An independent dataset from Synechococcus elongatus, which was released recently, was obtained for further assessment of the performance of our model. The AUC score of predictions is 0.7855, which is higher than other methods. This research presents that features obtained by homology mapping uniquely can achieve quite great or even better results than those integrated features. Meanwhile, the work indicates that machine learning-based method can assign more efficient weight coefficients than using empirical formula based on biological knowledge.
Collapse
|
6
|
Mehdizadeh Aghdam E, Hejazi MS, Barzegar A. Riboswitches: From living biosensors to novel targets of antibiotics. Gene 2016; 592:244-59. [PMID: 27432066 DOI: 10.1016/j.gene.2016.07.035] [Citation(s) in RCA: 52] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2016] [Revised: 07/11/2016] [Accepted: 07/14/2016] [Indexed: 12/24/2022]
Abstract
Riboswitches are generally located in 5'-UTR region of mRNAs and specifically bind small ligands. Following ligand binding, gene expression is controlled mostly by transcription termination, translation inhibition or mRNA degradation processes. More than 30 classes of known riboswitches have been identified by now. Most riboswitches consist of an aptamer domain and an expression platform. The aptamer domain of each class of riboswitch is a conserved structure and stabilizes specific structures of the expression platforms through binding to specific compounds. In this review, we are highlighting most aspects of riboswitch research including the novel riboswitch discoveries, routine methods for discovering and investigating riboswitches along with newly discovered classes and mechanistic principles of riboswitch-mediated gene expression control. Moreover, we will give an overview about the potential of riboswitches as therapeutic targets for antibiotic design and also their utilization as biosensors for molecular detection.
Collapse
Affiliation(s)
- Elnaz Mehdizadeh Aghdam
- Department of Pharmaceutical Biotechnology, Faculty of Pharmacy, Tabriz University of Medical Sciences, Tabriz, Iran; Molecular Medicine Research Center, Tabriz University of Medical Sciences, Tabriz, Iran; Student Research Committee, Tabriz University of Medical Sciences, Tabriz, Iran.
| | - Mohammad Saeid Hejazi
- Department of Pharmaceutical Biotechnology, Faculty of Pharmacy, Tabriz University of Medical Sciences, Tabriz, Iran; Molecular Medicine Research Center, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Abolfazl Barzegar
- Research Institute for Fundamental Sciences (RIFS), University of Tabriz, Tabriz, Iran; The School of Advanced Biomedical Sciences (SABS), Tabriz University of Medical Sciences, Tabriz, Iran
| |
Collapse
|
7
|
Abstract
Essential genes are indispensable for the target organism's survival. Large-scale identification and characterization of essential genes has shown to be beneficial in both fundamental biology and medicine fields. Current existing genome-scale experimental screenings of essential genes are time consuming and costly, also sometimes confer erroneous essential gene annotations. To circumvent these difficulties, many research groups turn to computational approaches as the alternative to identify essential genes. Here, we developed an integrative machine-learning based statistical framework to accurately predict essential genes in microorganisms. First we extracted a variety of relevant features derived from different aspects of an organism's genomic sequences. Then we selected a subset of features have high predictive power of gene essentiality through a carefully designed feature selection system. Using the selected features as input, we constructed an ensemble classifier and trained the model on a well-studied microorganism. After fine-tuning the model parameters in cross-validation, we tested the model on the other microorganism. We found that the tenfold cross-validation results within the same organism achieves a high predictive accuracy (AUC ~0.9), and cross-organism predictions between distant related organisms yield the AUC scores from 0.69 to 0.89, which significantly outperformed homology mapping.
Collapse
|
8
|
Grazziotin AL, Vidal NM, Venancio TM. Uncovering major genomic features of essential genes in Bacteria and a methanogenic Archaea. FEBS J 2015; 282:3395-3411. [PMID: 26084810 DOI: 10.1111/febs.13350] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2015] [Revised: 06/02/2015] [Accepted: 06/15/2015] [Indexed: 12/19/2022]
Abstract
Identification of essential genes is critical to understanding the physiology of a species, proposing novel drug targets and uncovering minimal gene sets required for life. Although essential gene sets of several organisms have been determined using large-scale mutagenesis techniques, systematic studies addressing their conservation, genomic context and functions remain scant. Here we integrate 17 essential gene sets from genome-wide in vitro screenings and three gene collections required for growth in vivo, encompassing 15 Bacteria and one Archaea. We refine and generalize important theories proposed using Escherichia coli. Essential genes are typically monogenic and more conserved than nonessential genes. Genes required in vivo are less conserved than those essential in vitro, suggesting that more divergent strategies are deployed when the organism is stressed by the host immune system and unstable nutrient availability. We identified essential analogous pathways that would probably be missed by orthology-based essentiality prediction strategies. For example, Streptococcus sanguinis carries horizontally transferred isoprenoid biosynthesis genes that are widespread in Archaea. Genes specifically essential in Mycobacterium tuberculosis and Burkholderia pseudomallei are reported as potential drug targets. Moreover, essential genes are not only preferentially located in operons, but also occupy the first position therein, supporting the influence of their regulatory regions in driving transcription of whole operons. Finally, these important genomic features are shared between Bacteria and at least one Archaea, suggesting that high order properties of gene essentiality and genome architecture were probably present in the last universal common ancestor or evolved independently in the prokaryotic domains.
Collapse
Affiliation(s)
- Ana Laura Grazziotin
- Laboratório de Química e Função de Proteínas e Peptídeos, Centro de Biociências e Biotecnologia, Universidade Estadual do Norte Fluminense Darcy Ribeiro, Campos dos Goytacazes, RJ, Brazil.,National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Newton Medeiros Vidal
- Laboratório de Química e Função de Proteínas e Peptídeos, Centro de Biociências e Biotecnologia, Universidade Estadual do Norte Fluminense Darcy Ribeiro, Campos dos Goytacazes, RJ, Brazil.,National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Thiago Motta Venancio
- Laboratório de Química e Função de Proteínas e Peptídeos, Centro de Biociências e Biotecnologia, Universidade Estadual do Norte Fluminense Darcy Ribeiro, Campos dos Goytacazes, RJ, Brazil
| |
Collapse
|
9
|
A Pipeline for Screening Small Molecules with Growth Inhibitory Activity against Burkholderia cenocepacia. PLoS One 2015; 10:e0128587. [PMID: 26053039 PMCID: PMC4460083 DOI: 10.1371/journal.pone.0128587] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2015] [Accepted: 04/28/2015] [Indexed: 12/02/2022] Open
Abstract
Infections with the bacteria Burkholderia cepacia complex (Bcc) are very difficult to eradicate in cystic fibrosis patients due the intrinsic resistance of Bcc to most available antibiotics and the emergence of multiple antibiotic resistant strains during antibiotic treatment. In this work, we used a whole-cell based assay to screen a diverse collection of small molecules for growth inhibitors of a relevant strain of Bcc, B. cenocepacia K56-2. The primary screen used bacterial growth in 96-well plate format and identified 206 primary actives among 30,259 compounds. From 100 compounds with no previous record of antibacterial activity secondary screening and data mining selected a total of Bce bioactives that were further analyzed. An experimental pipeline, evaluating in vitro antibacterial and antibiofilm activity, toxicity and in vivo antibacterial activity using C. elegans was used for prioritizing compounds with better chances to be further investigated as potential Bcc antibacterial drugs. This high throughput screen, along with the in vitro and in vivo analysis highlights the utility of this experimental method to quickly identify bioactives as a starting point of antibacterial drug discovery.
Collapse
|
10
|
Abstract
Essential genes are those genes indispensable for the survival of any living cell. Bacterial essential genes constitute the cornerstones of synthetic biology and are often attractive targets in the development of antibiotics and vaccines. Because identification of essential genes with wet-lab ways often means expensive economic costs and tremendous labor, scientists changed to seek for alternative way of computational prediction. Aiming to help to solve this issue, our research group (CEFG: group of Computational, Comparative, Evolutionary and Functional Genomics, http://cefg.uestc.edu.cn) has constructed three online services to predict essential genes in bacterial genomes. These freely available tools are applicable for single gene sequences without annotated functions, single genes with definite names, and complete genomes of bacterial strains. To ensure reliable predictions, the investigated species should belong to the same family (for EGP) or phylum (for CEG_Match and Geptop) with one of the reference species, respectively. As the pilot software for the issue, predicting accuracies of them have been assessed and compared with existing algorithms, and note that all of other published algorithms have not any formed online services. We hope these services at CEFG will help scientists and researchers in the field of essential genes.
Collapse
|
11
|
Paik H, Heo HS, Ban HJ, Cho SB. Unraveling human protein interaction networks underlying co-occurrences of diseases and pathological conditions. J Transl Med 2014; 12:99. [PMID: 24731539 PMCID: PMC4021415 DOI: 10.1186/1479-5876-12-99] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2014] [Accepted: 04/03/2014] [Indexed: 12/18/2022] Open
Abstract
Background Human diseases frequently cause complications such as obesity-induced diabetes and share numbers of pathological conditions, such as inflammation, by dysfunctions of common functional modules, such as protein–protein interactions (PPIs). Methods Our developed pipeline, ICod (Interaction analysis for disease Comorbidity), grades similarities between pairs of disease-related PPIs including comorbid diseases and pathological conditions. ICod displayed a disease similarity network consisting of nodes of disease PPIs and edges of similarity value. As a proof of concept, eight complex diseases and pathological conditions, such as type 2 diabetes, obesity, inflammation, and cancers, were examined to discover whether PPIs shared between diseases were associated with comorbidities. Results By comparing Medicare reports of disease co-occurrences from 31 million patients, the disease similarity network shows that PPIs of pathological conditions, including insulin resistance, and inflammation, overlap significantly with PPIs of various comorbid diseases, including diabetes, obesity, and cancers (p < 0.05). Interestingly, maintaining connectivity between essential genes was more drastically perturbed by removing a node of a disease-related gene rather than a pathological condition-related gene, such as one related to inflammations. Conclusion Thus, PPIs of pathological symptoms are underlying functional modules across diseases accompanying comorbidity phenomena, whereas they contribute only marginally to maintaining interactions between essential genes.
Collapse
Affiliation(s)
| | | | | | - Seong Beom Cho
- Division of Bio-Medical Informatics, Center for Genome Science, National Institute of Health, OHTAC, 187 Osongsaengmyeong2(i)-ro, Gangoe-myeon, Cheongwon-gun, ChoongchungBuk-do, South Korea.
| |
Collapse
|
12
|
Abstract
The increasing emergence of antimicrobial multiresistant bacteria is of great concern to public health. While these bacteria are becoming an ever more prominent cause of nosocomial and community-acquired infections worldwide, the antibiotic discovery pipeline has been stalled in the last few years with very few efforts in the research and development of novel antibacterial therapies. Some of the root causes that have hampered current antibiotic drug development are the lack of understanding of the mode of action (MOA) of novel antibiotic molecules and the poor characterization of the bacterial physiological response to antibiotics that ultimately causes resistance. Here, we review how bacterial genetic tools can be applied at the genomic level with the goal of profiling resistance to antibiotics and elucidating antibiotic MOAs. Specifically, we highlight how chemical genomic detection of the MOA of novel antibiotic molecules and antibiotic profiling by next-generation sequencing are leveraging basic antibiotic research to unprecedented levels with great opportunities for knowledge translation.
Collapse
Affiliation(s)
- Silvia T Cardona
- a Department of Microbiology , University of Manitoba , Winnipeg , Canada and.,b Department of Medical Microbiology & Infectious Disease , University of Manitoba , Winnipeg , Canada
| | - Carrie Selin
- a Department of Microbiology , University of Manitoba , Winnipeg , Canada and
| | - April S Gislason
- a Department of Microbiology , University of Manitoba , Winnipeg , Canada and
| |
Collapse
|
13
|
Lu Y, Deng J, Rhodes JC, Lu H, Lu LJ. Predicting essential genes for identifying potential drug targets in Aspergillus fumigatus. Comput Biol Chem 2014; 50:29-40. [PMID: 24569026 DOI: 10.1016/j.compbiolchem.2014.01.011] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/23/2013] [Indexed: 12/31/2022]
Abstract
BACKGROUND Aspergillus fumigatus (Af) is a ubiquitous and opportunistic pathogen capable of causing acute, invasive pulmonary disease in susceptible hosts. Despite current therapeutic options, mortality associated with invasive Af infections remains unacceptably high, increasing 357% since 1980. Therefore, there is an urgent need for the development of novel therapeutic strategies, including more efficacious drugs acting on new targets. Thus, as noted in a recent review, "the identification of essential genes in fungi represents a crucial step in the development of new antifungal drugs". Expanding the target space by rapidly identifying new essential genes has thus been described as "the most important task of genomics-based target validation". RESULTS In previous research, we were the first to show that essential gene annotation can be reliably transferred between distantly related four Prokaryotic species. In this study, we extend our machine learning approach to the much more complex Eukaryotic fungal species. A compendium of essential genes is predicted in Af by transferring known essential gene annotations from another filamentous fungus Neurospora crassa. This approach predicts essential genes by integrating diverse types of intrinsic and context-dependent genomic features encoded in microbial genomes. The predicted essential datasets contained 1674 genes. We validated our results by comparing our predictions with known essential genes in Af, comparing our predictions with those predicted by homology mapping, and conducting conditional expressed alleles. We applied several layers of filters and selected a set of potential drug targets from the predicted essential genes. Finally, we have conducted wet lab knockout experiments to verify our predictions, which further validates the accuracy and wide applicability of the machine learning approach. CONCLUSIONS The approach presented here significantly extended our ability to predict essential genes beyond orthologs and made it possible to predict an inventory of essential genes in Eukaryotic fungal species, amongst which a preferred subset of suitable drug targets may be selected. By selecting the best new targets, we believe that resultant drugs would exhibit an unparalleled clinical impact against a naive pathogen population. Additional benefits that a compendium of essential genes can provide are important information on cell function and evolutionary biology. Furthermore, mapping essential genes to pathways may also reveal critical check points in the pathogen's metabolism. Finally, this approach is highly reproducible and portable, and can be easily applied to predict essential genes in many more pathogenic microbes, especially those unculturable.
Collapse
Affiliation(s)
- Yao Lu
- Shanghai Institute of Medical Genetics, Shanghai Children's Hospital, Shanghai Jiao Tong University, 24/1400 Beijing (W) Road, Shanghai 200040, PR China
| | - Jingyuan Deng
- Division of Biomedical Informatics, Cincinnati Children's Hospital Research Foundation, 3333 Burnet Avenue, MLC7024, Cincinnati, OH 45229, USA
| | - Judith C Rhodes
- Department of Pathology and Laboratory Medicine, University of Cincinnati, 2600 Clifton Avenue, Cincinnati, OH 45221, USA
| | - Hui Lu
- Shanghai Institute of Medical Genetics, Shanghai Children's Hospital, Shanghai Jiao Tong University, 24/1400 Beijing (W) Road, Shanghai 200040, PR China; Department of Bioengineering (MC 063), University of Illinois at Chicago, 851 S Morgan St, 218 SEO, Chicago, IL 60607, USA.
| | - Long Jason Lu
- Division of Biomedical Informatics, Cincinnati Children's Hospital Research Foundation, 3333 Burnet Avenue, MLC7024, Cincinnati, OH 45229, USA; Division of Epidemiology and Biostatistics, Cincinnati Children's Hospital Research Foundation, 3333 Burnet Avenue, MLC7024, Cincinnati, OH 45229, USA; Department of Computer Science, University of Cincinnati, 2600 Clifton Avenue, Cincinnati, OH 45221, USA; Department of Environmental Health, University of Cincinnati, 2600 Clifton Avenue, Cincinnati, OH 45221, USA; Department of Biomedical Engineering, University of Cincinnati, 2600 Clifton Avenue, Cincinnati, OH 45221, USA.
| |
Collapse
|
14
|
Panjkovich A, Gibert I, Daura X. antibacTR: dynamic antibacterial-drug-target ranking integrating comparative genomics, structural analysis and experimental annotation. BMC Genomics 2014; 15:36. [PMID: 24438389 PMCID: PMC3932961 DOI: 10.1186/1471-2164-15-36] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2013] [Accepted: 01/11/2014] [Indexed: 01/22/2023] Open
Abstract
BACKGROUND Development of novel antibacterial drugs is both an urgent healthcare necessity and a partially neglected field. The last decades have seen a substantial decrease in the discovery of novel antibiotics, which combined with the recent thrive of multi-drug-resistant pathogens have generated a scenario of general concern. The procedures involved in the discovery and development of novel antibiotics are economically challenging, time consuming and lack any warranty of success. Furthermore, the return-on-investment for an antibacterial drug is usually marginal when compared to other therapeutics, which in part explains the decrease of private investment. RESULTS In this work we present antibacTR, a computational pipeline designed to aid researchers in the selection of potential drug targets, one of the initial steps in antibacterial-drug discovery. The approach was designed and implemented as part of two publicly funded initiatives aimed at discovering novel antibacterial targets, mechanisms and drugs for a priority list of Gram-negative pathogens: Acinetobacter baumannii, Escherichia coli, Helicobacter pylori, Pseudomonas aeruginosa and Stenotrophomonas maltophilia. However, at present this list has been extended to cover a total of 74 fully sequenced Gram-negative pathogens. antibacTR is based on sequence comparisons and queries to multiple databases (e.g. gene essentiality, virulence factors) to rank proteins according to their potential as antibacterial targets. The dynamic ranking of potential drug targets can easily be executed, customized and accessed by the user through a web interface which also integrates computational analyses performed in-house and visualizable on-site. These include three-dimensional modeling of protein structures and prediction of active sites among other functionally relevant ligand-binding sites. CONCLUSIONS Given its versatility and ease-of-use at integrating both experimental annotation and computational analyses, antibacTR may effectively assist microbiologists, medicinal-chemists and other researchers working in the field of antibacterial drug-discovery. The public web-interface for antibacTR is available at 'http://bioinf.uab.cat/antibactr'.
Collapse
Affiliation(s)
| | | | - Xavier Daura
- Institute of Biotechnology and Biomedicine (IBB), Universitat Autònoma de Barcelona (UAB), 08193 Bellaterra, Spain.
| |
Collapse
|
15
|
Hwang KB, Ha BY, Ju S, Kim S. Partial AUC maximization for essential gene prediction using genetic algorithms. BMB Rep 2013; 46:41-6. [PMID: 23351383 PMCID: PMC4133830 DOI: 10.5483/bmbrep.2013.46.1.159] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open
Abstract
Identifying genes indispensable for an organism‘s life and their characteristics is one of the central questions in current biological research, and hence it would be helpful to develop computational approaches towards the prediction of essential genes. The performance of a predictor is usually measured by the area under the receiver operating characteristic curve (AUC). We propose a novel method by implementing genetic algorithms to maximize the partial AUC that is restricted to a specific interval of lower false positive rate (FPR), the region relevant to follow-up experimental validation. Our predictor uses various features based on sequence information, proteinprotein interaction network topology, and gene expression profiles. A feature selection wrapper was developed to alleviate the over-fitting problem and to weigh each feature’s relevance to prediction. We evaluated our method using the proteome of budding yeast. Our implementation of genetic algorithms maximizing the partial AUC below 0.05 or 0.10 of FPR outperformed other popular classification methods. [BMB Reports 2013; 46(1): 41-46]
Collapse
Affiliation(s)
- Kyu-Baek Hwang
- School of Computer Science and Engineering, Soongsil University, Seoul, Korea
| | | | | | | |
Collapse
|
16
|
Juhas M, Eberl L, Church GM. Essential genes as antimicrobial targets and cornerstones of synthetic biology. Trends Biotechnol 2012; 30:601-7. [DOI: 10.1016/j.tibtech.2012.08.002] [Citation(s) in RCA: 73] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2012] [Revised: 08/02/2012] [Accepted: 08/02/2012] [Indexed: 11/15/2022]
|
17
|
Klein BA, Tenorio EL, Lazinski DW, Camilli A, Duncan MJ, Hu LT. Identification of essential genes of the periodontal pathogen Porphyromonas gingivalis. BMC Genomics 2012; 13:578. [PMID: 23114059 PMCID: PMC3547785 DOI: 10.1186/1471-2164-13-578] [Citation(s) in RCA: 114] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2012] [Accepted: 10/24/2012] [Indexed: 01/09/2023] Open
Abstract
Background Porphyromonas gingivalis is a Gram-negative anaerobic bacterium associated with periodontal disease onset and progression. Genetic tools for the manipulation of bacterial genomes allow for in-depth mechanistic studies of metabolism, physiology, interspecies and host-pathogen interactions. Analysis of the essential genes, protein-coding sequences necessary for survival of P. gingivalis by transposon mutagenesis has not previously been attempted due to the limitations of available transposon systems for the organism. We adapted a Mariner transposon system for mutagenesis of P. gingivalis and created an insertion mutant library. By analyzing the location of insertions using massively-parallel sequencing technology we used this mutant library to define genes essential for P. gingivalis survival under in vitro conditions. Results In mutagenesis experiments we identified 463 genes in P. gingivalis strain ATCC 33277 that are putatively essential for viability in vitro. Comparing the 463 P. gingivalis essential genes with previous essential gene studies, 364 of the 463 are homologues to essential genes in other species; 339 are shared with more than one other species. Twenty-five genes are known to be essential in P. gingivalis and B. thetaiotaomicron only. Significant enrichment of essential genes within Cluster of Orthologous Groups ‘D’ (cell division), ‘I’ (lipid transport and metabolism) and ‘J’ (translation/ribosome) were identified. Previously, the P. gingivalis core genome was shown to encode 1,476 proteins out of a possible 1,909; 434 of 463 essential genes are contained within the core genome. Thus, for the species P. gingivalis twenty-two, seventy-seven and twenty-three percent of the genome respectively are devoted to essential, core and accessory functions. Conclusions A Mariner transposon system can be adapted to create mutant libraries in P. gingivalis amenable to analysis by next-generation sequencing technologies. In silico analysis of genes essential for in vitro growth demonstrates that although the majority are homologous across bacterial species as a whole, species and strain-specific subsets are apparent. Understanding the putative essential genes of P. gingivalis will provide insights into metabolic pathways and niche adaptations as well as clinical therapeutic strategies.
Collapse
Affiliation(s)
- Brian A Klein
- Department of Molecular Biology and Microbiology, Tufts University Sackler School of Biomedical Sciences, Boston, MA 02111, USA
| | | | | | | | | | | |
Collapse
|
18
|
Klein CC, Cottret L, Kielbassa J, Charles H, Gautier C, Ribeiro de Vasconcelos AT, Lacroix V, Sagot MF. Exploration of the core metabolism of symbiotic bacteria. BMC Genomics 2012; 13:438. [PMID: 22938206 PMCID: PMC3543179 DOI: 10.1186/1471-2164-13-438] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2012] [Accepted: 08/18/2012] [Indexed: 12/01/2022] Open
Abstract
Background A large number of genome-scale metabolic networks is now available for many organisms, mostly bacteria. Previous works on minimal gene sets, when analysing host-dependent bacteria, found small common sets of metabolic genes. When such analyses are restricted to bacteria with similar lifestyles, larger portions of metabolism are expected to be shared and their composition is worth investigating. Here we report a comparative analysis of the small molecule metabolism of symbiotic bacteria, exploring common and variable portions as well as the contribution of different lifestyle groups to the reduction of a common set of metabolic capabilities. Results We found no reaction shared by all the bacteria analysed. Disregarding those with the smallest genomes, we still do not find a reaction core, however we did find a core of biochemical capabilities. While obligate intracellular symbionts have no core of reactions within their group, extracellular and cell-associated symbionts do have a small core composed of disconnected fragments. In agreement with previous findings in Escherichia coli, their cores are enriched in biosynthetic processes whereas the variable metabolisms have similar ratios of biosynthetic and degradation reactions. Conversely, the variable metabolism of obligate intracellular symbionts is enriched in anabolism. Conclusion Even when removing the symbionts with the most reduced genomes, there is no core of reactions common to the analysed symbiotic bacteria. The main reason is the very high specialisation of obligate intracellular symbionts, however, host-dependence alone is not an explanation for such absence. The composition of the metabolism of cell-associated and extracellular bacteria shows that while they have similar needs in terms of the building blocks of their cells, they have to adapt to very distinct environments. On the other hand, in obligate intracellular bacteria, catabolism has largely disappeared, whereas synthetic routes appear to have been selected for depending on the nature of the symbiosis. As more genomes are added, we expect, based on our simulations, that the core of cell-associated and extracellular bacteria continues to diminish, converging to approximately 60 reactions.
Collapse
|
19
|
Juhas M, Stark M, von Mering C, Lumjiaktase P, Crook DW, Valvano MA, Eberl L. High confidence prediction of essential genes in Burkholderia cenocepacia. PLoS One 2012; 7:e40064. [PMID: 22768221 PMCID: PMC3386938 DOI: 10.1371/journal.pone.0040064] [Citation(s) in RCA: 50] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2012] [Accepted: 05/31/2012] [Indexed: 01/01/2023] Open
Abstract
Background Essential genes are absolutely required for the survival of an organism. The identification of essential genes, besides being one of the most fundamental questions in biology, is also of interest for the emerging science of synthetic biology and for the development of novel antimicrobials. New antimicrobial therapies are desperately needed to treat multidrug-resistant pathogens, such as members of the Burkholderia cepacia complex. Methodology/Principal Findings We hypothesize that essential genes may be highly conserved within a group of evolutionary closely related organisms. Using a bioinformatics approach we determined that the core genome of the order Burkholderiales consists of 649 genes. All but two of these identified genes were located on chromosome 1 of Burkholderia cenocepacia. Although many of the 649 core genes of Burkholderiales have been shown to be essential in other bacteria, we were also able to identify a number of novel essential genes present mainly, or exclusively, within this order. The essentiality of some of the core genes, including the known essential genes infB, gyrB, ubiB, and valS, as well as the so far uncharacterized genes BCAL1882, BCAL2769, BCAL3142 and BCAL3369 has been confirmed experimentally in B. cenocepacia. Conclusions/Significance We report on the identification of essential genes using a novel bioinformatics strategy and provide bioinformatics and experimental evidence that the large majority of the identified genes are indeed essential. The essential genes identified here may represent valuable targets for the development of novel antimicrobials and their detailed study may shed new light on the functions required to support life.
Collapse
Affiliation(s)
- Mario Juhas
- Department of Microbiology, Institute of Plant Biology, University of Zurich, Zurich, Switzerland
- * E-mail: (MJ); (LE)
| | - Manuel Stark
- Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, Zurich, Switzerland
| | - Christian von Mering
- Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, Zurich, Switzerland
| | | | - Derrick W. Crook
- Nuffield Department of Clinical Laboratory Sciences, John Radcliffe Hospital, University of Oxford, Oxford, United Kingdom
| | - Miguel A. Valvano
- Department of Microbiology and Immunology, University of Western Ontario, London, Ontario, Canada
| | - Leo Eberl
- Department of Microbiology, Institute of Plant Biology, University of Zurich, Zurich, Switzerland
- * E-mail: (MJ); (LE)
| |
Collapse
|
20
|
Exploring the optimal strategy to predict essential genes in microbes. Biomolecules 2011; 2:1-22. [PMID: 24970124 PMCID: PMC4030871 DOI: 10.3390/biom2010001] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2011] [Revised: 12/16/2011] [Accepted: 12/19/2011] [Indexed: 01/26/2023] Open
Abstract
Accurately predicting essential genes is important in many aspects of biology, medicine and bioengineering. In previous research, we have developed a machine learning based integrative algorithm to predict essential genes in bacterial species. This algorithm lends itself to two approaches for predicting essential genes: learning the traits from known essential genes in the target organism, or transferring essential gene annotations from a closely related model organism. However, for an understudied microbe, each approach has its potential limitations. The first is constricted by the often small number of known essential genes. The second is limited by the availability of model organisms and by evolutionary distance. In this study, we aim to determine the optimal strategy for predicting essential genes by examining four microbes with well-characterized essential genes. Our results suggest that, unless the known essential genes are few, learning from the known essential genes in the target organism usually outperforms transferring essential gene annotations from a related model organism. In fact, the required number of known essential genes is surprisingly small to make accurate predictions. In prokaryotes, when the number of known essential genes is greater than 2% of total genes, this approach already comes close to its optimal performance. In eukaryotes, achieving the same best performance requires over 4% of total genes, reflecting the increased complexity of eukaryotic organisms. Combining the two approaches resulted in an increased performance when the known essential genes are few. Our investigation thus provides key information on accurately predicting essential genes and will greatly facilitate annotations of microbial genomes.
Collapse
|
21
|
Pei L, Schmidt M, Wei W. Synthetic biology: an emerging research field in China. Biotechnol Adv 2011; 29:804-14. [PMID: 21729747 PMCID: PMC3197886 DOI: 10.1016/j.biotechadv.2011.06.008] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2011] [Revised: 05/20/2011] [Accepted: 06/11/2011] [Indexed: 12/27/2022]
Abstract
Synthetic biology is considered as an emerging research field that will bring new opportunities to biotechnology. There is an expectation that synthetic biology will not only enhance knowledge in basic science, but will also have great potential for practical applications. Synthetic biology is still in an early developmental stage in China. We provide here a review of current Chinese research activities in synthetic biology and its different subfields, such as research on genetic circuits, minimal genomes, chemical synthetic biology, protocells and DNA synthesis, using literature reviews and personal communications with Chinese researchers. To meet the increasing demand for a sustainable development, research on genetic circuits to harness biomass is the most pursed research within Chinese researchers. The environmental concerns are driven force of research on the genetic circuits for bioremediation. The research on minimal genomes is carried on identifying the smallest number of genomes needed for engineering minimal cell factories and research on chemical synthetic biology is focused on artificial proteins and expanded genetic code. The research on protocells is more in combination with the research on molecular-scale motors. The research on DNA synthesis and its commercialisation are also reviewed. As for the perspective on potential future Chinese R&D activities, it will be discussed based on the research capacity and governmental policy.
Collapse
Affiliation(s)
- Lei Pei
- Organisation for International Dialogue and Conflict Management, Vienna, Austria.
| | | | | |
Collapse
|
22
|
Juhas M, Eberl L, Glass JI. Essence of life: essential genes of minimal genomes. Trends Cell Biol 2011; 21:562-8. [DOI: 10.1016/j.tcb.2011.07.005] [Citation(s) in RCA: 138] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2011] [Revised: 07/25/2011] [Accepted: 07/27/2011] [Indexed: 11/29/2022]
|
23
|
Deng J, Deng L, Su S, Zhang M, Lin X, Wei L, Minai AA, Hassett DJ, Lu LJ. Investigating the predictability of essential genes across distantly related organisms using an integrative approach. Nucleic Acids Res 2010; 39:795-807. [PMID: 20870748 PMCID: PMC3035443 DOI: 10.1093/nar/gkq784] [Citation(s) in RCA: 84] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023] Open
Abstract
Rapid and accurate identification of new essential genes in under-studied microorganisms will significantly improve our understanding of how a cell works and the ability to re-engineer microorganisms. However, predicting essential genes across distantly related organisms remains a challenge. Here, we present a machine learning-based integrative approach that reliably transfers essential gene annotations between distantly related bacteria. We focused on four bacterial species that have well-characterized essential genes, and tested the transferability between three pairs among them. For each pair, we trained our classifier to learn traits associated with essential genes in one organism, and applied it to make predictions in the other. The predictions were then evaluated by examining the agreements with the known essential genes in the target organism. Ten-fold cross-validation in the same organism yielded AUC scores between 0.86 and 0.93. Cross-organism predictions yielded AUC scores between 0.69 and 0.89. The transferability is likely affected by growth conditions, quality of the training data set and the evolutionary distance. We are thus the first to report that gene essentiality can be reliably predicted using features trained and tested in a distantly related organism. Our approach proves more robust and portable than existing approaches, significantly extending our ability to predict essential genes beyond orthologs.
Collapse
Affiliation(s)
- Jingyuan Deng
- Division of Biomedical Informatics, Cincinnati Children's Hospital Research Foundation, Cincinnati, OH 45229, USA
| | | | | | | | | | | | | | | | | |
Collapse
|