1
|
Rahiminejad S, De Sanctis B, Pevzner P, Mushegian A. Synthetic lethality and the minimal genome size problem. mSphere 2024; 9:e0013924. [PMID: 38904396 PMCID: PMC11288024 DOI: 10.1128/msphere.00139-24] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Accepted: 05/13/2024] [Indexed: 06/22/2024] Open
Abstract
Gene knockout studies suggest that ~300 genes in a bacterial genome and ~1,100 genes in a yeast genome cannot be deleted without loss of viability. These single-gene knockout experiments do not account for negative genetic interactions, when two or more genes can each be deleted without effect, but their joint deletion is lethal. Thus, large-scale single-gene deletion studies underestimate the size of a minimal gene set compatible with cell survival. In yeast Saccharomyces cerevisiae, the viability of all possible deletions of gene pairs (2-tuples), and of some deletions of gene triplets (3-tuples), has been experimentally tested. To estimate the size of a yeast minimal genome from that data, we first established that finding the size of a minimal gene set is equivalent to finding the minimum vertex cover in the lethality (hyper)graph, where the vertices are genes and (hyper)edges connect k-tuples of genes whose joint deletion is lethal. Using the Lovász-Johnson-Chvatal greedy approximation algorithm, we computed the minimum vertex cover of the synthetic-lethal 2-tuples graph to be 1,723 genes. We next simulated the genetic interactions in 3-tuples, extrapolating from the existing triplet sample, and again estimated minimum vertex covers. The size of a minimal gene set in yeast rapidly approaches the size of the entire genome even when considering only synthetic lethalities in k-tuples with small k. In contrast, several studies reported successful experimental reductions of yeast and bacterial genomes by simultaneous deletions of hundreds of genes, without eliciting synthetic lethality. We discuss possible reasons for this apparent contradiction.IMPORTANCEHow can we estimate the smallest number of genes sufficient for a unicellular organism to survive on a rich medium? One approach is to remove genes one at a time and count how many of such deletion strains are unable to grow. However, the single-gene knockout data are insufficient, because joint gene deletions may result in negative genetic interactions, also known as synthetic lethality. We used a technique from graph theory to estimate the size of minimal yeast genome from partial data on synthetic lethality. The number of potential synthetic lethal interactions grows very fast when multiple genes are deleted, revealing a paradoxical contrast with the experimental reductions of yeast genome by ~100 genes, and of bacterial genomes by several hundreds of genes.
Collapse
Affiliation(s)
- Sara Rahiminejad
- Department of Bioengineering, University of California—San Diego, La Jolla, California, USA
| | - Bianca De Sanctis
- Department of Genetics, University of Cambridge, Cambridge, United Kingdom
- Department of Ecology and Evolutionary Biology, University of California—Santa Cruz, Santa Cruz, California, USA
| | - Pavel Pevzner
- Department of Computer Science and Engineering, University of California—San Diego, La Jolla, California, USA
| | - Arcady Mushegian
- Molecular and Cellular Biosciences Division, National Science Foundation, Alexandria, Virginia, USA
- Clare Hall College, Cambridge, United Kingdom
| |
Collapse
|
2
|
Ma S, Su T, Lu X, Qi Q. Bacterial genome reduction for optimal chassis of synthetic biology: a review. Crit Rev Biotechnol 2024; 44:660-673. [PMID: 37380345 DOI: 10.1080/07388551.2023.2208285] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2022] [Revised: 10/13/2022] [Accepted: 02/20/2023] [Indexed: 06/30/2023]
Abstract
Bacteria with streamlined genomes, that harbor full functional genes for essential metabolic networks, are able to synthesize the desired products more effectively and thus have advantages as production platforms in industrial applications. To obtain streamlined chassis genomes, a large amount of effort has been made to reduce existing bacterial genomes. This work falls into two categories: rational and random reduction. The identification of essential gene sets and the emergence of various genome-deletion techniques have greatly promoted genome reduction in many bacteria over the past few decades. Some of the constructed genomes possessed desirable properties for industrial applications, such as: increased genome stability, transformation capacity, cell growth, and biomaterial productivity. The decreased growth and perturbations in physiological phenotype of some genome-reduced strains may limit their applications as optimized cell factories. This review presents an assessment of the advancements made to date in bacterial genome reduction to construct optimal chassis for synthetic biology, including: the identification of essential gene sets, the genome-deletion techniques, the properties and industrial applications of artificially streamlined genomes, the obstacles encountered in constructing reduced genomes, and the future perspectives.
Collapse
Affiliation(s)
- Shuai Ma
- State Key Laboratory of Microbial Technology, Shandong University, Qingdao, P. R. China
| | - Tianyuan Su
- State Key Laboratory of Microbial Technology, Shandong University, Qingdao, P. R. China
| | - Xuemei Lu
- State Key Laboratory of Microbial Technology, Shandong University, Qingdao, P. R. China
| | - Qingsheng Qi
- State Key Laboratory of Microbial Technology, Shandong University, Qingdao, P. R. China
| |
Collapse
|
3
|
Liang Y, Luo H, Lin Y, Gao F. Recent advances in the characterization of essential genes and development of a database of essential genes. IMETA 2024; 3:e157. [PMID: 38868518 PMCID: PMC10989110 DOI: 10.1002/imt2.157] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Accepted: 10/09/2023] [Indexed: 06/14/2024]
Abstract
Over the past few decades, there has been a significant interest in the study of essential genes, which are crucial for the survival of an organism under specific environmental conditions and thus have practical applications in the fields of synthetic biology and medicine. An increasing amount of experimental data on essential genes has been obtained with the continuous development of technological methods. Meanwhile, various computational prediction methods, related databases and web servers have emerged accordingly. To facilitate the study of essential genes, we have established a database of essential genes (DEG), which has become popular with continuous updates to facilitate essential gene feature analysis and prediction, drug and vaccine development, as well as artificial genome design and construction. In this article, we summarized the studies of essential genes, overviewed the relevant databases, and discussed their practical applications. Furthermore, we provided an overview of the main applications of DEG and conducted comprehensive analyses based on its latest version. However, it should be noted that the essential gene is a dynamic concept instead of a binary one, which presents both opportunities and challenges for their future development.
Collapse
Affiliation(s)
| | - Hao Luo
- Department of PhysicsTianjin UniversityTianjinChina
| | - Yan Lin
- Department of PhysicsTianjin UniversityTianjinChina
| | - Feng Gao
- Department of PhysicsTianjin UniversityTianjinChina
- Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education)Tianjin UniversityTianjinChina
- SynBio Research PlatformCollaborative Innovation Center of Chemical Science and Engineering (Tianjin)TianjinChina
| |
Collapse
|
4
|
Aromolaran OT, Isewon I, Adedeji E, Oswald M, Adebiyi E, Koenig R, Oyelade J. Heuristic-enabled active machine learning: A case study of predicting essential developmental stage and immune response genes in Drosophila melanogaster. PLoS One 2023; 18:e0288023. [PMID: 37556452 PMCID: PMC10411809 DOI: 10.1371/journal.pone.0288023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Accepted: 06/18/2023] [Indexed: 08/11/2023] Open
Abstract
Computational prediction of absolute essential genes using machine learning has gained wide attention in recent years. However, essential genes are mostly conditional and not absolute. Experimental techniques provide a reliable approach of identifying conditionally essential genes; however, experimental methods are laborious, time and resource consuming, hence computational techniques have been used to complement the experimental methods. Computational techniques such as supervised machine learning, or flux balance analysis are grossly limited due to the unavailability of required data for training the model or simulating the conditions for gene essentiality. This study developed a heuristic-enabled active machine learning method based on a light gradient boosting model to predict essential immune response and embryonic developmental genes in Drosophila melanogaster. We proposed a new sampling selection technique and introduced a heuristic function which replaces the human component in traditional active learning models. The heuristic function dynamically selects the unlabelled samples to improve the performance of the classifier in the next iteration. Testing the proposed model with four benchmark datasets, the proposed model showed superior performance when compared to traditional active learning models (random sampling and uncertainty sampling). Applying the model to identify conditionally essential genes, four novel essential immune response genes and a list of 48 novel genes that are essential in embryonic developmental condition were identified. We performed functional enrichment analysis of the predicted genes to elucidate their biological processes and the result evidence our predictions. Immune response and embryonic development related processes were significantly enriched in the essential immune response and embryonic developmental genes, respectively. Finally, we propose the predicted essential genes for future experimental studies and use of the developed tool accessible at http://heal.covenantuniversity.edu.ng for conditional essentiality predictions.
Collapse
Affiliation(s)
- Olufemi Tony Aromolaran
- Department of Computer & Information Sciences, Covenant University, Ota, Ogun State, Nigeria
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, Nigeria
| | - Itunu Isewon
- Department of Computer & Information Sciences, Covenant University, Ota, Ogun State, Nigeria
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, Nigeria
| | - Eunice Adedeji
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, Nigeria
- Department of Biochemistry, Covenant University, Ota, Ogun State, Nigeria
| | - Marcus Oswald
- Integrated Research and Treatment Center, Center for Sepsis Control and Care (CSCC), Jena University Hospital, Am Klinikum, Jena, Germany
- Institute of Infectious Diseases and Infection Control, Jena University Hospital, Am Klinikum, Jena, Germany
| | - Ezekiel Adebiyi
- Department of Computer & Information Sciences, Covenant University, Ota, Ogun State, Nigeria
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, Nigeria
| | - Rainer Koenig
- Integrated Research and Treatment Center, Center for Sepsis Control and Care (CSCC), Jena University Hospital, Am Klinikum, Jena, Germany
- Institute of Infectious Diseases and Infection Control, Jena University Hospital, Am Klinikum, Jena, Germany
| | - Jelili Oyelade
- Department of Computer & Information Sciences, Covenant University, Ota, Ogun State, Nigeria
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, Nigeria
| |
Collapse
|
5
|
Saxena P, Rauniyar S, Thakur P, Singh RN, Bomgni A, Alaba MO, Tripathi AK, Gnimpieba EZ, Lushbough C, Sani RK. Integration of text mining and biological network analysis: Identification of essential genes in sulfate-reducing bacteria. Front Microbiol 2023; 14:1086021. [PMID: 37125195 PMCID: PMC10133479 DOI: 10.3389/fmicb.2023.1086021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Accepted: 03/23/2023] [Indexed: 05/02/2023] Open
Abstract
The growth and survival of an organism in a particular environment is highly depends on the certain indispensable genes, termed as essential genes. Sulfate-reducing bacteria (SRB) are obligate anaerobes which thrives on sulfate reduction for its energy requirements. The present study used Oleidesulfovibrio alaskensis G20 (OA G20) as a model SRB to categorize the essential genes based on their key metabolic pathways. Herein, we reported a feedback loop framework for gene of interest discovery, from bio-problem to gene set of interest, leveraging expert annotation with computational prediction. Defined bio-problem was applied to retrieve the genes of SRB from literature databases (PubMed, and PubMed Central) and annotated them to the genome of OA G20. Retrieved gene list was further used to enrich protein-protein interaction and was corroborated to the pangenome analysis, to categorize the enriched gene sets and the respective pathways under essential and non-essential. Interestingly, the sat gene (dde_2265) from the sulfur metabolism was the bridging gene between all the enriched pathways. Gene clusters involved in essential pathways were linked with the genes from seleno-compound metabolism, amino acid metabolism, secondary metabolite synthesis, and cofactor biosynthesis. Furthermore, pangenome analysis demonstrated the gene distribution, where 69.83% of the 116 enriched genes were mapped under "persistent," inferring the essentiality of these genes. Likewise, 21.55% of the enriched genes, which involves specially the formate dehydrogenases and metallic hydrogenases, appeared under "shell." Our methodology suggested that semi-automated text mining and network analysis may play a crucial role in deciphering the previously unexplored genes and key mechanisms which can help to generate a baseline prior to perform any experimental studies.
Collapse
Affiliation(s)
- Priya Saxena
- Department of Chemical and Biological Engineering, South Dakota School of Mines and Technology, Rapid City, SD, United States
- Data Driven Material Discovery Center for Bioengineering Innovation, South Dakota School of Mines and Technology, Rapid City, SD, United States
| | - Shailabh Rauniyar
- Department of Chemical and Biological Engineering, South Dakota School of Mines and Technology, Rapid City, SD, United States
- 2-Dimensional Materials for Biofilm Engineering, Science and Technology, South Dakota School of Mines and Technology, Rapid City, SD, United States
| | - Payal Thakur
- Department of Chemical and Biological Engineering, South Dakota School of Mines and Technology, Rapid City, SD, United States
- Data Driven Material Discovery Center for Bioengineering Innovation, South Dakota School of Mines and Technology, Rapid City, SD, United States
| | - Ram Nageena Singh
- Department of Chemical and Biological Engineering, South Dakota School of Mines and Technology, Rapid City, SD, United States
- 2-Dimensional Materials for Biofilm Engineering, Science and Technology, South Dakota School of Mines and Technology, Rapid City, SD, United States
| | - Alain Bomgni
- Department of Biomedical Engineering, University of South Dakota, Sioux Falls, SD, United States
| | - Mathew O. Alaba
- Department of Biomedical Engineering, University of South Dakota, Sioux Falls, SD, United States
| | - Abhilash Kumar Tripathi
- Department of Chemical and Biological Engineering, South Dakota School of Mines and Technology, Rapid City, SD, United States
- 2-Dimensional Materials for Biofilm Engineering, Science and Technology, South Dakota School of Mines and Technology, Rapid City, SD, United States
| | - Etienne Z. Gnimpieba
- Department of Biomedical Engineering, University of South Dakota, Sioux Falls, SD, United States
- *Correspondence: Etienne Z. Gnimpieba,
| | - Carol Lushbough
- Department of Biomedical Engineering, University of South Dakota, Sioux Falls, SD, United States
| | - Rajesh Kumar Sani
- Department of Chemical and Biological Engineering, South Dakota School of Mines and Technology, Rapid City, SD, United States
- Data Driven Material Discovery Center for Bioengineering Innovation, South Dakota School of Mines and Technology, Rapid City, SD, United States
- 2-Dimensional Materials for Biofilm Engineering, Science and Technology, South Dakota School of Mines and Technology, Rapid City, SD, United States
- BuG ReMeDEE Consortium, South Dakota School of Mines and Technology, Rapid City, SD, United States
- Rajesh Kumar Sani,
| |
Collapse
|
6
|
LeBlanc N, Charles TC. Bacterial genome reductions: Tools, applications, and challenges. Front Genome Ed 2022; 4:957289. [PMID: 36120530 PMCID: PMC9473318 DOI: 10.3389/fgeed.2022.957289] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2022] [Accepted: 07/29/2022] [Indexed: 11/16/2022] Open
Abstract
Bacterial cells are widely used to produce value-added products due to their versatility, ease of manipulation, and the abundance of genome engineering tools. However, the efficiency of producing these desired biomolecules is often hindered by the cells’ own metabolism, genetic instability, and the toxicity of the product. To overcome these challenges, genome reductions have been performed, making strains with the potential of serving as chassis for downstream applications. Here we review the current technologies that enable the design and construction of such reduced-genome bacteria as well as the challenges that limit their assembly and applicability. While genomic reductions have shown improvement of many cellular characteristics, a major challenge still exists in constructing these cells efficiently and rapidly. Computational tools have been created in attempts at minimizing the time needed to design these organisms, but gaps still exist in modelling these reductions in silico. Genomic reductions are a promising avenue for improving the production of value-added products, constructing chassis cells, and for uncovering cellular function but are currently limited by their time-consuming construction methods. With improvements to and the creation of novel genome editing tools and in silico models, these approaches could be combined to expedite this process and create more streamlined and efficient cell factories.
Collapse
Affiliation(s)
- Nicole LeBlanc
- Department of Biology, University of Waterloo, Waterloo, ON, Canada
- *Correspondence: Nicole LeBlanc,
| | - Trevor C. Charles
- Department of Biology, University of Waterloo, Waterloo, ON, Canada
- Metagenom Bio Life Science Inc., Waterloo, ON, Canada
| |
Collapse
|
7
|
Chowdhury ZM, Bhattacharjee A, Ahammad I, Hossain MU, Jaber AA, Rahman A, Dev PC, Salimullah M, Keya CA. Exploration of Streptococcus core genome to reveal druggable targets and novel therapeutics against S. pneumoniae. PLoS One 2022; 17:e0272945. [PMID: 35980906 PMCID: PMC9387852 DOI: 10.1371/journal.pone.0272945] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2022] [Accepted: 07/29/2022] [Indexed: 11/18/2022] Open
Abstract
Streptococcus pneumoniae (S. pneumoniae), the major etiological agent of community-acquired pneumonia (CAP) contributes significantly to the global burden of infectious diseases which is getting resistant day by day. Nearly 30% of the S. pneumoniae genomes encode hypothetical proteins (HPs), and better understandings of these HPs in virulence and pathogenicity plausibly decipher new treatments. Some of the HPs are present across many Streptococcus species, systematic assessment of these unexplored HPs will disclose prospective drug targets. In this study, through a stringent bioinformatics analysis of the core genome and proteome of S. pneumoniae PCS8235, we identified and analyzed 28 HPs that are common in many Streptococcus species and might have a potential role in the virulence or pathogenesis of the bacteria. Functional annotations of the proteins were conducted based on the physicochemical properties, subcellular localization, virulence prediction, protein-protein interactions, and identification of essential genes, to find potentially druggable proteins among 28 HPs. The majority of the HPs are involved in bacterial transcription and translation. Besides, some of them were homologs of enzymes, binding proteins, transporters, and regulators. Protein-protein interactions revealed HP PCS8235_RS05845 made the highest interactions with other HPs and also has TRP structural motif along with virulent and pathogenic properties indicating it has critical cellular functions and might go under unconventional protein secretions. The second highest interacting protein HP PCS8235_RS02595 interacts with the Regulator of chromosomal segregation (RocS) which participates in chromosome segregation and nucleoid protection in S. pneumoniae. In this interacting network, 54% of protein members have virulent properties and 40% contain pathogenic properties. Among them, most of these proteins circulate in the cytoplasmic area and have hydrophilic properties. Finally, molecular docking and dynamics simulation demonstrated that the antimalarial drug Artenimol can act as a drug repurposing candidate against HP PCS8235_RS 04650 of S. pneumoniae. Hence, the present study could aid in drugs against S. pneumoniae.
Collapse
Affiliation(s)
| | | | - Ishtiaque Ahammad
- Bioinformatics Division, National Institute of Biotechnology, Dhaka, Bangladesh
| | | | - Abdullah All Jaber
- Department of Biochemistry & Microbiology, North South University, Dhaka, Bangladesh
| | - Anisur Rahman
- Bioinformatics Division, National Institute of Biotechnology, Dhaka, Bangladesh
| | | | - Md. Salimullah
- Molecular Biotechnology Division, National Institute of Biotechnology, Dhaka, Bangladesh
| | - Chaman Ara Keya
- Department of Biochemistry & Microbiology, North South University, Dhaka, Bangladesh
- * E-mail:
| |
Collapse
|
8
|
de Crécy-lagard V, Amorin de Hegedus R, Arighi C, Babor J, Bateman A, Blaby I, Blaby-Haas C, Bridge AJ, Burley SK, Cleveland S, Colwell LJ, Conesa A, Dallago C, Danchin A, de Waard A, Deutschbauer A, Dias R, Ding Y, Fang G, Friedberg I, Gerlt J, Goldford J, Gorelik M, Gyori BM, Henry C, Hutinet G, Jaroch M, Karp PD, Kondratova L, Lu Z, Marchler-Bauer A, Martin MJ, McWhite C, Moghe GD, Monaghan P, Morgat A, Mungall CJ, Natale DA, Nelson WC, O’Donoghue S, Orengo C, O’Toole KH, Radivojac P, Reed C, Roberts RJ, Rodionov D, Rodionova IA, Rudolf JD, Saleh L, Sheynkman G, Thibaud-Nissen F, Thomas PD, Uetz P, Vallenet D, Carter EW, Weigele PR, Wood V, Wood-Charlson EM, Xu J. A roadmap for the functional annotation of protein families: a community perspective. Database (Oxford) 2022; 2022:baac062. [PMID: 35961013 PMCID: PMC9374478 DOI: 10.1093/database/baac062] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Revised: 06/28/2022] [Accepted: 08/03/2022] [Indexed: 12/23/2022]
Abstract
Over the last 25 years, biology has entered the genomic era and is becoming a science of 'big data'. Most interpretations of genomic analyses rely on accurate functional annotations of the proteins encoded by more than 500 000 genomes sequenced to date. By different estimates, only half the predicted sequenced proteins carry an accurate functional annotation, and this percentage varies drastically between different organismal lineages. Such a large gap in knowledge hampers all aspects of biological enterprise and, thereby, is standing in the way of genomic biology reaching its full potential. A brainstorming meeting to address this issue funded by the National Science Foundation was held during 3-4 February 2022. Bringing together data scientists, biocurators, computational biologists and experimentalists within the same venue allowed for a comprehensive assessment of the current state of functional annotations of protein families. Further, major issues that were obstructing the field were identified and discussed, which ultimately allowed for the proposal of solutions on how to move forward.
Collapse
Affiliation(s)
- Valérie de Crécy-lagard
- Department of Microbiology and Cell Sciences, University of Florida, Gainesville, FL 32611, USA
| | | | - Cecilia Arighi
- Department of Computer and Information Sciences, University of Delaware, Newark, DE 19713, USA
| | - Jill Babor
- Department of Microbiology and Cell Sciences, University of Florida, Gainesville, FL 32611, USA
| | - Alex Bateman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Ian Blaby
- US Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Crysten Blaby-Haas
- Biology Department, Brookhaven National Laboratory, Upton, NY 11973, USA
| | - Alan J Bridge
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, Geneva 4 CH-1211, Switzerland
| | - Stephen K Burley
- RCSB Protein Data Bank, Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Stacey Cleveland
- Department of Microbiology and Cell Sciences, University of Florida, Gainesville, FL 32611, USA
| | - Lucy J Colwell
- Departmenf of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, UK
| | - Ana Conesa
- Spanish National Research Council, Institute for Integrative Systems Biology, Paterna, Valencia 46980, Spain
| | - Christian Dallago
- TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology, i12, Boltzmannstr. 3, Garching/Munich 85748, Germany
| | - Antoine Danchin
- School of Biomedical Sciences, Li KaShing Faculty of Medicine, The University of Hong Kong, 21 Sassoon Road, Pokfulam, SAR Hong Kong 999077, China
| | - Anita de Waard
- Research Collaboration Unit, Elsevier, Jericho, VT 05465, USA
| | - Adam Deutschbauer
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Raquel Dias
- Department of Microbiology and Cell Sciences, University of Florida, Gainesville, FL 32611, USA
| | - Yousong Ding
- Department of Medicinal Chemistry, Center for Natural Products, Drug Discovery and Development, University of Florida, Gainesville, FL 32610, USA
| | - Gang Fang
- NYU-Shanghai, Shanghai 200120, China
| | - Iddo Friedberg
- Department of Veterinary Microbiology and Preventive Medicine, Iowa State University, Ames, IA 50011, USA
| | - John Gerlt
- Institute for Genomic Biology and Departments of Biochemistry and Chemistry, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Joshua Goldford
- Physics of Living Systems, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Mark Gorelik
- Department of Microbiology and Cell Sciences, University of Florida, Gainesville, FL 32611, USA
| | - Benjamin M Gyori
- Laboratory of Systems Pharmacology, Harvard Medical School, Boston, MA 02115, USA
| | - Christopher Henry
- Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL 60439, USA
| | - Geoffrey Hutinet
- Department of Microbiology and Cell Sciences, University of Florida, Gainesville, FL 32611, USA
| | - Marshall Jaroch
- Department of Microbiology and Cell Sciences, University of Florida, Gainesville, FL 32611, USA
| | - Peter D Karp
- Bioinformatics Research Group, SRI International, Menlo Park, CA 94025, USA
| | | | - Zhiyong Lu
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), 8600 Rockville Pike, Bethesda, MD 20817, USA
| | - Aron Marchler-Bauer
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), 8600 Rockville Pike, Bethesda, MD 20817, USA
| | - Maria-Jesus Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Claire McWhite
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08540, USA
| | - Gaurav D Moghe
- Plant Biology Section, School of Integrative Plant Science, Cornell University, Ithaca, NY 14853, USA
| | - Paul Monaghan
- Department of Agricultural Education and Communication, University of Florida, Gainesville, FL 32611, USA
| | - Anne Morgat
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, Geneva 4 CH-1211, Switzerland
| | - Christopher J Mungall
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Darren A Natale
- Georgetown University Medical Center, Washington, DC 20007, USA
| | - William C Nelson
- Biological Sciences Division, Pacific Northwest National Laboratories, Richland, WA 99354, USA
| | - Seán O’Donoghue
- School of Biotechnology and Biomolecular Sciences, University of NSW, Sydney, NSW 2052, Australia
| | - Christine Orengo
- Department of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | | | - Predrag Radivojac
- Khoury College of Computer Sciences, Northeastern University, Boston, MA 02115, USA
| | - Colbie Reed
- Department of Microbiology and Cell Sciences, University of Florida, Gainesville, FL 32611, USA
| | | | - Dmitri Rodionov
- Sanford Burnham Prebys Medical Discovery Institute, La Jolla, CA 92037, USA
| | - Irina A Rodionova
- Department of Bioengineering, Division of Engineering, University of California at San Diego, La Jolla, CA 92093-0412, USA
| | - Jeffrey D Rudolf
- Department of Chemistry, University of Florida, Gainesville, FL 32611, USA
| | - Lana Saleh
- New England Biolabs, Ipswich, MA 01938, USA
| | - Gloria Sheynkman
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA, USA
| | - Francoise Thibaud-Nissen
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), 8600 Rockville Pike, Bethesda, MD 20817, USA
| | - Paul D Thomas
- Department of Population and Public Health Sciences, University of Southern California, Los Angeles, CA 90033, USA
| | - Peter Uetz
- Center for Biological Data Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| | - David Vallenet
- LABGeM, Génomique Métabolique, CEA, Genoscope, Institut François Jacob, Université d’Évry, Université Paris-Saclay, CNRS, Evry 91057, France
| | - Erica Watson Carter
- Department of Plant Pathology, University of Florida Citrus Research and Education Center, 700 Experiment Station Rd., Lake Alfred, FL 33850, USA
| | | | - Valerie Wood
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, UK
| | - Elisha M Wood-Charlson
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Jin Xu
- Department of Plant Pathology, University of Florida Citrus Research and Education Center, 700 Experiment Station Rd., Lake Alfred, FL 33850, USA
| |
Collapse
|
9
|
In silico Methods for Identification of Potential Therapeutic Targets. Interdiscip Sci 2022; 14:285-310. [PMID: 34826045 PMCID: PMC8616973 DOI: 10.1007/s12539-021-00491-y] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2021] [Revised: 10/19/2021] [Accepted: 11/01/2021] [Indexed: 11/01/2022]
Abstract
AbstractAt the initial stage of drug discovery, identifying novel targets with maximal efficacy and minimal side effects can improve the success rate and portfolio value of drug discovery projects while simultaneously reducing cycle time and cost. However, harnessing the full potential of big data to narrow the range of plausible targets through existing computational methods remains a key issue in this field. This paper reviews two categories of in silico methods—comparative genomics and network-based methods—for finding potential therapeutic targets among cellular functions based on understanding their related biological processes. In addition to describing the principles, databases, software, and applications, we discuss some recent studies and prospects of the methods. While comparative genomics is mostly applied to infectious diseases, network-based methods can be applied to infectious and non-infectious diseases. Nonetheless, the methods often complement each other in their advantages and disadvantages. The information reported here guides toward improving the application of big data-driven computational methods for therapeutic target discovery.
Graphical abstract
Collapse
|
10
|
Damas MSF, Mazur FG, Freire CCDM, da Cunha AF, Pranchevicius MCDS. A Systematic Immuno-Informatic Approach to Design a Multiepitope-Based Vaccine Against Emerging Multiple Drug Resistant Serratia marcescens. Front Immunol 2022; 13:768569. [PMID: 35371033 PMCID: PMC8967166 DOI: 10.3389/fimmu.2022.768569] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2021] [Accepted: 02/14/2022] [Indexed: 11/24/2022] Open
Abstract
Serratia marcescens is now an important opportunistic pathogen that can cause serious infections in hospitalized or immunocompromised patients. Here, we used extensive bioinformatic analyses based on reverse vaccinology and subtractive proteomics-based approach to predict potential vaccine candidates against S. marcescens. We analyzed the complete proteome sequence of 49 isolate of Serratia marcescens and identified 5 that were conserved proteins, non-homologous from human and gut flora, extracellular or exported to the outer membrane, and antigenic. The identified proteins were used to select 5 CTL, 12 HTL, and 12 BCL epitopes antigenic, non-allergenic, conserved, hydrophilic, and non-toxic. In addition, HTL epitopes were able to induce interferon-gamma immune response. The selected peptides were used to design 4 multi-epitope vaccines constructs (SMV1, SMV2, SMV3 and SMV4) with immune-modulating adjuvants, PADRE sequence, and linkers. Peptide cleavage analysis showed that antigen vaccines are processed and presented via of MHC class molecule. Several physiochemical and immunological analyses revealed that all multiepitope vaccines were non-allergenic, stable, hydrophilic, and soluble and induced the immunity with high antigenicity. The secondary structure analysis revealed the designed vaccines contain mainly coil structure and alpha helix structures. 3D analyses showed high-quality structure. Molecular docking analyses revealed SMV4 as the best vaccine construct among the four constructed vaccines, demonstrating high affinity with the immune receptor. Molecular dynamics simulation confirmed the low deformability and stability of the vaccine candidate. Discontinuous epitope residues analyses of SMV4 revealed that they are flexible and can interact with antibodies. In silico immune simulation indicated that the designed SMV4 vaccine triggers an effective immune response. In silico codon optimization and cloning in expression vector indicate that SMV4 vaccine can be efficiently expressed in E. coli system. Overall, we showed that SMV4 multi-epitope vaccine successfully elicited antigen-specific humoral and cellular immune responses and may be a potential vaccine candidate against S. marcescens. Further experimental validations could confirm its exact efficacy, the safety and immunogenicity profile. Our findings bring a valuable addition to the development of new strategies to prevent and control the spread of multidrug-resistant Gram-negative bacteria with high clinical relevance.
Collapse
Affiliation(s)
| | - Fernando Gabriel Mazur
- Departamento de Genética e Evolução, Universidade Federal de São Carlos, São Carlos, Brazil
| | | | | | - Maria-Cristina da Silva Pranchevicius
- Departamento de Genética e Evolução, Universidade Federal de São Carlos, São Carlos, Brazil
- Centro de Ciências Biológicas e da Saúde, Biodiversidade Tropical – BIOTROP, Universidade Federal de São Carlos, São Carlos, Brazil
| |
Collapse
|
11
|
Marques de Castro G, Hastenreiter Z, Silva Monteiro TA, Martins da Silva TT, Pereira Lobo F. Cross-species prediction of essential genes in insects. Bioinformatics 2022; 38:1504-1513. [PMID: 34999756 DOI: 10.1093/bioinformatics/btac009] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Revised: 11/12/2021] [Accepted: 01/04/2022] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Insects possess a vast phenotypic diversity and key ecological roles. Several insect species also have medical, agricultural and veterinary importance as parasites and disease vectors. Therefore, strategies to identify potential essential genes in insects may reduce the resources needed to find molecular players in central processes of insect biology. However, most predictors of essential genes in multicellular eukaryotes using machine learning rely on expensive and laborious experimental data to be used as gene features, such as gene expression profiles or protein-protein interactions, even though some of this information may not be available for the majority of insect species with genomic sequences available. RESULTS Here, we present and validate a machine learning strategy to predict essential genes in insects using sequence-based intrinsic attributes (statistical and physicochemical data) together with the predictions of subcellular location and transcriptomic data, if available. We gathered information available in public databases describing essential and non-essential genes for Drosophila melanogaster (fruit fly, Diptera) and Tribolium castaneum (red flour beetle, Coleoptera). We proceeded by computing intrinsic and extrinsic attributes that were used to train statistical models in one species and tested by their capability of predicting essential genes in the other. Even models trained using only intrinsic attributes are capable of predicting genes in the other insect species, including the prediction of lineage-specific essential genes. Furthermore, the inclusion of RNA-Seq data is a major factor to increase classifier performance. AVAILABILITY AND IMPLEMENTATION The code, data and final models produced in this study are freely available at https://github.com/g1o/GeneEssentiality/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Giovanni Marques de Castro
- Departamento de Genética, Ecologia e Evolução, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Zandora Hastenreiter
- Departamento de Genética, Ecologia e Evolução, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Thiago Augusto Silva Monteiro
- Departamento de Genética, Ecologia e Evolução, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Thieres Tayroni Martins da Silva
- Departamento de Genética, Ecologia e Evolução, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Francisco Pereira Lobo
- Departamento de Genética, Ecologia e Evolução, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| |
Collapse
|
12
|
Kania A. Harnessing the information theory and chaos game representation for pattern searching among essential and non-essential genes in Bacteria. J Theor Biol 2021; 531:110917. [PMID: 34563550 DOI: 10.1016/j.jtbi.2021.110917] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2021] [Revised: 08/19/2021] [Accepted: 09/21/2021] [Indexed: 11/29/2022]
Abstract
Proteins encoded by genes are engaged in most of the processes within a cell. Typing a minimal set of genes required for survival is still a challenging task. Essential genes seem to be more conservative and are usually responsible for basic functions, for instance, genetic information flow or energy production. Despite persistent advances in experimental methods, computer predictions may constitute an important part of this investigation. Firstly, they may embrace a huge amount of data and provide some characteristic patterns. Furthermore, they enable scientists to build models for predicting essential genes which are not yet verified experimentally. Some papers indicate interesting dependencies within essential genes sequences using different computer models. In this paper, an author took a three-step analysis for a deeper understanding of the fundamentals of essential and non-essential genes. Beginning from a simple nucleotide composition and finishing at long-range correlations, presents some characteristic patterns that are expected to be developed in future studies.
Collapse
Affiliation(s)
- Adrian Kania
- Department of Computational Biophysics and Bioinformatics, Faculty of Biochemistry, Biophysics and Biotechnology, Jagiellonian University, Gronostajowa 7, Cracow 30-387, Poland
| |
Collapse
|
13
|
Geptop 2.0: Accurately Select Essential Genes from the List of Protein-Coding Genes in Prokaryotic Genomes. Methods Mol Biol 2021. [PMID: 34709630 DOI: 10.1007/978-1-0716-1720-5_23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2023]
Abstract
Computational tool composites alternative way to identify essential genes and it is low-cost and time-efficient. Based on experimental essentiality sets deposited in the databases DEG and OGEE as reference, we developed an automatically computational tool named Geptop to select essential genes from the set of protein-coding genes in a prokaryotic genome, which utilizes the strategy of reciprocally best hit for homology search and evolutionary distance for weight assigning. The latest version of Geptop is 2.0 ( http://guolab.whu.edu.cn/geptop ), which can predict gene essentiality with the mean AUC 0f 0.84 in prokaryotes and is more stable. The chapter is to briefly introduce the tool and tell how to use it.
Collapse
|
14
|
Sharma M, Singh DN, Budhraja R, Sood U, Rawat CD, Adrian L, Richnow HH, Singh Y, Negi RK, Lal R. Comparative proteomics unravelled the hexachlorocyclohexane (HCH) isomers specific responses in an archetypical HCH degrading bacterium Sphingobium indicum B90A. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2021; 28:41380-41395. [PMID: 33783707 DOI: 10.1007/s11356-021-13073-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/27/2020] [Accepted: 02/17/2021] [Indexed: 06/12/2023]
Abstract
Hexachlorocyclohexane (HCH) is a persistent organochlorine pesticide that poses threat to different life forms. Sphingobium indicum B90A that belong to sphingomonad is well-known for its ability to degrade HCH isomers (α-, β-, γ-, δ-), but effects of HCH isomers and adaptive mechanisms of strain B90A under HCH load remain obscure. To investigate the responses of strain B90A to HCH isomers, we followed the proteomics approach as this technique is considered as the powerful tool to study the microbial response to environmental stress. Strain B90A culture was exposed to α-, β-, γ-, δ-HCH (5 mgL-1) and control (without HCH) taken for comparison and changes in whole cell proteome were analyzed. In β- and δ-HCH-treated cultures growth decreased significantly when compared to control, α-, and γ-HCH-treated cultures. HCH residue analysis corroborated previous observations depicting the complete depletion of α- and γ-HCH, while only 66% β-HCH and 34% δ-HCH were depleted from culture broth. Comparative proteome analyses showed that β- and δ-HCH induced utmost systemic changes in strain B90A proteome, wherein stress-alleviating proteins such as histidine kinases, molecular chaperons, DNA binding proteins, ABC transporters, TonB proteins, antioxidant enzymes, and transcriptional regulators were significantly affected. Besides study confirmed constitutive expression of linA, linB, and linC genes that are crucial for the initiation of HCH isomers degradation, while increased abundance of LinM and LinN in presence of β- and δ-HCH suggested the important role of ABC transporter in depletion of these isomers. These results will help to understand the HCH-induced damages and adaptive strategies of strain B90A under HCH load which remained unravelled to date.
Collapse
Affiliation(s)
- Monika Sharma
- Fish Molecular Biology Laboratory, Department of Zoology, University of Delhi, Delhi, 110007, India
| | | | - Rohit Budhraja
- Helmholtz Centre for Environmental Research-UFZ, 04318, Leipzig, Germany
| | - Utkarsh Sood
- Department of Zoology, University of Delhi, Delhi, 110007, India
- The Energy and Resources Institute, Darbari Seth Block, IHC Complex, Lodhi Road, New Delhi, 110003, India
| | - Charu Dogra Rawat
- Department of Zoology, Ramjas College, University of Delhi, Delhi, 110007, India
| | - Lorenz Adrian
- Helmholtz Centre for Environmental Research-UFZ, 04318, Leipzig, Germany
| | | | - Yogendra Singh
- Department of Zoology, University of Delhi, Delhi, 110007, India
| | - Ram Krishan Negi
- Fish Molecular Biology Laboratory, Department of Zoology, University of Delhi, Delhi, 110007, India.
| | - Rup Lal
- Department of Zoology, University of Delhi, Delhi, 110007, India.
- The Energy and Resources Institute, Darbari Seth Block, IHC Complex, Lodhi Road, New Delhi, 110003, India.
| |
Collapse
|
15
|
Nlebedim VU, Chaudhuri RR, Walters K. Probabilistic Identification of Bacterial Essential Genes via insertion density using TraDIS Data with Tn5 libraries. Bioinformatics 2021; 37:4343-4349. [PMID: 34255819 PMCID: PMC8652038 DOI: 10.1093/bioinformatics/btab508] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Revised: 06/24/2021] [Accepted: 07/23/2021] [Indexed: 11/29/2022] Open
Abstract
Motivation Probabilistic Identification of bacterial essential genes using transposon-directed insertion-site sequencing (TraDIS) data based on Tn5 libraries has received relatively little attention in the literature; most methods are designed for mariner transposon insertions. Analysis of Tn5 transposon-based genomic data is challenging due to the high insertion density and genomic resolution. We present a novel probabilistic Bayesian approach for classifying bacterial essential genes using transposon insertion density derived from transposon insertion sequencing data. We implement a Markov chain Monte Carlo sampling procedure to estimate the posterior probability that any given gene is essential. We implement a Bayesian decision theory approach to selecting essential genes. We assess the effectiveness of our approach via analysis of both simulated data and three previously published Escherichia coli, Salmonella Typhimurium and Staphylococcus aureus datasets. These three bacteria have relatively well characterized essential genes which allows us to test our classification procedure using receiver operating characteristic curves and area under the curves. We compare the classification performance with that of Bio-Tradis, a standard tool for bacterial gene classification. Results Our method is able to classify genes in the three datasets with areas under the curves between 0.967 and 0.983. Our simulated synthetic datasets show that both the number of insertions and the extent to which insertions are tolerated in the distal regions of essential genes are both important in determining classification accuracy. Importantly our method gives the user the option of classifying essential genes based on the user-supplied costs of false discovery and false non-discovery. Availability and implementation An R package that implements the method presented in this paper is available for download from https://github.com/Kevin-walters/insdens. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Valentine U Nlebedim
- School of Mathematics and Statistics, University of Sheffield, Sheffield, S10 2TN, United Kingdom
| | - Roy R Chaudhuri
- Department of Molecular Biology and Biotechnology, University of Sheffield, Sheffield, S10 2TN, United Kingdom
| | - Kevin Walters
- School of Mathematics and Statistics, University of Sheffield, Sheffield, S10 2TN, United Kingdom
| |
Collapse
|
16
|
Kuang S, Wei Y, Wang L. Expression-based prediction of human essential genes and candidate lncRNAs in cancer cells. Bioinformatics 2021; 37:396-403. [PMID: 32790840 DOI: 10.1093/bioinformatics/btaa717] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2020] [Revised: 07/21/2020] [Accepted: 08/06/2020] [Indexed: 01/12/2023] Open
Abstract
MOTIVATION Essential genes are required for the reproductive success at either cellular or organismal level. The identification of essential genes is important for understanding the core biological processes and identifying effective therapeutic drug targets. However, experimental identification of essential genes is costly, time consuming and labor intensive. Although several machine learning models have been developed to predict essential genes, these models are not readily applicable to lncRNAs. Moreover, the currently available models cannot be used to predict essential genes in a specific cancer type. RESULTS In this study, we have developed a new machine learning approach, XGEP (eXpression-based Gene Essentiality Prediction), to predict essential genes and candidate lncRNAs in cancer cells. The novelty of XGEP lies in the utilization of relevant features derived from the TCGA transcriptome dataset through collaborative embedding. When evaluated on the pan-cancer dataset, XGEP was able to accurately predict human essential genes and achieve significantly higher performance than previous models. Notably, several candidate lncRNAs selected by XGEP are reported to promote cell proliferation and inhibit cell apoptosis. Moreover, XGEP also demonstrated superior performance on cancer-type-specific datasets to identify essential genes. The comprehensive lists of candidate essential genes in specific cancer types may be used to guide experimental characterization and facilitate the discovery of drug targets for cancer therapy. AVAILABILITY AND IMPLEMENTATION The source code and datasets used in this study are freely available at https://github.com/BioDataLearning/XGEP. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Shuzhen Kuang
- Department of Genetics and Biochemistry, Clemson University, Clemson, SC 29634, USA.,Department of Biological Sciences, Clemson University, Clemson, SC 29634, USA
| | - Yanzhang Wei
- Department of Biological Sciences, Clemson University, Clemson, SC 29634, USA
| | - Liangjiang Wang
- Department of Genetics and Biochemistry, Clemson University, Clemson, SC 29634, USA.,Center for Human Genetics, Clemson University, Clemson, SC 29634, USA
| |
Collapse
|
17
|
Aromolaran O, Aromolaran D, Isewon I, Oyelade J. Machine learning approach to gene essentiality prediction: a review. Brief Bioinform 2021; 22:6219158. [PMID: 33842944 DOI: 10.1093/bib/bbab128] [Citation(s) in RCA: 36] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2021] [Revised: 03/04/2021] [Accepted: 03/17/2021] [Indexed: 12/17/2022] Open
Abstract
Essential genes are critical for the growth and survival of any organism. The machine learning approach complements the experimental methods to minimize the resources required for essentiality assays. Previous studies revealed the need to discover relevant features that significantly classify essential genes, improve on the generalizability of prediction models across organisms, and construct a robust gold standard as the class label for the train data to enhance prediction. Findings also show that a significant limitation of the machine learning approach is predicting conditionally essential genes. The essentiality status of a gene can change due to a specific condition of the organism. This review examines various methods applied to essential gene prediction task, their strengths, limitations and the factors responsible for effective computational prediction of essential genes. We discussed categories of features and how they contribute to the classification performance of essentiality prediction models. Five categories of features, namely, gene sequence, protein sequence, network topology, homology and gene ontology-based features, were generated for Caenorhabditis elegans to perform a comparative analysis of their essentiality prediction capacity. Gene ontology-based feature category outperformed other categories of features majorly due to its high correlation with the genes' biological functions. However, the topology feature category provided the highest discriminatory power making it more suitable for essentiality prediction. The major limiting factor of machine learning to predict essential genes conditionality is the unavailability of labeled data for interest conditions that can train a classifier. Therefore, cooperative machine learning could further exploit models that can perform well in conditional essentiality predictions. SHORT ABSTRACT Identification of essential genes is imperative because it provides an understanding of the core structure and function, accelerating drug targets' discovery, among other functions. Recent studies have applied machine learning to complement the experimental identification of essential genes. However, several factors are limiting the performance of machine learning approaches. This review aims to present the standard procedure and resources available for predicting essential genes in organisms, and also highlight the factors responsible for the current limitation in using machine learning for conditional gene essentiality prediction. The choice of features and ML technique was identified as an important factor to predict essential genes effectively.
Collapse
Affiliation(s)
- Olufemi Aromolaran
- Department of Computer and Information Sciences, Covenant University, Ota, Ogun State, Nigeria.,Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, Nigeria
| | - Damilare Aromolaran
- Department of Computer and Information Sciences, Covenant University, Ota, Ogun State, Nigeria.,Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, Nigeria
| | - Itunuoluwa Isewon
- Department of Computer and Information Sciences, Covenant University, Ota, Ogun State, Nigeria.,Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, Nigeria
| | - Jelili Oyelade
- Department of Computer and Information Sciences, Covenant University, Ota, Ogun State, Nigeria.,Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, Nigeria
| |
Collapse
|
18
|
Pan-genomics, drug candidate mining and ADMET profiling of natural product inhibitors screened against Yersinia pseudotuberculosis. Genomics 2020; 113:238-244. [PMID: 33321204 DOI: 10.1016/j.ygeno.2020.12.015] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2020] [Revised: 11/13/2020] [Accepted: 12/10/2020] [Indexed: 12/12/2022]
Abstract
Yersinia pseudotuberculosis belongs to the family Enterobacteriaceae and is responsible for scarlatinoid fever, food poisoning, post-infectious complications like erythema nodosum/reactive arthritis as well as pseudoappendicitis in children. Genome sequences of the 23 whole genomes from NCBI were utilized for conducting the pan-genomic analysis. Essential proteins from the core region were obtained and drug targets were identified using a hierarchal in silico approach. Among these, multidrug resistance protein sub-unit mdtC was chosen for further analysis. This protein unit confers resistance to antibiotics upon forming a tripartite complex with units A and B in Escherichia coli. Details of the function have not yet been elucidated experimentally in Yersinia spp. Computational structure modeling and validation were followed by screening against phytochemical libraries of traditional Indian (Ayurveda), North African, and traditional Chinese flora using Molecular Operating Environment software version 2019.0102. ADMET profiling and descriptor study of best docked compounds was studied. Since phytotherapy is the best resort to antibiotic resistance so these compounds should be tested experimentally to further validate the results. The obtained information could aid wet-lab scientists to work on the scaffold of screened drug-like compounds from natural resources. This could be useful in our quest for antibiotic-resistant therapy against Y. pseudotuberculosis.
Collapse
|
19
|
Liu S, Wang SX, Liu W, Wang C, Zhang FZ, Ye YN, Wu CS, Zheng WX, Rao N, Guo FB. CEG 2.0: an updated database of clusters of essential genes including eukaryotic organisms. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2020; 2020:6031000. [PMID: 33306800 PMCID: PMC7731928 DOI: 10.1093/database/baaa112] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/24/2020] [Revised: 11/12/2020] [Accepted: 12/02/2020] [Indexed: 02/06/2023]
Abstract
Essential genes are key elements for organisms to maintain their living. Building databases that store essential genes in the form of homologous clusters, rather than storing them as a singleton, can provide more enlightening information such as the general essentiality of homologous genes in multiple organisms. In 2013, the first database to store prokaryotic essential genes in clusters, CEG (Clusters of Essential Genes), was constructed. Afterward, the amount of available data for essential genes increased by a factor >3 since the last revision. Herein, we updated CEG to version 2, including more prokaryotic essential genes (from 16 gene datasets to 29 gene datasets) and newly added eukaryotic essential genes (nine species), specifically the human essential genes of 12 cancer cell lines. For prokaryotes, information associated with drug targets, such as protein structure, ligand–protein interaction, virulence factor and matched drugs, is also provided. Finally, we provided the service of essential gene prediction for both prokaryotes and eukaryotes. We hope our updated database will benefit more researchers in drug targets and evolutionary genomics. Database URL:http://cefg.uestc.cn/ceg
Collapse
Affiliation(s)
- Shuo Liu
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China.,Key Laboratory of Combinatorial Biosynthesis and Drug Discovery, Ministry of Education and School of Pharmaceutical Sciences, Wuhan University, Wuhan 430071, China
| | - Shu-Xuan Wang
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Wei Liu
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Chen Wang
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Fa-Zhan Zhang
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Yuan-Nong Ye
- Bioinformatics and BioMedical Bigdata Mining Laboratory, Key Laboratory of Environmental Pollution Monitoring and Disease Control, Ministry of Education, Guizhou Medical University, Guiyang 550025, China
| | - Candy-S Wu
- Thomas Worthington High School, 300 West Granville Road, Worthington, OH 43085, USA
| | - Wen-Xin Zheng
- School of Biomedical Engineering, Capital Medical University, Beijing 100069, China
| | - Nini Rao
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Feng-Biao Guo
- Key Laboratory of Combinatorial Biosynthesis and Drug Discovery, Ministry of Education and School of Pharmaceutical Sciences, Wuhan University, Wuhan 430071, China
| |
Collapse
|
20
|
Nandi S, Ganguli P, Sarkar RR. Essential gene prediction using limited gene essentiality information-An integrative semi-supervised machine learning strategy. PLoS One 2020; 15:e0242943. [PMID: 33253254 PMCID: PMC7703937 DOI: 10.1371/journal.pone.0242943] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2020] [Accepted: 11/12/2020] [Indexed: 11/24/2022] Open
Abstract
Essential gene prediction helps to find minimal genes indispensable for the survival of any organism. Machine learning (ML) algorithms have been useful for the prediction of gene essentiality. However, currently available ML pipelines perform poorly for organisms with limited experimental data. The objective is the development of a new ML pipeline to help in the annotation of essential genes of less explored disease-causing organisms for which minimal experimental data is available. The proposed strategy combines unsupervised feature selection technique, dimension reduction using the Kamada-Kawai algorithm, and semi-supervised ML algorithm employing Laplacian Support Vector Machine (LapSVM) for prediction of essential and non-essential genes from genome-scale metabolic networks using very limited labeled dataset. A novel scoring technique, Semi-Supervised Model Selection Score, equivalent to area under the ROC curve (auROC), has been proposed for the selection of the best model when supervised performance metrics calculation is difficult due to lack of data. The unsupervised feature selection followed by dimension reduction helped to observe a distinct circular pattern in the clustering of essential and non-essential genes. LapSVM then created a curve that dissected this circle for the classification and prediction of essential genes with high accuracy (auROC > 0.85) even with 1% labeled data for model training. After successful validation of this ML pipeline on both Eukaryotes and Prokaryotes that show high accuracy even when the labeled dataset is very limited, this strategy is used for the prediction of essential genes of organisms with inadequate experimentally known data, such as Leishmania sp. Using a graph-based semi-supervised machine learning scheme, a novel integrative approach has been proposed for essential gene prediction that shows universality in application to both Prokaryotes and Eukaryotes with limited labeled data. The essential genes predicted using the pipeline provide an important lead for the prediction of gene essentiality and identification of novel therapeutic targets for antibiotic and vaccine development against disease-causing parasites.
Collapse
Affiliation(s)
- Sutanu Nandi
- Chemical Engineering and Process Development, CSIR-National Chemical Laboratory, Pune, Maharashtra, India
- Academy of Scientific & Innovative Research (AcSIR), Ghaziabad, India
| | - Piyali Ganguli
- Chemical Engineering and Process Development, CSIR-National Chemical Laboratory, Pune, Maharashtra, India
- Academy of Scientific & Innovative Research (AcSIR), Ghaziabad, India
| | - Ram Rup Sarkar
- Chemical Engineering and Process Development, CSIR-National Chemical Laboratory, Pune, Maharashtra, India
- Academy of Scientific & Innovative Research (AcSIR), Ghaziabad, India
| |
Collapse
|
21
|
Rajamanickam K, Yang J, Chidambaram SB, Sakharkar MK. Enhancing Drug Efficacy against Mastitis Pathogens-An In Vitro Pilot Study in Staphylococcus aureus and Staphylococcus epidermidis. Animals (Basel) 2020; 10:E2117. [PMID: 33203170 PMCID: PMC7696410 DOI: 10.3390/ani10112117] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2020] [Revised: 11/04/2020] [Accepted: 11/09/2020] [Indexed: 11/24/2022] Open
Abstract
BACKGROUND Bovine mastitis is one of the major infectious diseases in dairy cattle, resulting in large economic loss due to decreased milk production and increased production cost to the dairy industry. Antibiotics are commonly used to prevent/treat bovine mastitis infections. However, increased antibiotic resistance and consumers' concern regarding antibiotic overuse make it prudent and urgent to develop novel therapeutic protocols for this disease. MATERIALS AND METHODS Potential druggable targets were found in 20 mastitis-causing pathogens and conserved and unique targets were identified. Bacterial strains Staphylococcus aureus (ATCC 29213, and two clinical isolates CI 1 and CI 2) and Staphylococcus epidermidis (ATCC 12228, and two clinical isolates CI 1 and CI 2) were used in the present study for validation of an effective drug combination. RESULTS In the current study, we identified the common and the unique druggable targets for twenty mastitis-causing pathogens using an integrative approach. Furthermore, we showed that phosphorylcholine, a drug for a unique target gamma-hemolysin component B in Staphylococcus aureus, and ceftiofur, the mostly used veterinary antibiotic that is FDA approved for treating mastitis infections, exhibit a synergistic effect against S. aureus and a strong additive effect against Staphylococcus epidermidis in vitro. CONCLUSION Based on the data generated in this study, we propose that combination therapy with drugs that work synergistically against conserved and unique targets can help increase efficacy and lower the usage of antibiotics for treating bacterial infections. However, these data need further validations in animal models of infection.
Collapse
Affiliation(s)
- Karthic Rajamanickam
- College of Pharmacy and Nutrition, University of Saskatchewan, 107 Wiggins Road, Saskatoon, SK S7N 5E5, Canada; (K.R.); (J.Y.)
| | - Jian Yang
- College of Pharmacy and Nutrition, University of Saskatchewan, 107 Wiggins Road, Saskatoon, SK S7N 5E5, Canada; (K.R.); (J.Y.)
| | - Saravana Babu Chidambaram
- Department of Pharmacology, JSS College of Pharmacy, JSS Academy of Higher Education & Research (JSS AHER), Mysuru-570015, Karnataka, India;
| | - Meena Kishore Sakharkar
- College of Pharmacy and Nutrition, University of Saskatchewan, 107 Wiggins Road, Saskatoon, SK S7N 5E5, Canada; (K.R.); (J.Y.)
| |
Collapse
|
22
|
Yu X, Weng T, Gu C, Yang H. Comparison of gene regulatory networks to identify pathogenic genes for lymphoma. J Bioinform Comput Biol 2020; 18:2050029. [PMID: 33131362 DOI: 10.1142/s0219720020500298] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Lymphoma is the most complicated cancer that can be divided into several tens of subtypes. It may occur in any part of body that has lymphocytes, and is closely correlated with diverse environmental factors such as the ionizing radiation, chemocarcinogenesis, and virus infection. All the environmental factors affect the lymphoma through genes. Identifying pathogenic genes for lymphoma is consequently an essential task to understand its complexity in a unified framework. In this paper, we propose a new method to expose high-confident edges in gene regulatory networks (GRNs) for a total of 32 organs, called Filtered GRNs (f-GRNs), comparison of which gives us a proper reference for the Lymphoma, i.e. the B-lymphocytes cells, whose f-GRN is closest with that for the Lymphoma. By using the Gene Ontology and Biological Process analysis we display the differences of the two networks' hubs in biological functions. Matching with the Genecards shows that most of the hubs take part in the genetic information transmission and expression, except a specific gene of Retinoic Acid Receptor Alpha (RARA) that encodes the retinoic acid receptor. In the lymphoma, the genes in the RARA ego-network are involved in two cancer pathways, and the RARA is present only in these cancer pathways. For the lymphoid B cells, however, the genes in the RARA ego-network do not participate in cancer-related pathways.
Collapse
Affiliation(s)
- Xiao Yu
- Department of Systems Science, University of Shanghai for Science and Technology, Jungong Road No. 516, Shanghai 200093, P. R. China
| | - Tongfeng Weng
- Department of Systems Science, University of Shanghai for Science and Technology, Jungong Road No. 516, Shanghai 200093, P. R. China
| | - Changgui Gu
- Department of Systems Science, University of Shanghai for Science and Technology, Jungong Road No. 516, Shanghai 200093, P. R. China
| | - Huijie Yang
- Department of Systems Science, University of Shanghai for Science and Technology, Jungong Road No. 516, Shanghai 200093, P. R. China
| |
Collapse
|
23
|
Yan F, Gao F. A systematic strategy for the investigation of vaccines and drugs targeting bacteria. Comput Struct Biotechnol J 2020; 18:1525-1538. [PMID: 32637049 PMCID: PMC7327267 DOI: 10.1016/j.csbj.2020.06.008] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2020] [Revised: 06/02/2020] [Accepted: 06/03/2020] [Indexed: 02/07/2023] Open
Abstract
Infectious and epidemic diseases induced by bacteria have historically caused great distress to people, and have even resulted in a large number of deaths worldwide. At present, many researchers are working on the discovery of viable drug and vaccine targets for bacteria through multiple methods, including the analyses of comparative subtractive genome, core genome, replication-related proteins, transcriptomics and riboswitches, which plays a significant part in the treatment of infectious and pandemic diseases. The 3D structures of the desired target proteins, drugs and epitopes can be predicted and modeled through target analysis. Meanwhile, molecular dynamics (MD) analysis of the constructed drug/epitope-protein complexes is an important standard for testing the suitability of these screened drugs and vaccines. Currently, target discovery, target analysis and MD analysis are integrated into a systematic set of drug and vaccine analysis strategy for bacteria. We hope that this comprehensive strategy will help in the design of high-performance vaccines and drugs.
Collapse
Affiliation(s)
- Fangfang Yan
- Department of Physics, School of Science, Tianjin University, Tianjin 300072, China
| | - Feng Gao
- Department of Physics, School of Science, Tianjin University, Tianjin 300072, China
- Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, Tianjin 300072, China
- SynBio Research Platform, Collaborative Innovation Center of Chemical Science and Engineering (Tianjin), Tianjin 300072, China
| |
Collapse
|
24
|
Ferreira LM, Sáfadi T, Ferreira JL. Evaluation of genome similarities using a wavelet-domain approach. Rev Soc Bras Med Trop 2020; 53:e20190470. [PMID: 32428175 PMCID: PMC7269520 DOI: 10.1590/0037-8682-0470-2019] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2019] [Accepted: 03/10/2020] [Indexed: 11/21/2022] Open
Abstract
INTRODUCTION Tuberculosis is listed among the top 10 causes of deaths worldwide. The resistant strains causing this disease have been considered to be responsible for public health emergencies and health security threats. As stated by the World Health Organization (WHO), around 558,000 different cases coupled with resistance to rifampicin (the most operative first-line drug) have been estimated to date. Therefore, in order to detect the resistant strains using the genomes of Mycobacterium tuberculosis (MTB), we propose a new methodology for the analysis of genomic similarities that associate the different levels of decomposition of the genome (discrete non-decimated wavelet transform) and the Hurst exponent. METHODS The signals corresponding to the ten analyzed sequences were obtained by assessing GC content, and then these signals were decomposed using the discrete non-decimated wavelet transform along with the Daubechies wavelet with four null moments at five levels of decomposition. The Hurst exponent was calculated at each decomposition level using five different methods. The cluster analysis was performed using the results obtained for the Hurst exponent. RESULTS The aggregated variance, differenced aggregated variance, and aggregated absolute value methods presented the formation of three groups, whereas the Peng and R/S methods presented the formation of two groups. The aggregated variance method exhibited the best results with respect to the group formation between similar strains. CONCLUSION The evaluation of Hurst exponent associated with discrete non-decimated wavelet transform can be used as a measure of similarity between genome sequences, thus leading to a refinement in the analysis.
Collapse
Affiliation(s)
- Leila Maria Ferreira
- Programa de Pós-Graduação Stricto Sensu em Estatística e Experimentação Agropecuária, Universidade Federal de Lavras, Lavras, MG, Brasil
| | - Thelma Sáfadi
- Departamento de Estatística, Universidade Federal de Lavras, Lavras, MG, Brasil
| | | |
Collapse
|
25
|
Delineating Novel Therapeutic Drug and Vaccine Targets for Staphylococcus cornubiensis NW1T Through Computational Analysis. Int J Pept Res Ther 2020. [DOI: 10.1007/s10989-020-10076-w] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
26
|
Wen QF, Liu S, Dong C, Guo HX, Gao YZ, Guo FB. Geptop 2.0: An Updated, More Precise, and Faster Geptop Server for Identification of Prokaryotic Essential Genes. Front Microbiol 2019; 10:1236. [PMID: 31214154 PMCID: PMC6558110 DOI: 10.3389/fmicb.2019.01236] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2019] [Accepted: 05/17/2019] [Indexed: 12/16/2022] Open
Abstract
Geptop has performed effectively in the identification of prokaryotic essential genes since its first release in 2013. It estimates gene essentiality for prokaryotes based on orthology and phylogeny. Genome-scale essentiality data of more prokaryotic species are available, and the information has been collected into public essential gene repositories such as DEG and OGEE. A faster and more accurate toolkit is needed to meet the increasing prokaryotic genome data. We updated Geptop by supplementing more validated essentiality data into reference set (from 19 to 37 species), and introducing multi-process technology to accelerate the computing speed. Compared with Geptop 1.0 and other gene essentiality prediction models, Geptop 2.0 can generate more stable predictions and finish the computation in a shorter time. The software is available both as an online server and a downloadable standalone application. We hope that the improved Geptop 2.0 will facilitate researches in gene essentiality and the development of novel antibacterial drugs. The gene essentiality prediction tool is available at http://cefg.uestc.cn/geptop.
Collapse
Affiliation(s)
- Qing-Feng Wen
- School of Life Sciences and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Shuo Liu
- School of Life Sciences and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Chuan Dong
- School of Life Sciences and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Hai-Xia Guo
- School of Life Sciences and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Yi-Zhou Gao
- School of Life Sciences and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Feng-Biao Guo
- School of Life Sciences and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
27
|
Shields RC, Jensen PA. The bare necessities: Uncovering essential and condition-critical genes with transposon sequencing. Mol Oral Microbiol 2019; 34:39-50. [PMID: 30739386 DOI: 10.1111/omi.12256] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2018] [Revised: 01/18/2019] [Accepted: 02/06/2019] [Indexed: 12/11/2022]
Abstract
Querying gene function in bacteria has been greatly accelerated by the advent of transposon sequencing (Tn-seq) technologies (related Tn-seq strategies are known as TraDIS, INSeq, RB-TnSeq, and HITS). Pooled populations of transposon mutants are cultured in an environment and next-generation sequencing tools are used to determine areas of the genome that are important for bacterial fitness. In this review we provide an overview of Tn-seq methodologies and discuss how Tn-seq has been applied, or could be applied, to the study of oral microbiology. These applications include studying the essential genome as a means to rationally design therapeutic agents. Tn-seq has also contributed to our understanding of well-studied biological processes in oral bacteria. Other important applications include in vivo pathogenesis studies and use of Tn-seq to probe the molecular basis of microbial interactions. We also highlight recent advancements in techniques that act in synergy with Tn-seq such as clustered regularly interspaced short palindromic repeats (CRISPR) interference and microfluidic chip platforms.
Collapse
Affiliation(s)
- Robert C Shields
- Department of Oral Biology, College of Dentistry, University of Florida, Gainesville, Florida
| | - Paul A Jensen
- Department of Bioengineering and Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois
| |
Collapse
|
28
|
Li X, Li W, Zeng M, Zheng R, Li M. Network-based methods for predicting essential genes or proteins: a survey. Brief Bioinform 2019; 21:566-583. [DOI: 10.1093/bib/bbz017] [Citation(s) in RCA: 55] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2018] [Revised: 01/21/2019] [Accepted: 01/22/2019] [Indexed: 12/14/2022] Open
Abstract
Abstract
Genes that are thought to be critical for the survival of organisms or cells are called essential genes. The prediction of essential genes and their products (essential proteins) is of great value in exploring the mechanism of complex diseases, the study of the minimal required genome for living cells and the development of new drug targets. As laboratory methods are often complicated, costly and time-consuming, a great many of computational methods have been proposed to identify essential genes/proteins from the perspective of the network level with the in-depth understanding of network biology and the rapid development of biotechnologies. Through analyzing the topological characteristics of essential genes/proteins in protein–protein interaction networks (PINs), integrating biological information and considering the dynamic features of PINs, network-based methods have been proved to be effective in the identification of essential genes/proteins. In this paper, we survey the advanced methods for network-based prediction of essential genes/proteins and present the challenges and directions for future research.
Collapse
Affiliation(s)
- Xingyi Li
- School of Computer Science and Engineering, Central South University, Changsha, Hunan, China
| | - Wenkai Li
- School of Computer Science and Engineering, Central South University, Changsha, Hunan, China
| | - Min Zeng
- School of Computer Science and Engineering, Central South University, Changsha, Hunan, China
| | - Ruiqing Zheng
- School of Computer Science and Engineering, Central South University, Changsha, Hunan, China
| | - Min Li
- School of Computer Science and Engineering, Central South University, Changsha, Hunan, China
| |
Collapse
|
29
|
Waman VP, Vedithi SC, Thomas SE, Bannerman BP, Munir A, Skwark MJ, Malhotra S, Blundell TL. Mycobacterial genomics and structural bioinformatics: opportunities and challenges in drug discovery. Emerg Microbes Infect 2019; 8:109-118. [PMID: 30866765 PMCID: PMC6334779 DOI: 10.1080/22221751.2018.1561158] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2018] [Revised: 12/03/2018] [Accepted: 12/09/2018] [Indexed: 01/08/2023]
Abstract
Of the more than 190 distinct species of Mycobacterium genus, many are economically and clinically important pathogens of humans or animals. Among those mycobacteria that infect humans, three species namely Mycobacterium tuberculosis (causative agent of tuberculosis), Mycobacterium leprae (causative agent of leprosy) and Mycobacterium abscessus (causative agent of chronic pulmonary infections) pose concern to global public health. Although antibiotics have been successfully developed to combat each of these, the emergence of drug-resistant strains is an increasing challenge for treatment and drug discovery. Here we describe the impact of the rapid expansion of genome sequencing and genome/pathway annotations that have greatly improved the progress of structure-guided drug discovery. We focus on the applications of comparative genomics, metabolomics, evolutionary bioinformatics and structural proteomics to identify potential drug targets. The opportunities and challenges for the design of drugs for M. tuberculosis, M. leprae and M. abscessus to combat resistance are discussed.
Collapse
Affiliation(s)
| | | | | | | | - Asma Munir
- Department of Biochemistry, University of Cambridge, Cambridge, UK
| | - Marcin J. Skwark
- Department of Biochemistry, University of Cambridge, Cambridge, UK
| | - Sony Malhotra
- Institute of Structural and Molecular Biology, Department of Biological Sciences, Birkbeck College, University of London, London, UK
| | - Tom L. Blundell
- Department of Biochemistry, University of Cambridge, Cambridge, UK
| |
Collapse
|
30
|
Martínez-Carranza E, Barajas H, Alcaraz LD, Servín-González L, Ponce-Soto GY, Soberón-Chávez G. Variability of Bacterial Essential Genes Among Closely Related Bacteria: The Case of Escherichia coli. Front Microbiol 2018; 9:1059. [PMID: 29910775 PMCID: PMC5992433 DOI: 10.3389/fmicb.2018.01059] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2018] [Accepted: 05/04/2018] [Indexed: 11/23/2022] Open
Abstract
The definition of bacterial essential genes has been widely pursued using different approaches. Their study has impacted several fields of research such as synthetic biology, the construction of bacteria with minimal chromosomes, the search for new antibiotic targets, or the design of strains with biotechnological applications. Bacterial genomes are mosaics that only share a small subset of gene-sequences (core genome) even among members of the same species. It has been reported that the presence of essential genes is highly variable between closely related bacteria and even among members of the same species, due to the phenomenon known as “non-orthologous gene displacement” that refers to the coding for an essential function by genes with no sequence homology due to horizontal gene transfer (HGT). The existence of dormant forms among bacteria and the high incidence of HGT have been proposed to be driving forces of bacterial evolution, and they might have a role in the low level of conservation of essential genes among related bacteria by non-orthologous gene displacement, but this correlation has not been recognized. The aim of this mini-review is to give a brief overview of the approaches that have been taken to define and study essential genes, and the implications of non-orthologous gene displacement in bacterial evolution, focusing mainly in the case of Escherichia coli. To this end, we reviewed the available literature, and we searched for the presence of the essential genes defined by mutagenesis in the genomes of the 63 best-sequenced E. coli genomes that are available in NCBI database. We could not document specific cases of non-orthologous gene displacement among the E. coli strains analyzed, but we found that the quality of the genome-sequences in the database is not enough to make accurate predictions about the conservation of essential-genes among members of this bacterial species.
Collapse
Affiliation(s)
- Enrique Martínez-Carranza
- Departamento de Biología Molecular y Biotecnología, Instituto de Investigaciones Biomédicas, Universidad Nacional Autónoma de México, Mexico City, Mexico
| | - Hugo Barajas
- Departamento de Biología Celular, Facultad de Ciencias, Universidad Nacional Autónoma de México, Mexico City, Mexico
| | - Luis-David Alcaraz
- Departamento de Biología Celular, Facultad de Ciencias, Universidad Nacional Autónoma de México, Mexico City, Mexico
| | - Luis Servín-González
- Departamento de Biología Molecular y Biotecnología, Instituto de Investigaciones Biomédicas, Universidad Nacional Autónoma de México, Mexico City, Mexico
| | - Gabriel-Yaxal Ponce-Soto
- Departamento de Ecología Evolutiva, Instituto de Ecología, Universidad Nacional Autónoma de México, Mexico City, Mexico
| | - Gloria Soberón-Chávez
- Departamento de Biología Molecular y Biotecnología, Instituto de Investigaciones Biomédicas, Universidad Nacional Autónoma de México, Mexico City, Mexico
| |
Collapse
|