1
|
Rasool D, Jan SA, Khan SU, Nahid N, Ashfaq UA, Umar A, Qasim M, Noor F, Rehman A, Shahzadi K, Alshammari A, Alharbi M, Nisar MA. Subtractive proteomics-based vaccine targets annotation and reverse vaccinology approaches to identify multiepitope vaccine against Plesiomonas shigelloides. Heliyon 2024; 10:e31304. [PMID: 38845922 PMCID: PMC11153098 DOI: 10.1016/j.heliyon.2024.e31304] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Revised: 05/09/2024] [Accepted: 05/14/2024] [Indexed: 06/09/2024] Open
Abstract
Plesiomonas shigelloides, an aquatic bacterium belonging to the Enterobacteriaceae family, is a frequent cause of gastroenteritis with diarrhea and gastrointestinal severe disease. Despite decades of research, discovering a licensed and globally accessible vaccine is still years away. Developing a putative vaccine that can combat the Plesiomonas shigelloides infection by boosting population immunity against P. shigelloides is direly needed. In the framework of the current study, the entire proteome of P. shigelloides was explored using subtractive genomics integrated with the immunoinformatics approach for designing an effective vaccine construct against P. shigelloides. The overall stability of the vaccine construct was evaluated using molecular docking, which demonstrated that MEV showed higher binding affinities with toll-like receptors (TLR4: 51.5 ± 10.3, TLR2: 60.5 ± 9.2) and MHC receptors(MHCI: 79.7 ± 11.2 kcal/mol, MHCII: 70.4 ± 23.7). Further, the therapeutic efficacy of the vaccine construct for generating an efficient immune response was evaluated by computational immunological simulation. Finally, computer-based cloning and improvement in codon composition without altering amino acid sequence led to the development of a proposed vaccine. In a nutshell, the findings of this study add to the existing knowledge about the pathogenesis of this infection. The schemed MEV can be a possible prophylactic agent for individuals infected with P. shigelloides. Nevertheless, further authentication is required to guarantee its safeness and immunogenic potential.
Collapse
Affiliation(s)
- Danish Rasool
- Department of Bioinformatics and Biosciences, Capital University of Science and Technology, Islamabad, 44000, Pakistan
| | - Sohail Ahmad Jan
- Department of Bioinformatics and Biosciences, Capital University of Science and Technology, Islamabad, 44000, Pakistan
| | | | - Nazia Nahid
- Department of Bioinformatics and Biotechnology, Government College University Faisalabad, Faisalabad, 38000, Pakistan
| | - Usman Ali Ashfaq
- Department of Bioinformatics and Biotechnology, Government College University Faisalabad, Faisalabad, 38000, Pakistan
| | - Ahitsham Umar
- Department of Bioinformatics and Biotechnology, Government College University Faisalabad, Faisalabad, 38000, Pakistan
| | - Muhammad Qasim
- Department of Bioinformatics and Biotechnology, Government College University Faisalabad, Faisalabad, 38000, Pakistan
| | - Fatima Noor
- Department of Bioinformatics and Biotechnology, Government College University Faisalabad, Faisalabad, 38000, Pakistan
| | - Abdur Rehman
- Center of Bioinformatics, College of Life Sciences, Northwest A & F Uiversity, yangling, 712100, Shaanxi, China
| | - Kiran Shahzadi
- Department of Bioinformatics and Biotechnology, Government College University Faisalabad, Faisalabad, 38000, Pakistan
| | - Abdulrahman Alshammari
- Department of Pharmacology and Toxicology, College of Pharmacy, King Saud University, Post Box 2455, Riyadh, 11451, Saudi Arabia
| | - Metab Alharbi
- Department of Pharmacology and Toxicology, College of Pharmacy, King Saud University, Post Box 2455, Riyadh, 11451, Saudi Arabia
| | - Muhammad Atif Nisar
- College of Science and Engineering, Flinders University, Adelaide, 5042, Australia
| |
Collapse
|
2
|
Mukul Das M, Sarkar K. Evaluation of machine learning classifiers for predicting essential genes in Mycobacterium tuberculosis strains. Bioinformation 2022; 18:1126-1130. [PMID: 37701504 PMCID: PMC10492903 DOI: 10.6026/973206300181126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Revised: 12/20/2022] [Accepted: 12/31/2022] [Indexed: 09/14/2023] Open
Abstract
Accurate investigation and prediction of essential genes from bacterial genome is very important as it might be explored in effective targets for antimicrobial drugs and understanding biological mechanism of a cell. A subset of key features data obtained from 14 genome sequence-based features of 20 strains of Mycobacterium tuberculosis bacteria whose essential gene information was downloaded from ePath and NCBI database for mapping and matching essential genes by using a genome extraction program. The selection of key features was performed by using Genetic Algorithm. For each of three classifiers, 80%, 10% and 10% of subset key features were used for training, validation and testing, respectively. Experimental results (10-f-cv) illustrated that DNN (proposed), DT, and SVM achieved AUC of 0.98, 0.88 and 0.82, respectively. DNN (proposed) outperformed DT and SVM. The higher prediction accuracy of classifiers was observed because of using only key features which also justified better generalizability of classifiers and efficiency of key features related to gene essentiality. Besides, DNN (proposed) also showed best prediction performance while compared with other predictors used in previous studies. The genome extraction program was developed for mapping and matching of essential genes between ePath and NCBI database.
Collapse
Affiliation(s)
- Monish Mukul Das
- Department of Computer Science and Engineering, University of Kalyani, Kalyani, Nadia - 741235
| | - Keka Sarkar
- Department of Microbiology, University of Kalyani, Kalyani, Nadia - 741235
| |
Collapse
|
3
|
LeBlanc N, Charles TC. Bacterial genome reductions: Tools, applications, and challenges. Front Genome Ed 2022; 4:957289. [PMID: 36120530 PMCID: PMC9473318 DOI: 10.3389/fgeed.2022.957289] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2022] [Accepted: 07/29/2022] [Indexed: 11/16/2022] Open
Abstract
Bacterial cells are widely used to produce value-added products due to their versatility, ease of manipulation, and the abundance of genome engineering tools. However, the efficiency of producing these desired biomolecules is often hindered by the cells’ own metabolism, genetic instability, and the toxicity of the product. To overcome these challenges, genome reductions have been performed, making strains with the potential of serving as chassis for downstream applications. Here we review the current technologies that enable the design and construction of such reduced-genome bacteria as well as the challenges that limit their assembly and applicability. While genomic reductions have shown improvement of many cellular characteristics, a major challenge still exists in constructing these cells efficiently and rapidly. Computational tools have been created in attempts at minimizing the time needed to design these organisms, but gaps still exist in modelling these reductions in silico. Genomic reductions are a promising avenue for improving the production of value-added products, constructing chassis cells, and for uncovering cellular function but are currently limited by their time-consuming construction methods. With improvements to and the creation of novel genome editing tools and in silico models, these approaches could be combined to expedite this process and create more streamlined and efficient cell factories.
Collapse
Affiliation(s)
- Nicole LeBlanc
- Department of Biology, University of Waterloo, Waterloo, ON, Canada
- *Correspondence: Nicole LeBlanc,
| | - Trevor C. Charles
- Department of Biology, University of Waterloo, Waterloo, ON, Canada
- Metagenom Bio Life Science Inc., Waterloo, ON, Canada
| |
Collapse
|
4
|
Powell‐Romero F, Fountain‐Jones NM, Norberg A, Clark NJ. Improving the predictability and interpretability of co‐occurrence modelling through feature‐based joint species distribution ensembles. Methods Ecol Evol 2022. [DOI: 10.1111/2041-210x.13915] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Affiliation(s)
| | | | - Anna Norberg
- Centre for Biodiversity Dynamics, Department of Biology Norwegian University of Science and Technology Trondheim Norway
| | - Nicholas J. Clark
- School of Veterinary Science The University of Queensland Gatton Qld Australia
| |
Collapse
|
5
|
Anjos WF, Lanes GC, Azevedo VA, Santos AR. GENPPI: standalone software for creating protein interaction networks from genomes. BMC Bioinformatics 2021; 22:596. [PMID: 34915867 PMCID: PMC8680239 DOI: 10.1186/s12859-021-04501-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2021] [Accepted: 11/30/2021] [Indexed: 11/30/2022] Open
Abstract
BackGround Bacterial genomes are being deposited into online databases at an increasing rate. Genome annotation represents one of the first efforts to understand organisms and their diseases. Some evolutionary relationships capable of being annotated only from genomes are conserved gene neighbourhoods (CNs), phylogenetic profiles (PPs), and gene fusions. At present, there is no standalone software that enables networks of interactions among proteins to be created using these three evolutionary characteristics with efficient and effective results. Results We developed GENPPI software for the ab initio prediction of interaction networks using predicted proteins from a genome. In our case study, we employed 50 genomes of the genus Corynebacterium. Based on the PP relationship, GENPPI differentiated genomes between the ovis and equi biovars of the species Corynebacterium pseudotuberculosis and created groups among the other species analysed. If we inspected only the CN relationship, we could not entirely separate biovars, only species. Our software GENPPI was determined to be efficient because, for example, it creates interaction networks from the central genomes of 50 species/lineages with an average size of 2200 genes in less than 40 min on a conventional computer. Moreover, the interaction networks that our software creates reflect correct evolutionary relationships between species, which we confirmed with average nucleotide identity analyses. Additionally, this software enables the user to define how he or she intends to explore the PP and CN characteristics through various parameters, enabling the creation of customized interaction networks. For instance, users can set parameters regarding the genus, metagenome, or pangenome. In addition to the parameterization of GENPPI, it is also the user’s choice regarding which set of genomes they are going to study. Conclusions GENPPI can help fill the gap concerning the considerable number of novel genomes assembled monthly and our ability to process interaction networks considering the noncore genes for all completed genome versions. With GENPPI, a user dictates how many and how evolutionarily correlated the genomes answer a scientific query.
Collapse
Affiliation(s)
- William F Anjos
- Department of Computer Science, Federal University of Uberlândia, Uberlândia, Brazil
| | - Gabriel C Lanes
- Biology Institute, Federal University of Uberlândia, Uberlândia, Brazil
| | - Vasco A Azevedo
- Department of Genetics, Federal University of Minas Gerais, Belo Horizonte, Brazil
| | - Anderson R Santos
- Department of Computer Science, Federal University of Uberlândia, Uberlândia, Brazil.
| |
Collapse
|
6
|
Beder T, Aromolaran O, Dönitz J, Tapanelli S, Adedeji E, Adebiyi E, Bucher G, Koenig R. Identifying essential genes across eukaryotes by machine learning. NAR Genom Bioinform 2021; 3:lqab110. [PMID: 34859210 PMCID: PMC8634067 DOI: 10.1093/nargab/lqab110] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2021] [Revised: 10/09/2021] [Accepted: 11/29/2021] [Indexed: 02/07/2023] Open
Abstract
Identifying essential genes on a genome scale is resource intensive and has been performed for only a few eukaryotes. For less studied organisms essentiality might be predicted by gene homology. However, this approach cannot be applied to non-conserved genes. Additionally, divergent essentiality information is obtained from studying single cells or whole, multi-cellular organisms, and particularly when derived from human cell line screens and human population studies. We employed machine learning across six model eukaryotes and 60 381 genes, using 41 635 features derived from the sequence, gene function information and network topology. Within a leave-one-organism-out cross-validation, the classifiers showed high generalizability with an average accuracy close to 80% in the left-out species. As a case study, we applied the method to Tribolium castaneum and Bombyx mori and validated predictions experimentally yielding similar performances. Finally, using the classifier based on the studied model organisms enabled linking the essentiality information of human cell line screens and population studies.
Collapse
Affiliation(s)
- Thomas Beder
- Integrated Research and Treatment Center, Center for Sepsis Control and Care (CSCC), Jena University Hospital, Am Klinikum 1, 07747 Jena, Germany
- Institute of Infectious Diseases and Infection Control, Jena University Hospital, Am Klinikum 1, 07747 Jena, Germany
- Department of Internal Medicine II, University Medical Center Schleswig-Holstein, Campus Kiel, 24105 Kiel, Germany
| | - Olufemi Aromolaran
- Department of Computer & Information Sciences, Covenant University, Ota, Ogun State, Nigeria
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, Nigeria
| | - Jürgen Dönitz
- Department of Evolutionary Developmental Genetics, GZMB, University of Göttingen, Justus-von-Liebig-Weg 11, 37077 Göttingen, Germany
- Department of Medical Bioinformatics, University Medical Center Göttingen (UMG), 37099 Göttingen, Germany
| | - Sofia Tapanelli
- Department of Life Sciences, Imperial College London, London SW7 2AZ, UK
| | - Eunice O Adedeji
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, Nigeria
- Department of Biochemistry, Covenant University, Ota, Ogun State, Nigeria
| | - Ezekiel Adebiyi
- Department of Computer & Information Sciences, Covenant University, Ota, Ogun State, Nigeria
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, Nigeria
| | - Gregor Bucher
- Department of Evolutionary Developmental Genetics, GZMB, University of Göttingen, Justus-von-Liebig-Weg 11, 37077 Göttingen, Germany
| | - Rainer Koenig
- Integrated Research and Treatment Center, Center for Sepsis Control and Care (CSCC), Jena University Hospital, Am Klinikum 1, 07747 Jena, Germany
- Institute of Infectious Diseases and Infection Control, Jena University Hospital, Am Klinikum 1, 07747 Jena, Germany
| |
Collapse
|
7
|
Umar A, Haque A, Alghamdi YS, Mashraqi MM, Rehman A, Shahid F, Khurshid M, Ashfaq UA. Development of a Candidate Multi-Epitope Subunit Vaccine against Klebsiella aerogenes: Subtractive Proteomics and Immuno-Informatics Approach. Vaccines (Basel) 2021; 9:vaccines9111373. [PMID: 34835304 PMCID: PMC8624419 DOI: 10.3390/vaccines9111373] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2021] [Revised: 11/03/2021] [Accepted: 11/11/2021] [Indexed: 12/17/2022] Open
Abstract
Klebsiella aerogenes is a Gram-negative bacterium which has gained considerable importance in recent years. It is involved in 10% of nosocomial and community-acquired urinary tract infections and 12% of hospital-acquired pneumonia. This organism has an intrinsic ability to produce inducible chromosomal AmpC beta-lactamases, which confer high resistance. The drug resistance in K. aerogenes has been reported in China, Israel, Poland, Italy and the United States, with a high mortality rate (~50%). This study aims to combine immunological approaches with molecular docking approaches for three highly antigenic proteins to design vaccines against K. aerogenes. The synthesis of the B-cell, T-cell (CTL and HTL) and IFN-γ epitopes of the targeted proteins was performed and most conserved epitopes were chosen for future research studies. The vaccine was predicted by connecting the respective epitopes, i.e., B cells, CTL and HTL with KK, AAY and GPGPG linkers and all these were connected with N-terminal adjuvants with EAAAK linker. The humoral response of the constructed vaccine was measured through IFN-γ and B-cell epitopes. Before being used as vaccine candidate, all identified B-cell, HTL and CTL epitopes were tested for antigenicity, allergenicity and toxicity to check the safety profiles of our vaccine. To find out the compatibility of constructed vaccine with receptors, MHC-I, followed by MHC-II and TLR4 receptors, was docked with the vaccine. Lastly, in order to precisely certify the proper expression and integrity of our construct, in silico cloning was carried out. Further studies are needed to confirm the safety features and immunogenicity of the vaccine.
Collapse
Affiliation(s)
- Ahitsham Umar
- Department of Bioinformatics and Biotechnology, Government College University Faisalabad, Faisalabad 38000, Pakistan; (A.U.); (A.H.); (A.R.); (F.S.)
| | - Asma Haque
- Department of Bioinformatics and Biotechnology, Government College University Faisalabad, Faisalabad 38000, Pakistan; (A.U.); (A.H.); (A.R.); (F.S.)
| | - Youssef Saeed Alghamdi
- Department of Biology, Turabah University College, Taif University, Taif 21944, Saudi Arabia;
| | - Mutaib M Mashraqi
- Department of Clinical Laboratory Sciences, College of Applied Medical Science, Najran University, Najran 61441, Saudi Arabia;
| | - Abdur Rehman
- Department of Bioinformatics and Biotechnology, Government College University Faisalabad, Faisalabad 38000, Pakistan; (A.U.); (A.H.); (A.R.); (F.S.)
| | - Farah Shahid
- Department of Bioinformatics and Biotechnology, Government College University Faisalabad, Faisalabad 38000, Pakistan; (A.U.); (A.H.); (A.R.); (F.S.)
| | - Mohsin Khurshid
- Department of Microbiology, Government College University Faisalabad, Faisalabad 38000, Pakistan;
| | - Usman Ali Ashfaq
- Department of Bioinformatics and Biotechnology, Government College University Faisalabad, Faisalabad 38000, Pakistan; (A.U.); (A.H.); (A.R.); (F.S.)
- Correspondence:
| |
Collapse
|
8
|
Her HL, Lin PT, Wu YW. PangenomeNet: a pan-genome-based network reveals functional modules on antimicrobial resistome for Escherichia coli strains. BMC Bioinformatics 2021; 22:548. [PMID: 34758735 PMCID: PMC8579557 DOI: 10.1186/s12859-021-04459-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2021] [Accepted: 10/19/2021] [Indexed: 11/30/2022] Open
Abstract
BACKGROUND Discerning genes crucial to antimicrobial resistance (AMR) mechanisms is becoming more and more important to accurately and swiftly identify AMR pathogenic strains. Pangenome-wide association studies (e.g. Scoary) identified numerous putative AMR genes. However, only a tiny proportion of the putative resistance genes are annotated by AMR databases or Gene Ontology. In addition, many putative resistance genes are of unknown function (termed hypothetical proteins). An annotation tool is crucially needed in order to reveal the functional organization of the resistome and expand our knowledge of the AMR gene repertoire. RESULTS We developed an approach (PangenomeNet) for building co-functional networks from pan-genomes to infer functions for hypothetical genes. Using Escherichia coli as an example, we demonstrated that it is possible to build co-functional network from its pan-genome using co-inheritance, domain-sharing, and protein-protein-interaction information. The investigation of the network revealed that it fits the characteristics of biological networks and can be used for functional inferences. The subgraph consisting of putative meropenem resistance genes consists of clusters of stress response genes and resistance gene acquisition pathways. Resistome subgraphs also demonstrate drug-specific AMR genes such as beta-lactamase, as well as functional roles shared among multiple classes of drugs, mostly in the stress-related pathways. CONCLUSIONS By demonstrating the idea of pan-genome-based co-functional network on the E. coli species, we showed that the network can infer functional roles of the genes, including those without functional annotations, and provides holistic views on the putative antimicrobial resistomes. We hope that the pan-genome network idea can help formulate hypothesis for targeted experimental works.
Collapse
Affiliation(s)
- Hsuan-Lin Her
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA, 92093, USA
| | - Po-Ting Lin
- Department of Mechanical Engineering, National Taiwan University of Science and Technology, No.43, Keelung Rd., Sec.4, Da'an Dist., Taipei City, 10609, Taiwan.
- Center for Cyber-Physical System Innovation, National Taiwan University of Science and Technology, Taipei, 10609, Taiwan.
| | - Yu-Wei Wu
- Graduate Institute of Biomedical Informatics, College of Medical Science and Technology, Taipei Medical University, 250, Wuxing St., Sinyi District, Taipei, 11031, Taiwan.
- Clinical Big Data Research Center, Taipei Medical University Hospital, Taipei, 11031, Taiwan.
| |
Collapse
|
9
|
Aldakheel FM, Abrar A, Munir S, Aslam S, Allemailem KS, Khurshid M, Ashfaq UA. Proteome-Wide Mapping and Reverse Vaccinology Approaches to Design a Multi-Epitope Vaccine against Clostridium perfringens. Vaccines (Basel) 2021; 9:1079. [PMID: 34696187 PMCID: PMC8539331 DOI: 10.3390/vaccines9101079] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2021] [Revised: 09/16/2021] [Accepted: 09/20/2021] [Indexed: 12/30/2022] Open
Abstract
C. perfringens is a highly versatile bacteria of livestock and humans, causing enteritis (a common food-borne illness in humans), enterotoxaemia (in which toxins are formed in the intestine which damage and destroy organs, i.e., the brain), and gangrene (wound infection). There is no particular cure for the toxins of C. perfringens. Supportive care (medical control of pain, intravenous fluids) is the standard treatment. Therefore, a multiple-epitope vaccine (MEV) should be designed to battle against C. perfringens infection. Furthermore, the main objective of this in silico investigation is to design an MEV that targets C. perfringens. For this purpose, we selected the top three proteins that were highly antigenic using immuno-informatics approaches, including molecular docking. B-cells, IFN-gamma, and T cells for target proteins were predicted and the most conserved epitopes were selected for further investigation. For the development of the final MEV, epitopes of LBL5, CTL17, and HTL13 were linked to GPGPG, AAY, and KK linkers. The vaccine N-end was joined to an adjuvant through an EAAK linker to improve immunogenicity. After the attachment of linkers and adjuvants, the final construct was 415 amino acids. B-cell and IFN-gamma epitopes demonstrate that the model structure is enhanced for humoral and cellular immune responses. To validate the immunogenicity and safety of the final construct, various physicochemical properties, and other properties such as antigenicity and non-allergens, were evaluated. Furthermore, molecular docking was carried out for verification of vaccine compatibility with the receptor, evaluated in silico. Also, in silico cloning was employed for the verification of the proper expression and credibility of the construct.
Collapse
Affiliation(s)
- Fahad M. Aldakheel
- Department of Clinical Laboratory Sciences, College of Applied Medical Sciences, King Saud University, Riyadh 11564, Saudi Arabia;
| | - Amna Abrar
- Department of Bioinformatics and Biotechnology, Government College University, Faisalabad 38000, Pakistan; (A.A.); (S.M.); (S.A.)
| | - Samman Munir
- Department of Bioinformatics and Biotechnology, Government College University, Faisalabad 38000, Pakistan; (A.A.); (S.M.); (S.A.)
| | - Sehar Aslam
- Department of Bioinformatics and Biotechnology, Government College University, Faisalabad 38000, Pakistan; (A.A.); (S.M.); (S.A.)
| | - Khaled S. Allemailem
- Department of Medical Laboratories, College of Applied Medical Sciences, Qassim University, Buraydah 51452, Saudi Arabia;
| | - Mohsin Khurshid
- Department of Microbiology, Government College University, Faisalabad 38000, Pakistan;
| | - Usman Ali Ashfaq
- Department of Bioinformatics and Biotechnology, Government College University, Faisalabad 38000, Pakistan; (A.A.); (S.M.); (S.A.)
| |
Collapse
|
10
|
Senthamizhan V, Ravindran B, Raman K. NetGenes: A Database of Essential Genes Predicted Using Features From Interaction Networks. Front Genet 2021; 12:722198. [PMID: 34630517 PMCID: PMC8495214 DOI: 10.3389/fgene.2021.722198] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2021] [Accepted: 08/16/2021] [Indexed: 11/26/2022] Open
Abstract
Essential gene prediction models built so far are heavily reliant on sequence-based features, and the scope of network-based features has been narrow. Previous work from our group demonstrated the importance of using network-based features for predicting essential genes with high accuracy. Here, we apply our approach for the prediction of essential genes to organisms from the STRING database and host the results in a standalone website. Our database, NetGenes, contains essential gene predictions for 2,700+ bacteria predicted using features derived from STRING protein-protein functional association networks. Housing a total of over 2.1 million genes, NetGenes offers various features like essentiality scores, annotations, and feature vectors for each gene. NetGenes database is available from https://rbc-dsai-iitm.github.io/NetGenes/.
Collapse
Affiliation(s)
- Vimaladhasan Senthamizhan
- Centre for Integrative Biology and Systems mEdicine (IBSE), Indian Institute of Technology (IIT) Madras, Chennai, India
- Robert Bosch Center for Data Science and Artificial Intelligence (RBCDSAI), IIT Madras, Chennai, India
| | - Balaraman Ravindran
- Centre for Integrative Biology and Systems mEdicine (IBSE), Indian Institute of Technology (IIT) Madras, Chennai, India
- Robert Bosch Center for Data Science and Artificial Intelligence (RBCDSAI), IIT Madras, Chennai, India
- Department of Computer Science and Engineering, IIT Madras, Chennai, India
| | - Karthik Raman
- Centre for Integrative Biology and Systems mEdicine (IBSE), Indian Institute of Technology (IIT) Madras, Chennai, India
- Robert Bosch Center for Data Science and Artificial Intelligence (RBCDSAI), IIT Madras, Chennai, India
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, IIT Madras, Chennai, India
| |
Collapse
|
11
|
DELEAT: gene essentiality prediction and deletion design for bacterial genome reduction. BMC Bioinformatics 2021; 22:444. [PMID: 34537011 PMCID: PMC8449488 DOI: 10.1186/s12859-021-04348-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2021] [Accepted: 08/26/2021] [Indexed: 11/10/2022] Open
Abstract
Background The study of gene essentiality is fundamental to understand the basic principles of life, as well as for applications in many fields. In recent decades, dozens of sets of essential genes have been determined using different experimental and bioinformatics approaches, and this information has been useful for genome reduction of model organisms. Multiple in silico strategies have been developed to predict gene essentiality, but no optimal algorithm or set of gene features has been found yet, especially for non-model organisms with incomplete functional annotation. Results We have developed DELEAT v0.1 (DELetion design by Essentiality Analysis Tool), an easy-to-use bioinformatic tool which integrates an in silico gene essentiality classifier in a pipeline allowing automatic design of large-scale deletions in any bacterial genome. The essentiality classifier consists of a novel logistic regression model based on only six gene features which are not dependent on experimental data or functional annotation. As a proof of concept, we have applied this pipeline to the determination of dispensable regions in the genome of Bartonella quintana str. Toulouse. In this already reduced genome, 35 possible deletions have been delimited, spanning 29% of the genome. Conclusions Built on in silico gene essentiality predictions, we have developed an analysis pipeline which assists researchers throughout multiple stages of bacterial genome reduction projects, and created a novel classifier which is simple, fast, and universally applicable to any bacterial organism with a GenBank annotation file. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04348-5.
Collapse
|
12
|
Liu X, Luo Y, He T, Ren M, Xu Y. Predicting essential genes of 37 prokaryotes by combining information-theoretic features. J Microbiol Methods 2021; 188:106297. [PMID: 34343487 DOI: 10.1016/j.mimet.2021.106297] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2021] [Revised: 07/30/2021] [Accepted: 07/30/2021] [Indexed: 10/20/2022]
Abstract
Essential genes are required for the reproduction and survival of an organism. Rapid identification of essential genes has practical application value in biomedicine. Information theory is a discipline that studies information transmission. Based on the similarity between heredity and information transmission, measures derived from information theory can be applied to genetic sequence analysis on different scales. In this study, we employed 114 features extracted by information theory methods to construct an essential gene prediction model. We applied a backpropagation neural network to construct a classifier and employed it to predict essential genes of 37 prokaryotes. The performance of the classifier was evaluated by applying intra-organism prediction and leave-one-species-out prediction. Among 37 prokaryotes, intra-organism prediction and leave-one-species-out prediction yielded average AUC scores of 0.791 and 0.717, respectively. Considering the potential redundancy in the feature set, we performed feature selection and constructed a key feature subset. In the above two prediction methods, the average AUC scores of 37 organisms obtained by using key features were 0.786 and 0.714, respectively. The results show the potential and universality of information-theoretic features in the study of prokaryotic essential gene prediction.
Collapse
Affiliation(s)
- Xiao Liu
- School of Microelectronics and Communication Engineering, Chongqing University, 174 ShaPingBa District, Chongqing 400044, China.
| | - Yachuan Luo
- School of Microelectronics and Communication Engineering, Chongqing University, 174 ShaPingBa District, Chongqing 400044, China
| | - Ting He
- School of Microelectronics and Communication Engineering, Chongqing University, 174 ShaPingBa District, Chongqing 400044, China
| | - Meixiang Ren
- School of Microelectronics and Communication Engineering, Chongqing University, 174 ShaPingBa District, Chongqing 400044, China
| | - Yuqiao Xu
- School of Microelectronics and Communication Engineering, Chongqing University, 174 ShaPingBa District, Chongqing 400044, China
| |
Collapse
|
13
|
de Souza ID, Reis CF, Morais DAA, Fernandes VGS, Cavalcante JVF, Dalmolin RJS. Ancestry analysis indicates two different sets of essential genes in eukaryotic model species. Funct Integr Genomics 2021; 21:523-531. [PMID: 34279742 DOI: 10.1007/s10142-021-00794-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2020] [Revised: 06/02/2021] [Accepted: 06/10/2021] [Indexed: 11/28/2022]
Abstract
Essential genes are so-called because they are crucial for organism perpetuation. Those genes are usually related to essential functions to cellular metabolism or multicellular homeostasis. Deleterious alterations on essential genes produce a spectrum of phenotypes in multicellular organisms. The effects range from the impairment of the fertilization process, disruption of fetal development, to loss of reproductive capacity. Essential genes are described as more evolutionarily conserved than non-essential genes. However, there is no consensus about the relationship between gene essentiality and gene age. Here, we identified essential genes in five model eukaryotic species (Saccharomyces cerevisiae, Schizosaccharomyces pombe, Drosophila melanogaster, Caenorhabditis elegans, and Mus musculus) and estimate their evolutionary ancestry and their network properties. We observed that essential genes, on average, are older than other genes in all species investigated. The relationship of network properties and gene essentiality convey with previous findings, showing essential genes as important nodes in biological networks. As expected, we also observed that essential orthologs shared by the five species evaluated here are old. However, all the species evaluated here have a specific set of young essential genes not shared among them. Additionally, these two groups of essential genes are involved with distinct biological functions, suggesting two sets of essential genes: (i) a set of old essential genes common to all the evaluated species, regulating basic cellular functions, and (ii) a set of young essential genes exclusive to each species, which perform specific essential functions in each species.
Collapse
Affiliation(s)
- Iara D de Souza
- Bioinformatics Multidisciplinary Environment - IMD, Federal University of Rio Grande Do Norte, Av. Odilon Gomes de Lima, 1722, Capim Macio, Natal, RN, 59078-400, Brazil
| | - Clovis F Reis
- Bioinformatics Multidisciplinary Environment - IMD, Federal University of Rio Grande Do Norte, Av. Odilon Gomes de Lima, 1722, Capim Macio, Natal, RN, 59078-400, Brazil
| | - Diego A A Morais
- Bioinformatics Multidisciplinary Environment - IMD, Federal University of Rio Grande Do Norte, Av. Odilon Gomes de Lima, 1722, Capim Macio, Natal, RN, 59078-400, Brazil
| | - Vítor G S Fernandes
- Bioinformatics Multidisciplinary Environment - IMD, Federal University of Rio Grande Do Norte, Av. Odilon Gomes de Lima, 1722, Capim Macio, Natal, RN, 59078-400, Brazil
| | - João Vitor F Cavalcante
- Bioinformatics Multidisciplinary Environment - IMD, Federal University of Rio Grande Do Norte, Av. Odilon Gomes de Lima, 1722, Capim Macio, Natal, RN, 59078-400, Brazil
| | - Rodrigo J S Dalmolin
- Bioinformatics Multidisciplinary Environment - IMD, Federal University of Rio Grande Do Norte, Av. Odilon Gomes de Lima, 1722, Capim Macio, Natal, RN, 59078-400, Brazil. .,Department of Biochemistry - CB, Federal University of Rio Grande Do Norte, Campus Universitário UFRN, Lagoa Nova, Natal, RN, 59078-970, Brazil.
| |
Collapse
|
14
|
Nandi S, Ganguli P, Sarkar RR. Essential gene prediction using limited gene essentiality information-An integrative semi-supervised machine learning strategy. PLoS One 2020; 15:e0242943. [PMID: 33253254 PMCID: PMC7703937 DOI: 10.1371/journal.pone.0242943] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2020] [Accepted: 11/12/2020] [Indexed: 11/24/2022] Open
Abstract
Essential gene prediction helps to find minimal genes indispensable for the survival of any organism. Machine learning (ML) algorithms have been useful for the prediction of gene essentiality. However, currently available ML pipelines perform poorly for organisms with limited experimental data. The objective is the development of a new ML pipeline to help in the annotation of essential genes of less explored disease-causing organisms for which minimal experimental data is available. The proposed strategy combines unsupervised feature selection technique, dimension reduction using the Kamada-Kawai algorithm, and semi-supervised ML algorithm employing Laplacian Support Vector Machine (LapSVM) for prediction of essential and non-essential genes from genome-scale metabolic networks using very limited labeled dataset. A novel scoring technique, Semi-Supervised Model Selection Score, equivalent to area under the ROC curve (auROC), has been proposed for the selection of the best model when supervised performance metrics calculation is difficult due to lack of data. The unsupervised feature selection followed by dimension reduction helped to observe a distinct circular pattern in the clustering of essential and non-essential genes. LapSVM then created a curve that dissected this circle for the classification and prediction of essential genes with high accuracy (auROC > 0.85) even with 1% labeled data for model training. After successful validation of this ML pipeline on both Eukaryotes and Prokaryotes that show high accuracy even when the labeled dataset is very limited, this strategy is used for the prediction of essential genes of organisms with inadequate experimentally known data, such as Leishmania sp. Using a graph-based semi-supervised machine learning scheme, a novel integrative approach has been proposed for essential gene prediction that shows universality in application to both Prokaryotes and Eukaryotes with limited labeled data. The essential genes predicted using the pipeline provide an important lead for the prediction of gene essentiality and identification of novel therapeutic targets for antibiotic and vaccine development against disease-causing parasites.
Collapse
Affiliation(s)
- Sutanu Nandi
- Chemical Engineering and Process Development, CSIR-National Chemical Laboratory, Pune, Maharashtra, India
- Academy of Scientific & Innovative Research (AcSIR), Ghaziabad, India
| | - Piyali Ganguli
- Chemical Engineering and Process Development, CSIR-National Chemical Laboratory, Pune, Maharashtra, India
- Academy of Scientific & Innovative Research (AcSIR), Ghaziabad, India
| | - Ram Rup Sarkar
- Chemical Engineering and Process Development, CSIR-National Chemical Laboratory, Pune, Maharashtra, India
- Academy of Scientific & Innovative Research (AcSIR), Ghaziabad, India
| |
Collapse
|
15
|
Le NQK, Do DT, Hung TNK, Lam LHT, Huynh TT, Nguyen NTK. A Computational Framework Based on Ensemble Deep Neural Networks for Essential Genes Identification. Int J Mol Sci 2020; 21:E9070. [PMID: 33260643 PMCID: PMC7730808 DOI: 10.3390/ijms21239070] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2020] [Revised: 11/25/2020] [Accepted: 11/26/2020] [Indexed: 01/13/2023] Open
Abstract
Essential genes contain key information of genomes that could be the key to a comprehensive understanding of life and evolution. Because of their importance, studies of essential genes have been considered a crucial problem in computational biology. Computational methods for identifying essential genes have become increasingly popular to reduce the cost and time-consumption of traditional experiments. A few models have addressed this problem, but performance is still not satisfactory because of high dimensional features and the use of traditional machine learning algorithms. Thus, there is a need to create a novel model to improve the predictive performance of this problem from DNA sequence features. This study took advantage of a natural language processing (NLP) model in learning biological sequences by treating them as natural language words. To learn the NLP features, a supervised learning model was consequentially employed by an ensemble deep neural network. Our proposed method could identify essential genes with sensitivity, specificity, accuracy, Matthews correlation coefficient (MCC), and area under the receiver operating characteristic curve (AUC) values of 60.2%, 84.6%, 76.3%, 0.449, and 0.814, respectively. The overall performance outperformed the single models without ensemble, as well as the state-of-the-art predictors on the same benchmark dataset. This indicated the effectiveness of the proposed method in determining essential genes, in particular, and other sequencing problems, in general.
Collapse
Affiliation(s)
- Nguyen Quoc Khanh Le
- Professional Master Program in Artificial Intelligence in Medicine, College of Medicine, Taipei Medical University, Taipei 106, Taiwan
- Research Center for Artificial Intelligence in Medicine, Taipei Medical University, Taipei 106, Taiwan
- Translational Imaging Research Center, Taipei Medical University Hospital, Taipei 110, Taiwan
| | - Duyen Thi Do
- Graduate Institute of Biomedical Informatics, Taipei Medical University, Taipei 106, Taiwan;
| | - Truong Nguyen Khanh Hung
- International Master/Ph.D. Program in Medicine, College of Medicine, Taipei Medical University, Taipei 110, Taiwan; (T.N.K.H.); (L.H.T.L.)
- Department of Orthopedic and Trauma, Cho Ray Hospital, Ho Chi Minh 70000, Vietnam
| | - Luu Ho Thanh Lam
- International Master/Ph.D. Program in Medicine, College of Medicine, Taipei Medical University, Taipei 110, Taiwan; (T.N.K.H.); (L.H.T.L.)
- Intensive Care Unit, Children’s Hospital 2, Ho Chi Minh 70000, Vietnam
| | - Tuan-Tu Huynh
- Department of Electrical Engineering, Yuan Ze University, Taoyuan 320, Taiwan;
- Department of Electrical Electronic and Mechanical Engineering, Lac Hong University, Dong Nai 76120, Vietnam
| | - Ngan Thi Kim Nguyen
- School of Nutrition and Health Sciences, Taipei Medical University, Taipei 110, Taiwan;
| |
Collapse
|
16
|
Abstract
BACKGROUND Essential genes are those genes that are critical for the survival of an organism. The prediction of essential genes in bacteria can provide targets for the design of novel antibiotic compounds or antimicrobial strategies. RESULTS We propose a deep neural network for predicting essential genes in microbes. Our architecture called DEEPLYESSENTIAL makes minimal assumptions about the input data (i.e., it only uses gene primary sequence and the corresponding protein sequence) to carry out the prediction thus maximizing its practical application compared to existing predictors that require structural or topological features which might not be readily available. We also expose and study a hidden performance bias that effected previous classifiers. Extensive results show that DEEPLYESSENTIAL outperform existing classifiers that either employ down-sampling to balance the training set or use clustering to exclude multiple copies of orthologous genes. CONCLUSION Deep neural network architectures can efficiently predict whether a microbial gene is essential (or not) using only its sequence information.
Collapse
Affiliation(s)
- Md Abid Hasan
- Department of Computer Science and Engineering, University of California Riverside, 900 University Ave, Riverside, 92507 CA USA
| | - Stefano Lonardi
- Department of Computer Science and Engineering, University of California Riverside, 900 University Ave, Riverside, 92507 CA USA
| |
Collapse
|
17
|
Khorsand B, Savadi A, Naghibzadeh M. Comprehensive host-pathogen protein-protein interaction network analysis. BMC Bioinformatics 2020; 21:400. [PMID: 32912135 PMCID: PMC7488060 DOI: 10.1186/s12859-020-03706-z] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2020] [Accepted: 07/31/2020] [Indexed: 12/27/2022] Open
Abstract
BACKGROUND Infectious diseases are a cruel assassin with millions of victims around the world each year. Understanding infectious mechanism of viruses is indispensable for their inhibition. One of the best ways of unveiling this mechanism is to investigate the host-pathogen protein-protein interaction network. In this paper we try to disclose many properties of this network. We focus on human as host and integrate experimentally 32,859 interaction between human proteins and virus proteins from several databases. We investigate different properties of human proteins targeted by virus proteins and find that most of them have a considerable high centrality scores in human intra protein-protein interaction network. Investigating human proteins network properties which are targeted by different virus proteins can help us to design multipurpose drugs. RESULTS As host-pathogen protein-protein interaction network is a bipartite network and centrality measures for this type of networks are scarce, we proposed seven new centrality measures for analyzing bipartite networks. Applying them to different virus strains reveals unrandomness of attack strategies of virus proteins which could help us in drug design hence elevating the quality of life. They could also be used in detecting host essential proteins. Essential proteins are those whose functions are critical for survival of its host. One of the proposed centralities named diversity of predators, outperforms the other existing centralities in terms of detecting essential proteins and could be used as an optimal essential proteins' marker. CONCLUSIONS Different centralities were applied to analyze human protein-protein interaction network and to detect characteristics of human proteins targeted by virus proteins. Moreover, seven new centralities were proposed to analyze host-pathogen protein-protein interaction network and to detect pathogens' favorite host protein victims. Comparing different centralities in detecting essential proteins reveals that diversity of predator (one of the proposed centralities) is the best essential protein marker.
Collapse
Affiliation(s)
- Babak Khorsand
- Computer Engineering Department, Faculty of Engineering, Ferdowsi University of Mashhad, Mashhad, Iran
| | - Abdorreza Savadi
- Computer Engineering Department, Faculty of Engineering, Ferdowsi University of Mashhad, Mashhad, Iran
- Ferdowsi University of Mashhad, Azadi Square, Mashhad, 9177948974 Iran
| | | |
Collapse
|
18
|
Liu X, He T, Guo Z, Ren M, Luo Y. Predicting essential genes of 41 prokaryotes by a semi-supervised method. Anal Biochem 2020; 609:113919. [PMID: 32827465 DOI: 10.1016/j.ab.2020.113919] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2020] [Revised: 07/25/2020] [Accepted: 08/13/2020] [Indexed: 10/23/2022]
Abstract
Essential genes are vitally important to the survival and reproduction of organisms. Many machine learning methods have been widely employed to predict essential genes and have obtained satisfactory results. However, most of these methods are supervised methods and may not obtain the desired result when the labeled data are insufficient. In this paper, we proposed a learning with local and global consistency (LGC) method-based classifier, which was employed to predict the essential genes of 41 prokaryotes. LGC is a graph-based semi-supervised learning method that can construct a prediction model using finite label and constraint information. The performance of the proposed classifier was evaluated by employing intra-organism prediction and leave-one-species-out validation. The average AUC value of 41 organisms in intra-organisms prediction was 0.723 when the labeled sample ratio was 0.5. The results of this study indicate that the proposed method can achieve acceptable prediction performance with limited labeled data. Additionally, the results demonstrate that this method has good universality.
Collapse
Affiliation(s)
- Xiao Liu
- School of Microelectronics and Communication Engineering, Chongqing University, Chongqing, 400044, China.
| | - Ting He
- School of Microelectronics and Communication Engineering, Chongqing University, Chongqing, 400044, China
| | - Zhirui Guo
- School of Microelectronics and Communication Engineering, Chongqing University, Chongqing, 400044, China
| | - Meixiang Ren
- School of Microelectronics and Communication Engineering, Chongqing University, Chongqing, 400044, China
| | - Yachuan Luo
- School of Microelectronics and Communication Engineering, Chongqing University, Chongqing, 400044, China
| |
Collapse
|
19
|
James K, Olson PD. The tapeworm interactome: inferring confidence scored protein-protein interactions from the proteome of Hymenolepis microstoma. BMC Genomics 2020; 21:346. [PMID: 32380953 PMCID: PMC7204028 DOI: 10.1186/s12864-020-6710-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2019] [Accepted: 03/30/2020] [Indexed: 12/14/2022] Open
Abstract
Background Reference genome and transcriptome assemblies of helminths have reached a level of completion whereby secondary analyses that rely on accurate gene estimation or syntenic relationships can be now conducted with a high level of confidence. Recent public release of the v.3 assembly of the mouse bile-duct tapeworm, Hymenolepis microstoma, provides chromosome-level characterisation of the genome and a stabilised set of protein coding gene models underpinned by bioinformatic and empirical data. However, interactome data have not been produced. Conserved protein-protein interactions in other organisms, termed interologs, can be used to transfer interactions between species, allowing systems-level analysis in non-model organisms. Results Here, we describe a probabilistic, integrated network of interologs for the H. microstoma proteome, based on conserved protein interactions found in eukaryote model species. Almost a third of the 10,139 gene models in the v.3 assembly could be assigned interaction data and assessment of the resulting network indicates that topologically-important proteins are related to essential cellular pathways, and that the network clusters into biologically meaningful components. Moreover, network parameters are similar to those of single-species interaction networks that we constructed in the same way for S. cerevisiae, C. elegans and H. sapiens, demonstrating that information-rich, system-level analyses can be conducted even on species separated by a large phylogenetic distance from the major model organisms from which most protein interaction evidence is based. Using the interolog network, we then focused on sub-networks of interactions assigned to discrete suites of genes of interest, including signalling components and transcription factors, germline multipotency genes, and genes differentially-expressed between larval and adult worms. Results show not only an expected bias toward highly-conserved proteins, such as components of intracellular signal transduction, but in some cases predicted interactions with transcription factors that aid in identifying their target genes. Conclusions With key helminth genomes now complete, systems-level analyses can provide an important predictive framework to guide basic and applied research on helminths and will become increasingly informative as new protein-protein interaction data accumulate.
Collapse
Affiliation(s)
- Katherine James
- Department of Applied Sciences, Northumbria University, Newcastle Upon Tyne, UK. .,Department of Life Sciences, The Natural History Museum, Cromwell Road, London, UK.
| | - Peter D Olson
- Department of Life Sciences, The Natural History Museum, Cromwell Road, London, UK
| |
Collapse
|
20
|
Aromolaran O, Beder T, Oswald M, Oyelade J, Adebiyi E, Koenig R. Essential gene prediction in Drosophila melanogaster using machine learning approaches based on sequence and functional features. Comput Struct Biotechnol J 2020; 18:612-621. [PMID: 32257045 PMCID: PMC7096750 DOI: 10.1016/j.csbj.2020.02.022] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2019] [Revised: 02/27/2020] [Accepted: 02/27/2020] [Indexed: 12/11/2022] Open
Abstract
Genes are termed to be essential if their loss of function compromises viability or results in profound loss of fitness. On the genome scale, these genes can be determined experimentally employing RNAi or knockout screens, but this is very resource intensive. Computational methods for essential gene prediction can overcome this drawback, particularly when intrinsic (e.g. from the protein sequence) as well as extrinsic features (e.g. from transcription profiles) are considered. In this work, we employed machine learning to predict essential genes in Drosophila melanogaster. A total of 27,340 features were generated based on a large variety of different aspects comprising nucleotide and protein sequences, gene networks, protein-protein interactions, evolutionary conservation and functional annotations. Employing cross-validation, we obtained an excellent prediction performance. The best model achieved in D. melanogaster a ROC-AUC of 0.90, a PR-AUC of 0.30 and a F1 score of 0.34. Our approach considerably outperformed a benchmark method in which only features derived from the protein sequences were used (P < 0.001). Investigating which features contributed to this success, we found all categories of features, most prominently network topological, functional and sequence-based features. To evaluate our approach we performed the same workflow for essential gene prediction in human and achieved an ROC-AUC = 0.97, PR-AUC = 0.73, and F1 = 0.64. In summary, this study shows that using our well-elaborated assembly of features covering a broad range of intrinsic and extrinsic gene and protein features enabled intelligent systems to predict well the essentiality of genes in an organism.
Collapse
Affiliation(s)
- Olufemi Aromolaran
- Department of Computer & Information Sciences, Covenant University, Ota, Ogun State, Nigeria
- Integrated Research and Treatment Center, Center for Sepsis Control and Care (CSCC), Jena University Hospital, Am Klinikum 1, 07747 Jena, Germany
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, Nigeria
| | - Thomas Beder
- Integrated Research and Treatment Center, Center for Sepsis Control and Care (CSCC), Jena University Hospital, Am Klinikum 1, 07747 Jena, Germany
| | - Marcus Oswald
- Integrated Research and Treatment Center, Center for Sepsis Control and Care (CSCC), Jena University Hospital, Am Klinikum 1, 07747 Jena, Germany
| | - Jelili Oyelade
- Department of Computer & Information Sciences, Covenant University, Ota, Ogun State, Nigeria
- Integrated Research and Treatment Center, Center for Sepsis Control and Care (CSCC), Jena University Hospital, Am Klinikum 1, 07747 Jena, Germany
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, Nigeria
| | - Ezekiel Adebiyi
- Department of Computer & Information Sciences, Covenant University, Ota, Ogun State, Nigeria
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, Nigeria
| | - Rainer Koenig
- Integrated Research and Treatment Center, Center for Sepsis Control and Care (CSCC), Jena University Hospital, Am Klinikum 1, 07747 Jena, Germany
| |
Collapse
|
21
|
Wen QF, Liu S, Dong C, Guo HX, Gao YZ, Guo FB. Geptop 2.0: An Updated, More Precise, and Faster Geptop Server for Identification of Prokaryotic Essential Genes. Front Microbiol 2019; 10:1236. [PMID: 31214154 PMCID: PMC6558110 DOI: 10.3389/fmicb.2019.01236] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2019] [Accepted: 05/17/2019] [Indexed: 12/16/2022] Open
Abstract
Geptop has performed effectively in the identification of prokaryotic essential genes since its first release in 2013. It estimates gene essentiality for prokaryotes based on orthology and phylogeny. Genome-scale essentiality data of more prokaryotic species are available, and the information has been collected into public essential gene repositories such as DEG and OGEE. A faster and more accurate toolkit is needed to meet the increasing prokaryotic genome data. We updated Geptop by supplementing more validated essentiality data into reference set (from 19 to 37 species), and introducing multi-process technology to accelerate the computing speed. Compared with Geptop 1.0 and other gene essentiality prediction models, Geptop 2.0 can generate more stable predictions and finish the computation in a shorter time. The software is available both as an online server and a downloadable standalone application. We hope that the improved Geptop 2.0 will facilitate researches in gene essentiality and the development of novel antibacterial drugs. The gene essentiality prediction tool is available at http://cefg.uestc.cn/geptop.
Collapse
Affiliation(s)
- Qing-Feng Wen
- School of Life Sciences and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Shuo Liu
- School of Life Sciences and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Chuan Dong
- School of Life Sciences and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Hai-Xia Guo
- School of Life Sciences and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Yi-Zhou Gao
- School of Life Sciences and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Feng-Biao Guo
- School of Life Sciences and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|