1
|
Kania A. Harnessing the information theory and chaos game representation for pattern searching among essential and non-essential genes in Bacteria. J Theor Biol 2021; 531:110917. [PMID: 34563550 DOI: 10.1016/j.jtbi.2021.110917] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2021] [Revised: 08/19/2021] [Accepted: 09/21/2021] [Indexed: 11/29/2022]
Abstract
Proteins encoded by genes are engaged in most of the processes within a cell. Typing a minimal set of genes required for survival is still a challenging task. Essential genes seem to be more conservative and are usually responsible for basic functions, for instance, genetic information flow or energy production. Despite persistent advances in experimental methods, computer predictions may constitute an important part of this investigation. Firstly, they may embrace a huge amount of data and provide some characteristic patterns. Furthermore, they enable scientists to build models for predicting essential genes which are not yet verified experimentally. Some papers indicate interesting dependencies within essential genes sequences using different computer models. In this paper, an author took a three-step analysis for a deeper understanding of the fundamentals of essential and non-essential genes. Beginning from a simple nucleotide composition and finishing at long-range correlations, presents some characteristic patterns that are expected to be developed in future studies.
Collapse
Affiliation(s)
- Adrian Kania
- Department of Computational Biophysics and Bioinformatics, Faculty of Biochemistry, Biophysics and Biotechnology, Jagiellonian University, Gronostajowa 7, Cracow 30-387, Poland
| |
Collapse
|
2
|
Liu S, Wang SX, Liu W, Wang C, Zhang FZ, Ye YN, Wu CS, Zheng WX, Rao N, Guo FB. CEG 2.0: an updated database of clusters of essential genes including eukaryotic organisms. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2020; 2020:6031000. [PMID: 33306800 PMCID: PMC7731928 DOI: 10.1093/database/baaa112] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/24/2020] [Revised: 11/12/2020] [Accepted: 12/02/2020] [Indexed: 02/06/2023]
Abstract
Essential genes are key elements for organisms to maintain their living. Building databases that store essential genes in the form of homologous clusters, rather than storing them as a singleton, can provide more enlightening information such as the general essentiality of homologous genes in multiple organisms. In 2013, the first database to store prokaryotic essential genes in clusters, CEG (Clusters of Essential Genes), was constructed. Afterward, the amount of available data for essential genes increased by a factor >3 since the last revision. Herein, we updated CEG to version 2, including more prokaryotic essential genes (from 16 gene datasets to 29 gene datasets) and newly added eukaryotic essential genes (nine species), specifically the human essential genes of 12 cancer cell lines. For prokaryotes, information associated with drug targets, such as protein structure, ligand–protein interaction, virulence factor and matched drugs, is also provided. Finally, we provided the service of essential gene prediction for both prokaryotes and eukaryotes. We hope our updated database will benefit more researchers in drug targets and evolutionary genomics. Database URL:http://cefg.uestc.cn/ceg
Collapse
Affiliation(s)
- Shuo Liu
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China.,Key Laboratory of Combinatorial Biosynthesis and Drug Discovery, Ministry of Education and School of Pharmaceutical Sciences, Wuhan University, Wuhan 430071, China
| | - Shu-Xuan Wang
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Wei Liu
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Chen Wang
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Fa-Zhan Zhang
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Yuan-Nong Ye
- Bioinformatics and BioMedical Bigdata Mining Laboratory, Key Laboratory of Environmental Pollution Monitoring and Disease Control, Ministry of Education, Guizhou Medical University, Guiyang 550025, China
| | - Candy-S Wu
- Thomas Worthington High School, 300 West Granville Road, Worthington, OH 43085, USA
| | - Wen-Xin Zheng
- School of Biomedical Engineering, Capital Medical University, Beijing 100069, China
| | - Nini Rao
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Feng-Biao Guo
- Key Laboratory of Combinatorial Biosynthesis and Drug Discovery, Ministry of Education and School of Pharmaceutical Sciences, Wuhan University, Wuhan 430071, China
| |
Collapse
|
3
|
Liu X, He T, Guo Z, Ren M, Luo Y. Predicting essential genes of 41 prokaryotes by a semi-supervised method. Anal Biochem 2020; 609:113919. [PMID: 32827465 DOI: 10.1016/j.ab.2020.113919] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2020] [Revised: 07/25/2020] [Accepted: 08/13/2020] [Indexed: 10/23/2022]
Abstract
Essential genes are vitally important to the survival and reproduction of organisms. Many machine learning methods have been widely employed to predict essential genes and have obtained satisfactory results. However, most of these methods are supervised methods and may not obtain the desired result when the labeled data are insufficient. In this paper, we proposed a learning with local and global consistency (LGC) method-based classifier, which was employed to predict the essential genes of 41 prokaryotes. LGC is a graph-based semi-supervised learning method that can construct a prediction model using finite label and constraint information. The performance of the proposed classifier was evaluated by employing intra-organism prediction and leave-one-species-out validation. The average AUC value of 41 organisms in intra-organisms prediction was 0.723 when the labeled sample ratio was 0.5. The results of this study indicate that the proposed method can achieve acceptable prediction performance with limited labeled data. Additionally, the results demonstrate that this method has good universality.
Collapse
Affiliation(s)
- Xiao Liu
- School of Microelectronics and Communication Engineering, Chongqing University, Chongqing, 400044, China.
| | - Ting He
- School of Microelectronics and Communication Engineering, Chongqing University, Chongqing, 400044, China
| | - Zhirui Guo
- School of Microelectronics and Communication Engineering, Chongqing University, Chongqing, 400044, China
| | - Meixiang Ren
- School of Microelectronics and Communication Engineering, Chongqing University, Chongqing, 400044, China
| | - Yachuan Luo
- School of Microelectronics and Communication Engineering, Chongqing University, Chongqing, 400044, China
| |
Collapse
|
4
|
ePath: an online database towards comprehensive essential gene annotation for prokaryotes. Sci Rep 2019; 9:12949. [PMID: 31506471 PMCID: PMC6737131 DOI: 10.1038/s41598-019-49098-w] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2019] [Accepted: 08/15/2019] [Indexed: 02/01/2023] Open
Abstract
Experimental techniques for identification of essential genes (EGs) in prokaryotes are usually expensive, time-consuming and sometimes unrealistic. Emerging in silico methods provide alternative methods for EG prediction, but often possess limitations including heavy computational requirements and lack of biological explanation. Here we propose a new computational algorithm for EG prediction in prokaryotes with an online database (ePath) for quick access to the EG prediction results of over 4,000 prokaryotes ( https://www.pubapps.vcu.edu/epath/ ). In ePath, gene essentiality is linked to biological functions annotated by KEGG Ortholog (KO). Two new scoring systems, namely, E_score and P_score, are proposed for each KO as the EG evaluation criteria. E_score represents appearance and essentiality of a given KO in existing experimental results of gene essentiality, while P_score denotes gene essentiality based on the principle that a gene is essential if it plays a role in genetic information processing, cell envelope maintenance or energy production. The new EG prediction algorithm shows prediction accuracy ranging from 75% to 91% based on validation from five new experimental studies on EG identification. Our overall goal with ePath is to provide a comprehensive and reliable reference for gene essentiality annotation, facilitating the study of those prokaryotes without experimentally derived gene essentiality information.
Collapse
|
5
|
Diagnosis of Breast Hyperplasia and Evaluation of RuXian-I Based on Metabolomics Deep Belief Networks. Int J Mol Sci 2019; 20:ijms20112620. [PMID: 31141969 PMCID: PMC6600413 DOI: 10.3390/ijms20112620] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2019] [Revised: 05/24/2019] [Accepted: 05/26/2019] [Indexed: 01/09/2023] Open
Abstract
Breast cancer is estimated to be the leading cancer type among new cases in American women. Core biopsy data have shown a close association between breast hyperplasia and breast cancer. The early diagnosis and treatment of breast hyperplasia are extremely important to prevent breast cancer. The Mongolian medicine RuXian-I is a traditional drug that has achieved a high level of efficacy and a low incidence of side effects in its clinical use. However, for detecting the efficacy of RuXian-I, a rapid and accurate evaluation method based on metabolomic data is still lacking. Therefore, we proposed a framework, named the metabolomics deep belief network (MDBN), to analyze breast hyperplasia metabolomic data. We obtained 168 samples of metabolomic data from an animal model experiment of RuXian-I, which were averaged from control groups, treatment groups, and model groups. In the process of training, unlabelled data were used to pretrain the Deep Belief Networks models, and then labelled data were used to complete fine-tuning based on a limited-memory Broyden Fletcher Goldfarb Shanno (L-BFGS) algorithm. To prevent overfitting, a dropout method was added to the pretraining and fine-tuning procedures. The experimental results showed that the proposed model is superior to other classical classification methods that are based on positive and negative spectra data. Further, the proposed model can be used as an extension of the classification method for metabolomic data. For the high accuracy of classification of the three groups, the model indicates obvious differences and boundaries between the three groups. It can be inferred that the animal model of RuXian-I is well established, which can lay a foundation for subsequent related experiments. This also shows that metabolomic data can be used as a means to verify the effectiveness of RuXian-I in the treatment of breast hyperplasia.
Collapse
|
6
|
Peng C, Lin Y, Luo H, Gao F. A Comprehensive Overview of Online Resources to Identify and Predict Bacterial Essential Genes. Front Microbiol 2017; 8:2331. [PMID: 29230204 PMCID: PMC5711816 DOI: 10.3389/fmicb.2017.02331] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2017] [Accepted: 11/13/2017] [Indexed: 12/15/2022] Open
Abstract
Genes critical for the survival or reproduction of an organism in certain circumstances are classified as essential genes. Essential genes play a significant role in deciphering the survival mechanism of life. They may be greatly applied to pharmaceutics and synthetic biology. The continuous progress of experimental method for essential gene identification has accelerated the accumulation of gene essentiality data which facilitates the study of essential genes in silico. In this article, we present some available online resources related to gene essentiality, including bioinformatic software tools for transposon sequencing (Tn-seq) analysis, essential gene databases and online services to predict bacterial essential genes. We review several computational approaches that have been used to predict essential genes, and summarize the features used for gene essentiality prediction. In addition, we evaluate the available online bacterial essential gene prediction servers based on the experimentally validated essential gene sets of 30 bacteria from DEG. This article is intended to be a quick reference guide for the microbiologists interested in the essential genes.
Collapse
Affiliation(s)
- Chong Peng
- Department of Physics, School of Science, Tianjin University, Tianjin, China
| | - Yan Lin
- Department of Physics, School of Science, Tianjin University, Tianjin, China
| | - Hao Luo
- Department of Physics, School of Science, Tianjin University, Tianjin, China
| | - Feng Gao
- Department of Physics, School of Science, Tianjin University, Tianjin, China
- Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, Tianjin, China
- SynBio Research Platform, Collaborative Innovation Center of Chemical Science and Engineering (Tianjin), Tianjin University, Tianjin, China
| |
Collapse
|
7
|
Hua ZG, Lin Y, Yuan YZ, Yang DC, Wei W, Guo FB. ZCURVE 3.0: identify prokaryotic genes with higher accuracy as well as automatically and accurately select essential genes. Nucleic Acids Res 2015; 43:W85-90. [PMID: 25977299 PMCID: PMC4489317 DOI: 10.1093/nar/gkv491] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2015] [Accepted: 05/02/2015] [Indexed: 01/09/2023] Open
Abstract
In 2003, we developed an ab initio program, ZCURVE 1.0, to find genes in bacterial and archaeal genomes. In this work, we present the updated version (i.e. ZCURVE 3.0). Using 422 prokaryotic genomes, the average accuracy was 93.7% with the updated version, compared with 88.7% with the original version. Such results also demonstrate that ZCURVE 3.0 is comparable with Glimmer 3.02 and may provide complementary predictions to it. In fact, the joint application of the two programs generated better results by correctly finding more annotated genes while also containing fewer false-positive predictions. As the exclusive function, ZCURVE 3.0 contains one post-processing program that can identify essential genes with high accuracy (generally >90%). We hope ZCURVE 3.0 will receive wide use with the web-based running mode. The updated ZCURVE can be freely accessed from http://cefg.uestc.edu.cn/zcurve/ or http://tubic.tju.edu.cn/zcurveb/ without any restrictions.
Collapse
Affiliation(s)
- Zhi-Gang Hua
- Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China Center for Information in BioMedicine, University of Electronic Science and Technology of China, Chengdu 610054, China Health Big Data Science Research Center, Big Data Research Center, University of Electronic Science and Technology of China, Chengdu 610054, China Key Laboratory for NeuroInformation of the Ministry of Education, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Yan Lin
- Department of Physics, Tianjin University, Tianjin 300072, China Key Laboratory of Systems Bioengineering, Ministry of Education, Tianjin 300072, China Collaborative Innovation Center of Chemical Science and Engineering, Tianjin 300072, China
| | - Ya-Zhou Yuan
- Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China Center for Information in BioMedicine, University of Electronic Science and Technology of China, Chengdu 610054, China Health Big Data Science Research Center, Big Data Research Center, University of Electronic Science and Technology of China, Chengdu 610054, China Key Laboratory for NeuroInformation of the Ministry of Education, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - De-Chang Yang
- Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China Center for Information in BioMedicine, University of Electronic Science and Technology of China, Chengdu 610054, China Health Big Data Science Research Center, Big Data Research Center, University of Electronic Science and Technology of China, Chengdu 610054, China Key Laboratory for NeuroInformation of the Ministry of Education, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Wen Wei
- Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China Center for Information in BioMedicine, University of Electronic Science and Technology of China, Chengdu 610054, China Health Big Data Science Research Center, Big Data Research Center, University of Electronic Science and Technology of China, Chengdu 610054, China Key Laboratory for NeuroInformation of the Ministry of Education, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Feng-Biao Guo
- Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China Center for Information in BioMedicine, University of Electronic Science and Technology of China, Chengdu 610054, China Health Big Data Science Research Center, Big Data Research Center, University of Electronic Science and Technology of China, Chengdu 610054, China Key Laboratory for NeuroInformation of the Ministry of Education, University of Electronic Science and Technology of China, Chengdu 610054, China
| |
Collapse
|