1
|
Kumar U, Khandia R, Singhal S, Puranik N, Tripathi M, Pateriya AK, Khan R, Emran TB, Dhama K, Munjal A, Alqahtani T, Alqahtani AM. Insight into Codon Utilization Pattern of Tumor Suppressor Gene EPB41L3 from Different Mammalian Species Indicates Dominant Role of Selection Force. Cancers (Basel) 2021; 13:cancers13112739. [PMID: 34205890 PMCID: PMC8198080 DOI: 10.3390/cancers13112739] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Revised: 05/27/2021] [Accepted: 05/27/2021] [Indexed: 12/13/2022] Open
Abstract
Simple Summary The present study envisaged the codon usage pattern analysis of tumor suppressor gene EPB41L3 for the human, brown rat, domesticated cattle, and Sumatran orangutan. Most amino acids are coded by more than one synonymous codon, but they are used in a biased manner. The codon usage bias results from multiple factors like compositional properties, dinucleotide abundance, neutrality, parity, tRNA pool, etc. Understanding codon bias is central to fields as diverse as molecular evolution, gene expressivity, protein translation, and protein folding. This kind of studies is important to see the effects of various evolutionary forces on codon usage. The present study indicated that the selection force is dominant over other forces shaping codon usage in the envisaged organisms. Abstract Uneven codon usage within genes as well as among genomes is a usual phenomenon across organisms. It plays a significant role in the translational efficiency and evolution of a particular gene. EPB41L3 is a tumor suppressor protein-coding gene, and in the present study, the pattern of codon usage was envisaged. The full-length sequences of the EPB41L3 gene for the human, brown rat, domesticated cattle, and Sumatran orangutan available at the NCBI were retrieved and utilized to analyze CUB patterns across the selected mammalian species. Compositional properties, dinucleotide abundance, and parity analysis showed the dominance of A and G whilst RSCU analysis indicated the dominance of G/C-ending codons. The neutrality plot plotted between GC12 and GC3 to determine the variation between the mutation pressure and natural selection indicated the dominance of selection pressure (R = 0.926; p < 0.00001) over the three codon positions across the gene. The result is in concordance with the codon adaptation index analysis and the ENc-GC3 plot analysis, as well as the translational selection index (P2). Overall selection pressure is the dominant pressure acting during the evolution of the EPB41L3 gene.
Collapse
Affiliation(s)
- Utsang Kumar
- Department of Biochemistry and Genetics, Barkatullah University, Bhopal 462026, India; (U.K.); (S.S.); (N.P.); (A.M.)
| | - Rekha Khandia
- Department of Biochemistry and Genetics, Barkatullah University, Bhopal 462026, India; (U.K.); (S.S.); (N.P.); (A.M.)
- Correspondence: (R.K.); (K.D.)
| | - Shailja Singhal
- Department of Biochemistry and Genetics, Barkatullah University, Bhopal 462026, India; (U.K.); (S.S.); (N.P.); (A.M.)
| | - Nidhi Puranik
- Department of Biochemistry and Genetics, Barkatullah University, Bhopal 462026, India; (U.K.); (S.S.); (N.P.); (A.M.)
| | - Meghna Tripathi
- ICAR-National Institute of High Security Animal Diseases, Bhopal 462043, India; (M.T.); (A.K.P.)
| | - Atul Kumar Pateriya
- ICAR-National Institute of High Security Animal Diseases, Bhopal 462043, India; (M.T.); (A.K.P.)
| | - Raju Khan
- Microfluidics & MEMS Center, (MRS & CFC), CSIR-Advanced Materials and Processes Research Institute (AMPRI), Hoshangabad Road, Bhopal 462026, India;
| | - Talha Bin Emran
- Department of Pharmacy, BGC Trust University Bangladesh, Chittagong 4381, Bangladesh;
| | - Kuldeep Dhama
- Division of Pathology, Indian Veterinary Research Institute, Izatnagar, Bareilly 243122, India
- Correspondence: (R.K.); (K.D.)
| | - Ashok Munjal
- Department of Biochemistry and Genetics, Barkatullah University, Bhopal 462026, India; (U.K.); (S.S.); (N.P.); (A.M.)
| | - Taha Alqahtani
- Department of Pharmacology, College of Pharmacy, King Khalid University, Abha 62529, Saudi Arabia; (T.A.); (A.M.A.)
| | - Ali M. Alqahtani
- Department of Pharmacology, College of Pharmacy, King Khalid University, Abha 62529, Saudi Arabia; (T.A.); (A.M.A.)
| |
Collapse
|
2
|
Bouraoui A, Jamoussi S, BenAyed Y. A multi-objective genetic algorithm for simultaneous model and feature selection for support vector machines. Artif Intell Rev 2017. [DOI: 10.1007/s10462-017-9543-9] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
3
|
Tran TA, Vo NT, Nguyen HD, Pham BT. A Novel Method to Predict Highly Expressed Genes Based on Radius Clustering and Relative Synonymous Codon Usage. J Comput Biol 2015; 22:1086-96. [PMID: 26540560 DOI: 10.1089/cmb.2015.0121] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Recombinant proteins play an important role in many aspects of life and have generated a huge income, notably in the industrial enzyme business. A gene is introduced into a vector and expressed in a host organism-for example, E. coli-to obtain a high productivity of target protein. However, transferred genes from particular organisms are not usually compatible with the host's expression system because of various reasons, for example, codon usage bias, GC content, repetitive sequences, and secondary structure. The solution is developing programs to optimize for designing a nucleotide sequence whose origin is from peptide sequences using properties of highly expressed genes (HEGs) of the host organism. Existing data of HEGs determined by practical and computer-based methods do not satisfy for qualifying and quantifying. Therefore, the demand for developing a new HEG prediction method is critical. We proposed a new method for predicting HEGs and criteria to evaluate gene optimization. Codon usage bias was weighted by amplifying the difference between HEGs and non-highly expressed genes (non-HEGs). The number of predicted HEGs is 5% of the genome. In comparison with Puigbò's method, the result is twice as good as Puigbò's one, in kernel ratio and kernel sensitivity. Concerning transcription/translation factor proteins (TF), the proposed method gives low TF sensitivity, while Puigbò's method gives moderate one. In summary, the results indicated that the proposed method can be a good optional applying method to predict optimized genes for particular organisms, and we generated an HEG database for further researches in gene design.
Collapse
Affiliation(s)
- Tuan-Anh Tran
- 1 Facultie of Mathematics and Computer Science, VNUHCM-University of Science , Ho Chi Minh, Viet Nam
| | - Nam Tri Vo
- 2 Facultie of Biology, VNUHCM-University of Science , Ho Chi Minh, Viet Nam
| | - Hoang Duc Nguyen
- 2 Facultie of Biology, VNUHCM-University of Science , Ho Chi Minh, Viet Nam
| | - Bao The Pham
- 1 Facultie of Mathematics and Computer Science, VNUHCM-University of Science , Ho Chi Minh, Viet Nam
| |
Collapse
|
4
|
Pan LL, Wang Y, Hu JH, Ding ZT, Li C. Analysis of codon use features of stearoyl-acyl carrier protein desaturase gene in Camellia sinensis. J Theor Biol 2013; 334:80-6. [PMID: 23774066 DOI: 10.1016/j.jtbi.2013.06.006] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2013] [Revised: 06/03/2013] [Accepted: 06/06/2013] [Indexed: 11/19/2022]
Abstract
The stearoyl-acyl carrier protein desaturase (SAD) gene widely exists in all kinds of plants. In this paper, the Camellia sinensis SAD gene (CsSAD) sequence was firstly analyzed by Codon W, CHIPS, and CUSP programs online, and then compared with genomes of the tea plant, other species and SAD genes from 11 plant species. The results show that the CsSAD gene and the selected 73 of C. sinensis genes have similar codon usage bias. The CsSAD gene has a bias toward the synonymous codons with A and T at the third codon position, the same as the 73 of C. sinensis genes. Compared with monocotyledons such as Triticum aestivum and Zea mays, the differences in codon usage frequency between the CsSAD gene and dicotyledons such as Arabidopsis thaliana and Nicotiana tobacum are less. Therefore, A. thaliana and N. tobacum expression systems may be more suitable for the expression of the CsSAD gene. The analysis result of SAD genes from 12 plant species also shows that most of the SAD genes are biased toward the synonymous codons with G and C at the third codon position. We believe that the codon usage bias analysis presented in this study will be essential for providing a theoretical basis for discussing the structure and function of the CsSAD gene.
Collapse
Affiliation(s)
- Lu-Lu Pan
- Tea Research Institute, Qingdao Agricultural University, Changcheng Road 700#, Chengyang District, Qingdao, Shandong 266109, China.
| | | | | | | | | |
Collapse
|
5
|
|
6
|
McEachern A, Ashlock D, Schonfeld J. Sequence classification with side effect machines evolved via ring optimization. Biosystems 2013; 113:9-27. [PMID: 23603215 DOI: 10.1016/j.biosystems.2013.03.022] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2011] [Revised: 03/29/2013] [Accepted: 03/31/2013] [Indexed: 10/26/2022]
Abstract
The explosion of available sequence data necessitates the development of sophisticated machine learning tools with which to analyze them. This study introduces a sequence-learning technology called side effect machines. It also applies a model of evolution which simulates the evolution of a ring species to the training of the side effect machines. A comparison is done between side effect machines evolved in the ring structure and side effect machines evolved using a standard evolutionary algorithm based on tournament selection. At the core of the training of side effect machines is a nearest neighbor classifier. A parameter study was performed to investigate the impact of the division of training data into examples for nearest neighbor assessment and training cases. The parameter study demonstrates that parameter setting is important in the baseline runs but had little impact in the ring-optimization runs. The ring optimization technique was also found to exhibit improved and also more reliable training performance. Side effect machines are tested on two types of synthetic data, one based on GC-content and the other checking for the ability of side effect machines to recognize an embedded motif. Three types of biological data are used, a data set with different types of immune-system genes, a data set with normal and retro-virally derived human genomic sequence, and standard and nonstandard initiation regions from the cytochrome-oxidase subunit one in the mitochondrial genome.
Collapse
Affiliation(s)
- Andrew McEachern
- Department of Mathematics and Statistics, University of Guelph, Guelph, Ontario, Canada N1G 2W1.
| | | | | |
Collapse
|
7
|
Dass JFP, Sudandiradoss C. Insight into pattern of codon biasness and nucleotide base usage in serotonin receptor gene family from different mammalian species. Gene 2012; 503:92-100. [PMID: 22480817 DOI: 10.1016/j.gene.2012.03.057] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2011] [Revised: 03/14/2012] [Accepted: 03/17/2012] [Indexed: 11/16/2022]
Abstract
5-HT (5-Hydroxy-tryptamine) or serotonin receptors are found both in central and peripheral nervous system as well as in non-neuronal tissues. In the animal and human nervous system, serotonin produces various functional effects through a variety of membrane bound receptors. In this study, we focus on 5-HT receptor family from different mammals and examined the factors that account for codon and nucleotide usage variation. A total of 110 homologous coding sequences from 11 different mammalian species were analyzed using relative synonymous codon usage (RSCU), correspondence analysis (COA) and hierarchical cluster analysis together with nucleotide base usage frequency of chemically similar amino acid codons. The mean effective number of codon (ENc) value of 37.06 for 5-HT(6) shows very high codon bias within the family and may be due to high selective translational efficiency. The COA and Spearman's rank correlation reveals that the nucleotide compositional mutation bias as the major factors influencing the codon usage in serotonin receptor genes. The hierarchical cluster analysis suggests that gene function is another dominant factor that affects the codon usage bias, while species is a minor factor. Nucleotide base usage was reported using Goldman, Engelman, Stietz (GES) scale reveals the presence of high uracil (>45%) content at functionally important hydrophobic regions. Our in silico approach will certainly help for further investigations on critical inference on evolution, structure, function and gene expression aspects of 5-HT receptors family which are potential antipsychotic drug targets.
Collapse
Affiliation(s)
- J Febin Prabhu Dass
- School of Biosciences and Technology, VIT University, Vellore, Tamil Nadu State, India
| | | |
Collapse
|
8
|
Tumu S, Patil A, Towns W, Dyavaiah M, Begley TJ. The gene-specific codon counting database: a genome-based catalog of one-, two-, three-, four- and five-codon combinations present in Saccharomyces cerevisiae genes. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2012; 2012:bas002. [PMID: 22323063 PMCID: PMC3275765 DOI: 10.1093/database/bas002] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
A codon consists of three nucleotides and functions during translation to dictate the insertion of a specific amino acid in a growing peptide or, in the case of stop codons, to specify the completion of protein synthesis. There are 64 possible single codons and there are 4096 double, 262 144 triple, 16 777 216 quadruple and 1 073 741 824 quintuple codon combinations available for use by specific genes and genomes. In order to evaluate the use of specific single, double, triple, quadruple and quintuple codon combinations in genes and gene networks, we have developed a codon counting tool and employed it to analyze 5780 Saccharomyces cerevisiae genes. We have also developed visualization approaches, including codon painting, combination and bar graphs, and have used them to identify distinct codon usage patterns in specific genes and groups of genes. Using our developed Gene-Specific Codon Counting Database, we have identified extreme codon runs in specific genes. We have also demonstrated that specific codon combinations or usage patterns are over-represented in genes whose corresponding proteins belong to ribosome or translation-associated biological processes. Our resulting database provides a mineable list of multi-codon data and can be used to identify unique sequence runs and codon usage patterns in individual and functionally linked groups of genes. Database URL:http://www.cs.albany.edu/~tumu/GSCC.html
Collapse
Affiliation(s)
- Sudheer Tumu
- Department of Computer Science, University at Albany, State University of New York, Albany, NY 12222, USA
| | | | | | | | | |
Collapse
|
9
|
Nguyen MN, Zurada JM, Rajapakse JC. Toward better understanding of protein secondary structure: extracting prediction rules. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011; 8:858-864. [PMID: 21393657 DOI: 10.1109/tcbb.2010.16] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Although numerous computational techniques have been applied to predict protein secondary structure (PSS), only limited studies have dealt with discovery of logic rules underlying the prediction itself. Such rules offer interesting links between the prediction model and the underlying biology. In addition, they enhance interpretability of PSS prediction by providing a degree of transparency to the predicting model usually regarded as a black box. In this paper, we explore the generation and use of C4.5 decision trees to extract relevant rules from PSS predictions modeled with two-stage support vector machines (TS-SVM). The proposed rules were derived on the RS126 data set of 126 nonhomologous globular proteins and on the PSIPRED data set of 1,923 protein sequences. Our approach has produced sets of comprehensible, and often interpretable, rules underlying the PSS predictions. Moreover, many of the rules seem to be strongly supported by biological evidence. Further, our approach resulted in good prediction accuracy, few and usually compact rules, and rules that are generally of higher confidence levels than those generated by other rule extraction techniques.
Collapse
Affiliation(s)
- Minh N Nguyen
- BioInfomatics Institute, 30 Biopolis Street, #07-01 Matrix, Singapore.
| | | | | |
Collapse
|
10
|
Nguyen MN, Ma J, Fogel GB, Rajapakse JC. Di-codon usage for classification of genes. Biosystems 2009; 98:1-6. [PMID: 19577612 DOI: 10.1016/j.biosystems.2009.06.005] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2009] [Revised: 06/11/2009] [Accepted: 06/14/2009] [Indexed: 11/17/2022]
Abstract
Genes are often classified into biologically related groups so that inferences on their functions can be made. This paper demonstrates that the di-codon usage is a useful feature for gene classification and gives better classification accuracy than the codon usage. Our experiments with different classifiers show that support vector machines performs better than other classifiers in classifying genes by using di-codon usage as features. The method is illustrated on 1841 HLA sequences which are classified into two major classes, HLA-I and HLA-II, and further classified into the subclasses of major classes. By using both codon and di-codon features, we show near perfect accuracies in the classification of HLA molecules into major classes and their sub-classes.
Collapse
|
11
|
Rajapakse J, Chen C, Ho SL. Comparative genomic workflow: discovery of conserved noncoding DNA patterns. IEEE ENGINEERING IN MEDICINE AND BIOLOGY MAGAZINE : THE QUARTERLY MAGAZINE OF THE ENGINEERING IN MEDICINE & BIOLOGY SOCIETY 2009; 28:19-24. [PMID: 19622420 DOI: 10.1109/memb.2009.932910] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Affiliation(s)
- Jagath Rajapakse
- Bioinformatics Research Center, Nanyang Technological University, Singapore.
| | | | | |
Collapse
|