1
|
Zhang K, Wang Y, Zhang Y, Shan X. Codon usage characterization and phylogenetic analysis of the mitochondrial genome in Hemerocallis citrina. BMC Genom Data 2024; 25:6. [PMID: 38218810 PMCID: PMC10788020 DOI: 10.1186/s12863-024-01191-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Accepted: 01/04/2024] [Indexed: 01/15/2024] Open
Abstract
BACKGROUND Hemerocallis citrina Baroni is a traditional vegetable crop widely cultivated in eastern Asia for its high edible, medicinal, and ornamental value. The phenomenon of codon usage bias (CUB) is prevalent in various genomes and provides excellent clues for gaining insight into organism evolution and phylogeny. Comprehensive analysis of the CUB of mitochondrial (mt) genes can provide rich genetic information for improving the expression efficiency of exogenous genes and optimizing molecular-assisted breeding programmes in H. citrina. RESULTS Here, the CUB patterns in the mt genome of H. citrina were systematically analyzed, and the possible factors shaping CUB were further evaluated. Composition analysis of codons revealed that the overall GC (GCall) and GC at the third codon position (GC3) contents of mt genes were lower than 50%, presenting a preference for A/T-rich nucleotides and A/T-ending codons in H. citrina. The high values of the effective number of codons (ENC) are indicative of fairly weak CUB. Significant correlations of ENC with the GC3 and codon counts were observed, suggesting that not only compositional constraints but also gene length contributed greatly to CUB. Combined ENC-plot, neutrality plot, and Parity rule 2 (PR2)-plot analyses augmented the inference that the CUB patterns of the H. citrina mitogenome can be attributed to multiple factors. Natural selection, mutation pressure, and other factors might play a major role in shaping the CUB of mt genes, although natural selection is the decisive factor. Moreover, we identified a total of 29 high-frequency codons and 22 optimal codons, which exhibited a consistent preference for ending in A/T. Subsequent relative synonymous codon usage (RSCU)-based cluster and mt protein coding gene (PCG)-based phylogenetic analyses suggested that H. citrina is close to Asparagus officinalis, Chlorophytum comosum, Allium cepa, and Allium fistulosum in evolutionary terms, reflecting a certain correlation between CUB and evolutionary relationships. CONCLUSIONS There is weak CUB in the H. citrina mitogenome that is subject to the combined effects of multiple factors, especially natural selection. H. citrina was found to be closely related to Asparagus officinalis, Chlorophytum comosum, Allium cepa, and Allium fistulosum in terms of their evolutionary relationships as well as the CUB patterns of their mitogenomes. Our findings provide a fundamental reference for further studies on genetic modification and phylogenetic evolution in H. citrina.
Collapse
Affiliation(s)
- Kun Zhang
- College of Agriculture and Life Sciences, Shanxi Datong University, Datong, Shanxi, China.
- Key Laboratory of Organic Dry Farming for Special Crops in Datong City, Datong, Shanxi, China.
| | - Yiheng Wang
- State Key Laboratory of Vegetable Biobreeding, Tianjin Academy of Agricultural Sciences, Tianjin, China
| | - Yue Zhang
- College of Agriculture and Life Sciences, Shanxi Datong University, Datong, Shanxi, China
| | - Xiaofei Shan
- College of Agriculture and Life Sciences, Shanxi Datong University, Datong, Shanxi, China
| |
Collapse
|
2
|
Rahman SU, Rehman HU, Rahman IU, Khan MA, Rahim F, Ali H, Chen D, Ma W. Evolution of codon usage in Taenia saginata genomes and its impact on the host. Front Vet Sci 2023; 9:1021440. [PMID: 36713873 PMCID: PMC9875090 DOI: 10.3389/fvets.2022.1021440] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Accepted: 10/03/2022] [Indexed: 01/13/2023] Open
Abstract
The beef tapeworm, also known as Taenia saginata, is a zoonotic tapeworm from the genus Taenia in the order Cyclophyllidea. Taenia saginata is a food-borne zoonotic parasite with a worldwide distribution. It poses serious health risks to the host and has a considerable negative socioeconomic impact. Previous studies have explained the population structure of T. saginata within the evolutionary time scale and adaptive evolution. However, it is still unknown how synonymous codons are used by T. saginata. In this study, we used 90 T. saginata strains, applying the codon usage bias (CUB). Both base content and relative synonymous codon usage (RSCU) analysis revealed that AT-ended codons were more frequently used in the genome of T. saginata. Further low CUB was observed from the effective number of codons (ENC) value. The neutrality plot analysis suggested that the dominant factor of natural selection was involved in the structuring of CUB in T. saginata. Further analysis showed that T. saginata has adapted host-specific codon usage patterns to sustain successful replication and transmission chains within hosts (Bos taurus and Homo sapiens). Generally, both natural selection and mutational pressure have an impact on the codon usage patterns of the protein-coding genes in T. saginata. This study is important because it characterized the codon usage pattern in the T. saginata genomes and provided the necessary data for a basic evolutionary study on them.
Collapse
Affiliation(s)
- Siddiq Ur Rahman
- Department of Computer Science and Bioinformatics, Khushal Khan Khattak University, Karak, Pakistan
| | - Hassan Ur Rehman
- Department of Computer Science and Bioinformatics, Khushal Khan Khattak University, Karak, Pakistan
| | - Inayat Ur Rahman
- Department of Botany, Khushal Khan Khattak University, Karak, Pakistan
| | - Muazzam Ali Khan
- Department of Botany, Bacha Khan University, Charsadda, KP, Pakistan
| | - Fazli Rahim
- Department of Botany, Bacha Khan University, Charsadda, KP, Pakistan
| | - Hamid Ali
- Department of Biotechnology and Genetic Engineering, Hazara University, Mansehra, Pakistan
| | - Dekun Chen
- College of Veterinary Medicine, Northwest A&F University, Yangling, Shaanxi, China
| | - Wentao Ma
- Veterinary Immunology Laboratory, College of Veterinary Medicine, Northwest A&F University, Yangling, Shaanxi, China,*Correspondence: Wentao Ma ✉
| |
Collapse
|
3
|
Rahman SU, Rehman HU, Rahman IU, Rauf A, Alshammari A, Alharbi M, Haq NU, Suleria HAR, Raza SHA. Analysis of codon usage bias of lumpy skin disease virus causing livestock infection. Front Vet Sci 2022; 9:1071097. [PMID: 36544551 PMCID: PMC9762553 DOI: 10.3389/fvets.2022.1071097] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2022] [Accepted: 11/10/2022] [Indexed: 12/07/2022] Open
Abstract
Lumpy skin disease virus (LSDV) causes lumpy skin disease (LSD) in livestock, which is a double-stranded DNA virus that belongs to the genus Capripoxvirus of the family Poxviridae. LSDV is an important poxvirus that has spread out far and wide to become distributed worldwide. It poses serious health risks to the host and causes considerable negative socioeconomic impact on farmers financially and on cattle by causing ruminant-related diseases. Previous studies explained the population structure of the LSDV within the evolutionary time scale and adaptive evolution. However, it is still unknown and remains enigmatic as to how synonymous codons are used by the LSDV. Here, we used 53 LSDV strains and applied the codon usage bias (CUB) analysis to them. Both the base content and the relative synonymous codon usage (RSCU) analysis revealed that the AT-ended codons were more frequently used in the genome of LSDV. Further low codon usage bias was calculated from the effective number of codons (ENC) value. The neutrality plot analysis suggested that the dominant factor of natural selection played a role in the structuring of CUB in LSDV. Additionally, the results from a comparative analysis suggested that the LSDV has adapted host-specific codon usage patterns to sustain successful replication and transmission chains within hosts (Bos taurus and Homo sapiens). Both natural selection and mutational pressure have an impact on the codon usage patterns of the protein-coding genes in LSDV. This study is important because it has characterized the codon usage pattern in the LSDV genomes and has provided the necessary data for a basic evolutionary study on them.
Collapse
Affiliation(s)
- Siddiq Ur Rahman
- Department of Computer Science and Bioinformatics, Khushal Khan Khattak University, Karak, Pakistan,*Correspondence: Siddiq Ur Rahman
| | - Hassan Ur Rehman
- Department of Computer Science and Bioinformatics, Khushal Khan Khattak University, Karak, Pakistan
| | - Inayat Ur Rahman
- Department of Botany, Khushal Khan Khattak University, Karak, Pakistan
| | - Abdur Rauf
- Department of Chemistry, University of Swabi, Swabi, Pakistan
| | - Abdulrahman Alshammari
- Department of Pharmacology and Toxicology, College of Pharmacy, King Saud University, Riyadh, Saudi Arabia
| | - Metab Alharbi
- Department of Pharmacology and Toxicology, College of Pharmacy, King Saud University, Riyadh, Saudi Arabia
| | - Noor ul Haq
- Department of Computer Science and Bioinformatics, Khushal Khan Khattak University, Karak, Pakistan
| | - Hafiz Ansar Rasul Suleria
- Faculty of Veterinary and Agricultural Sciences, School of Agriculture and Food, The University of Melbourne, Melbourne, VIC, Australia
| | - Sayed Haidar Abbas Raza
- College of Animal Science and Technology, Northwest A&F University, Xianyang, China,Safety of Livestock and Poultry Products, College of Food Science, South China Agricultural University, Guangzhou, China
| |
Collapse
|
4
|
Jiang S, Du Q, Feng C, Ma L, Zhang Z. CompoDynamics: a comprehensive database for characterizing sequence composition dynamics. Nucleic Acids Res 2022; 50:D962-D969. [PMID: 34718745 PMCID: PMC8728180 DOI: 10.1093/nar/gkab979] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2021] [Revised: 10/02/2021] [Accepted: 10/06/2021] [Indexed: 11/15/2022] Open
Abstract
Sequence compositions of nucleic acids and proteins have significant impact on gene expression, RNA stability, translation efficiency, RNA/protein structure and molecular function, and are associated with genome evolution and adaptation across all kingdoms of life. Therefore, a devoted resource of sequence compositions and associated features is fundamentally crucial for a wide range of biological research. Here, we present CompoDynamics (https://ngdc.cncb.ac.cn/compodynamics/), a comprehensive database of sequence compositions of coding sequences (CDSs) and genomes for all kinds of species. Taking advantage of the exponential growth of RefSeq data, CompoDynamics presents a wealth of sequence compositions (nucleotide content, codon usage, amino acid usage) and derived features (coding potential, physicochemical property and phase separation) for 118 689 747 high-quality CDSs and 34 562 genomes across 24 995 species. Additionally, interactive analytical tools are provided to enable comparative analyses of sequence compositions and molecular features across different species and gene groups. Collectively, CompoDynamics bears the great potential to better understand the underlying roles of sequence composition dynamics across genes and genomes, providing a fundamental resource in support of a broad spectrum of biological studies.
Collapse
Affiliation(s)
- Shuai Jiang
- National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
- China National Center for Bioinformation, Beijing 100101, China
| | - Qiang Du
- National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
- China National Center for Bioinformation, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Changrui Feng
- National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
- China National Center for Bioinformation, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Lina Ma
- National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
- China National Center for Bioinformation, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Zhang Zhang
- National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
- China National Center for Bioinformation, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
5
|
Analysis of Codon Usage Patterns in Giardia duodenalis Based on Transcriptome Data from GiardiaDB. Genes (Basel) 2021; 12:genes12081169. [PMID: 34440343 PMCID: PMC8393687 DOI: 10.3390/genes12081169] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Revised: 07/24/2021] [Accepted: 07/27/2021] [Indexed: 12/03/2022] Open
Abstract
Giardia duodenalis, a flagellated parasitic protozoan, the most common cause of parasite-induced diarrheal diseases worldwide. Codon usage bias (CUB) is an important evolutionary character in most species. However, G. duodenalis CUB remains unclear. Thus, this study analyzes codon usage patterns to assess the restriction factors and obtain useful information in shaping G. duodenalis CUB. The neutrality analysis result indicates that G. duodenalis has a wide GC3 distribution, which significantly correlates with GC12. ENC-plot result—suggesting that most genes were close to the expected curve with only a few strayed away points. This indicates that mutational pressure and natural selection played an important role in the development of CUB. The Parity Rule 2 plot (PR2) result demonstrates that the usage of GC and AT was out of proportion. Interestingly, we identified 26 optimal codons in the G. duodenalis genome, ending with G or C. In addition, GC content, gene expression, and protein size also influence G. duodenalis CUB formation. This study systematically analyzes G. duodenalis codon usage pattern and clarifies the mechanisms of G. duodenalis CUB. These results will be very useful to identify new genes, molecular genetic manipulation, and study of G. duodenalis evolution.
Collapse
|
6
|
|
7
|
Maldonado LL, Bertelli AM, Kamenetzky L. Molecular features similarities between SARS-CoV-2, SARS, MERS and key human genes could favour the viral infections and trigger collateral effects. Sci Rep 2021; 11:4108. [PMID: 33602998 PMCID: PMC7893037 DOI: 10.1038/s41598-021-83595-1] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2020] [Accepted: 01/26/2021] [Indexed: 01/31/2023] Open
Abstract
In December 2019, rising pneumonia cases caused by a novel β-coronavirus (SARS-CoV-2) occurred in Wuhan, China, which has rapidly spread worldwide, causing thousands of deaths. The WHO declared the SARS-CoV-2 outbreak as a public health emergency of international concern, since then several scientists are dedicated to its study. It has been observed that many human viruses have codon usage biases that match highly expressed proteins in the tissues they infect and depend on the host cell machinery for the replication and co-evolution. In this work, we analysed 91 molecular features and codon usage patterns for 339 viral genes and 463 human genes that consisted of 677,873 codon positions. Hereby, we selected the highly expressed genes from human lung tissue to perform computational studies that permit to compare their molecular features with those of SARS, SARS-CoV-2 and MERS genes. The integrated analysis of all the features revealed that certain viral genes and overexpressed human genes have similar codon usage patterns. The main pattern was the A/T bias that together with other features could propitiate the viral infection, enhanced by a host dependant specialization of the translation machinery of only some of the overexpressed genes. The envelope protein E, the membrane glycoprotein M and ORF7 could be further benefited. This could be the key for a facilitated translation and viral replication conducting to different comorbidities depending on the genetic variability of population due to the host translation machinery. This is the first codon usage approach that reveals which human genes could be potentially deregulated due to the codon usage similarities between the host and the viral genes when the virus is already inside the human cells of the lung tissues. Our work leaded to the identification of additional highly expressed human genes which are not the usual suspects but might play a role in the viral infection and settle the basis for further research in the field of human genetics associated with new viral infections. To identify the genes that could be deregulated under a viral infection is important to predict the collateral effects and determine which individuals would be more susceptible based on their genetic features and comorbidities associated.
Collapse
Affiliation(s)
- Lucas L Maldonado
- IMPaM, CONICET, Facultad de Medicina, Universidad de Buenos Aires, Ciudad Autónoma de Buenos Aires, Argentina.
| | | | - Laura Kamenetzky
- IMPaM, CONICET, Facultad de Medicina, Universidad de Buenos Aires, Ciudad Autónoma de Buenos Aires, Argentina
- iB3 | Instituto de Biociencias, Biotecnología y Biología traslacional, Departamento de Fisiologia y Biologia Molecular y Celular, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Ciudad Autónoma de Buenos Aires, Argentina
| |
Collapse
|
8
|
Saha J, Bhattacharjee S, Pal Sarkar M, Saha BK, Basak HK, Adhikary S, Roy V, Mandal P, Chatterjee A, Pal A. A comparative genomics-based study of positive strand RNA viruses emphasizing on SARS-CoV-2 utilizing dinucleotide signature, codon usage and codon context analyses. GENE REPORTS 2021; 23:101055. [PMID: 33615042 PMCID: PMC7887452 DOI: 10.1016/j.genrep.2021.101055] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2020] [Revised: 01/20/2021] [Accepted: 02/09/2021] [Indexed: 12/12/2022]
Abstract
The novel corona virus disease or COVID-19 caused by a positive strand RNA virus (PRV) called SARS-CoV-2 is plaguing the entire planet as we conduct this study. In this study a multifaceted analysis was carried out employing dinucleotide signature, codon usage and codon context to compare and unravel the genomic as well as genic characteristics of the SARS-CoV-2 isolates and how they compare to other PRVs which represents some of the most pathogenic human viruses. The main emphasis of this study was to comprehend the codon biology of the SARS-CoV-2 in the backdrop of the other PRVs like Poliovirus, Japanese encephalitis virus, Hepatitis C virus, Norovirus, Rubella virus, Semliki Forest virus, Zika virus, Dengue virus, Human rhinoviruses and the Betacoronaviruses since codon usage pattern along with the nucleotide composition prevalent within the viral genome helps to understand the biology and evolution of viruses. Our results suggest discrete genomic dinucleotide signature within the PRVs. Some of the genes from the different SARS-CoV-2 isolates were also found to demonstrate heterogeneity in terms of their dinucleotide signature. The SARS-CoV-2 isolates also demonstrated a codon context trend characteristically dissimilar to the other PRVs. The findings of this study are expected to contribute to the developing global knowledge base in countering COVID-19.
Collapse
Key Words
- CAI, Codon Adaptation Index
- CNS, Central Nervous System
- COVID-19
- CRS, Congenital Rubella Syndrome
- CUB, Codon Usage Bias
- Codon context
- Codon usage bias
- Coronaviruses
- Fop, Frequency of optimal codons
- GC1, Guanine and Cytosine content on the first position of the codon
- GC2, Guanine and Cytosine content on the second position of the codon
- GC3, Guanine and Cytosine content on the third position of the codon
- HCV, Hepatitis C Virus
- MERS, Middle East Respiratory Syndrome
- MFE, Minimum Free Energy
- Nc, Effective Number of Codons
- PCA, Principal Component Analysis
- PRV, Positive strand RNA Virus
- Positive strand RNA virus
- RCDI, Relative Codon De-Optimization Index
- RSCU, Relative Synonymous Codon Usage
- SARS, Severe Acute Respiratory Syndrome
- SARS-CoV-2
- SARS-CoV-2, Severe Acute Respiratory Syndrome Coronavirus 2
- SCUO, Synonymous Codon Usage Order
- SiD, Similarity Index
Collapse
Affiliation(s)
- Jayanti Saha
- Microbiology & Computational Biology Laboratory, Department of Botany, Raiganj University, Raiganj PIN-733 134, Uttar Dinajpur, West Bengal, India
| | - Sukanya Bhattacharjee
- Microbiology & Computational Biology Laboratory, Department of Botany, Raiganj University, Raiganj PIN-733 134, Uttar Dinajpur, West Bengal, India
| | - Monalisha Pal Sarkar
- Mycology & Plant Pathology Laboratory, Department of Botany, Raiganj University, Raiganj PIN-733 134, Uttar Dinajpur, West Bengal, India
| | - Barnan Kumar Saha
- Microbiology & Computational Biology Laboratory, Department of Botany, Raiganj University, Raiganj PIN-733 134, Uttar Dinajpur, West Bengal, India
| | - Hriday Kumar Basak
- Department of Chemistry, Raiganj University, Raiganj PIN-733 134, Uttar Dinajpur, West Bengal, India
| | - Samarpita Adhikary
- Microbiology & Computational Biology Laboratory, Department of Botany, Raiganj University, Raiganj PIN-733 134, Uttar Dinajpur, West Bengal, India
| | - Vivek Roy
- Microbiology & Computational Biology Laboratory, Department of Botany, Raiganj University, Raiganj PIN-733 134, Uttar Dinajpur, West Bengal, India
| | - Parimal Mandal
- Mycology & Plant Pathology Laboratory, Department of Botany, Raiganj University, Raiganj PIN-733 134, Uttar Dinajpur, West Bengal, India
| | - Abhik Chatterjee
- Department of Chemistry, Raiganj University, Raiganj PIN-733 134, Uttar Dinajpur, West Bengal, India
| | - Ayon Pal
- Microbiology & Computational Biology Laboratory, Department of Botany, Raiganj University, Raiganj PIN-733 134, Uttar Dinajpur, West Bengal, India
| |
Collapse
|
9
|
Uddin A. Compositional Features and Codon Usage Pattern of Genes Associated with Anxiety in Human. Mol Neurobiol 2020; 57:4911-4920. [PMID: 32813237 DOI: 10.1007/s12035-020-02068-0] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2020] [Accepted: 08/10/2020] [Indexed: 12/12/2022]
Abstract
Codon usage bias (CUB) is the unequal usage of synonymous codon; some codons are more preferred than others. CUB analysis has applications in understanding the molecular organization of genome, genetics, gene expression, and molecular evolution. Bioinformatic approach was used to analyze the protein-coding sequences of genes involved in the anxiety to understand the patterns of codon usage as no work was reported yet. The improved effective number of codons (Nc) values ranged from 43.55 to 55.06, with a mean of 44.57, suggested that the overall CUB was low for genes associated with anxiety. The overall GC and AT content was 54.76 and 45.24, respectively. Relative synonymous codon usage (RSCU) analysis revealed that most frequently used codon ended mostly with C or G. The over-represented codons in genes associated with anxiety were CTG, ATC, GTG, AGC, ACC, and GCC, while under-represented codons were TTA, CTT, CTA, ATA, GTT, GTA, TCG, CCG, GCG, CAA, and CGT. Correlation analysis was performed between overall nucleotide composition and its 3rd codon positions, and observed highly significant (p < 0.01) correlation between them suggested that both mutation pressure and natural selection might affect the pattern of CUB. The highly significant correlation (0.598**, p < 0.01) was also observed between GC12 with GC3 suggested that directional mutation pressure might acted on all codon positions for genes associated with anxiety.
Collapse
Affiliation(s)
- Arif Uddin
- Department of Zoology, Moinul Hoque Choudhury Memorial Science College, Algapur, Hailakandi, Assam, 788150, India.
| |
Collapse
|
10
|
Priya R, Sneha P, Dass JFP, Doss C GP, Manickavasagam M, Siva R. Exploring the codon patterns between CCD and NCED genes among different plant species. Comput Biol Med 2019; 114:103449. [DOI: 10.1016/j.compbiomed.2019.103449] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2019] [Revised: 09/13/2019] [Accepted: 09/13/2019] [Indexed: 01/16/2023]
|
11
|
Classification of Hot and Cold Recombination Regions in Saccharomyces cerevisiae: Comparative Analysis of Two Machine Learning Techniques. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES INDIA SECTION A-PHYSICAL SCIENCES 2019. [DOI: 10.1007/s40010-017-0427-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
12
|
Uddin A, Paul N, Chakraborty S. The codon usage pattern of genes involved in ovarian cancer. Ann N Y Acad Sci 2019; 1440:67-78. [PMID: 30843242 DOI: 10.1111/nyas.14019] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2018] [Revised: 01/04/2019] [Accepted: 01/14/2019] [Indexed: 12/20/2022]
Abstract
In this study, we analyzed the compositional dynamics and codon usage pattern of genes involved in ovarian cancer (OC) using a computational method. Mutations in specific genes are associated with OC, and some genes are risk factors for progression of OC, but no work has been reported yet on the codon usage pattern of genes involved in OC. Nucleotide composition analysis of OC-related genes suggested that the overall GC content was higher than AT content; that is, the genes were GC rich. The improved effective number of codons indicated that the overall extent of codon usage bias of genes involved in OC was low. The codons AGC, CTG, ATC, ACC, GTG, and GCC were overrepresented, while the codons TCG, TTA, CTA, CCG, CAA, CGT, ATA, ACG, GTA, GTT, GCG, and GGT were underrepresented in the genes. Correspondence analysis suggested that the codon usage pattern was different in different genes. A highly significant correlation was observed between GC12 and GC3 (r = 0.587, P < 0.01) of genes, suggesting that directional mutation affected the three codon positions. Our report on the codon usage pattern of genes involved in OC includes a new perspective for elucidating the mechanisms of biased usage of synonymous codons, as well as providing useful clues for molecular genetic engineering.
Collapse
Affiliation(s)
- Arif Uddin
- Department of Zoology, Moinul Hoque Choudhury Memorial Science College, Assam, India
| | - Nirmal Paul
- Department of Biotechnology, Assam University, Assam, India
| | | |
Collapse
|
13
|
Uddin A, Mazumder TH, Chakraborty S. Understanding molecular biology of codon usage in mitochondrial complex IV genes of electron transport system: Relevance to mitochondrial diseases. J Cell Physiol 2018; 234:6397-6413. [DOI: 10.1002/jcp.27375] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2018] [Accepted: 08/17/2018] [Indexed: 12/17/2022]
Affiliation(s)
- Arif Uddin
- Department of Zoology Moinul Hoque Choudhury Memorial Science College Hailakandi Assam India
| | | | | |
Collapse
|
14
|
Mazumder GA, Uddin A, Chakraborty S. Preference of A/T ending codons in mitochondrial ATP6 gene under phylum Platyhelminthes. Mol Biochem Parasitol 2018; 225:15-26. [DOI: 10.1016/j.molbiopara.2018.08.007] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2018] [Revised: 08/17/2018] [Accepted: 08/22/2018] [Indexed: 11/27/2022]
|
15
|
Maldonado LL, Stegmayer G, Milone DH, Oliveira G, Rosenzvit M, Kamenetzky L. Whole genome analysis of codon usage in Echinococcus. Mol Biochem Parasitol 2018; 225:54-66. [DOI: 10.1016/j.molbiopara.2018.08.001] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2018] [Revised: 07/20/2018] [Accepted: 08/01/2018] [Indexed: 01/15/2023]
|
16
|
Abrhámová K, Nemčko F, Libus J, Převorovský M, Hálová M, Půta F, Folk P. Introns provide a platform for intergenic regulatory feedback of RPL22 paralogs in yeast. PLoS One 2018; 13:e0190685. [PMID: 29304067 PMCID: PMC5755908 DOI: 10.1371/journal.pone.0190685] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2017] [Accepted: 12/19/2017] [Indexed: 01/04/2023] Open
Abstract
Ribosomal protein genes (RPGs) in Saccharomyces cerevisiae are a remarkable regulatory group that may serve as a model for understanding genetic redundancy in evolutionary adaptations. Most RPGs exist as pairs of highly conserved functional paralogs with divergent untranslated regions and introns. We examined the roles of introns in strains with various combinations of intron and gene deletions in RPL22, RPL2, RPL16, RPL37, RPL17, RPS0, and RPS18 paralog pairs. We found that introns inhibited the expression of their genes in the RPL22 pair, with the RPL22B intron conferring a much stronger effect. While the WT RPL22A/RPL22B mRNA ratio was 93/7, the rpl22aΔi/RPL22B and RPL22A/rpl22bΔi ratios were >99/<1 and 60/40, respectively. The intron in RPL2A stimulated the expression of its own gene, but the removal of the other introns had little effect on expression of the corresponding gene pair. Rpl22 protein abundances corresponded to changes in mRNAs. Using splicing reporters containing endogenous intron sequences, we demonstrated that these effects were due to the inhibition of splicing by Rpl22 proteins but not by their RNA-binding mutant versions. Indeed, only WT Rpl22A/Rpl22B proteins (but not the mutants) interacted in a yeast three-hybrid system with an RPL22B intronic region between bp 165 and 236. Transcriptome analysis showed that both the total level of Rpl22 and the A/B ratio were important for maintaining the WT phenotype. The data presented here support the contention that the Rpl22B protein has a paralog-specific role. The RPL22 singleton of Kluyveromyces lactis, which did not undergo whole genome duplication, also responded to Rpl22-mediated inhibition in K. lactis cells. Vice versa, the overproduction of the K. lactis protein reduced the expression of RPL22A/B in S. cerevisiae. The extraribosomal function of of the K. lactis Rpl22 suggests that the loop regulating RPL22 paralogs of S. cerevisiae evolved from autoregulation.
Collapse
Affiliation(s)
- Kateřina Abrhámová
- Department of Cell Biology, Faculty of Science, Charles University, Prague, Czech Republic
| | - Filip Nemčko
- Department of Cell Biology, Faculty of Science, Charles University, Prague, Czech Republic
| | - Jiří Libus
- Department of Cell Biology, Faculty of Science, Charles University, Prague, Czech Republic
| | - Martin Převorovský
- Department of Cell Biology, Faculty of Science, Charles University, Prague, Czech Republic
| | - Martina Hálová
- Department of Cell Biology, Faculty of Science, Charles University, Prague, Czech Republic
| | - František Půta
- Department of Cell Biology, Faculty of Science, Charles University, Prague, Czech Republic
| | - Petr Folk
- Department of Cell Biology, Faculty of Science, Charles University, Prague, Czech Republic
- * E-mail:
| |
Collapse
|
17
|
Pathak J, Kannaujiya VK, Singh SP, Sinha RP. Codon usage analysis of photolyase encoding genes of cyanobacteria inhabiting diverse habitats. 3 Biotech 2017; 7:192. [PMID: 28664377 DOI: 10.1007/s13205-017-0826-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2017] [Accepted: 05/31/2017] [Indexed: 12/17/2022] Open
Abstract
Nucleotide and amino acid compositions were studied to determine the genomic and structural relationship of photolyase gene in freshwater, marine and hot spring cyanobacteria. Among three habitats, photolyase encoding genes from hot spring cyanobacteria were found to have highest GC content. The genomic GC content was found to influence the codon usage and amino acid variability in photolyases. The third position of codon was found to have more effect on amino acid variability in photolyases than the first and second positions of codon. The variation of amino acids Ala, Asp, Glu, Gly, His, Leu, Pro, Gln, Arg and Val in photolyases of three different habitats was found to be controlled by first position of codon (G1C1). However, second position (G2C2) of codon regulates variation of Ala, Cys, Gly, Pro, Arg, Ser, Thr and Tyr contents in photolyases. Third position (G3C3) of codon controls incorporation of amino acids such as Ala, Phe, Gly, Leu, Gln, Pro, Arg, Ser, Thr and Tyr in photolyases from three habitats. Photolyase encoding genes of hot spring cyanobacteria have 85% codons with G or C at third position, whereas marine and freshwater cyanobacteria showed 82 and 60% codons, respectively, with G or C at third position. Principal component analysis (PCA) showed that GC content has a profound effect in separating the genes along the first major axis according to their RSCU (relative synonymous codon usage) values, and neutrality analysis indicated that mutational pressure has resulted in codon bias in photolyase genes of cyanobacteria.
Collapse
Affiliation(s)
- Jainendra Pathak
- Laboratory of Photobiology and Molecular Microbiology, Centre of Advanced Study in Botany, Institute of Science, Banaras Hindu University, Varanasi, 221005, India
| | - Vinod K Kannaujiya
- Laboratory of Photobiology and Molecular Microbiology, Centre of Advanced Study in Botany, Institute of Science, Banaras Hindu University, Varanasi, 221005, India
| | - Shailendra P Singh
- Laboratory of Photobiology and Molecular Microbiology, Centre of Advanced Study in Botany, Institute of Science, Banaras Hindu University, Varanasi, 221005, India
| | - Rajeshwar P Sinha
- Laboratory of Photobiology and Molecular Microbiology, Centre of Advanced Study in Botany, Institute of Science, Banaras Hindu University, Varanasi, 221005, India.
| |
Collapse
|
18
|
Huang X, Xu J, Chen L, Wang Y, Gu X, Peng X, Yang G. Analysis of transcriptome data reveals multifactor constraint on codon usage in Taenia multiceps. BMC Genomics 2017; 18:308. [PMID: 28427327 PMCID: PMC5397707 DOI: 10.1186/s12864-017-3704-8] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2016] [Accepted: 04/12/2017] [Indexed: 12/04/2022] Open
Abstract
Background Codon usage bias (CUB) is an important evolutionary feature in genomes that has been widely observed in many organisms. However, the synonymous codon usage pattern in the genome of T. multiceps remains to be clarified. In this study, we analyzed the codon usage of T. multiceps based on the transcriptome data to reveal the constraint factors and to gain an improved understanding of the mechanisms that shape synonymous CUB. Results Analysis of a total of 8,620 annotated mRNA sequences from T. multiceps indicated only a weak codon bias, with mean GC and GC3 content values of 49.29% and 51.43%, respectively. Our analysis indicated that nucleotide composition, mutational pressure, natural selection, gene expression level, amino acids with grand average of hydropathicity (GRAVY) and aromaticity (Aromo) and the effective selection of amino-acids all contributed to the codon usage in T. multiceps. Among these factors, natural selection was implicated as the major factor affecting the codon usage variation in T. multiceps. The codon usage of ribosome genes was affected mainly by mutations, while the essential genes were affected mainly by selection. In addition, 21codons were identified as “optimal codons”. Overall, the optimal codons were GC-rich (GC:AU, 41:22), and ended with G or C (except CGU). Furthermore, different degrees of variation in codon usage were found between T. multiceps and Escherichia coli, yeast, Homo sapiens. However, little difference was found between T. multiceps and Taenia pisiformis. Conclusions In this study, the codon usage pattern of T. multiceps was analyzed systematically and factors affected CUB were also identified. This is the first study of codon biology in T. multiceps. Understanding the codon usage pattern in T. multiceps can be helpful for the discovery of new genes, molecular genetic engineering and evolutionary studies. Electronic supplementary material The online version of this article (doi:10.1186/s12864-017-3704-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Xing Huang
- Department of Parasitology, College of Veterinary Medicine, Sichuan Agricultural University, Chengdu, 611130, China.,Chengdu Agricultural College, Chengdu, 611130, China
| | - Jing Xu
- Department of Parasitology, College of Veterinary Medicine, Sichuan Agricultural University, Chengdu, 611130, China
| | - Lin Chen
- Meat-processing Application Key Laboratory of Sichuan Province, College of Pharmacy and Biological Engineering, Chengdu University, Chengdu, 610106, China
| | - Yu Wang
- Department of Parasitology, College of Veterinary Medicine, Sichuan Agricultural University, Chengdu, 611130, China
| | - Xiaobin Gu
- Department of Parasitology, College of Veterinary Medicine, Sichuan Agricultural University, Chengdu, 611130, China
| | - Xuerong Peng
- College of Science, Sichuan Agricultural University, Ya'an, 625014, China
| | - Guangyou Yang
- Department of Parasitology, College of Veterinary Medicine, Sichuan Agricultural University, Chengdu, 611130, China.
| |
Collapse
|
19
|
Gene expression, nucleotide composition and codon usage bias of genes associated with human Y chromosome. Genetica 2017; 145:295-305. [PMID: 28421323 DOI: 10.1007/s10709-017-9965-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2016] [Accepted: 04/08/2017] [Indexed: 10/19/2022]
Abstract
Analysis of codon usage pattern is important to understand the genetic and evolutionary characteristics of genomes. We have used bioinformatic approaches to analyze the codon usage bias (CUB) of the genes located in human Y chromosome. Codon bias index (CBI) indicated that the overall extent of codon usage bias was low. The relative synonymous codon usage (RSCU) analysis suggested that approximately half of the codons out of 59 synonymous codons were most frequently used, and possessed a T or G at the third codon position. The codon usage pattern was different in different genes as revealed from correspondence analysis (COA). A significant correlation between effective number of codons (ENC) and various GC contents suggests that both mutation pressure and natural selection affect the codon usage pattern of genes located in human Y chromosome. In addition, Y-linked genes have significant difference in GC contents at the second and third codon positions, expression level, and codon usage pattern of some codons like the SPANX genes in X chromosome.
Collapse
|
20
|
Dwivedi AK, Chouhan U. Comparative study of artificial neural network for classification of hot and cold recombination regions in Saccharomyces cerevisiae. Neural Comput Appl 2016. [DOI: 10.1007/s00521-016-2466-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
21
|
Yang X, Ma X, Luo X, Ling H, Zhang X, Cai X. Codon Usage Bias and Determining Forces in Taenia solium Genome. THE KOREAN JOURNAL OF PARASITOLOGY 2015; 53:689-97. [PMID: 26797435 PMCID: PMC4725240 DOI: 10.3347/kjp.2015.53.6.689] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/16/2015] [Revised: 08/10/2015] [Accepted: 10/06/2015] [Indexed: 11/23/2022]
Abstract
The tapeworm Taenia solium is an important human zoonotic parasite that causes great economic loss and also endangers public health. At present, an effective vaccine that will prevent infection and chemotherapy without any side effect remains to be developed. In this study, codon usage patterns in the T. solium genome were examined through 8,484 protein-coding genes. Neutrality analysis showed that T. solium had a narrow GC distribution, and a significant correlation was observed between GC12 and GC3. Examination of an NC (ENC vs GC3s)-plot showed a few genes on or close to the expected curve, but the majority of points with low-ENC (the effective number of codons) values were detected below the expected curve, suggesting that mutational bias plays a major role in shaping codon usage. The Parity Rule 2 plot (PR2) analysis showed that GC and AT were not used proportionally. We also identified 26 optimal codons in the T. solium genome, all of which ended with either a G or C residue. These optimal codons in the T. solium genome are likely consistent with tRNAs that are highly expressed in the cell, suggesting that mutational and translational selection forces are probably driving factors of codon usage bias in the T. solium genome.
Collapse
Affiliation(s)
- Xing Yang
- College of Veterinary Medicine, Jilin University, Changchun, 130000, P. R. China ; State Key Laboratory of Veterinary Etiological Biology, Lanzhou Veterinary Research Institute, Chinese Academy of Agricultural Sciences, Lanzhou 730046, P. R. China
| | - Xusheng Ma
- State Key Laboratory of Veterinary Etiological Biology, Lanzhou Veterinary Research Institute, Chinese Academy of Agricultural Sciences, Lanzhou 730046, P. R. China
| | - Xuenong Luo
- College of Veterinary Medicine, Jilin University, Changchun, 130000, P. R. China
| | - Houjun Ling
- College of Veterinary Medicine, Jilin University, Changchun, 130000, P. R. China
| | - Xichen Zhang
- College of Veterinary Medicine, Jilin University, Changchun, 130000, P. R. China
| | - Xuepeng Cai
- College of Veterinary Medicine, Jilin University, Changchun, 130000, P. R. China.,State Key Laboratory of Veterinary Etiological Biology, Lanzhou Veterinary Research Institute, Chinese Academy of Agricultural Sciences, Lanzhou 730046, P. R. China
| |
Collapse
|
22
|
Yang X, Luo X, Cai X. Analysis of codon usage pattern in Taenia saginata based on a transcriptome dataset. Parasit Vectors 2014; 7:527. [PMID: 25440955 PMCID: PMC4268816 DOI: 10.1186/s13071-014-0527-1] [Citation(s) in RCA: 74] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2014] [Accepted: 11/06/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Codon usage bias is an important evolutionary feature in a genome and has been widely documented in many genomes. Analysis of codon usage bias has significance for mRNA translation, design of transgenes, new gene discovery, and studies of molecular biology and evolution, etc. However, the information about synonymous codon usage pattern of T. saginata genome remains unclear. T. saginata is a food-borne zoonotic cestode which infects approximataely 50 million humans worldwide, and causes significant health problems to the host and considerable socio-economic losses as a consequence. In this study, synonymous codon usage in T. saginata were examined. METHODS Total RNA was isolated from T. saginata cysticerci and 91,487 unigenes were generated using Illumina sequencing technology. After filtering, the final sequence collection containing 11,399 CDSs was used for our analysis. RESULTS Neutrality analysis showed that the T. saginata had a wide GC3 distribution and a significant correlation was observed between GC12 and GC3. NC-plot showed most of genes on or close to the expected curve, but only a few points with low-ENC values were below it, suggesting that mutational bias plays a major role in shaping codon usage. The Parity Rule 2 plot (PR2) analysis showed that GC and AT were not used proportionally. We also identified twenty-three optimal codons in the T. saginata genome, all of which were ended with a G or C residue. These results suggest that mutational and selection forces are probably driving factors of codon usage bias in T. saginata genome. Meanwhile, other factors such as protein length, gene expression, GC content of genes, the hydropathicity of each protein also influence codon usage. CONCLUSIONS Here, we systematically analyzed the codon usage pattern and identified factors shaping in codon usage bias in T. saginata. Currently, no complete nuclear genome is available for codon usage analysis at the genome level in T. saginata. This is the first report to investigate codon biology in T. sagninata. Such information does not only bring about a new perspective for understanding the mechanisms of biased usage of synonymous codons but also provide useful clues for molecular genetic engineering and evolutionary studies.
Collapse
Affiliation(s)
- Xing Yang
- State Key Laboratory of Veterinary Etiological Biology, Lanzhou Veterinary Research Institute, Chinese Academy of Agricultural Sciences, Lanzhou, 730046, PR China. .,College of Veterinary Medicine, Jilin University, Changchun, 130000, PR China.
| | - Xuenong Luo
- State Key Laboratory of Veterinary Etiological Biology, Lanzhou Veterinary Research Institute, Chinese Academy of Agricultural Sciences, Lanzhou, 730046, PR China.
| | - Xuepeng Cai
- State Key Laboratory of Veterinary Etiological Biology, Lanzhou Veterinary Research Institute, Chinese Academy of Agricultural Sciences, Lanzhou, 730046, PR China. .,College of Veterinary Medicine, Jilin University, Changchun, 130000, PR China.
| |
Collapse
|
23
|
Analysis of codon usage patterns in Taenia pisiformis through annotated transcriptome data. Biochem Biophys Res Commun 2013; 430:1344-8. [DOI: 10.1016/j.bbrc.2012.12.078] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2012] [Accepted: 12/12/2012] [Indexed: 12/16/2022]
|
24
|
Hershberg R, Petrov DA. On the limitations of using ribosomal genes as references for the study of codon usage: a rebuttal. PLoS One 2012; 7:e49060. [PMID: 23284622 PMCID: PMC3527481 DOI: 10.1371/journal.pone.0049060] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2011] [Accepted: 10/05/2012] [Indexed: 01/08/2023] Open
Abstract
In a recent paper published in PLOS ONE, Wang et al. challenge our finding that the identity of optimal codons in different genomes follows a set of clear rules. Here we provide a rebuttal of their paper and demonstrate that the results of our original PLOS Genetics paper stand. This provides us with an opportunity to bring up an aspect of how codon usage has been studied that should be of general interest. The Wang et al. study, as well as many other studies, used ribosomal genes as a reference set for the study of patterns of codon usage. We discuss here the assumptions that are made in order to justify using ribosomal genes to study codon bias, suggest that this practice can at times be problematic, and discuss its limitations.
Collapse
Affiliation(s)
- Ruth Hershberg
- Rachel & Menachem Mendelovitch Evolutionary Processes of Mutation & Natural Selection Research Laboratory, Department of Genetics, Technion-Israel Institute of Technology, Haifa, Israel.
| | | |
Collapse
|
25
|
Novoa EM, Ribas de Pouplana L. Speeding with control: codon usage, tRNAs, and ribosomes. Trends Genet 2012; 28:574-81. [PMID: 22921354 DOI: 10.1016/j.tig.2012.07.006] [Citation(s) in RCA: 218] [Impact Index Per Article: 18.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2012] [Revised: 07/19/2012] [Accepted: 07/20/2012] [Indexed: 11/26/2022]
Abstract
Codon usage and tRNA abundance are critical parameters for gene synthesis. However, the forces determining codon usage bias within genomes and between organisms, as well as the functional roles of biased codon compositions, remain poorly understood. Similarly, the composition and dynamics of mature tRNA populations in cells in terms of isoacceptor abundances, and the prevalence and function of base modifications are not well understood. As we begin to decipher some of the rules that govern codon usage and tRNA abundances, it is becoming clear that these parameters are a way to not only increase gene expression, but also regulate the speed of ribosomal translation, the efficiency of protein folding, and the coordinated expression of functionally related gene families. Here, we discuss the importance of codon-anticodon interactions in translation regulation and highlight the contribution of non-random codon distributions and post-transcriptional base modifications to this regulation.
Collapse
Affiliation(s)
- Eva Maria Novoa
- Institute for Research in Biomedicine (IRB), c/Baldiri Reixac 15-21 08028, Barcelona, Catalonia, Spain
| | | |
Collapse
|
26
|
Atkinson GC, Kuzmenko A, Kamenski P, Vysokikh MY, Lakunina V, Tankov S, Smirnova E, Soosaar A, Tenson T, Hauryliuk V. Evolutionary and genetic analyses of mitochondrial translation initiation factors identify the missing mitochondrial IF3 in S. cerevisiae. Nucleic Acids Res 2012; 40:6122-34. [PMID: 22457064 PMCID: PMC3401457 DOI: 10.1093/nar/gks272] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
Mitochondrial translation is essentially bacteria-like, reflecting the bacterial endosymbiotic ancestry of the eukaryotic organelle. However, unlike the translation system of its bacterial ancestors, mitochondrial translation is limited to just a few mRNAs, mainly coding for components of the respiratory complex. The classical bacterial initiation factors (IFs) IF1, IF2 and IF3 are universal in bacteria, but only IF2 is universal in mitochondria (mIF2). We analyse the distribution of mitochondrial translation initiation factors and their sequence features, given two well-propagated claims: first, a sequence insertion in mitochondrial IF2 (mIF2) compensates for the universal lack of IF1 in mitochondria, and secondly, no homologue of mitochondrial IF3 (mIF3) is identifiable in Saccharomyces cerevisiae. Our comparative sequence analysis shows that, in fact, the mIF2 insertion is highly variable and restricted in length and primary sequence conservation to vertebrates, while phylogenetic and in vivo complementation analyses reveal that an uncharacterized S. cerevisiae mitochondrial protein currently named Aim23p is a bona fide evolutionary and functional orthologue of mIF3. Our results highlight the lineage-specific nature of mitochondrial translation and emphasise that comparative analyses among diverse taxa are essential for understanding whether generalizations from model organisms can be made across eukaryotes.
Collapse
|
27
|
Nguyen MN, Ma J, Fogel GB, Rajapakse JC. Di-codon usage for classification of genes. Biosystems 2009; 98:1-6. [PMID: 19577612 DOI: 10.1016/j.biosystems.2009.06.005] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2009] [Revised: 06/11/2009] [Accepted: 06/14/2009] [Indexed: 11/17/2022]
Abstract
Genes are often classified into biologically related groups so that inferences on their functions can be made. This paper demonstrates that the di-codon usage is a useful feature for gene classification and gives better classification accuracy than the codon usage. Our experiments with different classifiers show that support vector machines performs better than other classifiers in classifying genes by using di-codon usage as features. The method is illustrated on 1841 HLA sequences which are classified into two major classes, HLA-I and HLA-II, and further classified into the subclasses of major classes. By using both codon and di-codon features, we show near perfect accuracies in the classification of HLA molecules into major classes and their sub-classes.
Collapse
|
28
|
Liu H, He R, Zhang H, Huang Y, Tian M, Zhang J. Analysis of synonymous codon usage in Zea mays. Mol Biol Rep 2009; 37:677-84. [DOI: 10.1007/s11033-009-9521-7] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2008] [Accepted: 03/17/2009] [Indexed: 11/29/2022]
|
29
|
Ma J, Nguyen MN, Rajapakse JC. Gene classification using codon usage and support vector machines. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2009; 6:134-143. [PMID: 19179707 DOI: 10.1109/tcbb.2007.70240] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
A novel approach for gene classification, which adopts codon usage bias as input feature vector for classification by support vector machines (SVM) is proposed. The DNA sequence is first converted to a 59-dimensional feature vector where each element corresponds to the relative synonymous usage frequency of a codon. As the input to the classifier is independent of sequence length and variance, our approach is useful when the sequences to be classified are of different lengths, a condition that homology-based methods tend to fail. The method is demonstrated by using 1,841 Human Leukocyte Antigen (HLA) sequences which are classified into two major classes: HLA-I and HLA-II; each major class is further subdivided into sub-groups of HLA-I and HLA-II molecules. Using codon usage frequencies, binary SVM achieved accuracy rate of 99.3% for HLA major class classification and multi-class SVM achieved accuracy rates of 99.73% and 98.38% for sub-class classification of HLA-I and HLA-II molecules, respectively. The results show that gene classification based on codon usage bias is consistent with the molecular structures and biological functions of HLA molecules.
Collapse
Affiliation(s)
- Jianmin Ma
- BioInformatics Research Center, NanyangTechnological University, Singapore 637553.
| | | | | |
Collapse
|
30
|
Mocellin S, Rossi CR. Principles of gene microarray data analysis. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2007; 593:19-30. [PMID: 17265713 DOI: 10.1007/978-0-387-39978-2_3] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
The development of several gene expression profiling methods, such as comparative genomic hybridization (CGH), differential display, serial analysis of gene expression (SAGE), and gene microarray, together with the sequencing of the human genome, has provided an opportunity to monitor and investigate the complex cascade of molecular events leading to tumor development and progression. The availability of such large amounts of information has shifted the attention of scientists towards a nonreductionist approach to biological phenomena. High throughput technologies can be used to follow changing patterns of gene expression over time. Among them, gene microarray has become prominent because it is easier to use, does not require large-scale DNA sequencing, and allows for the parallel quantification of thousands of genes from multiple samples. Gene microarray technology is rapidly spreading worldwide and has the potential to drastically change the therapeutic approach to patients affected with tumor. Therefore, it is of paramount importance for both researchers and clinicians to know the principles underlying the analysis of the huge amount of data generated with microarray technology.
Collapse
Affiliation(s)
- Simone Mocellin
- Clinica Chirurgica II, Dipartimento di Scienze Oncologiche e Chirurgiche, University of Padova, Via Giustiniani 2, Italy.
| | | |
Collapse
|
31
|
Dittmar KA, Goodenbour JM, Pan T. Tissue-specific differences in human transfer RNA expression. PLoS Genet 2006; 2:e221. [PMID: 17194224 DOI: 10.1371/journal.pgen.0020221.st006] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2006] [Accepted: 11/07/2006] [Indexed: 05/21/2023] Open
Abstract
Over 450 transfer RNA (tRNA) genes have been annotated in the human genome. Reliable quantitation of tRNA levels in human samples using microarray methods presents a technical challenge. We have developed a microarray method to quantify tRNAs based on a fluorescent dye-labeling technique. The first-generation tRNA microarray consists of 42 probes for nuclear encoded tRNAs and 21 probes for mitochondrial encoded tRNAs. These probes cover tRNAs for all 20 amino acids and 11 isoacceptor families. Using this array, we report that the amounts of tRNA within the total cellular RNA vary widely among eight different human tissues. The brain expresses higher overall levels of nuclear encoded tRNAs than every tissue examined but one and higher levels of mitochondrial encoded tRNAs than every tissue examined. We found tissue-specific differences in the expression of individual tRNA species, and tRNAs decoding amino acids with similar chemical properties exhibited coordinated expression in distinct tissue types. Relative tRNA abundance exhibits a statistically significant correlation to the codon usage of a collection of highly expressed, tissue-specific genes in a subset of tissues or tRNA isoacceptors. Our findings demonstrate the existence of tissue-specific expression of tRNA species that strongly implicates a role for tRNA heterogeneity in regulating translation and possibly additional processes in vertebrate organisms.
Collapse
Affiliation(s)
- Kimberly A Dittmar
- Department of Biochemistry and Molecular Biology, University of Chicago, Chicago, Illinois, United States of America
| | | | | |
Collapse
|
32
|
Dittmar KA, Goodenbour JM, Pan T. Tissue-specific differences in human transfer RNA expression. PLoS Genet 2006; 2:e221. [PMID: 17194224 PMCID: PMC1713254 DOI: 10.1371/journal.pgen.0020221] [Citation(s) in RCA: 460] [Impact Index Per Article: 25.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2006] [Accepted: 11/07/2006] [Indexed: 12/02/2022] Open
Abstract
Over 450 transfer RNA (tRNA) genes have been annotated in the human genome. Reliable quantitation of tRNA levels in human samples using microarray methods presents a technical challenge. We have developed a microarray method to quantify tRNAs based on a fluorescent dye-labeling technique. The first-generation tRNA microarray consists of 42 probes for nuclear encoded tRNAs and 21 probes for mitochondrial encoded tRNAs. These probes cover tRNAs for all 20 amino acids and 11 isoacceptor families. Using this array, we report that the amounts of tRNA within the total cellular RNA vary widely among eight different human tissues. The brain expresses higher overall levels of nuclear encoded tRNAs than every tissue examined but one and higher levels of mitochondrial encoded tRNAs than every tissue examined. We found tissue-specific differences in the expression of individual tRNA species, and tRNAs decoding amino acids with similar chemical properties exhibited coordinated expression in distinct tissue types. Relative tRNA abundance exhibits a statistically significant correlation to the codon usage of a collection of highly expressed, tissue-specific genes in a subset of tissues or tRNA isoacceptors. Our findings demonstrate the existence of tissue-specific expression of tRNA species that strongly implicates a role for tRNA heterogeneity in regulating translation and possibly additional processes in vertebrate organisms. Transfer RNAs (tRNAs) translate the genetic code of genes into the amino acid sequence of proteins. Most amino acids have two or more codons. Every organism has multiple tRNA species reading the codons for the same amino acid (tRNA isoacceptors). In bacteria and yeast, differences in the relative abundance of tRNA isoacceptors have been found to affect the level of highly expressed proteins. This tRNA abundance–codon distribution relationship can have predictive power on the expression of genes based on their codon usages. Approximately 450 tRNA genes consisting of 49 isoacceptors and 274 different sequences have been annotated in the human genome. This work describes the first comparative analysis of tRNA expression levels in eight human tissues using microarray methods. The authors find significant, tissue-specific differences in the expression of tRNA species and coordinated expression among tRNAs decoding amino acids with similar chemical properties in distinct tissue types. Correlation of relative tRNA abundance versus the codon usage of highly expressed, tissue-specific genes can be found among a subset of tissues or tRNA isoacceptors. Differential tRNA expression in human tissues suggests that tRNA may play a unique role in regulating translation and possibly other processes in humans.
Collapse
Affiliation(s)
- Kimberly A Dittmar
- Department of Biochemistry and Molecular Biology, University of Chicago, Chicago, Illinois, United States of America
| | - Jeffrey M Goodenbour
- Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America
| | - Tao Pan
- Department of Biochemistry and Molecular Biology, University of Chicago, Chicago, Illinois, United States of America
- * To whom correspondence should be addressed. E-mail:
| |
Collapse
|
33
|
Wang L, Roossinck MJ. Comparative analysis of expressed sequences reveals a conserved pattern of optimal codon usage in plants. PLANT MOLECULAR BIOLOGY 2006; 61:699-710. [PMID: 16897485 DOI: 10.1007/s11103-006-0041-8] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/08/2005] [Accepted: 03/09/2006] [Indexed: 05/11/2023]
Abstract
Codon usage bias is a ubiquitous phenomenon, which may be caused by mutational bias, selection, or both. The patterns of codon usage in plants are not well understood. Datasets of expressed sequence tags (ESTs) available for many plant species provide the resources for large-scale comparative analysis of codon usage patterns. We developed a computational approach to translate EST or assembled contig sequences, and then used the coding information for comparative analysis of codon usage in 12 plant species, including 6 eudicots, 5 monocots and the green alga Chlamydomonas reinhardtii. While codon nucleotide composition is highly conserved within eudicots or monocots, there is a significant difference between these two major taxonomic groups of higher plants. The third nucleotide position of codons is AU-rich in the eudicot genomes (35-42% of G+C content), but GC-rich in the monocot genomes (59-61% of G+C content). To identify optimal codons in these species, we used EST counts to estimate gene transcript levels. It was demonstrated that codon usage bias is correlated positively with gene transcript levels. Interestingly, the use of optimal codons appears to be well conserved between eudicots and monocots, and to a lesser degree between the higher plants and C. reinhardtii. Most of the optimal codons end with a C or G base, regardless of the different nucleotide composition in these genomes. The results suggest that plant codon usage is affected by translational selection, and the selective pressure appears to be conserved in the plant kingdom.
Collapse
Affiliation(s)
- Liangjiang Wang
- Bioinformatics Center, Division of Biology, Kansas State University, Manhattan, KS 66506, USA.
| | | |
Collapse
|
34
|
Pasamontes A, Garcia-Vallve S. Use of a multi-way method to analyze the amino acid composition of a conserved group of orthologous proteins in prokaryotes. BMC Bioinformatics 2006; 7:257. [PMID: 16709240 PMCID: PMC1489954 DOI: 10.1186/1471-2105-7-257] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2005] [Accepted: 05/18/2006] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Amino acids in proteins are not used equally. Some of the differences in the amino acid composition of proteins are between species (mainly due to nucleotide composition and lifestyle) and some are between proteins from the same species (related to protein function, expression or subcellular localization, for example). As several factors contribute to the different amino acid usage in proteins, it is difficult both to analyze these differences and to separate the contributions made by each factor. RESULTS Using a multi-way method called Tucker3, we have analyzed the amino composition of a set of 64 orthologous groups of proteins present in 62 archaea and bacteria. This dataset corresponds to essential proteins such as ribosomal proteins, tRNA synthetases and translational initiation or elongation factors, which are common to all the species analyzed. The Tucker3 model can be used to study the amino acid variability within and between species by taking into consideration the tridimensionality of the data set. We found that the main factor behind the amino acid composition of proteins is independent of the organism or protein function analyzed. This factor must be related to the biochemical characteristics of each amino acid. The difference between the non-ribosomal proteins and the ribosomal proteins (which are rich in arginine and lysine) is the main factor behind the differences in amino acid composition within species, while G+C content and optimal growth temperature are the main factors behind the differences in amino acid usage between species. CONCLUSION We show that a multi-way method is useful for comparing the amino acid composition of several groups of orthologous proteins from the same group of species. This kind of dataset is extremely useful for detecting differences between and within species.
Collapse
Affiliation(s)
- Alberto Pasamontes
- Chemometrics, Qualimetrics and Nanosensors Group, Analytical and Organic Chemistry Department, Rovira i Virgili University (URV). Campus Sescelades, c/Marcelli Domingo s/n., 43007 Tarragona, Spain
| | - Santiago Garcia-Vallve
- Evolutionary Genomics Group, Biochemistry and Biotechnology Department, Rovira i Virgili University (URV). Campus Sescelades, c/Marcelli Domingo s/n., 43007 Tarragona, Spain
| |
Collapse
|
35
|
Zhou T, Weng J, Sun X, Lu Z. Support vector machine for classification of meiotic recombination hotspots and coldspots in Saccharomyces cerevisiae based on codon composition. BMC Bioinformatics 2006; 7:223. [PMID: 16640774 PMCID: PMC1463011 DOI: 10.1186/1471-2105-7-223] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2005] [Accepted: 04/26/2006] [Indexed: 11/30/2022] Open
Abstract
Background Meiotic double-strand breaks occur at relatively high frequencies in some genomic regions (hotspots) and relatively low frequencies in others (coldspots). Hotspots and coldspots are receiving increasing attention in research into the mechanism of meiotic recombination. However, predicting hotspots and coldspots from DNA sequence information is still a challenging task. Results We present a novel method for classification of hot and cold ORFs located in hotspots and coldspots respectively in Saccharomyces cerevisiae, using support vector machine (SVM), which relies on codon composition differences. This method has achieved a high classification accuracy of 85.0%. Since codon composition is a fusion of codon usage bias and amino acid composition signals, the ability of these two kinds of sequence attributes to discriminate hot ORFs from cold ORFs was also investigated separately. Our results indicate that neither codon usage bias nor amino acid composition taken separately performed as well as codon composition. Moreover, our SVM based method was applied to the full genome: We predicted the hot/cold ORFs from the yeast genome by using cutoffs of recombination rate. We found that the performance of our method for predicting cold ORFs is not as good as that for predicting hot ORFs. Besides, we also observed a considerable correlation between meiotic recombination rate and amino acid composition of certain residues, which probably reflects the structural and functional dissimilarity between the hot and cold groups. Conclusion We have introduced a SVM-based novel method to discriminate hot ORFs from cold ones. Applying codon composition as sequence attributes, we have achieved a high classification accuracy, which suggests that codon composition has strong potential to be used as sequence attributes in the prediction of hot and cold ORFs.
Collapse
Affiliation(s)
- Tong Zhou
- State Key Laboratory of Bioelectronics, Southeast University, Nanjing 210096, China
| | - Jianhong Weng
- State Key Laboratory of Bioelectronics, Southeast University, Nanjing 210096, China
| | - Xiao Sun
- State Key Laboratory of Bioelectronics, Southeast University, Nanjing 210096, China
| | - Zuhong Lu
- State Key Laboratory of Bioelectronics, Southeast University, Nanjing 210096, China
| |
Collapse
|
36
|
Yang C, Mills D, Mathee K, Wang Y, Jayachandran K, Sikaroodi M, Gillevet P, Entry J, Narasimhan G. An ecoinformatics tool for microbial community studies: Supervised classification of Amplicon Length Heterogeneity (ALH) profiles of 16S rRNA. J Microbiol Methods 2006; 65:49-62. [PMID: 16054254 DOI: 10.1016/j.mimet.2005.06.012] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2005] [Revised: 04/22/2005] [Accepted: 06/24/2005] [Indexed: 01/08/2023]
Abstract
Support vector machines (SVM) and K-nearest neighbors (KNN) are two computational machine learning tools that perform supervised classification. This paper presents a novel application of such supervised analytical tools for microbial community profiling and to distinguish patterning among ecosystems. Amplicon length heterogeneity (ALH) profiles from several hypervariable regions of 16S rRNA gene of eubacterial communities from Idaho agricultural soil samples and from Chesapeake Bay marsh sediments were separately analyzed. The profiles from all available hypervariable regions were concatenated to obtain a combined profile, which was then provided to the SVM and KNN classifiers. Each profile was labeled with information about the location or time of its sampling. We hypothesized that after a learning phase using feature vectors from labeled ALH profiles, both these classifiers would have the capacity to predict the labels of previously unseen samples. The resulting classifiers were able to predict the labels of the Idaho soil samples with high accuracy. The classifiers were less accurate for the classification of the Chesapeake Bay sediments suggesting greater similarity within the Bay's microbial community patterns in the sampled sites. The profiles obtained from the V1+V2 region were more informative than that obtained from any other single region. However, combining them with profiles from the V1 region (with or without the profiles from the V3 region) resulted in the most accurate classification of the samples. The addition of profiles from the V 9 region appeared to confound the classifiers. Our results show that SVM and KNN classifiers can be effectively applied to distinguish between eubacterial community patterns from different ecosystems based only on their ALH profiles.
Collapse
Affiliation(s)
- Chengyong Yang
- Bioinformatics Research Group (BioRG), School of Computer Science, Florida International University, Miami, Florida, 33199, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
37
|
Ishii K, Washio T, Uechi T, Yoshihama M, Kenmochi N, Tomita M. Characteristics and clustering of human ribosomal protein genes. BMC Genomics 2006; 7:37. [PMID: 16504170 PMCID: PMC1459141 DOI: 10.1186/1471-2164-7-37] [Citation(s) in RCA: 61] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2005] [Accepted: 02/28/2006] [Indexed: 11/20/2022] Open
Abstract
Background The ribosome is a central player in the translation system, which in mammals consists of four RNA species and 79 ribosomal proteins (RPs). The control mechanisms of gene expression and the functions of RPs are believed to be identical. Most RP genes have common promoters and were therefore assumed to have a unified gene expression control mechanism. Results We systematically analyzed the homogeneity and heterogeneity of RP genes on the basis of their expression profiles, promoter structures, encoded amino acid compositions, and codon compositions. The results revealed that (1) most RP genes are coordinately expressed at the mRNA level, with higher signals in the spleen, lymph node dissection (LND), and fetal brain. However, 17 genes, including the P protein genes (RPLP0, RPLP1, RPLP2), are expressed in a tissue-specific manner. (2) Most promoters have GC boxes and possible binding sites for nuclear respiratory factor 2, Yin and Yang 1, and/or activator protein 1. However, they do not have canonical TATA boxes. (3) Analysis of the amino acid composition of the encoded proteins indicated a high lysine and arginine content. (4) The major RP genes exhibit a characteristic synonymous codon composition with high rates of G or C in the third-codon position and a high content of AAG, CAG, ATC, GAG, CAC, and CTG. Conclusion Eleven of the RP genes are still identified as being unique and did not exhibit at least some of the above characteristics, indicating that they may have unknown functions not present in other RP genes. Furthermore, we found sequences conserved between human and mouse genes around the transcription start sites and in the intronic regions. This study suggests certain overall trends and characteristic features of human RP genes.
Collapse
Affiliation(s)
- Kyota Ishii
- Institute for Advanced Biosciences, Keio University, Tsuruoka, Yamagata 997-0035, Japan
- Graduate School of Media and Governance, Keio University, Fujisawa, Kanagawa 252-8520, Japan
| | - Takanori Washio
- Institute for Advanced Biosciences, Keio University, Tsuruoka, Yamagata 997-0035, Japan
- Graduate School of Information Science, Nara Institute of Science and Technology, Ikoma, Nara 630-0192, Japan
| | - Tamayo Uechi
- Frontier Science Research Center, University of Miyazaki, Kiyotake, Miyazaki 889-1692, Japan
| | - Maki Yoshihama
- Frontier Science Research Center, University of Miyazaki, Kiyotake, Miyazaki 889-1692, Japan
| | - Naoya Kenmochi
- Frontier Science Research Center, University of Miyazaki, Kiyotake, Miyazaki 889-1692, Japan
| | - Masaru Tomita
- Institute for Advanced Biosciences, Keio University, Tsuruoka, Yamagata 997-0035, Japan
- Department of Environmental Information, Keio University, Fujisawa, Kanagawa 252-8520, Japan
| |
Collapse
|
38
|
Pascal G, Médigue C, Danchin A. Persistent biases in the amino acid composition of prokaryotic proteins. Bioessays 2006; 28:726-38. [PMID: 16850406 DOI: 10.1002/bies.20431] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Correspondence analysis of 28 proteomes selected to span the entire realm of prokaryotes revealed universal biases in the proteins' amino acid distribution. Integral Inner Membrane Proteins always form an individual cluster, which can then be used to predict protein localisation in unknown proteomes, independently of the organism's biotope or kingdom. Orphan proteins are consistently rich in aromatic residues. Another bias is also ubiquitous: the amino acid composition is driven by the G + C content of the first codon position. An unexpected bias is driven, in many proteomes, by the AAN box of the genetic code, suggesting some functional biochemical relationship between asparagine and lysine. Less-significant biases are driven by the rare amino acids, cysteine and tryptophan. Some allow identification of species-specific functions or localisation such as surface or exported proteins. Errors in genome annotations are also revealed by correspondence analysis, making it useful for quality control and correction.
Collapse
Affiliation(s)
- Géraldine Pascal
- Genoscope/CNRS UMR 8030, Atelier de Génomique Comparative, Evry, France
| | | | | |
Collapse
|
39
|
Abstract
The levels of cellular organization in living organisms are the results of a variety of selection pressures. We have investigated here the final outcome of this integrated selective process in proteins of the best known microbial models Escherichia coli, Bacillus subtilis, and Methanococcus jannaschii, supposed to have undergone separate evolution for more than 1 billion years. Using multivariate analysis methods, including correspondence analysis, we studied the overall amino acid composition of all proteins making a proteome. Starting from and further developing previous results that had pointed out some general forces driving the amino acid composition of the proteomes of these model bacteria, we explored the correlations existing between the structure and functions of the proteins forming a proteome and their amino acid composition. The electric charge of amino acids measured against hydrophobicity creates a highly homogeneous cluster, made exclusively of proteins that are core components of the cytoplasmic membrane of the cell (integral inner membrane proteins). A second bias is imposed by the G+C content of the genome, indicating that protein functions are so robust with respect to amino acid changes that they can accommodate a large shift in the nucleotide content of the genome. A remarkable role of aromatic amino acids was uncovered. Expressed orphan proteins are enriched in these residues, suggesting that they might participate in a process of gain of function during evolution.
Collapse
Affiliation(s)
- Géraldine Pascal
- Genoscope/CNRS UMR 8030, Atelier de Génomique Comparative, Evry, France.
| | | | | |
Collapse
|
40
|
Mocellin S, Provenzano M, Rossi CR, Pilati P, Nitti D, Lise M. DNA array-based gene profiling: from surgical specimen to the molecular portrait of cancer. Ann Surg 2005; 241:16-26. [PMID: 15621987 PMCID: PMC1356842 DOI: 10.1097/01.sla.0000150157.83537.53] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
Cancer is a heterogeneous disease in most respects, including its cellularity, different genetic alterations, and diverse clinical behaviors. Traditional molecular analyses are reductionist, assessing only 1 or a few genes at a time, thus working with a biologic model too specific and limited to confront a process whose clinical outcome is likely to be governed by the combined influence of many genes. The potential of functional genomics is enormous, because for each experiment, thousands of relevant observations can be made simultaneously. Accordingly, DNA array, like other high-throughput technologies, might catalyze and ultimately accelerate the development of knowledge in tumor cell biology. Although in its infancy, the implementation of DNA array technology in cancer research has already provided investigators with novel data and intriguing new hypotheses on the molecular cascade leading to carcinogenesis, tumor aggressiveness, and sensitivity to antiblastic agents. Given the revolutionary implications that the use of this technology might have in the clinical management of patients with cancer, principles of DNA array-based tumor gene profiling need to be clearly understood for the data to be correctly interpreted and appreciated. In the present work, we discuss the technical features characterizing this powerful laboratory tool and review the applications so far described in the field of oncology.
Collapse
Affiliation(s)
- Simone Mocellin
- Surgery Branch, Department of Oncological and Surgical Sciences, University of Padova, Italy.
| | | | | | | | | | | |
Collapse
|
41
|
Mocellin S, Wang E, Panelli M, Pilati P, Marincola FM. DNA array-based gene profiling in tumor immunology. Clin Cancer Res 2005; 10:4597-606. [PMID: 15269130 DOI: 10.1158/1078-0432.ccr-04-0327] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Recent advances in tumor immunology have fostered the clinical implementation of different immunotherapy modalities. However, the alternate success of such regimens underscores the fact that the molecular mechanisms underlying tumor immune rejection are still poorly understood. Given the complexity of the immune system network and the multidimensionality of tumor-host interactions, the comprehension of tumor immunology might greatly benefit from high-throughput DNA array analysis, which can portray the molecular kinetics of immune response on a genome-wide scale, thus accelerating the accumulation of knowledge and ultimately catalyzing the development of new hypotheses in cell biology. Although in its infancy, the implementation of DNA array technology in tumor immunology studies has already provided investigators with novel data and intriguing hypotheses on the cascade of molecular events leading to an effective immune response against cancer. Although the principles of DNA array-based gene profiling techniques have become common knowledge, the need for mastering this technique to produce meaningful data and correctly interpret this enormous output of information is critical and represents a tremendous challenge for investigators. In the present work, we summarize the main technical features and critical issues characterizing this powerful laboratory tool and review its applications in the fascinating field of cancer immunogenomics.
Collapse
Affiliation(s)
- Simone Mocellin
- Department of Oncological and Surgical Sciences, University of Padova, Padua, Italy.
| | | | | | | | | |
Collapse
|
42
|
Wang L, Chen K, Ong YS. Bio-kernel Self-organizing Map for HIV Drug Resistance Classification. LECTURE NOTES IN COMPUTER SCIENCE 2005. [PMCID: PMC7122014 DOI: 10.1007/11539087_20] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Kernel self-organizing map has been recently studied by Fyfe and his colleagues [1]. This paper investigates the use of a novel bio-kernel function for the kernel self-organizing map. For verification, the application of the proposed new kernel self-organizing map to HIV drug resistance classification using mutation patterns in protease sequences is presented. The original self-organizing map together with the distributed encoding method was compared. It has been found that the use of the kernel self-organizing map with the novel bio-kernel function leads to better classification and faster convergence rate ...
Collapse
Affiliation(s)
- Lipo Wang
- School of Electrical and Electronic Engineering, Nanyang Technological University, Block S1, Nanyang Avenue, 639798 Singapore
| | - Ke Chen
- School of Software, Sun Yat-Sen University, 510275 Guangzhou, China
| | - Yew Soon Ong
- School of Computer Engineering, Nanyang Technological University, BLK N4, 2b-39, Nanyang Avenue, 639798 Singapore
| |
Collapse
|
43
|
Hsiang T, Goodwin PH. Distinguishing plant and fungal sequences in ESTs from infected plant tissues. J Microbiol Methods 2003; 54:339-51. [PMID: 12842480 DOI: 10.1016/s0167-7012(03)00067-8] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Expressed sequence tags (ESTs) from fungal-infected plant tissues are composed of a mixture of plant and fungal sequences. Using freely available software and tools, a novel procedure is described for distinguishing plant and fungal DNA sequences. Although the GenBank non-redundant (NR) database is larger and therefore one would presume that BLASTX analysis of it would be more accurate, superior resolution of 700 randomly selected fungal ESTs was found with Standalone TBLASTX analyses with a local matching database composed of a plant and a fungal genome. Standalone TBLASTX analyses of 3,983 ESTs from nine different fungal-infected plant EST libraries also proved to be superior in identifying the origin of sequences as either plant or fungal compared to GenBank BLASTX analysis. Standalone TBLASTX with a matching database comprised of a single plant and a single fungal genome appears to be a faster and more accurate method than BLASTX searches of the GenBank non-redundant database to distinguish fungal and plant sequences in mixed EST collections.
Collapse
Affiliation(s)
- Tom Hsiang
- Department of Environmental Biology, University of Guelph, Guelph, Ontario, Canada N1G 2W1.
| | | |
Collapse
|
44
|
Perrière G, Thioulouse J. Use and misuse of correspondence analysis in codon usage studies. Nucleic Acids Res 2002; 30:4548-55. [PMID: 12384602 PMCID: PMC137129 DOI: 10.1093/nar/gkf565] [Citation(s) in RCA: 120] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Correspondence analysis has frequently been used for codon usage studies but this method is often misused. Because amino acid composition exerts constraints on codon usage, it is common to use tables containing relative codon frequencies (or ratios of frequencies) instead of simple codon counts to get rid of these amino acid biases. The problem is that some important properties of correspondence analysis, such as rows weighting, are lost in the process. Moreover, the use of relative measures sometimes introduces other biases and often diminishes the quantity of information to analyse, occasionally resulting in interpretation errors. For instance, in the case of an organism such as Borrelia burgdorferi, the use of relative measures led to the conclusion that there was no translational selection, while analyses based on codon counts show that there is a possibility of a selective effect at that level. In this paper, we expose these problems and we propose alternative strategies to correspondence analysis for studying codon usage biases when amino acid composition effects must be removed.
Collapse
Affiliation(s)
- Guy Perrière
- Laboratoire de Biométrie et Biologie Evolutive, UMR CNRS 5558, Université Claude Bernard - Lyon 1, 43 Boulevard du 11 Novembre 1918, 69622 Villeurbanne Cedex, France.
| | | |
Collapse
|
45
|
Current awareness on yeast. Yeast 2002; 19:1277-84. [PMID: 12400546 DOI: 10.1002/yea.829] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
|
46
|
Current Awareness on Comparative and Functional Genomics. Comp Funct Genomics 2002. [PMCID: PMC2448418 DOI: 10.1002/cfg.121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
|