1
|
Wang Y, Jiang D, Guo K, Zhao L, Meng F, Xiao J, Niu Y, Sun Y. Comparative analysis of codon usage patterns in chloroplast genomes of ten Epimedium species. BMC Genom Data 2023; 24:3. [PMID: 36624369 PMCID: PMC9830715 DOI: 10.1186/s12863-023-01104-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Accepted: 01/05/2023] [Indexed: 01/11/2023] Open
Abstract
BACKGROUND The Phenomenon of codon usage bias exists in the genomes of prokaryotes and eukaryotes. The codon usage pattern is affected by environmental factors, base mutation, gene flow and gene expression level, among which natural selection and mutation pressure are the main factors. The study of codon preference is an effective method to analyze the source of evolutionary driving forces in organisms. Epimedium species are perennial herbs with ornamental and medicinal value distributed worldwide. The chloroplast genome is self-replicating and maternally inherited which is usually used to study species evolution, gene expression and genetic transformation. RESULTS The results suggested that chloroplast genomes of Epimedium species preferred to use codons ending with A/U. 17 common high-frequency codons and 2-6 optimal codons were found in the chloroplast genomes of Epimedium species, respectively. According to the ENc-plot, PR2-plot and neutrality-plot, the formation of codon preference in Epimedium was affected by multiple factors, and natural selection was the dominant factor. By comparing the codon usage frequency with 4 common model organisms, it was found that Arabidopsis thaliana, Populus trichocarpa, and Saccharomyces cerevisiae were suitable exogenous expression receptors. CONCLUSION The evolutionary driving force in the chloroplast genomes of 10 Epimedium species probably comes from mutation pressure. Our results provide an important theoretical basis for evolutionary analysis and transgenic research of chloroplast genes.
Collapse
Affiliation(s)
- Yingzhe Wang
- grid.449428.70000 0004 1797 7280College of Pharmacy, Jining Medical University, Rizhao, Shandong China ,grid.440665.50000 0004 1757 641XSchool of Pharmaceutical Sciences, Changchun University of Chinese Medicine, Changchun, Jilin China
| | - Dacheng Jiang
- grid.440665.50000 0004 1757 641XSchool of Pharmaceutical Sciences, Changchun University of Chinese Medicine, Changchun, Jilin China
| | - Kun Guo
- grid.440665.50000 0004 1757 641XSchool of Pharmaceutical Sciences, Changchun University of Chinese Medicine, Changchun, Jilin China
| | - Lei Zhao
- grid.440665.50000 0004 1757 641XSchool of Pharmaceutical Sciences, Changchun University of Chinese Medicine, Changchun, Jilin China
| | - Fangfang Meng
- grid.440665.50000 0004 1757 641XSchool of Pharmaceutical Sciences, Changchun University of Chinese Medicine, Changchun, Jilin China
| | - Jinglei Xiao
- grid.440665.50000 0004 1757 641XSchool of Pharmaceutical Sciences, Changchun University of Chinese Medicine, Changchun, Jilin China
| | - Yuan Niu
- Lanzhou Agro-Technical Research and Popularization Center, Lanzhou, Gansu China
| | - Yunlong Sun
- grid.449428.70000 0004 1797 7280College of Pharmacy, Jining Medical University, Rizhao, Shandong China
| |
Collapse
|
2
|
Gao Y, Lu Y, Song Y, Jing L. Analysis of codon usage bias of WRKY transcription factors in Helianthus annuus. BMC Genom Data 2022; 23:46. [PMID: 35725374 PMCID: PMC9210703 DOI: 10.1186/s12863-022-01064-8] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2021] [Accepted: 06/13/2022] [Indexed: 11/10/2022] Open
Abstract
Abstract
Background
The phenomenon of codon usage bias is known to exist in many genomes and is mainly determined by mutation and selection. Codon usage bias analysis is a suitable strategy for identifying the principal evolutionary driving forces in different organisms. Sunflower (Helianthus annuus L.) is an annual crop that is cultivated worldwide as ornamentals, food plants and for their valuable oil. The WRKY family genes in plants play a central role in diverse regulation and multiple stress responses. Evolutionary analysis of WRKY family genes of H. annuus can provide rich genetic information for developing hybridization resources of the genus Helianthus.
Results
Bases composition analysis showed the average GC content of WRKY genes of H. annuus was 43.42%, and the average GC3 content was 39.60%, suggesting that WRKY gene family prefers A/T(U) ending codons. There were 29 codons with relative synonymous codon usage (RSCU) greater than 1 and 22 codons ending with A and U base. The effective number of codons (ENC) and codon adaptation index (CAI) in WRKY genes ranged from 43.47–61.00 and 0.14–0.26, suggesting that the codon bias was weak and WRKY genes expression level was low. Neutrality analysis found a significant correlation between GC12 and GC3. ENC-plot showed most genes on or close to the expected curve, suggesting that mutational bias played a major role in shaping codon usage. The Parity Rule 2 plot (PR2) analysis showed that the usage of AT and GC was disproportionate. A total of three codons were identified as the optimal codons.
Conclusion
Apart from natural selection effects, most of the genetic evolution in the H. annuus WRKY genome might be driven by mutation pressure. Our results provide a theoretical foundation for elaborating the genetic architecture and mechanisms of H. annuus and contributing to enrich H. annuus genetic resources.
Collapse
|
3
|
Yang J, Chu Q, Meng G, Kong W. The complete chloroplast genome sequences of three Broussonetia species and comparative analysis within the Moraceae. PeerJ 2022; 10:e14293. [PMID: 36340196 PMCID: PMC9632464 DOI: 10.7717/peerj.14293] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2022] [Accepted: 10/03/2022] [Indexed: 01/22/2023] Open
Abstract
Background Species of Broussonetia (family Moraceae) are commonly used to make textiles and high-grade paper. The distribution of Broussonetia papyrifera L. is considered to be related to the spread and location of humans. The complete chloroplast (cp) genomes of B. papyrifera, Broussonetia kazinoki Sieb., and Broussonetia kaempferi Sieb. were analyzed to better understand the status and evolutionary biology of the genus Broussonetia. Methods The cp genomes were assembled and characterized using SOAPdenovo2 and DOGMA. Phylogenetic and molecular dating analysis were performed using the concatenated nucleotide sequences of 35 species in the Moraceae family and were based on 66 protein-coding genes (PCGs). An analysis of the sequence divergence (pi) of each PCG among the 35 cp genomes was conducted using DnaSP v6. Codon usage indices were calculated using the CodonW program. Results All three cp genomes had the typical land plant quadripartite structure, ranging in size from 160,239 bp to 160,841 bp. The ribosomal protein L22 gene (RPL22) was either incomplete or missing in all three Broussonetia species. Phylogenetic analysis revealed two clades. Clade 1 included Morus and Artocarpus, whereas clade 2 included the other seven genera. Malaisia scandens Lour. was clustered within the genus Broussonetia. The differentiation of Broussonetia was estimated to have taken place 26 million years ago. The PCGs' pi values ranged from 0.0005 to 0.0419, indicating small differences within the Moraceae family. The distribution of most of the genes in the effective number of codons plot (ENc-plot) fell on or near the trend line; the slopes of the trend line of neutrality plots were within the range of 0.0363-0.171. These results will facilitate the identification, taxonomy, and utilization of the Broussonetia species and further the evolutionary studies of the Moraceae family.
Collapse
Affiliation(s)
- Jinhong Yang
- Shaanxi Key Laboratory of Sericulture, Ankang University, Ankang, China
| | - Qu Chu
- Shaanxi Key Laboratory of Sericulture, Ankang University, Ankang, China
| | - Gang Meng
- Shaanxi Key Laboratory of Sericulture, Ankang University, Ankang, China
| | - Weiqing Kong
- Shaanxi Key Laboratory of Sericulture, Ankang University, Ankang, China
| |
Collapse
|
4
|
Zhang Y, Shen Z, Meng X, Zhang L, Liu Z, Liu M, Zhang F, Zhao J. Codon usage patterns across seven Rosales species. BMC PLANT BIOLOGY 2022; 22:65. [PMID: 35123393 PMCID: PMC8817548 DOI: 10.1186/s12870-022-03450-x] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/24/2021] [Accepted: 01/31/2022] [Indexed: 05/03/2023]
Abstract
BACKGROUND Codon usage bias (CUB) analysis is an effective method for studying specificity, evolutionary relationships, and mRNA translation and discovering new genes among various species. In general, CUB analysis is mainly performed within one species or between closely related species and no such study has been applied among species with distant genetic relationships. Here, seven Rosales species with high economic value were selected to conduct CUB analysis. RESULTS The results showed that the average GC1, GC2 and GC3 contents were 51.08, 40.52 and 43.12%, respectively, indicating that the A/T content is more abundant and the Rosales species prefer A/T as the last codon. Neutrality plot and ENc plot analysis revealed that natural selection was the main factor leading to CUB during the evolution of Rosales species. All 7 Rosales species contained three high-frequency codons, AGA, GTT and TTG, encoding Arg, Val and Leu, respectively. The 7 Rosales species differed in high-frequency codon pairs and the distribution of GC3, though the usage patterns of closely related species were more consistent. The results of the biclustering heat map among 7 Rosales species and 20 other species were basically consistent with the results of genome data, suggesting that CUB analysis is an effective method for revealing evolutionary relationships among species at the family or order level. In addition, chlorophytes prefer using G/C as ending codon, while monocotyledonous and dicotyledonous plants prefer using A/T as ending codon. CONCLUSIONS The CUB pattern among Rosales species was mainly affected by natural selection. This work is the first to highlight the CUB patterns and characteristics of Rosales species and provides a new perspective for studying genetic relationships across a wide range of species.
Collapse
Affiliation(s)
- Yao Zhang
- College of Life Science, Hebei Agricultural University, Baoding, China
- Hebei Key Laboratory of Plant Physiology and Molecular Pathology, Hebei Agricultural University, Baoding, China
| | - Zenan Shen
- High Performance Computer Research Center, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190 China
| | - Xiangrui Meng
- College of Life Science, Hebei Agricultural University, Baoding, China
- Hebei Key Laboratory of Plant Physiology and Molecular Pathology, Hebei Agricultural University, Baoding, China
| | - Liman Zhang
- College of Life Science, Hebei Agricultural University, Baoding, China
- Hebei Key Laboratory of Plant Physiology and Molecular Pathology, Hebei Agricultural University, Baoding, China
| | - Zhiguo Liu
- Research Center of Chinese Jujube, Hebei Agricultural University, Baoding, China
| | - Mengjun Liu
- Research Center of Chinese Jujube, Hebei Agricultural University, Baoding, China
| | - Fa Zhang
- High Performance Computer Research Center, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190 China
| | - Jin Zhao
- College of Life Science, Hebei Agricultural University, Baoding, China
- Hebei Key Laboratory of Plant Physiology and Molecular Pathology, Hebei Agricultural University, Baoding, China
| |
Collapse
|
5
|
Allen QM, Febres VJ, Rathinasabapathi B, Chaparro JX. Engineering a Plant-Derived Astaxanthin Synthetic Pathway Into Nicotiana benthamiana. FRONTIERS IN PLANT SCIENCE 2022; 12:831785. [PMID: 35116052 PMCID: PMC8804313 DOI: 10.3389/fpls.2021.831785] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/08/2021] [Accepted: 12/27/2021] [Indexed: 06/14/2023]
Abstract
Carotenoids have been shown to be essential for human nutrition. Consumption of carotenoid-rich fruits and vegetables can reduce the risk of many diseases. The ketocarotenoid astaxanthin has become a commercially valuable compound due to its powerful antioxidant properties compared to other carotenoids. It is naturally produced in certain algae, bacteria, and the flowers of some species of the genus Adonis, although it is produced in such small quantities in these organisms that it is costly to extract. Chemical synthesis of this compound has also shown limited success with a high proportion of esterified forms of astaxanthin being produced, which decreases antioxidant properties by the conversion of hydroxyl groups to esters. Previously, transgenic astaxanthin-producing plants have been created using a β-carotene ketolase enzyme of either bacterial or algal origin. However, a novel astaxanthin pathway exists in the flowering plants of the genus Adonis which has not been utilized in the same manner. The pathway involves two unique enzymes, β-ring-4-dehydrogenase and 4-hydroxy-β-ring-4-dehydrogenase, which add the necessary hydroxyl and ketone groups to the rings of β-carotene. In the present study, Nicotiana benthamiana plants were transformed with chimeric constructs coding for these two enzymes. The regenerated, transgenic plants accumulate astaxanthin and their growth (height and weight) was unaffected, when compared to non-transformed N. benthamiana and to plants transformed with the bacterial β-carotene ketolase. The accumulation of astaxanthin also improved seedling survivability under harsh UV light, mitigated reactive oxygen accumulation, and provided a phenotype (color) that allowed the efficient identification and recovery of transgenic plants with and without selection.
Collapse
Affiliation(s)
- Quinton M. Allen
- Horticultural Sciences Department, University of Florida, Gainesville, FL, United States
| | - Vicente J. Febres
- Horticultural Sciences Department, University of Florida, Gainesville, FL, United States
| | - Bala Rathinasabapathi
- Horticultural Sciences Department, University of Florida, Gainesville, FL, United States
- Plant Molecular and Cellular Biology Program, Horticultural Sciences Department, University of Florida, Gainesville, FL, United States
| | - José X. Chaparro
- Horticultural Sciences Department, University of Florida, Gainesville, FL, United States
| |
Collapse
|
6
|
Abstract
Codon usage bias is the preferential or non-random use of synonymous codons, a ubiquitous phenomenon observed in bacteria, plants and animals. Different species have consistent and characteristic codon biases. Codon bias varies not only with species, family or group within kingdom, but also between the genes within an organism. Codon usage bias has evolved through mutation, natural selection, and genetic drift in various organisms. Genome composition, GC content, expression level and length of genes, position and context of codons in the genes, recombination rates, mRNA folding, and tRNA abundance and interactions are some factors influencing codon bias. The factors shaping codon bias may also be involved in evolution of the universal genetic code. Codon-usage bias is critical factor determining gene expression and cellular function by influencing diverse processes such as RNA processing, protein translation and protein folding. Codon usage bias reflects the origin, mutation patterns and evolution of the species or genes. Investigations of codon bias patterns in genomes can reveal phylogenetic relationships between organisms, horizontal gene transfers, molecular evolution of genes and identify selective forces that drive their evolution. Most important application of codon bias analysis is in the design of transgenes, to increase gene expression levels through codon optimization, for development of transgenic crops. The review gives an overview of deviations of genetic code, factors influencing codon usage or bias, codon usage bias of nuclear and organellar genes, computational methods to determine codon usage and the significance as well as applications of codon usage analysis in biological research, with emphasis on plants.
Collapse
Affiliation(s)
| | - Varatharajalu Udayasuriyan
- Department of Biotechnology, Centre for Plant Molecular Biology and Biotechnology, Tamil Nadu Agricultural University, Coimbatore, 641003, India
| | - Vijaipal Bhadana
- ICAR-Indian Institute of Agricultural Biotechnology, Ranchi, Jharkhand, 834010, India
| |
Collapse
|
7
|
Patil SS, Indrabalan UB, Suresh KP, Shome BR. Analysis of codon usage bias of classical swine fever virus. Vet World 2021; 14:1450-1458. [PMID: 34316191 PMCID: PMC8304411 DOI: 10.14202/vetworld.2021.1450-1458] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2021] [Accepted: 04/21/2021] [Indexed: 11/22/2022] Open
Abstract
Background and Aim: Classical swine fever (CSF), caused by CSF virus (CSFV), is a highly contagious disease in pigs causing 100% mortality in susceptible adult pigs and piglets. High mortality rate in pigs causes huge economic loss to pig farmers. CSFV has a positive-sense RNA genome of 12.3 kb in length flanked by untranslated regions at 5’ and 3’ end. The genome codes for a large polyprotein of 3900 amino acids coding for 11 viral proteins. The 1300 codons in the polyprotein are coded by different combinations of three nucleotides which help the infectious agent to evolve itself and adapt to the host environment. This study performed and employed various methods/techniques to estimate the changes occurring in the process of CSFV evolution by analyzing the codon usage pattern. Materials and Methods: The evolution of viruses is widely studied by analyzing their nucleotides and coding regions/codons using various methods. A total of 115 complete coding regions of CSFVs including one complete genome from our laboratory (MH734359) were included in this study and analysis was carried out using various methods in estimating codon usage bias and evolution. This study elaborates on the factors that influence the codon usage pattern. Results: The effective number of codons (ENC) and relative synonymous codon usage showed the presence of codon usage bias. The mononucleotide (A) has a higher frequency compared to the other mononucleotides (G, C, and T). The dinucleotides CG and CC are underrepresented and overrepresented. The codons CGT was underrepresented and AGG was overrepresented. The codon adaptation index value of 0.71 was obtained indicating that there is a similarity in the codon usage bias. The principal component analysis, ENC-plot, Neutrality plot, and Parity Rule 2 plot produced in this article indicate that the CSFV is influenced by the codon usage bias. The mutational pressure and natural selection are the important factors that influence the codon usage bias. Conclusion: The study provides useful information on the codon usage analysis of CSFV and may be utilized to understand the host adaptation to virus environment and its evolution. Further, such findings help in new gene discovery, design of primers/probes, design of transgenes, determination of the origin of species, prediction of gene expression level, and gene function of CSFV. To the best of our knowledge, this is the first study on codon usage bias involving such a large number of complete CSFVs including one sequence of CSFV from India.
Collapse
Affiliation(s)
- Sharanagouda S Patil
- ICAR-National Institute of Veterinary Epidemiology and Disease Informatics (NIVEDI), Yelahanka, Bengaluru, Karnataka, India
| | - Uma Bharathi Indrabalan
- ICAR-National Institute of Veterinary Epidemiology and Disease Informatics (NIVEDI), Yelahanka, Bengaluru, Karnataka, India
| | | | - Bibek Ranjan Shome
- ICAR-National Institute of Veterinary Epidemiology and Disease Informatics (NIVEDI), Yelahanka, Bengaluru, Karnataka, India
| |
Collapse
|
8
|
Shen Z, Gan Z, Zhang F, Yi X, Zhang J, Wan X. Analysis of codon usage patterns in citrus based on coding sequence data. BMC Genomics 2020; 21:234. [PMID: 33327935 PMCID: PMC7739459 DOI: 10.1186/s12864-020-6641-x] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2020] [Accepted: 03/03/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Codon usage is an important determinant of gene expression levels that can help us understand codon biology, evolution and mRNA translation of species. The majority of previous codon usage studies have focused on single species analysis, although few studies have focused on the species within the same genus. In this study, we proposed a multispecies codon usage analysis workflow to reveal the genetic features and correlation in citrus. RESULTS Our codon usage analysis workflow was based on the GC content, GC plot, and relative synonymous codon usage value of each codon in 8 citrus species. This approach allows for the comparison of codon usage bias of different citrus species. Next, we performed cluster analysis and obtained an overview of the relationship in citrus. However, traditional methods cannot conduct quantitative analysis of the correlation. To further estimate the correlation among the citrus species, we used the frequency profile to construct feature vectors of each species. The Pearson correlation coefficient was used to quantitatively analyze the distance among the citrus species. This result was consistent with the cluster analysis. CONCLUSIONS Our findings showed that the citrus species are conserved at the genetic level and demonstrated the existing genetic evolutionary relationship in citrus. This work provides new insights into codon biology and the evolution of citrus and other plant species.
Collapse
Affiliation(s)
- Zenan Shen
- High Performance Computer Research Center, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190, China.,University of Chinese Academy of Sciences, Beijing, 100000, China
| | - Zhimeng Gan
- Key Laboratory of Horticultural Plant Biology (Ministry of Education), College of Horticulture and Forestry Science, Huazhong Agricultural University, Wuhan, 430070, China
| | - Fa Zhang
- High Performance Computer Research Center, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190, China.,University of Chinese Academy of Sciences, Beijing, 100000, China
| | - Xinyao Yi
- Department of Computer Science and Engineering, University of South Carolina, Colombia, 29201, USA
| | - Jinzhi Zhang
- Key Laboratory of Horticultural Plant Biology (Ministry of Education), College of Horticulture and Forestry Science, Huazhong Agricultural University, Wuhan, 430070, China
| | - Xiaohua Wan
- High Performance Computer Research Center, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190, China. .,University of Chinese Academy of Sciences, Beijing, 100000, China.
| |
Collapse
|
9
|
Biswas KK, Palchoudhury S, Chakraborty P, Bhattacharyya UK, Ghosh DK, Debnath P, Ramadugu C, Keremane ML, Khetarpal RK, Lee RF. Codon Usage Bias Analysis of Citrus tristeza Virus: Higher Codon Adaptation to Citrus reticulata Host. Viruses 2019; 11:v11040331. [PMID: 30965565 PMCID: PMC6521185 DOI: 10.3390/v11040331] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2019] [Revised: 03/25/2019] [Accepted: 04/03/2019] [Indexed: 12/16/2022] Open
Abstract
Citrus tristeza virus (CTV), a member of the aphid-transmitted closterovirus group, is the causal agent of the notorious tristeza disease in several citrus species worldwide. The codon usage patterns of viruses reflect the evolutionary changes for optimization of their survival and adaptation in their fitness to the external environment and the hosts. The codon usage adaptation of CTV to specific citrus hosts remains to be studied; thus, its role in CTV evolution is not clearly comprehended. Therefore, to better explain the host–virus interaction and evolutionary history of CTV, the codon usage patterns of the coat protein (CP) genes of 122 CTV isolates originating from three economically important citrus hosts (55 isolate from Citrus sinensis, 38 from C. reticulata, and 29 from C. aurantifolia) were studied using several codon usage indices and multivariate statistical methods. The present study shows that CTV displays low codon usage bias (CUB) and higher genomic stability. Neutrality plot and relative synonymous codon usage analyses revealed that the overall influence of natural selection was more profound than that of mutation pressure in shaping the CUB of CTV. The contribution of high-frequency codon analysis and codon adaptation index value show that CTV has host-specific codon usage patterns, resulting in higheradaptability of CTV isolates originating from C. reticulata (Cr-CTV), and low adaptability in the isolates originating from C. aurantifolia (Ca-CTV) and C. sinensis (Cs-CTV). The combination of codon analysis of CTV with citrus genealogy suggests that CTV evolved in C. reticulata or other Citrus progenitors. The outcome of the study enhances the understanding of the factors involved in viral adaptation, evolution, and fitness toward their hosts. This information will definitely help devise better management strategies of CTV.
Collapse
Affiliation(s)
- Kajal Kumar Biswas
- Advanced Centre for Plant Virology, Division of Plant Pathology, ICAR-Indian Agricultural Research Institute, New Delhi 11012, India.
| | - Supratik Palchoudhury
- Advanced Centre for Plant Virology, Division of Plant Pathology, ICAR-Indian Agricultural Research Institute, New Delhi 11012, India.
| | - Prosenjit Chakraborty
- Advanced Centre for Plant Virology, Division of Plant Pathology, ICAR-Indian Agricultural Research Institute, New Delhi 11012, India.
| | - Utpal K Bhattacharyya
- Advanced Centre for Plant Virology, Division of Plant Pathology, ICAR-Indian Agricultural Research Institute, New Delhi 11012, India.
| | - Dilip K Ghosh
- ICAR-Central Citrus Research Institute, Nagpur 440033, India.
| | - Palash Debnath
- Department of Plant Pathology, Assam Agricultural University, Jorhat 785013, India.
| | - Chandrika Ramadugu
- Department of Botany and Plant Sciences, University of California, Riverside, CA 92507, USA.
| | - Manjunath L Keremane
- National Clonal Germplasm Repository for Citrus & Dates, United States Department of Agriculture-Agricultural Research Service, Riverside, CA 92507, USA.
| | - Ravi K Khetarpal
- Asia-Pacific Association of Agricultural Research Institutions, Bangkok 10100, Thailand.
| | - Richard F Lee
- National Clonal Germplasm Repository for Citrus & Dates, United States Department of Agriculture-Agricultural Research Service, Riverside, CA 92507, USA.
| |
Collapse
|
10
|
Chan KL, Tatarinova TV, Rosli R, Amiruddin N, Azizi N, Halim MAA, Sanusi NSNM, Jayanthi N, Ponomarenko P, Triska M, Solovyev V, Firdaus-Raih M, Sambanthamurthi R, Murphy D, Low ETL. Evidence-based gene models for structural and functional annotations of the oil palm genome. Biol Direct 2017; 12:21. [PMID: 28886750 PMCID: PMC5591544 DOI: 10.1186/s13062-017-0191-4] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2017] [Accepted: 08/07/2017] [Indexed: 11/13/2022] Open
Abstract
Background Oil palm is an important source of edible oil. The importance of the crop, as well as its long breeding cycle (10-12 years) has led to the sequencing of its genome in 2013 to pave the way for genomics-guided breeding. Nevertheless, the first set of gene predictions, although useful, had many fragmented genes. Classification and characterization of genes associated with traits of interest, such as those for fatty acid biosynthesis and disease resistance, were also limited. Lipid-, especially fatty acid (FA)-related genes are of particular interest for the oil palm as they specify oil yields and quality. This paper presents the characterization of the oil palm genome using different gene prediction methods and comparative genomics analysis, identification of FA biosynthesis and disease resistance genes, and the development of an annotation database and bioinformatics tools. Results Using two independent gene-prediction pipelines, Fgenesh++ and Seqping, 26,059 oil palm genes with transcriptome and RefSeq support were identified from the oil palm genome. These coding regions of the genome have a characteristic broad distribution of GC3 (fraction of cytosine and guanine in the third position of a codon) with over half the GC3-rich genes (GC3 ≥ 0.75286) being intronless. In comparison, only one-seventh of the oil palm genes identified are intronless. Using comparative genomics analysis, characterization of conserved domains and active sites, and expression analysis, 42 key genes involved in FA biosynthesis in oil palm were identified. For three of them, namely EgFABF, EgFABH and EgFAD3, segmental duplication events were detected. Our analysis also identified 210 candidate resistance genes in six classes, grouped by their protein domain structures. Conclusions We present an accurate and comprehensive annotation of the oil palm genome, focusing on analysis of important categories of genes (GC3-rich and intronless), as well as those associated with important functions, such as FA biosynthesis and disease resistance. The study demonstrated the advantages of having an integrated approach to gene prediction and developed a computational framework for combining multiple genome annotations. These results, available in the oil palm annotation database (http://palmxplore.mpob.gov.my), will provide important resources for studies on the genomes of oil palm and related crops. Reviewers This article was reviewed by Alexander Kel, Igor Rogozin, and Vladimir A. Kuznetsov. Electronic supplementary material The online version of this article (doi:10.1186/s13062-017-0191-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Kuang-Lim Chan
- Advanced Biotechnology and Breeding Centre, Malaysian Palm Oil Board, No. 6, Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor, Malaysia.,Faculty of Science and Technology, Universiti Kebangsaan Malaysia, 43600, Bangi, Selangor, Malaysia
| | - Tatiana V Tatarinova
- Department of Biology, University of La Verne, La Verne, California, 91750, USA.,Spatial Sciences Institute, University of Southern California, Los Angeles, CA, 90089, USA
| | - Rozana Rosli
- Advanced Biotechnology and Breeding Centre, Malaysian Palm Oil Board, No. 6, Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor, Malaysia.,Genomics and Computational Biology Research Group, University of South Wales, Pontypridd, CF371DL, UK
| | - Nadzirah Amiruddin
- Advanced Biotechnology and Breeding Centre, Malaysian Palm Oil Board, No. 6, Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor, Malaysia
| | - Norazah Azizi
- Advanced Biotechnology and Breeding Centre, Malaysian Palm Oil Board, No. 6, Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor, Malaysia
| | - Mohd Amin Ab Halim
- Advanced Biotechnology and Breeding Centre, Malaysian Palm Oil Board, No. 6, Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor, Malaysia
| | - Nik Shazana Nik Mohd Sanusi
- Advanced Biotechnology and Breeding Centre, Malaysian Palm Oil Board, No. 6, Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor, Malaysia
| | - Nagappan Jayanthi
- Advanced Biotechnology and Breeding Centre, Malaysian Palm Oil Board, No. 6, Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor, Malaysia
| | - Petr Ponomarenko
- Spatial Sciences Institute, University of Southern California, Los Angeles, CA, 90089, USA
| | - Martin Triska
- Children's Hospital Los Angeles, University of Southern California, Los Angeles, CA, 90089, USA
| | - Victor Solovyev
- Softberry Inc., 116 Radio Circle, Suite 400, Mount Kisco, NY, 10549, USA
| | - Mohd Firdaus-Raih
- Faculty of Science and Technology, Universiti Kebangsaan Malaysia, 43600, Bangi, Selangor, Malaysia
| | - Ravigadevi Sambanthamurthi
- Advanced Biotechnology and Breeding Centre, Malaysian Palm Oil Board, No. 6, Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor, Malaysia
| | - Denis Murphy
- Genomics and Computational Biology Research Group, University of South Wales, Pontypridd, CF371DL, UK
| | - Eng-Ti Leslie Low
- Advanced Biotechnology and Breeding Centre, Malaysian Palm Oil Board, No. 6, Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor, Malaysia.
| |
Collapse
|
11
|
Sablok G, Chen TW, Lee CC, Yang C, Gan RC, Wegrzyn JL, Porta NL, Nayak KC, Huang PJ, Varotto C, Tang P. ChloroMitoCU: Codon patterns across organelle genomes for functional genomics and evolutionary applications. DNA Res 2017; 24:327-332. [PMID: 28419256 PMCID: PMC5499650 DOI: 10.1093/dnares/dsw044] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2016] [Accepted: 09/14/2016] [Indexed: 01/01/2023] Open
Abstract
Organelle genomes are widely thought to have arisen from reduction events involving cyanobacterial and archaeal genomes, in the case of chloroplasts, or α-proteobacterial genomes, in the case of mitochondria. Heterogeneity in base composition and codon preference has long been the subject of investigation of topics ranging from phylogenetic distortion to the design of overexpression cassettes for transgenic expression. From the overexpression point of view, it is critical to systematically analyze the codon usage patterns of the organelle genomes. In light of the importance of codon usage patterns in the development of hyper-expression organelle transgenics, we present ChloroMitoCU, the first-ever curated, web-based reference catalog of the codon usage patterns in organelle genomes. ChloroMitoCU contains the pre-compiled codon usage patterns of 328 chloroplast genomes (29,960 CDS) and 3,502 mitochondrial genomes (49,066 CDS), enabling genome-wide exploration and comparative analysis of codon usage patterns across species. ChloroMitoCU allows the phylogenetic comparison of codon usage patterns across organelle genomes, the prediction of codon usage patterns based on user-submitted transcripts or assembled organelle genes, and comparative analysis with the pre-compiled patterns across species of interest. ChloroMitoCU can increase our understanding of the biased patterns of codon usage in organelle genomes across multiple clades. ChloroMitoCU can be accessed at: http://chloromitocu.cgu.edu.tw/
Collapse
Affiliation(s)
- Gaurav Sablok
- Department of Biodiversity and Molecular Ecology, Research and Innovation Centre, Fondazione Edmund Mach, Via E. Mach 1, 38010 S. Michele all'Adige (TN), Italy
| | - Ting-Wen Chen
- Bioinformatics Core Laboratory, Molecular Medicine Research Center, Chang Gung University, Kweishan, Taoyuan 333, Taiwan
| | - Chi-Ching Lee
- Bioinformatics Core Laboratory, Molecular Medicine Research Center, Chang Gung University, Kweishan, Taoyuan 333, Taiwan
| | - Chi Yang
- Bioinformatics Core Laboratory, Molecular Medicine Research Center, Chang Gung University, Kweishan, Taoyuan 333, Taiwan
| | - Ruei-Chi Gan
- Bioinformatics Core Laboratory, Molecular Medicine Research Center, Chang Gung University, Kweishan, Taoyuan 333, Taiwan
| | - Jill L Wegrzyn
- Department of Ecology and Evolutionary Biology, University 10 of Connecticut, 75 North Eagleville Road, Storrs, CT 06269-3043 USA
| | - Nicola L Porta
- Department of Sustainable Agrobiosystems and Bioresources, Research and Innovation Centre, Fondazione Edmund Mach, Via E. Mach 1, 38010 S. Michele all'Adige (TN), Italy.,MOUNTFOR Project Centre, European Forest Institute, Via E. Mach 1, 38010 San Michele all'Adige, Trento, Italy
| | - Kinshuk C Nayak
- Bioinformatics Centre, Institute of Life Sciences, Department of Biotechnology, Govt. India, Nalco Square, Bhubaneswar - 751 023, India
| | - Po-Jung Huang
- Bioinformatics Core Laboratory, Molecular Medicine Research Center, Chang Gung University, Kweishan, Taoyuan 333, Taiwan
| | - Claudio Varotto
- Department of Biodiversity and Molecular Ecology, Research and Innovation Centre, Fondazione Edmund Mach, Via E. Mach 1, 38010 S. Michele all'Adige (TN), Italy
| | - Petrus Tang
- Bioinformatics Core Laboratory, Molecular Medicine Research Center, Chang Gung University, Kweishan, Taoyuan 333, Taiwan.,Molecular Infectious Diseases Research Center, Chang Gung Memorial Hospital, Kweishan, Taoyuan 333, Taiwan
| |
Collapse
|
12
|
Abstract
Mistranslation errors compromise fitness by wasting resources on nonfunctional proteins. In order to reduce the cost of mistranslations, natural selection chooses the most accurately translated codons at sites that are particularly important for protein structure and function. We investigated the determinants underlying selection for translational accuracy in several species of plants belonging to three clades: Brassicaceae, Fabidae, and Poaceae. Although signatures of translational selection were found in genes from a wide range of species, the underlying factors varied in nature and intensity. Indeed, the degree of synonymous codon bias at evolutionarily conserved sites varied among plant clades while remaining uniform within each clade. This is unlikely to solely reflect the diversity of tRNA pools because there is little correlation between synonymous codon bias and tRNA abundance, so other factors must affect codon choice and translational accuracy in plant genes. Accordingly, synonymous codon choice at a given site was affected not only by the selection pressure at that site, but also its participation in protein domains or mRNA secondary structures. Although these effects were detected in all the species we analyzed, their impact on translation accuracy was distinct in evolutionarily distant plant clades. The domain effect was found to enhance translational accuracy in dicot and monocot genes with a high GC content, but to oppose the selection of more accurate codons in monocot genes with a low GC content.
Collapse
|
13
|
Kong WQ, Yang JH. The complete chloroplast genome sequence of Morus cathayana and Morus multicaulis, and comparative analysis within genus Morus L. PeerJ 2017; 5:e3037. [PMID: 28286710 PMCID: PMC5345388 DOI: 10.7717/peerj.3037] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2016] [Accepted: 01/27/2017] [Indexed: 11/20/2022] Open
Abstract
Trees in the Morus genera belong to the Moraceae family. To better understand the species status of genus Morus and to provide information for studies on evolutionary biology within the genus, the complete chloroplast (cp) genomes of M. cathayana and M. multicaulis were sequenced. The plastomes of the two species are 159,265 bp and 159,103 bp, respectively, with corresponding 83 and 82 simple sequence repeats (SSRs). Similar to the SSRs of M. mongolica and M. indica cp genomes, more than 70% are mononucleotides, ten are in coding regions, and one exhibits nucleotide content polymorphism. Results for codon usage and relative synonymous codon usage show a strong bias towards NNA and NNT codons in the two cp genomes. Analysis of a plot of the effective number of codons (ENc) for five Morus spp. cp genomes showed that most genes follow the standard curve, but several genes have ENc values below the expected curve. The results indicate that both natural selection and mutational bias have contributed to the codon bias. Ten highly variable regions were identified among the five Morus spp. cp genomes, and 154 single-nucleotide polymorphism mutation events were accurately located in the gene coding region.
Collapse
Affiliation(s)
- Wei Qing Kong
- Shaanxi Key Laboratory of Sericulture, Ankang University , Ankang , Shaanxi , China
| | - Jin Hong Yang
- Shaanxi Key Laboratory of Sericulture, Ankang University , Ankang , Shaanxi , China
| |
Collapse
|
14
|
Chan KL, Rosli R, Tatarinova TV, Hogan M, Firdaus-Raih M, Low ETL. Seqping: gene prediction pipeline for plant genomes using self-training gene models and transcriptomic data. BMC Bioinformatics 2017; 18:1426. [PMID: 28466793 PMCID: PMC5333190 DOI: 10.1186/s12859-016-1426-6] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Gene prediction is one of the most important steps in the genome annotation process. A large number of software tools and pipelines developed by various computing techniques are available for gene prediction. However, these systems have yet to accurately predict all or even most of the protein-coding regions. Furthermore, none of the currently available gene-finders has a universal Hidden Markov Model (HMM) that can perform gene prediction for all organisms equally well in an automatic fashion. RESULTS We present an automated gene prediction pipeline, Seqping that uses self-training HMM models and transcriptomic data. The pipeline processes the genome and transcriptome sequences of the target species using GlimmerHMM, SNAP, and AUGUSTUS pipelines, followed by MAKER2 program to combine predictions from the three tools in association with the transcriptomic evidence. Seqping generates species-specific HMMs that are able to offer unbiased gene predictions. The pipeline was evaluated using the Oryza sativa and Arabidopsis thaliana genomes. Benchmarking Universal Single-Copy Orthologs (BUSCO) analysis showed that the pipeline was able to identify at least 95% of BUSCO's plantae dataset. Our evaluation shows that Seqping was able to generate better gene predictions compared to three HMM-based programs (MAKER2, GlimmerHMM and AUGUSTUS) using their respective available HMMs. Seqping had the highest accuracy in rice (0.5648 for CDS, 0.4468 for exon, and 0.6695 nucleotide structure) and A. thaliana (0.5808 for CDS, 0.5955 for exon, and 0.8839 nucleotide structure). CONCLUSIONS Seqping provides researchers a seamless pipeline to train species-specific HMMs and predict genes in newly sequenced or less-studied genomes. We conclude that the Seqping pipeline predictions are more accurate than gene predictions using the other three approaches with the default or available HMMs.
Collapse
Affiliation(s)
- Kuang-Lim Chan
- Advanced Biotechnology and Breeding Center, Malaysian Palm Oil Board, 6 Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor Malaysia
- Faculty of Science and Technology, Universiti Kebangsaan Malaysia, 43600 Bangi, Selangor Malaysia
| | - Rozana Rosli
- Advanced Biotechnology and Breeding Center, Malaysian Palm Oil Board, 6 Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor Malaysia
| | - Tatiana V. Tatarinova
- Center for Personalized Medicine and Spatial Sciences Institute, University of Southern California, Los Angeles, CA USA
| | - Michael Hogan
- Orion Genomics, 4041 Forest Park Avenue, St. Louis, MO 63108 USA
| | - Mohd Firdaus-Raih
- Faculty of Science and Technology, Universiti Kebangsaan Malaysia, 43600 Bangi, Selangor Malaysia
| | - Eng-Ti Leslie Low
- Advanced Biotechnology and Breeding Center, Malaysian Palm Oil Board, 6 Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor Malaysia
| |
Collapse
|
15
|
Tatarinova TV, Lysnyansky I, Nikolsky YV, Bolshoy A. The mysterious orphans of Mycoplasmataceae. Biol Direct 2016; 11:2. [PMID: 26747447 PMCID: PMC4706650 DOI: 10.1186/s13062-015-0104-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2015] [Accepted: 12/30/2015] [Indexed: 01/08/2023] Open
Abstract
Background The length of a protein sequence is largely determined by its function. In certain species, it may be also affected by additional factors, such as growth temperature or acidity. In 2002, it was shown that in the bacterium Escherichia coli and in the archaeon Archaeoglobus fulgidus, protein sequences with no homologs were, on average, shorter than those with homologs (BMC Evol Biol 2:20, 2002). It is now generally accepted that in bacterial and archaeal genomes the distributions of protein length are different between sequences with and without homologs. In this study, we examine this postulate by conducting a comprehensive analysis of all annotated prokaryotic genomes and by focusing on certain exceptions. Results We compared the distribution of lengths of “having homologs proteins” (HHPs) and “non-having homologs proteins” (orphans or ORFans) in all currently completely sequenced and COG-annotated prokaryotic genomes. As expected, the HHPs and ORFans have strikingly different length distributions in almost all genomes. As previously established, the HHPs, indeed are, on average, longer than the ORFans, and the length distributions for the ORFans have a relatively narrow peak, in contrast to the HHPs, whose lengths spread over a wider range of values. However, about thirty genomes do not obey these rules. Practically all genomes of Mycoplasma and Ureaplasma have atypical ORFans distributions, with the mean lengths of ORFan larger than the mean lengths of HHPs. These genera constitute over 80 % of atypical genomes. Conclusions We confirmed on a ubiquitous set of genomes that the previous observation of HHPs and ORFans have different gene length distributions. We also showed that Mycoplasmataceae genomes have very distinctive distributions of ORFans lengths. We offer several possible biological explanations of this phenomenon, such as an adaptation to Mycoplasmataceae’s ecological niche, specifically its “quiet” co-existence with host organisms, resulting in long ABC transporters. Electronic supplementary material The online version of this article (doi:10.1186/s13062-015-0104-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Tatiana V Tatarinova
- Children's Hospital Los Angeles, Keck School of Medicine, University of Southern California, Los Angeles, 90027, CA, USA. .,Spatial Sciences Institute, University of Southern California, Los Angeles, 90089, CA, USA.
| | - Inna Lysnyansky
- Mycoplasma Unit, Division of Avian and Aquatic Diseases, Kimron Veterinary Institute, POB 12, Beit Dagan, 50250, Israel.
| | - Yuri V Nikolsky
- School of Systems Biology, George Mason University, 10900 University Blvd, MSN 5B3, Manassas, VA, 20110, USA. .,Prosapia Genetics, LLC, 534 San Andres Dr., Solana Beach, CA, 92075, USA. .,Vavilov Institute of General Genetics, Moscow, Russian Federation.
| | - Alexander Bolshoy
- Department of Evolutionary and Environmental Biology and Institute of Evolution, University of Haifa, Haifa, Israel.
| |
Collapse
|
16
|
Camiolo S, Melito S, Porceddu A. New insights into the interplay between codon bias determinants in plants. DNA Res 2015; 22:461-70. [PMID: 26546225 PMCID: PMC4675714 DOI: 10.1093/dnares/dsv027] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2015] [Accepted: 10/01/2015] [Indexed: 12/28/2022] Open
Abstract
Codon bias is the non-random use of synonymous codons, a phenomenon that has been observed in species as diverse as bacteria, plants and mammals. The preferential use of particular synonymous codons may reflect neutral mechanisms (e.g. mutational bias, G|C-biased gene conversion, genetic drift) and/or selection for mRNA stability, translational efficiency and accuracy. The extent to which these different factors influence codon usage is unknown, so we dissected the contribution of mutational bias and selection towards codon bias in genes from 15 eudicots, 4 monocots and 2 mosses. We analysed the frequency of mononucleotides, dinucleotides and trinucleotides and investigated whether the compositional genomic background could account for the observed codon usage profiles. Neutral forces such as mutational pressure and G|C-biased gene conversion appeared to underlie most of the observed codon bias, although there was also evidence for the selection of optimal translational efficiency and mRNA folding. Our data confirmed the compositional differences between monocots and dicots, with the former featuring in general a lower background compositional bias but a higher overall codon bias.
Collapse
Affiliation(s)
- S Camiolo
- Dipartimento di Agraria, SACEG, Università degli Studi di Sassari, Sassari, Italy
| | - S Melito
- Dipartimento di Agraria, SACEG, Università degli Studi di Sassari, Sassari, Italy
| | - A Porceddu
- Dipartimento di Agraria, SACEG, Università degli Studi di Sassari, Sassari, Italy
| |
Collapse
|
17
|
Tatarinova T, Elhaik E, Pellegrini M. Cross-species analysis of genic GC3 content and DNA methylation patterns. Genome Biol Evol 2013; 5:1443-56. [PMID: 23833164 PMCID: PMC3762193 DOI: 10.1093/gbe/evt103] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
The GC content in the third codon position (GC3) exhibits a unimodal distribution in many plant and animal genomes. Interestingly, grasses and homeotherm vertebrates exhibit a unique bimodal distribution. High GC3 was previously found to be associated with variable expression, higher frequency of upstream TATA boxes, and an increase of GC3 from 5′ to 3′. Moreover, GC3-rich genes are predominant in certain gene classes and are enriched in CpG dinucleotides that are potential targets for methylation. Based on the GC3 bimodal distribution we hypothesize that GC3 has a regulatory role involving methylation and gene expression. To test that hypothesis, we selected diverse taxa (rice, thale cress, bee, and human) that varied in the modality of their GC3 distribution and tested the association between GC3, DNA methylation, and gene expression. We examine the relationship between cytosine methylation levels and GC3, gene expression, genome signature, gene length, and other gene compositional features. We find a strong negative correlation (Pearson’s correlation coefficient r = −0.67, P value < 0.0001) between GC3 and genic CpG methylation. The comparison between 5′-3′ gradients of CG3-skew and genic methylation for the taxa in the study suggests interplay between gene-body methylation and transcription-coupled cytosine deamination effect. Compositional features are correlated with methylation levels of genes in rice, thale cress, human, bee, and fruit fly (which acts as an unmethylated control). These patterns allow us to generate evolutionary hypotheses about the relationships between GC3 and methylation and how these affect expression patterns. Specifically, we propose that the opposite effects of methylation and compositional gradients along coding regions of GC3-poor and GC3-rich genes are the products of several competing processes.
Collapse
Affiliation(s)
- Tatiana Tatarinova
- Laboratory of Applied Pharmacokinetics and Bioinformatics, University of Southern California.
| | | | | |
Collapse
|
18
|
Feng C, Xu CJ, Wang Y, Liu WL, Yin XR, Li X, Chen M, Chen KS. Codon usage patterns in Chinese bayberry (Myrica rubra) based on RNA-Seq data. BMC Genomics 2013; 14:732. [PMID: 24160180 PMCID: PMC4008310 DOI: 10.1186/1471-2164-14-732] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2013] [Accepted: 10/21/2013] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Codon usage analysis has been a classical topic for decades and has significances for studies of evolution, mRNA translation, and new gene discovery, etc. While the codon usage varies among different members of the plant kingdom, indicating the necessity for species-specific study, this work has mostly been limited to model organisms. Recently, the development of deep sequencing, especial RNA-Seq, has made it possible to carry out studies in non-model species. RESULT RNA-Seq data of Chinese bayberry was analyzed to investigate the bias of codon usage and codon pairs. High frequency codons (AGG, GCU, AAG and GAU), as well as low frequency ones (NCG and NUA codons) were identified, and 397 high frequency codon pairs were observed. Meanwhile, 26 preferred and 141 avoided neighboring codon pairs were also identified, which showed more significant bias than the same pairs with one or more intervening codons. Codon patterns were also analyzed at the plant kingdom, organism and gene levels. Changes during plant evolution were evident using RSCU (relative synonymous codon usage), which was even more significant than GC3s (GC content of 3rd synonymous codons). Nine GO categories were differentially and independently influenced by CAI (codon adaptation index) or GC3s, especially in 'Molecular function' category. Within a gene, the average CAI increased from 0.720 to 0.785 in the first 50 codons, and then more slowly thereafter. Furthermore, the preferred as well as avoided codons at the position just following the start codon AUG were identified and discussed in relation to the key positions in Kozak sequences. CONCLUSION A comprehensive codon usage Table and number of high-frequency codon pairs were established. Bias in codon usage as well as in neighboring codon pairs was observed, and the significance of this in avoiding DNA mutation, increasing protein production and regulating protein synthesis rate was proposed. Codon usage patterns at three levels were revealed and the significance in plant evolution analysis, gene function classification, and protein translation start site predication were discussed. This work promotes the study of codon biology, and provides some reference for analysis and comprehensive application of RNA-Seq data from other non-model species.
Collapse
Affiliation(s)
- Chao Feng
- Laboratory of Fruit Quality Biology / The State Agriculture Ministry Laboratory of Horticultural Plant Growth, Development and Quality Improvement, Zhejiang University, Hangzhou, 310058, China
| | - Chang-jie Xu
- Laboratory of Fruit Quality Biology / The State Agriculture Ministry Laboratory of Horticultural Plant Growth, Development and Quality Improvement, Zhejiang University, Hangzhou, 310058, China
| | - Yue Wang
- Department of Bioinformatics / The State Key Laboratory of Plant Physiology and Biochemistry, College of Life Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Wen-li Liu
- Department of Mathematics, Zhejiang University, Hangzhou, 310027, China
| | - Xue-ren Yin
- Laboratory of Fruit Quality Biology / The State Agriculture Ministry Laboratory of Horticultural Plant Growth, Development and Quality Improvement, Zhejiang University, Hangzhou, 310058, China
| | - Xian Li
- Laboratory of Fruit Quality Biology / The State Agriculture Ministry Laboratory of Horticultural Plant Growth, Development and Quality Improvement, Zhejiang University, Hangzhou, 310058, China
| | - Ming Chen
- Department of Bioinformatics / The State Key Laboratory of Plant Physiology and Biochemistry, College of Life Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Kun-song Chen
- Laboratory of Fruit Quality Biology / The State Agriculture Ministry Laboratory of Horticultural Plant Growth, Development and Quality Improvement, Zhejiang University, Hangzhou, 310058, China
| |
Collapse
|
19
|
Sablok G, Wu X, Kuo J, Nayak KC, Baev V, Varotto C, Zhou F. Combinational effect of mutational bias and translational selection for translation efficiency in tomato (Solanum lycopersicum) cv. Micro-Tom. Genomics 2013; 101:290-5. [PMID: 23474140 DOI: 10.1016/j.ygeno.2013.02.008] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2012] [Revised: 01/21/2013] [Accepted: 02/21/2013] [Indexed: 11/24/2022]
Abstract
We conducted a comprehensive analysis of codon usage bias (CUB) based on the available non-redundant full-length cDNA (nrFLcDNA) and expressed sequence tags (ESTs) data of cultivar Micro-Tom and evaluated the associations of observed CUB and measurements of transcriptional and translational effectiveness. The analysis presented in our study suggests a correlation, which is negative but highly correlated between Axis 1 and GC3s (r=-0.827, P<0.01), indicating that mutational bias has a significant and dominant repressive role to the choices of GC3. We also observed a strong positive correlation between codon adaptation index (CAI) and translational adaptation index (tAIg) (0.407, P<0.01), which demonstrates the facilitation of efficient translation by the optimal codon usage patterns of the highly expressed genes. We believe that the complete set of optimal codon usage patterns detected in this study will serve as a model to enhance the transgenesis in the studied cultivar of Solanum lycopersicum.
Collapse
Affiliation(s)
- Gaurav Sablok
- Department of Biodiversity and Molecular Ecology, Research and Innovation Centre, Fondazione Edmund Mach, Via E Mach 1, 38010 S. Michele all'Adige (TN), Italy.
| | | | | | | | | | | | | |
Collapse
|