Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Ahmad T, Sablok G, Tatarinova TV, Xu Q, Deng XX, Guo WW. Evaluation of codon biology in citrus and Poncirus trifoliata based on genomic features and frame corrected expressed sequence tags. DNA Res 2013;20:135-50. [PMID: 23315666 PMCID: PMC3628444 DOI: 10.1093/dnares/dss039] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open

For:	Ahmad T, Sablok G, Tatarinova TV, Xu Q, Deng XX, Guo WW. Evaluation of codon biology in citrus and Poncirus trifoliata based on genomic features and frame corrected expressed sequence tags. DNA Res 2013;20:135-50. [PMID: 23315666 PMCID: PMC3628444 DOI: 10.1093/dnares/dss039] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open

Number

Cited by Other Article(s)

Wang Y, Jiang D, Guo K, Zhao L, Meng F, Xiao J, Niu Y, Sun Y. Comparative analysis of codon usage patterns in chloroplast genomes of ten Epimedium species. BMC Genom Data 2023;24:3. [PMID: 36624369 PMCID: PMC9830715 DOI: 10.1186/s12863-023-01104-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Accepted: 01/05/2023] [Indexed: 01/11/2023] Open

Gao Y, Lu Y, Song Y, Jing L. Analysis of codon usage bias of WRKY transcription factors in Helianthus annuus. BMC Genom Data 2022;23:46. [PMID: 35725374 PMCID: PMC9210703 DOI: 10.1186/s12863-022-01064-8] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2021] [Accepted: 06/13/2022] [Indexed: 11/10/2022] Open

Abstract Abstract Background The phenomenon of codon usage bias is known to exist in many genomes and is mainly determined by mutation and selection. Codon usage bias analysis is a suitable strategy for identifying the principal evolutionary driving forces in different organisms. Sunflower (Helianthus annuus L.) is an annual crop that is cultivated worldwide as ornamentals, food plants and for their valuable oil. The WRKY family genes in plants play a central role in diverse regulation and multiple stress responses. Evolutionary analysis of WRKY family genes of H. annuus can provide rich genetic information for developing hybridization resources of the genus Helianthus. Results Bases composition analysis showed the average GC content of WRKY genes of H. annuus was 43.42%, and the average GC3 content was 39.60%, suggesting that WRKY gene family prefers A/T(U) ending codons. There were 29 codons with relative synonymous codon usage (RSCU) greater than 1 and 22 codons ending with A and U base. The effective number of codons (ENC) and codon adaptation index (CAI) in WRKY genes ranged from 43.47–61.00 and 0.14–0.26, suggesting that the codon bias was weak and WRKY genes expression level was low. Neutrality analysis found a significant correlation between GC12 and GC3. ENC-plot showed most genes on or close to the expected curve, suggesting that mutational bias played a major role in shaping codon usage. The Parity Rule 2 plot (PR2) analysis showed that the usage of AT and GC was disproportionate. A total of three codons were identified as the optimal codons. Conclusion Apart from natural selection effects, most of the genetic evolution in the H. annuus WRKY genome might be driven by mutation pressure. Our results provide a theoretical foundation for elaborating the genetic architecture and mechanisms of H. annuus and contributing to enrich H. annuus genetic resources. Collapse

Yang J, Chu Q, Meng G, Kong W. The complete chloroplast genome sequences of three Broussonetia species and comparative analysis within the Moraceae. PeerJ 2022;10:e14293. [PMID: 36340196 PMCID: PMC9632464 DOI: 10.7717/peerj.14293] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2022] [Accepted: 10/03/2022] [Indexed: 01/22/2023] Open

Abstract

Background

Species of Broussonetia (family Moraceae) are commonly used to make textiles and high-grade paper. The distribution of Broussonetia papyrifera L. is considered to be related to the spread and location of humans. The complete chloroplast (cp) genomes of B. papyrifera, Broussonetia kazinoki Sieb., and Broussonetia kaempferi Sieb. were analyzed to better understand the status and evolutionary biology of the genus Broussonetia.

Methods

The cp genomes were assembled and characterized using SOAPdenovo2 and DOGMA. Phylogenetic and molecular dating analysis were performed using the concatenated nucleotide sequences of 35 species in the Moraceae family and were based on 66 protein-coding genes (PCGs). An analysis of the sequence divergence (pi) of each PCG among the 35 cp genomes was conducted using DnaSP v6. Codon usage indices were calculated using the CodonW program.

Results

All three cp genomes had the typical land plant quadripartite structure, ranging in size from 160,239 bp to 160,841 bp. The ribosomal protein L22 gene (RPL22) was either incomplete or missing in all three Broussonetia species. Phylogenetic analysis revealed two clades. Clade 1 included Morus and Artocarpus, whereas clade 2 included the other seven genera. Malaisia scandens Lour. was clustered within the genus Broussonetia. The differentiation of Broussonetia was estimated to have taken place 26 million years ago. The PCGs' pi values ranged from 0.0005 to 0.0419, indicating small differences within the Moraceae family. The distribution of most of the genes in the effective number of codons plot (ENc-plot) fell on or near the trend line; the slopes of the trend line of neutrality plots were within the range of 0.0363-0.171. These results will facilitate the identification, taxonomy, and utilization of the Broussonetia species and further the evolutionary studies of the Moraceae family.

Collapse

Zhang Y, Shen Z, Meng X, Zhang L, Liu Z, Liu M, Zhang F, Zhao J. Codon usage patterns across seven Rosales species. BMC PLANT BIOLOGY 2022;22:65. [PMID: 35123393 PMCID: PMC8817548 DOI: 10.1186/s12870-022-03450-x] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/24/2021] [Accepted: 01/31/2022] [Indexed: 05/03/2023]

Allen QM, Febres VJ, Rathinasabapathi B, Chaparro JX. Engineering a Plant-Derived Astaxanthin Synthetic Pathway Into Nicotiana benthamiana. FRONTIERS IN PLANT SCIENCE 2022;12:831785. [PMID: 35116052 PMCID: PMC8804313 DOI: 10.3389/fpls.2021.831785] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/08/2021] [Accepted: 12/27/2021] [Indexed: 06/14/2023]

Parvathy ST, Udayasuriyan V, Bhadana V. Codon usage bias. Mol Biol Rep 2021;49:539-565. [PMID: 34822069 PMCID: PMC8613526 DOI: 10.1007/s11033-021-06749-4] [Citation(s) in RCA: 136] [Impact Index Per Article: 45.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2021] [Accepted: 09/14/2021] [Indexed: 12/14/2022]

Patil SS, Indrabalan UB, Suresh KP, Shome BR. Analysis of codon usage bias of classical swine fever virus. Vet World 2021;14:1450-1458. [PMID: 34316191 PMCID: PMC8304411 DOI: 10.14202/vetworld.2021.1450-1458] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2021] [Accepted: 04/21/2021] [Indexed: 11/22/2022] Open

Abstract

Background and Aim:

Classical swine fever (CSF), caused by CSF virus (CSFV), is a highly contagious disease in pigs causing 100% mortality in susceptible adult pigs and piglets. High mortality rate in pigs causes huge economic loss to pig farmers. CSFV has a positive-sense RNA genome of 12.3 kb in length flanked by untranslated regions at 5’ and 3’ end. The genome codes for a large polyprotein of 3900 amino acids coding for 11 viral proteins. The 1300 codons in the polyprotein are coded by different combinations of three nucleotides which help the infectious agent to evolve itself and adapt to the host environment. This study performed and employed various methods/techniques to estimate the changes occurring in the process of CSFV evolution by analyzing the codon usage pattern.

Materials and Methods:

The evolution of viruses is widely studied by analyzing their nucleotides and coding regions/codons using various methods. A total of 115 complete coding regions of CSFVs including one complete genome from our laboratory (MH734359) were included in this study and analysis was carried out using various methods in estimating codon usage bias and evolution. This study elaborates on the factors that influence the codon usage pattern.

Results:

The effective number of codons (ENC) and relative synonymous codon usage showed the presence of codon usage bias. The mononucleotide (A) has a higher frequency compared to the other mononucleotides (G, C, and T). The dinucleotides CG and CC are underrepresented and overrepresented. The codons CGT was underrepresented and AGG was overrepresented. The codon adaptation index value of 0.71 was obtained indicating that there is a similarity in the codon usage bias. The principal component analysis, ENC-plot, Neutrality plot, and Parity Rule 2 plot produced in this article indicate that the CSFV is influenced by the codon usage bias. The mutational pressure and natural selection are the important factors that influence the codon usage bias.

Conclusion:

The study provides useful information on the codon usage analysis of CSFV and may be utilized to understand the host adaptation to virus environment and its evolution. Further, such findings help in new gene discovery, design of primers/probes, design of transgenes, determination of the origin of species, prediction of gene expression level, and gene function of CSFV. To the best of our knowledge, this is the first study on codon usage bias involving such a large number of complete CSFVs including one sequence of CSFV from India.

Collapse

Shen Z, Gan Z, Zhang F, Yi X, Zhang J, Wan X. Analysis of codon usage patterns in citrus based on coding sequence data. BMC Genomics 2020;21:234. [PMID: 33327935 PMCID: PMC7739459 DOI: 10.1186/s12864-020-6641-x] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2020] [Accepted: 03/03/2020] [Indexed: 11/10/2022] Open

Biswas KK, Palchoudhury S, Chakraborty P, Bhattacharyya UK, Ghosh DK, Debnath P, Ramadugu C, Keremane ML, Khetarpal RK, Lee RF. Codon Usage Bias Analysis of Citrus tristeza Virus: Higher Codon Adaptation to Citrus reticulata Host. Viruses 2019;11:v11040331. [PMID: 30965565 PMCID: PMC6521185 DOI: 10.3390/v11040331] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2019] [Revised: 03/25/2019] [Accepted: 04/03/2019] [Indexed: 12/16/2022] Open

Abstract

Citrus tristeza virus (CTV), a member of the aphid-transmitted closterovirus group, is the causal agent of the notorious tristeza disease in several citrus species worldwide. The codon usage patterns of viruses reflect the evolutionary changes for optimization of their survival and adaptation in their fitness to the external environment and the hosts. The codon usage adaptation of CTV to specific citrus hosts remains to be studied; thus, its role in CTV evolution is not clearly comprehended. Therefore, to better explain the host–virus interaction and evolutionary history of CTV, the codon usage patterns of the coat protein (CP) genes of 122 CTV isolates originating from three economically important citrus hosts (55 isolate from Citrus sinensis, 38 from C. reticulata, and 29 from C. aurantifolia) were studied using several codon usage indices and multivariate statistical methods. The present study shows that CTV displays low codon usage bias (CUB) and higher genomic stability. Neutrality plot and relative synonymous codon usage analyses revealed that the overall influence of natural selection was more profound than that of mutation pressure in shaping the CUB of CTV. The contribution of high-frequency codon analysis and codon adaptation index value show that CTV has host-specific codon usage patterns, resulting in higheradaptability of CTV isolates originating from C. reticulata (Cr-CTV), and low adaptability in the isolates originating from C. aurantifolia (Ca-CTV) and C. sinensis (Cs-CTV). The combination of codon analysis of CTV with citrus genealogy suggests that CTV evolved in C. reticulata or other Citrus progenitors. The outcome of the study enhances the understanding of the factors involved in viral adaptation, evolution, and fitness toward their hosts. This information will definitely help devise better management strategies of CTV.

Collapse

Chan KL, Tatarinova TV, Rosli R, Amiruddin N, Azizi N, Halim MAA, Sanusi NSNM, Jayanthi N, Ponomarenko P, Triska M, Solovyev V, Firdaus-Raih M, Sambanthamurthi R, Murphy D, Low ETL. Evidence-based gene models for structural and functional annotations of the oil palm genome. Biol Direct 2017;12:21. [PMID: 28886750 PMCID: PMC5591544 DOI: 10.1186/s13062-017-0191-4] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2017] [Accepted: 08/07/2017] [Indexed: 11/13/2022] Open

Abstract

Background

Oil palm is an important source of edible oil. The importance of the crop, as well as its long breeding cycle (10-12 years) has led to the sequencing of its genome in 2013 to pave the way for genomics-guided breeding. Nevertheless, the first set of gene predictions, although useful, had many fragmented genes. Classification and characterization of genes associated with traits of interest, such as those for fatty acid biosynthesis and disease resistance, were also limited. Lipid-, especially fatty acid (FA)-related genes are of particular interest for the oil palm as they specify oil yields and quality. This paper presents the characterization of the oil palm genome using different gene prediction methods and comparative genomics analysis, identification of FA biosynthesis and disease resistance genes, and the development of an annotation database and bioinformatics tools.

Results

Using two independent gene-prediction pipelines, Fgenesh++ and Seqping, 26,059 oil palm genes with transcriptome and RefSeq support were identified from the oil palm genome. These coding regions of the genome have a characteristic broad distribution of GC₃ (fraction of cytosine and guanine in the third position of a codon) with over half the GC₃-rich genes (GC₃ ≥ 0.75286) being intronless. In comparison, only one-seventh of the oil palm genes identified are intronless. Using comparative genomics analysis, characterization of conserved domains and active sites, and expression analysis, 42 key genes involved in FA biosynthesis in oil palm were identified. For three of them, namely EgFABF, EgFABH and EgFAD3, segmental duplication events were detected. Our analysis also identified 210 candidate resistance genes in six classes, grouped by their protein domain structures.

Conclusions

We present an accurate and comprehensive annotation of the oil palm genome, focusing on analysis of important categories of genes (GC₃-rich and intronless), as well as those associated with important functions, such as FA biosynthesis and disease resistance. The study demonstrated the advantages of having an integrated approach to gene prediction and developed a computational framework for combining multiple genome annotations. These results, available in the oil palm annotation database (http://palmxplore.mpob.gov.my), will provide important resources for studies on the genomes of oil palm and related crops.

Reviewers

This article was reviewed by Alexander Kel, Igor Rogozin, and Vladimir A. Kuznetsov.

Electronic supplementary material

The online version of this article (doi:10.1186/s13062-017-0191-4) contains supplementary material, which is available to authorized users.

Collapse

Affiliation(s)

Kuang-Lim Chan Advanced Biotechnology and Breeding Centre, Malaysian Palm Oil Board, No. 6, Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor, Malaysia.,Faculty of Science and Technology, Universiti Kebangsaan Malaysia, 43600, Bangi, Selangor, Malaysia
Tatiana V Tatarinova Department of Biology, University of La Verne, La Verne, California, 91750, USA.,Spatial Sciences Institute, University of Southern California, Los Angeles, CA, 90089, USA
Rozana Rosli Advanced Biotechnology and Breeding Centre, Malaysian Palm Oil Board, No. 6, Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor, Malaysia.,Genomics and Computational Biology Research Group, University of South Wales, Pontypridd, CF371DL, UK
Nadzirah Amiruddin Advanced Biotechnology and Breeding Centre, Malaysian Palm Oil Board, No. 6, Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor, Malaysia
Norazah Azizi Advanced Biotechnology and Breeding Centre, Malaysian Palm Oil Board, No. 6, Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor, Malaysia
Mohd Amin Ab Halim Advanced Biotechnology and Breeding Centre, Malaysian Palm Oil Board, No. 6, Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor, Malaysia
Nik Shazana Nik Mohd Sanusi Advanced Biotechnology and Breeding Centre, Malaysian Palm Oil Board, No. 6, Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor, Malaysia
Nagappan Jayanthi Advanced Biotechnology and Breeding Centre, Malaysian Palm Oil Board, No. 6, Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor, Malaysia
Petr Ponomarenko Spatial Sciences Institute, University of Southern California, Los Angeles, CA, 90089, USA
Martin Triska Children's Hospital Los Angeles, University of Southern California, Los Angeles, CA, 90089, USA
Victor Solovyev Softberry Inc., 116 Radio Circle, Suite 400, Mount Kisco, NY, 10549, USA
Mohd Firdaus-Raih Faculty of Science and Technology, Universiti Kebangsaan Malaysia, 43600, Bangi, Selangor, Malaysia
Ravigadevi Sambanthamurthi Advanced Biotechnology and Breeding Centre, Malaysian Palm Oil Board, No. 6, Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor, Malaysia
Denis Murphy Genomics and Computational Biology Research Group, University of South Wales, Pontypridd, CF371DL, UK
Eng-Ti Leslie Low Advanced Biotechnology and Breeding Centre, Malaysian Palm Oil Board, No. 6, Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor, Malaysia.

Collapse

Sablok G, Chen TW, Lee CC, Yang C, Gan RC, Wegrzyn JL, Porta NL, Nayak KC, Huang PJ, Varotto C, Tang P. ChloroMitoCU: Codon patterns across organelle genomes for functional genomics and evolutionary applications. DNA Res 2017;24:327-332. [PMID: 28419256 PMCID: PMC5499650 DOI: 10.1093/dnares/dsw044] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2016] [Accepted: 09/14/2016] [Indexed: 01/01/2023] Open

Affiliation(s)

Gaurav Sablok Department of Biodiversity and Molecular Ecology, Research and Innovation Centre, Fondazione Edmund Mach, Via E. Mach 1, 38010 S. Michele all'Adige (TN), Italy
Ting-Wen Chen Bioinformatics Core Laboratory, Molecular Medicine Research Center, Chang Gung University, Kweishan, Taoyuan 333, Taiwan
Chi-Ching Lee Bioinformatics Core Laboratory, Molecular Medicine Research Center, Chang Gung University, Kweishan, Taoyuan 333, Taiwan
Chi Yang Bioinformatics Core Laboratory, Molecular Medicine Research Center, Chang Gung University, Kweishan, Taoyuan 333, Taiwan
Ruei-Chi Gan Bioinformatics Core Laboratory, Molecular Medicine Research Center, Chang Gung University, Kweishan, Taoyuan 333, Taiwan
Jill L Wegrzyn Department of Ecology and Evolutionary Biology, University 10 of Connecticut, 75 North Eagleville Road, Storrs, CT 06269-3043 USA
Nicola L Porta Department of Sustainable Agrobiosystems and Bioresources, Research and Innovation Centre, Fondazione Edmund Mach, Via E. Mach 1, 38010 S. Michele all'Adige (TN), Italy.,MOUNTFOR Project Centre, European Forest Institute, Via E. Mach 1, 38010 San Michele all'Adige, Trento, Italy
Kinshuk C Nayak Bioinformatics Centre, Institute of Life Sciences, Department of Biotechnology, Govt. India, Nalco Square, Bhubaneswar - 751 023, India
Po-Jung Huang Bioinformatics Core Laboratory, Molecular Medicine Research Center, Chang Gung University, Kweishan, Taoyuan 333, Taiwan
Claudio Varotto Department of Biodiversity and Molecular Ecology, Research and Innovation Centre, Fondazione Edmund Mach, Via E. Mach 1, 38010 S. Michele all'Adige (TN), Italy
Petrus Tang Bioinformatics Core Laboratory, Molecular Medicine Research Center, Chang Gung University, Kweishan, Taoyuan 333, Taiwan.,Molecular Infectious Diseases Research Center, Chang Gung Memorial Hospital, Kweishan, Taoyuan 333, Taiwan

Collapse

The Evolutionary Basis of Translational Accuracy in Plants. G3-GENES GENOMES GENETICS 2017;7:2363-2373. [PMID: 28533334 PMCID: PMC5499143 DOI: 10.1534/g3.117.040626] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Kong WQ, Yang JH. The complete chloroplast genome sequence of Morus cathayana and Morus multicaulis, and comparative analysis within genus Morus L. PeerJ 2017;5:e3037. [PMID: 28286710 PMCID: PMC5345388 DOI: 10.7717/peerj.3037] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2016] [Accepted: 01/27/2017] [Indexed: 11/20/2022] Open

Chan KL, Rosli R, Tatarinova TV, Hogan M, Firdaus-Raih M, Low ETL. Seqping: gene prediction pipeline for plant genomes using self-training gene models and transcriptomic data. BMC Bioinformatics 2017;18:1426. [PMID: 28466793 PMCID: PMC5333190 DOI: 10.1186/s12859-016-1426-6] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Gene prediction is one of the most important steps in the genome annotation process. A large number of software tools and pipelines developed by various computing techniques are available for gene prediction. However, these systems have yet to accurately predict all or even most of the protein-coding regions. Furthermore, none of the currently available gene-finders has a universal Hidden Markov Model (HMM) that can perform gene prediction for all organisms equally well in an automatic fashion.

RESULTS

We present an automated gene prediction pipeline, Seqping that uses self-training HMM models and transcriptomic data. The pipeline processes the genome and transcriptome sequences of the target species using GlimmerHMM, SNAP, and AUGUSTUS pipelines, followed by MAKER2 program to combine predictions from the three tools in association with the transcriptomic evidence. Seqping generates species-specific HMMs that are able to offer unbiased gene predictions. The pipeline was evaluated using the Oryza sativa and Arabidopsis thaliana genomes. Benchmarking Universal Single-Copy Orthologs (BUSCO) analysis showed that the pipeline was able to identify at least 95% of BUSCO's plantae dataset. Our evaluation shows that Seqping was able to generate better gene predictions compared to three HMM-based programs (MAKER2, GlimmerHMM and AUGUSTUS) using their respective available HMMs. Seqping had the highest accuracy in rice (0.5648 for CDS, 0.4468 for exon, and 0.6695 nucleotide structure) and A. thaliana (0.5808 for CDS, 0.5955 for exon, and 0.8839 nucleotide structure).

CONCLUSIONS

Seqping provides researchers a seamless pipeline to train species-specific HMMs and predict genes in newly sequenced or less-studied genomes. We conclude that the Seqping pipeline predictions are more accurate than gene predictions using the other three approaches with the default or available HMMs.

Collapse

Tatarinova TV, Lysnyansky I, Nikolsky YV, Bolshoy A. The mysterious orphans of Mycoplasmataceae. Biol Direct 2016;11:2. [PMID: 26747447 PMCID: PMC4706650 DOI: 10.1186/s13062-015-0104-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2015] [Accepted: 12/30/2015] [Indexed: 01/08/2023] Open

Abstract

Background

The length of a protein sequence is largely determined by its function. In certain species, it may be also affected by additional factors, such as growth temperature or acidity. In 2002, it was shown that in the bacterium Escherichia coli and in the archaeon Archaeoglobus fulgidus, protein sequences with no homologs were, on average, shorter than those with homologs (BMC Evol Biol 2:20, 2002). It is now generally accepted that in bacterial and archaeal genomes the distributions of protein length are different between sequences with and without homologs. In this study, we examine this postulate by conducting a comprehensive analysis of all annotated prokaryotic genomes and by focusing on certain exceptions.

Results

We compared the distribution of lengths of “having homologs proteins” (HHPs) and “non-having homologs proteins” (orphans or ORFans) in all currently completely sequenced and COG-annotated prokaryotic genomes. As expected, the HHPs and ORFans have strikingly different length distributions in almost all genomes. As previously established, the HHPs, indeed are, on average, longer than the ORFans, and the length distributions for the ORFans have a relatively narrow peak, in contrast to the HHPs, whose lengths spread over a wider range of values. However, about thirty genomes do not obey these rules. Practically all genomes of Mycoplasma and Ureaplasma have atypical ORFans distributions, with the mean lengths of ORFan larger than the mean lengths of HHPs. These genera constitute over 80 % of atypical genomes.

Conclusions

We confirmed on a ubiquitous set of genomes that the previous observation of HHPs and ORFans have different gene length distributions. We also showed that Mycoplasmataceae genomes have very distinctive distributions of ORFans lengths. We offer several possible biological explanations of this phenomenon, such as an adaptation to Mycoplasmataceae’s ecological niche, specifically its “quiet” co-existence with host organisms, resulting in long ABC transporters.

Electronic supplementary material

The online version of this article (doi:10.1186/s13062-015-0104-3) contains supplementary material, which is available to authorized users.

Collapse

Camiolo S, Melito S, Porceddu A. New insights into the interplay between codon bias determinants in plants. DNA Res 2015;22:461-70. [PMID: 26546225 PMCID: PMC4675714 DOI: 10.1093/dnares/dsv027] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2015] [Accepted: 10/01/2015] [Indexed: 12/28/2022] Open

Tatarinova T, Elhaik E, Pellegrini M. Cross-species analysis of genic GC3 content and DNA methylation patterns. Genome Biol Evol 2013;5:1443-56. [PMID: 23833164 PMCID: PMC3762193 DOI: 10.1093/gbe/evt103] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open

Abstract

The GC content in the third codon position (GC₃) exhibits a unimodal distribution in many plant and animal genomes. Interestingly, grasses and homeotherm vertebrates exhibit a unique bimodal distribution. High GC₃ was previously found to be associated with variable expression, higher frequency of upstream TATA boxes, and an increase of GC₃ from 5′ to 3′. Moreover, GC₃-rich genes are predominant in certain gene classes and are enriched in CpG dinucleotides that are potential targets for methylation. Based on the GC₃ bimodal distribution we hypothesize that GC₃ has a regulatory role involving methylation and gene expression. To test that hypothesis, we selected diverse taxa (rice, thale cress, bee, and human) that varied in the modality of their GC₃ distribution and tested the association between GC₃, DNA methylation, and gene expression. We examine the relationship between cytosine methylation levels and GC₃, gene expression, genome signature, gene length, and other gene compositional features. We find a strong negative correlation (Pearson’s correlation coefficient r = −0.67, P value < 0.0001) between GC₃ and genic CpG methylation. The comparison between 5′-3′ gradients of CG₃-skew and genic methylation for the taxa in the study suggests interplay between gene-body methylation and transcription-coupled cytosine deamination effect. Compositional features are correlated with methylation levels of genes in rice, thale cress, human, bee, and fruit fly (which acts as an unmethylated control). These patterns allow us to generate evolutionary hypotheses about the relationships between GC₃ and methylation and how these affect expression patterns. Specifically, we propose that the opposite effects of methylation and compositional gradients along coding regions of GC₃-poor and GC₃-rich genes are the products of several competing processes.

Collapse

Feng C, Xu CJ, Wang Y, Liu WL, Yin XR, Li X, Chen M, Chen KS. Codon usage patterns in Chinese bayberry (Myrica rubra) based on RNA-Seq data. BMC Genomics 2013;14:732. [PMID: 24160180 PMCID: PMC4008310 DOI: 10.1186/1471-2164-14-732] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2013] [Accepted: 10/21/2013] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Codon usage analysis has been a classical topic for decades and has significances for studies of evolution, mRNA translation, and new gene discovery, etc. While the codon usage varies among different members of the plant kingdom, indicating the necessity for species-specific study, this work has mostly been limited to model organisms. Recently, the development of deep sequencing, especial RNA-Seq, has made it possible to carry out studies in non-model species.

RESULT

RNA-Seq data of Chinese bayberry was analyzed to investigate the bias of codon usage and codon pairs. High frequency codons (AGG, GCU, AAG and GAU), as well as low frequency ones (NCG and NUA codons) were identified, and 397 high frequency codon pairs were observed. Meanwhile, 26 preferred and 141 avoided neighboring codon pairs were also identified, which showed more significant bias than the same pairs with one or more intervening codons. Codon patterns were also analyzed at the plant kingdom, organism and gene levels. Changes during plant evolution were evident using RSCU (relative synonymous codon usage), which was even more significant than GC3s (GC content of 3rd synonymous codons). Nine GO categories were differentially and independently influenced by CAI (codon adaptation index) or GC3s, especially in 'Molecular function' category. Within a gene, the average CAI increased from 0.720 to 0.785 in the first 50 codons, and then more slowly thereafter. Furthermore, the preferred as well as avoided codons at the position just following the start codon AUG were identified and discussed in relation to the key positions in Kozak sequences.

CONCLUSION

A comprehensive codon usage Table and number of high-frequency codon pairs were established. Bias in codon usage as well as in neighboring codon pairs was observed, and the significance of this in avoiding DNA mutation, increasing protein production and regulating protein synthesis rate was proposed. Codon usage patterns at three levels were revealed and the significance in plant evolution analysis, gene function classification, and protein translation start site predication were discussed. This work promotes the study of codon biology, and provides some reference for analysis and comprehensive application of RNA-Seq data from other non-model species.

Collapse

Sablok G, Wu X, Kuo J, Nayak KC, Baev V, Varotto C, Zhou F. Combinational effect of mutational bias and translational selection for translation efficiency in tomato (Solanum lycopersicum) cv. Micro-Tom. Genomics 2013;101:290-5. [PMID: 23474140 DOI: 10.1016/j.ygeno.2013.02.008] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2012] [Revised: 01/21/2013] [Accepted: 02/21/2013] [Indexed: 11/24/2022]